Hello,
I have a high-volume installation with 100% RTP relay using rtpengine.
We run rtpengine with a high and low port range of 8000 - 65000. Generally the peak concurrent channel count is around 2000-4000 ports, so in theory, so we should have enough ports for all calls currently in process.
In practice, however, we get a nearly constant stream of these errors from Kamailio:
Feb 2 15:52:58 sip /usr/local/sbin/kamailio[16438]: ERROR: rtpengine [rtpengine.c:1541]: rtpp_function_call(): proxy replied with error: Error rewriting SDP
When I trace this back to rtpengine logs on the server, I see:
---
Feb 2 15:55:47 rtphost rtpengine[14710]: [FSRIGZPJGXZWPNO.52364.907376806] Received command 'offer' from 215.111.222.101:54100 Feb 2 15:55:47 rtphost rtpengine[14710]: [FSRIGZPJGXZWPNO.52364.907376806] Creating new call Feb 2 15:55:47 rtphost rtpengine[14710]: [FSRIGZPJGXZWPNO.52364.907376806] Failed to get 2 consecutive UDP ports for relay Feb 2 15:55:47 rtphost rtpengine[14710]: [FSRIGZPJGXZWPNO.52364.907376806] Error allocating media ports Feb 2 15:55:47 rtphost rtpengine[14710]: [FSRIGZPJGXZWPNO.52364.907376806] Protocol error in packet from 215.111.222.101:54100: Error rewriting SDP [d3:sdp285:v=0#015#012o=- 1631132006999214513 4263823554000404972 IN IP4 45.201.202.203#015#012s=VOS2009#015#012c=IN IP4 45.201.202.203#015#012t=0 0#015#012m=audio 29034 RTP/AVP 18 8 0 101#015#012a=rtpmap:18 G729/8000#015#012a=fmtp:18 annexb=no#015#012a=rtpmap:8 PCMA/8000#015#012a=rtpmap:0 PCMU/8000#015#012a=rtpmap:101 telephone-event/8000#015#012a=fmtp:101 0-15#015#0123:ICE6:remove7:replacel6:origin18:session-connectione7:call-id31:FSRIGZPJGXZWPNO.52364.90737680613:received-froml3:IP415:215.111.222 ... Feb 2 15:55:47 rtphost rtpengine[14710]: [FSRIGZPJGXZWPNO.52364.907376806] ... 5.125e8:from-tag32:709abb985b675e1497597a5bd44ca2d67:command5:offere]
---
This is despite nearly no rtpengine engagement:
[root@csrprtpus1 log]# cat /proc/rtpengine/0/status Refcount: 1 Control PID: 14709 Targets: 0 Buckets: 0
As far as I can see, the cause is that rtpengine is not deallocating ports:
--- [root@rtphost log]# lsof -n | grep -i rtp rtpengine 14710 root *542u IPv6 1082109949 0t0 UDP *:39910 rtpengine 14710 root *543u IPv6 1082109950 0t0 UDP *:39911 rtpengine 14710 root *544u IPv6 1082218768 0t0 UDP *:41469 rtpengine 14710 root *545u IPv6 1082058934 0t0 UDP *:46424 rtpengine 14710 root *546u IPv6 1082366799 0t0 UDP *:22915 rtpengine 14710 root *547u IPv6 1082288397 0t0 UDP *:57245 rtpengine 14710 root *548u IPv6 1082428020 0t0 UDP *:24311 rtpengine 14710 root *549u IPv6 1082366800 0t0 UDP *:23266 rtpengine 14710 root *550u IPv6 1082798144 0t0 UDP *:22743 rtpengine 14710 root *551u IPv6 1082605782 0t0 UDP *:47845 rtpengine 14710 root *552u IPv6 1082062515 0t0 UDP *:52766 rtpengine 14710 root *553u IPv6 1082648709 0t0 UDP *:58266 rtpengine 14710 root *554u IPv6 1082122218 0t0 UDP *:54359 ... ---
There are _lots_ of these:
[root@rtphost log]# lsof -n | grep -i rtpengine | grep 'UDP *:' | wc -l 56964
So, I suppose most of these just aren't being freed up.
These are short-duration calls, and the vast majority of them are CANCEL'd at various stages of completion. My approach to invocation of rtpengine in the config is:
1) rtpengine_offer() in branch_route[] on outbound call:
set_rtpengine_set("1"); rtpengine_offer("replace-origin replace-session-connection ICE=remove"); add_rr_param(";proxy_media=yes"); setflag(PROXY_MEDIA);
This can happen multiple times, as many branches may be attempted before a gateway is found to take the call.
2) Symmetrical rtpengine_answer() in onreply_route:
if(isflagset(PROXY_MEDIA) && has_body("application/sdp") && $var(origin_b2bua) == 0) { set_rtpengine_set("1"); rtpengine_answer("replace-origin replace-session-connection ICE=remove"); }
3) rtpengine_delete() in event of CANCEL:
if(is_method("CANCEL")) { # Kill RTP stream if this call was using rtpengine.
set_rtpengine_set("1"); rtpengine_delete();
# Implicitly forward the CANCEL and drop out of script # if the corresponding INVITE transaction exists.
if(!t_relay_cancel()) { # If we are still here, the INVITE transaction # was found but an error occurred.
sl_send_reply("500", "Internal Server Error"); exit; }
# If we are still here, this is also bad because it means # the corresponding INVITE transaction no longer exists.
xlog("L_INFO", 'action=R-MAIN-REQUEST | ret=CANCEL_NO_TRANS_MATCH');
exit;
4) rtpengine_delete() in case of BYE:
if(has_totag()) { if(loose_route()) { if(is_method("BYE")) { if(check_route_param("proxy_media=yes")) { set_rtpengine_set("1"); rtpengine_delete(); } } else if(is_method("INVITE")) { # update rtpengine_offer() etc. }
... } else { ... } }
Any obvious problem here?
Thanks!
-- Alex
Ah, never mind. This was a stupid question.
In short-duration traffic land, most calls end with all branches failing. I didn't have rtpengine_delete() cleanups in the failure_route[] for that scenario.
The volume of calls is far too large for rtpengine's own RTP timeout ("garbage collector") to release the ports fast enough.
False alarm, and apologies for wasting time.
-- Alex
For sake of information - the rtpengine_manage() function was added to simplify the config logic of using rtpengien_offer()/_answer()/_delete(). The manage function (for both rtpproxy and rtpengine) calls the other functions based on request method and response code. The current config file is using it in a subroute, executed in few other places, no longer under conditions related to method or reply code.
Also, the manage function handles properly when INVITE is without SDP and the ACK is carrying the SDP.
Cheers, Daniel
On 02/02/16 22:38, Alex Balashov wrote:
Ah, never mind. This was a stupid question.
In short-duration traffic land, most calls end with all branches failing. I didn't have rtpengine_delete() cleanups in the failure_route[] for that scenario.
The volume of calls is far too large for rtpengine's own RTP timeout ("garbage collector") to release the ports fast enough.
False alarm, and apologies for wasting time.
-- Alex
Daniel,
I do like rtpengine_manage() philosophically, and agree it's easier to use and handles the case where the SDP offer is in the reply from the UAS.
I used to use it. However, it doesn't behave correctly in certain route script contexts. Here's an example from my own environment:
--- route[XYZ_TRY] { # Lots of stuff that can be executed either with or without # transaction already having been created, e.g. use send_reply(), # never just sl_send_reply() or t_reply(), etc.
t_on_branch("XYZ_BRANCH"); t_on_failure("XYZ_FAILURE");
t_relay(); }
branch_route[XYZ_BRANCH] { rtpengine_manage(); }
failure_route[XYZ_FAILURE] { if(t_is_canceled()) exit;
# Load some more routes maybe, create new # branch.
$rd = "new_host";
route(XYZ_TRY); } ---
This behaves mostly as you'd expect, _except_ in the case of a branch timeout. If there's a _timeout_ per se, e.g. branch #1 host did not respond, then in the subsequent branch #2 attempt, rtpengine_manage() in the branch_route will send a 'delete' command rather than an updated 'offer' command.
Perhaps the root of the problem is that I am calling a request_route from a failure_route, but I don't know how else to recycle the huge corpus of logic that's in XYZ_TRY. I'm open to better suggestions.
Notwithstanding that, there may be other exotic cases as well where rtpengine_manage() doesn't do the right thing. Either way, for this reason I do not use it.
-- Alex
I don't think it behaves as you describe -- I have mainly used rtpproxy so far, but also rtpengine quite many times, and rtpengine variant is based on the rpproxy_manage() -- I quickly looked at the sources right now.
For an INVITE request, the delete command to rtp relay is done only when executed from failure_route. But you execute it from branch_route, which is different top route type and it is sending init or update session, based in intial request or re-invite.
The default configuration file uses rtpproxy_manage() for many stable versions right now. I use the same architecture in all my configs for rtpproxy like in default configuration, where the rtpproxy_manage() ends up being executed from branch_route, failure_route and onreply_route. No issues with serial forking and that is done a lot.
Are you sure it does a delete command when rtpengine_manage() is executed from branch_route following a relay to a new destination from failure_route? Or is just your guess?!? If it does, it's a bug in setting the route block type, but I would think that is very unlikely.
Cheers, Daniel
On 03/02/16 15:14, Alex Balashov wrote:
Daniel,
I do like rtpengine_manage() philosophically, and agree it's easier to use and handles the case where the SDP offer is in the reply from the UAS.
I used to use it. However, it doesn't behave correctly in certain route script contexts. Here's an example from my own environment:
route[XYZ_TRY] { # Lots of stuff that can be executed either with or without # transaction already having been created, e.g. use send_reply(), # never just sl_send_reply() or t_reply(), etc.
t_on_branch("XYZ_BRANCH"); t_on_failure("XYZ_FAILURE");
t_relay(); }
branch_route[XYZ_BRANCH] { rtpengine_manage(); }
failure_route[XYZ_FAILURE] { if(t_is_canceled()) exit;
# Load some more routes maybe, create new # branch.
$rd = "new_host";
route(XYZ_TRY); }
This behaves mostly as you'd expect, _except_ in the case of a branch timeout. If there's a _timeout_ per se, e.g. branch #1 host did not respond, then in the subsequent branch #2 attempt, rtpengine_manage() in the branch_route will send a 'delete' command rather than an updated 'offer' command.
Perhaps the root of the problem is that I am calling a request_route from a failure_route, but I don't know how else to recycle the huge corpus of logic that's in XYZ_TRY. I'm open to better suggestions.
Notwithstanding that, there may be other exotic cases as well where rtpengine_manage() doesn't do the right thing. Either way, for this reason I do not use it.
-- Alex
On 03/02/16 15:14, Alex Balashov wrote:
[...]
Notwithstanding that, there may be other exotic cases as well where rtpengine_manage() doesn't do the right thing.
... forgot about the above remark: when I added rtpproxy_manage() I evaluated all possibilities, so while I am not excluding I overlooked something, the function is quite old by now not to have such case surfaced.
It handles properly the situations when INVITE has or not SDP, ACK has or not SDP, UPDATE has or not SDP, INVITE responses with SDP, as well as failed/cancelled initial INVITEs and the BYE requests.
Practically rtpproxy_manage() is executing the rtpproxy_offer()/_answer()/_delete() based on the request/response type and state of transaction.
If you are aware of a case not properly covered, then report it as a bug on tracker and it will be fixed, the role of rtpproxy_manage() is to simplify the config file, by doing all those different conditions in code.
Cheers, Daniel