Mitigation of unavailable rtpproxy - sr-users

6 Nov 2013


      Hello,
(Sorry for cross-posting to -users and -dev;  not really sure where this 
post belongs most.)
A few days ago, I ran into an issue with a Kamailio server being 
somewhat unresponsive, during moderate call volume, on account of a 
down rtpproxy--the only rtpproxy in the set.  This is rtpproxy classic, 
not ngcp-mediaproxy-ng.
Rtpproxy was not actually engaged on any of the initial INVITEs going 
through the server;  the server is configured to invoke it conditionally 
based on a setting, and the setting was not set for any endpoints. 
rtpproxy_manage() was never called.
However, I call unforce_rtp_proxy() unconditionally in my config when 
handling CANCELs, reasoning that it can't do any harm if 
rtpproxy_manage() was not called before[1].
Nevertheless, it seemed to be the case that this situation was clogging 
up SIP worker threads, because some SIP messages were definitely 
dropped.  Periodic log messages about inability to reach the rtpproxy 
were echoed as well.  This problem cleared up almost immediately when 
the rtpproxy instance was restored into service.
This raised some questions in my mind about the relationship between 
rtpproxy management and SIP worker thread utilisation.  I assume it was 
my indiscriminate unforce_rtp_proxy() calls that were actually clogging 
up the worker threads, right?  If so, why?  I figured that in the 
unforce_rtp_proxy() case, the rtpproxy module simply sends 
fire-and-forget UDP messages down the UDP control socket without any 
sort of blocking for acknowledgement, since in this case the call must 
be released on the rtpproxy side without doing any rewriting of SDP on 
the Kamailio side (unlike in the case where rtpproxy is engaged).  Thus, 
there should be no need to wait for ports to substitute into the 
message.  Or is the same response-wait mechanism used regardless, even 
in the unforce_rtp_proxy() case, for programmatic reasons?
More broadly, is there any way that this scenario can be prevented?  In 
other words, is there a way to work around an outage of all rtpproxies 
in the set without tying up workers, or at least tying them up less 
severely?
Thanks!
-- Alex
[1] Is this a reasonable assumption?
The reason I do this is that I don't see a way to find out if
     rtpproxy was engaged from the body of a CANCEL message.  I do check
     for a ;proxy_media RR parameter when handling BYEs, but since a
     CANCEL is not an in-dialog request, I'm not sure what to do except
     to call unforce_rtp_proxy()/rtpproxy_manage() indiscriminately,
     without resorting to storing state in htable or other complications
     I don't want.
-- 
Alex Balashov - Principal
Evariste Systems LLC
235 E Ponce de Leon Ave
Suite 106
Decatur, GA 30030
United States
Tel: +1-678-954-0670
Web: http://www.evaristesys.com/, http://www.alexbalashov.com/