[SR-Users] Solutions to missing BYEs, accounting for them

Wed Apr 21 11:21:01 CEST 2010

Hi all,

Please forgive the slightly long post, but if you have anything to 
contribute on this topic, please consider giving it a read as I could 
really use your input.  :-)

As I'm sure many others of you running proxy-based service delivery 
platforms of some description also, I am faced with the problem of 
trying to account for calls with missing BYEs in a realistic way. 
There is no shortage of mailing list posts over the years on this 
topic.  Inevitably, in a platform with sufficient call volume, with 
some NAT'd and/or endpoint diversity and other technical causes, there 
will be some calls that are never officially terminated from the point 
of view of a proxy.

The ability of the 'dialog' module to spoof bidirectional BYEs on 
timeout[1] goes a long way toward addressing this problem 
theoretically.  However, there are practical obstacles to relying on 
it solely as a solution, mainly because there is not an acceptable 
timeout value to use as a trade-off.  If the timeout period is set to 
a very low value, users will obviously complain, and in any case, 
depending on the destination, the worst-case scenario for maximum call 
billing may still be far too high.  If the timeout period is set 
high--perhaps something like 5-8 hours--then all calls that fail to 
end in the normal way will be billed some excessively large amount 
that certainly will not sit well with users either.

If either the core delivery element of the platform or the user agent 
is tightly controlled by the operator of the proxy from an 
administrative point of view, it is indeed probably possible to rely 
on RTP timeouts or SIP Session Timers (SSTs) on one of the endpoints.

That doesn't create a satisfying resolution for those of us dealing 
with indeterminate call completion scenarios with a great deal of user 
and vendor diversity, though.  For instance, I route to about 15 ITSPs 
and carriers;  I think maybe one of them does 15-minute SSTs, and the 
rest are certainly not going to turn them on just for me, even if 
their SBCs/switches/things have the capability.  The user endpoints 
are mostly Asterisk and do RTP timeout, of course, and in most cases I 
do get the resulting BYE.  However, this discussion is about the 
minute but nontrivial percentage of cases in which I do not get the 
BYE, whether because of NAT statekeeping problems or network 
reachability or whatever underlying causes--in truth, I cannot 
accurately characterise these.

So, it seems to me that from a theoretical point of view, there are 
basically two directions someone in this position can go from here:

1) Inline B2BUA in the signaling path of all calls;

1a) Make it do SSTs; or
1b) Make it relay media, too, and hang up the call (bidirectional BYE) 
on RTP receive timeout;

2) Couple the proxy to an RTP relay and provide some mechanism by 
which the proxy can be made aware, in an asynchronous fashion, that an 
RTP timeout was detected by the relay.

It seems to me from a brief and informal survey of prior mailing list 
literature that #1 is the usually recommended option here.

If #1 is pursued, what is the best tool to use in the 
Kamailio/SIP-Router-oriented ecosystem?  My default instinct would say 
SEMS;  I really like SEMS, and use it a lot for various related chores.

The problem is that the pre-built modules and examples for SEMS mostly 
center on application-level functionality, while low-level 
documentation of its powerful C++ API is a bit impoverished, so this 
would take a lot of work.

Needless to say, I am interested in the option that requires the least 
work but still solves the problem in an elegant way from a technical 
and--dare I say--aesthetic perspective.

For instance, it seems clear from looking at the SEMS-1.1.1 sources 
that SSTs are supported in principle in core/plug-in/session_timer. 
But unless I am missing something, I cannot find anywhere in the 
sources or examples where it is actually used.

So, I suppose one option is to figure out how to make this stuff work 
in SEMS, and make it work.  But for some reason who is not attune to 
the universe of its C++ API, it is a rather formidable chore.  I think 
the same would hold true of making it observe bidirectional RTP timeout.

Turning attention to option #2, I have looked at rtpproxy (my 
preferred default), iptrtpproxy, and mediaproxy modules but have not 
found any evidence that the control protocols Kamailio/SR uses to 
engage them support any notion of backward asynchronous feedback in 
case of RTP timeout.

It would be really nice if one of these stream control protocols was 
augmented to kick back a packet to Kamailio that can be caught in a 
special event_route, like event_route[nathelper:rtp-stream-timeout], 
but that is clearly not the case today.

To be honest, I would not use MediaProxy even if it had this feature, 
because, well, let's be bluntly honest and acknowledge what the more 
politically aware presumably already conjecture: in light of AG 
Projects' zealous OpenSIPS partnership, it's difficult to muster 
confidence in future compatibility of MediaProxy with Kamailio.  The 
module is there, it works, and I'm sure its maintainers are dedicated 
to doing whatever it takes to reverse engineer and keep it working, 
lift patches from OpenSIPS as necessary, etc., but who wants to be on 
the wrong side of the project ecosystem fence?  Not I.

That leaves iptrtpproxy, whose 'switchboard' concept I do not fully 
comprehend due to lack of experience with it, but which holds a 
potentially viable, if slightly kludgy/Rube Goldbergian answer.  Of 
the three RTP proxies, it is the only one that provides a ready means 
of exporting a list of media streams it is currently tracking, 
together with statistics on how many packets have been received, etc. 
  It is not inconceivable to cook up an external process that will 
frequently check this 'switchboard', as it were, and incite 
Kamailio/SR to do dlg_bye() via MI if it appears that the media stream 
has disappeared from either side;  the dialog module helpfully exports 
the MI command dlg_end_dlg.

Still, this does not seem nearly as parsimonious and reliable a 
solution as simply building some kind of RTP stream leg timeout 
notification into the control socket.  After all, the control socket 
is open persistently, right, not on-demand?  The various RTP proxies 
all seem to have some kind of dead peer detection internally in order 
to have some means of gracefully expiring resources allocated to media 
streams that have gone away, so it would just be a matter of passing a 
control frame up the socket to Kamailio/SR and wiring that to a custom 
event_route or a more static callback in the code.

By the way, I should mention that I am aware of and historically very 
sympathetic to the perspective that this kind of call control is alien 
to the nature of a proxy, and an appropriate job for UAs and not 
proxies at all.  However, we all have to make pragmatic concessions to 
the realities of real-world operation, which I assume is the 
motivation for dialog timeouts, dlg_bye(), and other perversions from 
the point of view of a purist.  :-)

I welcome your thoughts and suggestions about the easiest and most 
technically meritorious approach.

Thanks,

-- Alex

[1] Enabled via $dlg_ctx(timeout_bye) = 1

-- 
Alex Balashov - Principal
Evariste Systems LLC
1170 Peachtree Street
12th Floor, Suite 1200
Atlanta, GA 30309
Tel: +1-678-954-0670
Fax: +1-404-961-1892
Web: http://www.evaristesys.com/