[sr-dev] Solutions to missing BYEs, accounting for them

Wed Apr 21 16:07:19 CEST 2010

Hello Alex,

I echo with your thought that one must allow some "real-world" exceptions, specially when you are faced with problems that haven't being addressed in the original concept.
I am not a SIP purist, but after thinking about some similar workarounds as you are suggesting, I believe that the SIP Session Timers you mention (RFC 4028) is the only answer which is worth to work on, even though you say it is not satisfying. Actually it does solve some problems I have, when the timeout detection is necessary. I was forced to disable it completely in my environment because of some interoperability problems, but I really plan to turn it on again soon.
Perhaps I am more comfortable to say that because I don't have such a diverse environment as yours, but we deal with 10 thousand clients from 5 different vendors. I've seen very diverse behaviors and many times we had to push them to implement some feature we needed, and nothing was better than bringing a RFC instead of implementing workarounds. This is the philosophy I am still trying to keep, bringing the manufactures to go in conformance with the RFCs and then I'll try to turn the timers on again. But I am afraid that may be different in your case.

May I politely ask why the carriers would not switch on the Session Timers "for you"? I don't mean you haven't tried it already, I actually believe that you have probably heard a lot of "no", but before we go on with the discussion, I'd really be interested to know their argumentation.

Mit freundlichen Grüßen / Best regards

Ricardo Keigo de Sales Andrade

Robert Bosch GmbH
 (CI/AFU)
Postfach 30 02 20
70442 Stuttgart
GERMANY
www.bosch.com

Tel. +49(711)811-3607004
Mobil +49(172)1081152
ricardo.andrade at br.bosch.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart, HRB 14000;
Aufsichtsratsvorsitzender: Hermann Scholl; Geschäftsführung: Franz Fehrenbach, Siegfried Dais;
Bernd Bohr, Rudolf Colm, Volkmar Denner, Gerhard Kümmel, Wolfgang Malchow, Peter Marks,
Peter Tyroller; Uwe Raschke

-----Ursprüngliche Nachricht-----
Date: Wed, 21 Apr 2010 05:40:03 -0400
From: Alex Balashov <abalashov at evaristesys.com>
Subject: [sr-dev] Solutions to missing BYEs, accounting for them
To: sr-dev <sr-dev at lists.sip-router.org>
Message-ID: <4BCEC7F3.5070506 at evaristesys.com>
Content-Type: text/plain; charset=UTF-8; format=flowed

[Sorry for cross-posting this from sr-users;  after some reflection
upon posting, I got the impression this question may be more
developer-centric than I initially imagined.]

Hi all,

Please forgive the slightly long post, but if you have anything to
contribute on this topic, please consider giving it a read as I could
really use your input.  :-)

As I'm sure many others of you running proxy-based service delivery
platforms of some description also, I am faced with the problem of
trying to account for calls with missing BYEs in a realistic way.
There is no shortage of mailing list posts over the years on this
topic.  Inevitably, in a platform with sufficient call volume, with
some NAT'd and/or endpoint diversity and other technical causes, there
will be some calls that are never officially terminated from the point
of view of a proxy.

The ability of the 'dialog' module to spoof bidirectional BYEs on
timeout[1] goes a long way toward addressing this problem
theoretically.  However, there are practical obstacles to relying on
it solely as a solution, mainly because there is not an acceptable
timeout value to use as a trade-off.  If the timeout period is set to
a very low value, users will obviously complain, and in any case,
depending on the destination, the worst-case scenario for maximum call
billing may still be far too high.  If the timeout period is set
high--perhaps something like 5-8 hours--then all calls that fail to
end in the normal way will be billed some excessively large amount
that certainly will not sit well with users either.

If either the core delivery element of the platform or the user agent
is tightly controlled by the operator of the proxy from an
administrative point of view, it is indeed probably possible to rely
on RTP timeouts or SIP Session Timers (SSTs) on one of the endpoints.

That doesn't create a satisfying resolution for those of us dealing
with indeterminate call completion scenarios with a great deal of user
and vendor diversity, though.  For instance, I route to about 15 ITSPs
and carriers;  I think maybe one of them does 15-minute SSTs, and the
rest are certainly not going to turn them on just for me, even if
their SBCs/switches/things have the capability.  The user endpoints
are mostly Asterisk and do RTP timeout, of course, and in most cases I
do get the resulting BYE.  However, this discussion is about the
minute but nontrivial percentage of cases in which I do not get the
BYE, whether because of NAT statekeeping problems or network
reachability or whatever underlying causes--in truth, I cannot
accurately characterise these.

So, it seems to me that from a theoretical point of view, there are
basically two directions someone in this position can go from here:

1) Inline B2BUA in the signaling path of all calls;

1a) Make it do SSTs; or
1b) Make it relay media, too, and hang up the call (bidirectional BYE)
on RTP receive timeout;

2) Couple the proxy to an RTP relay and provide some mechanism by
which the proxy can be made aware, in an asynchronous fashion, that an
RTP timeout was detected by the relay.

It seems to me from a brief and informal survey of prior mailing list
literature that #1 is the usually recommended option here.

If #1 is pursued, what is the best tool to use in the
Kamailio/SIP-Router-oriented ecosystem?  My default instinct would say
SEMS;  I really like SEMS, and use it a lot for various related chores.

The problem is that the pre-built modules and examples for SEMS mostly
center on application-level functionality, while low-level
documentation of its powerful C++ API is a bit impoverished, so this
would take a lot of work.

Needless to say, I am interested in the option that requires the least
work but still solves the problem in an elegant way from a technical
and--dare I say--aesthetic perspective.

For instance, it seems clear from looking at the SEMS-1.1.1 sources
that SSTs are supported in principle in core/plug-in/session_timer.
But unless I am missing something, I cannot find anywhere in the
sources or examples where it is actually used.

So, I suppose one option is to figure out how to make this stuff work
in SEMS, and make it work.  But for some reason who is not attune to
the universe of its C++ API, it is a rather formidable chore.  I think
the same would hold true of making it observe bidirectional RTP timeout.

Turning attention to option #2, I have looked at rtpproxy (my
preferred default), iptrtpproxy, and mediaproxy modules but have not
found any evidence that the control protocols Kamailio/SR uses to
engage them support any notion of backward asynchronous feedback in
case of RTP timeout.

It would be really nice if one of these stream control protocols was
augmented to kick back a packet to Kamailio that can be caught in a
special event_route, like event_route[nathelper:rtp-stream-timeout],
but that is clearly not the case today.

To be honest, I would not use MediaProxy even if it had this feature,
because, well, let's be bluntly honest and acknowledge what the more
politically aware presumably already conjecture: in light of AG
Projects' zealous OpenSIPS partnership, it's difficult to muster
confidence in future compatibility of MediaProxy with Kamailio.  The
module is there, it works, and I'm sure its maintainers are dedicated
to doing whatever it takes to reverse engineer and keep it working,
lift patches from OpenSIPS as necessary, etc., but who wants to be on
the wrong side of the project ecosystem fence?  Not I.

That leaves iptrtpproxy, whose 'switchboard' concept I do not fully
comprehend due to lack of experience with it, but which holds a
potentially viable, if slightly kludgy/Rube Goldbergian answer.  Of
the three RTP proxies, it is the only one that provides a ready means
of exporting a list of media streams it is currently tracking,
together with statistics on how many packets have been received, etc.
  It is not inconceivable to cook up an external process that will
frequently check this 'switchboard', as it were, and incite
Kamailio/SR to do dlg_bye() via MI if it appears that the media stream
has disappeared from either side;  the dialog module helpfully exports
the MI command dlg_end_dlg.

Still, this does not seem nearly as parsimonious and reliable a
solution as simply building some kind of RTP stream leg timeout
notification into the control socket.  After all, the control socket
is open persistently, right, not on-demand?  The various RTP proxies
all seem to have some kind of dead peer detection internally in order
to have some means of gracefully expiring resources allocated to media
streams that have gone away, so it would just be a matter of passing a
control frame up the socket to Kamailio/SR and wiring that to a custom
event_route or a more static callback in the code.

By the way, I should mention that I am aware of and historically very
sympathetic to the perspective that this kind of call control is alien
to the nature of a proxy, and an appropriate job for UAs and not
proxies at all.  However, we all have to make pragmatic concessions to
the realities of real-world operation, which I assume is the
motivation for dialog timeouts, dlg_bye(), and other perversions from
the point of view of a purist.  :-)

I welcome your thoughts and suggestions about the easiest and most
technically meritorious approach.

Thanks,

-- Alex

[1] Enabled via $dlg_ctx(timeout_bye) = 1

--
Alex Balashov - Principal
Evariste Systems LLC
1170 Peachtree Street
12th Floor, Suite 1200
Atlanta, GA 30309
Tel: +1-678-954-0670
Fax: +1-404-961-1892
Web: http://www.evaristesys.com/