improving the dialog module - sr-dev

12 Mar 2010


      Hey all,
I'm currently working on improving the dialog module. As some may know,
the module isn't perfect with regards to how and and under which
circumstances end-to-end calls are tracked properly. Although there are
several locations in the code where additional work could assumingly
raise the module's quality, I'd like to focus discussion on two major
components that IMHO require refurbishment first and foremost.
Any feedback on the described issues, my proposed solutions, or anything
else I haven't specifically addressed will be greatly appreciated.
Let's start with the problems and follow up with the solutions.
Problems
--------
(1)
When an *unconfirmed* request is received by a proxy multiple times, the
dialog module will establish a new dialog for each reception where
instead it should be just one. For instance, in our infrastructure, call
forwardings are implemented by means of sending out messages with the
request URI rewritten to a set of local balancers which will re-route
the (slightly modified) request back to our proxies. When this happens,
the dialog module doesn't match the subsequent, still unconfirmed
request to the initial, unconfirmed request, and creates a new dialog.
This erroneous behavior has several consequences, such as distorting
statistics, wasting memory, and others.
(2)
Forking isn't handled at all. Once you do so by, e.g., calling
append_branch() in the configuration script, all branches will be mapped
to the same, single dialog. Consequently, the state transition machine
of the dialog is affected by each and every of these branches and will
affect dialog tracking depending on the order of forked responses
received. (Consider the case where one branch's 5xx response is observed
first by the proxy, followed by a successful, other branch's 200. In
this case, the 200 doesn't affect the dialog state positively anymore
since the 5xx made it terminate prematurely.)
Solutions
---------
(1)
The reason why Kamailio keeps instantiating new dialogs while it should
not is that the dlg_onreq() function keeps things too simple. This
function is set up as a callback to tm's TMCB_REQUEST_IN event and only
refrains from its main purpose of creating new dialogs if a dialog to a
given message is already established. Naturally, this does do not cover
scenarios where unconfirmed requests are legitimately re-seen by a proxy
as described above.
The solution is to re-use the same basic dialog matching logic that the
dlg_onroute() function has with one particular exception that the yet
missing To-tag for an unconfirmed dialog must not be essential for
dialog matching to succeed. I managed to create a patch that implements
this kind of dialog continuation and tested it accordingly. The fix was
created for Kamailio 1.5 but should be rather easily portable to
sip-router. In my opinion, problem (1) should be considered as a bug
since it breaks correct dialog working in certain situation and
therefore the patch should be included in 1.5 as well. If no one
objects, I will submit it for evaluation and upstream incorporation soon.
(2)
Kamailio maps all forked branches to the same dialog because they all
carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order
to differentiate, a branch ID must be introduced to the dialog module to
tell dialogs apart. One way to do so would be to provide the branch
index to TMCB_REQUEST_BUILT callbacks during execution of the
t_forward_nonack() function (tm/t_fwd.c) . That way, the dialog module
could register for that specific tm event and treat branched,
unconfirmed dialogs accordingly.
A follow-up design choice that is to be made is how to treat branches
once the dialog module is capable of. Two approaches come to my mind:
Make the branch ID part of the dialog, thereby treating forked branches
as separate dialogs; or, continue mapping branches to the same dialog
but adjust the dialog state transition machine such that a forking
dialog is not discarded unless all branches failed. To me, the first
approach seems to be more straight as it allows to apply the same state
machine logic simply to each branched dialog.
To me, a proper solution to problem (2) seems more difficult to find,
and I haven't finally settled for a preferred method myself yet. That's
why I'd be especially glad for any further input on this point.
Finally, please note that I've been digging into Kamailio 1.5 code most
of the time because this is where I need to implement any improvements.
sip-router's implementation seemed to be quite on a par with 1.5's,
however, last time I checked. Naturally, I'd be helpful in getting any
improvements upstream in either version allowable.
Thanks and
cheers,
--Timo