Hey all,
I'm currently working on improving the dialog module. As some may know, the module isn't perfect with regards to how and and under which circumstances end-to-end calls are tracked properly. Although there are several locations in the code where additional work could assumingly raise the module's quality, I'd like to focus discussion on two major components that IMHO require refurbishment first and foremost.
Any feedback on the described issues, my proposed solutions, or anything else I haven't specifically addressed will be greatly appreciated.
Let's start with the problems and follow up with the solutions.
Problems --------
(1) When an *unconfirmed* request is received by a proxy multiple times, the dialog module will establish a new dialog for each reception where instead it should be just one. For instance, in our infrastructure, call forwardings are implemented by means of sending out messages with the request URI rewritten to a set of local balancers which will re-route the (slightly modified) request back to our proxies. When this happens, the dialog module doesn't match the subsequent, still unconfirmed request to the initial, unconfirmed request, and creates a new dialog. This erroneous behavior has several consequences, such as distorting statistics, wasting memory, and others.
(2) Forking isn't handled at all. Once you do so by, e.g., calling append_branch() in the configuration script, all branches will be mapped to the same, single dialog. Consequently, the state transition machine of the dialog is affected by each and every of these branches and will affect dialog tracking depending on the order of forked responses received. (Consider the case where one branch's 5xx response is observed first by the proxy, followed by a successful, other branch's 200. In this case, the 200 doesn't affect the dialog state positively anymore since the 5xx made it terminate prematurely.)
Solutions ---------
(1) The reason why Kamailio keeps instantiating new dialogs while it should not is that the dlg_onreq() function keeps things too simple. This function is set up as a callback to tm's TMCB_REQUEST_IN event and only refrains from its main purpose of creating new dialogs if a dialog to a given message is already established. Naturally, this does do not cover scenarios where unconfirmed requests are legitimately re-seen by a proxy as described above.
The solution is to re-use the same basic dialog matching logic that the dlg_onroute() function has with one particular exception that the yet missing To-tag for an unconfirmed dialog must not be essential for dialog matching to succeed. I managed to create a patch that implements this kind of dialog continuation and tested it accordingly. The fix was created for Kamailio 1.5 but should be rather easily portable to sip-router. In my opinion, problem (1) should be considered as a bug since it breaks correct dialog working in certain situation and therefore the patch should be included in 1.5 as well. If no one objects, I will submit it for evaluation and upstream incorporation soon.
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order to differentiate, a branch ID must be introduced to the dialog module to tell dialogs apart. One way to do so would be to provide the branch index to TMCB_REQUEST_BUILT callbacks during execution of the t_forward_nonack() function (tm/t_fwd.c) . That way, the dialog module could register for that specific tm event and treat branched, unconfirmed dialogs accordingly.
A follow-up design choice that is to be made is how to treat branches once the dialog module is capable of. Two approaches come to my mind: Make the branch ID part of the dialog, thereby treating forked branches as separate dialogs; or, continue mapping branches to the same dialog but adjust the dialog state transition machine such that a forking dialog is not discarded unless all branches failed. To me, the first approach seems to be more straight as it allows to apply the same state machine logic simply to each branched dialog.
To me, a proper solution to problem (2) seems more difficult to find, and I haven't finally settled for a preferred method myself yet. That's why I'd be especially glad for any further input on this point.
Finally, please note that I've been digging into Kamailio 1.5 code most of the time because this is where I need to implement any improvements. sip-router's implementation seemed to be quite on a par with 1.5's, however, last time I checked. Naturally, I'd be helpful in getting any improvements upstream in either version allowable.
Thanks and
cheers,
--Timo