Hey all,
I'm currently working on improving the dialog module. As some may know, the module isn't perfect with regards to how and and under which circumstances end-to-end calls are tracked properly. Although there are several locations in the code where additional work could assumingly raise the module's quality, I'd like to focus discussion on two major components that IMHO require refurbishment first and foremost.
Any feedback on the described issues, my proposed solutions, or anything else I haven't specifically addressed will be greatly appreciated.
Let's start with the problems and follow up with the solutions.
Problems --------
(1) When an *unconfirmed* request is received by a proxy multiple times, the dialog module will establish a new dialog for each reception where instead it should be just one. For instance, in our infrastructure, call forwardings are implemented by means of sending out messages with the request URI rewritten to a set of local balancers which will re-route the (slightly modified) request back to our proxies. When this happens, the dialog module doesn't match the subsequent, still unconfirmed request to the initial, unconfirmed request, and creates a new dialog. This erroneous behavior has several consequences, such as distorting statistics, wasting memory, and others.
(2) Forking isn't handled at all. Once you do so by, e.g., calling append_branch() in the configuration script, all branches will be mapped to the same, single dialog. Consequently, the state transition machine of the dialog is affected by each and every of these branches and will affect dialog tracking depending on the order of forked responses received. (Consider the case where one branch's 5xx response is observed first by the proxy, followed by a successful, other branch's 200. In this case, the 200 doesn't affect the dialog state positively anymore since the 5xx made it terminate prematurely.)
Solutions ---------
(1) The reason why Kamailio keeps instantiating new dialogs while it should not is that the dlg_onreq() function keeps things too simple. This function is set up as a callback to tm's TMCB_REQUEST_IN event and only refrains from its main purpose of creating new dialogs if a dialog to a given message is already established. Naturally, this does do not cover scenarios where unconfirmed requests are legitimately re-seen by a proxy as described above.
The solution is to re-use the same basic dialog matching logic that the dlg_onroute() function has with one particular exception that the yet missing To-tag for an unconfirmed dialog must not be essential for dialog matching to succeed. I managed to create a patch that implements this kind of dialog continuation and tested it accordingly. The fix was created for Kamailio 1.5 but should be rather easily portable to sip-router. In my opinion, problem (1) should be considered as a bug since it breaks correct dialog working in certain situation and therefore the patch should be included in 1.5 as well. If no one objects, I will submit it for evaluation and upstream incorporation soon.
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order to differentiate, a branch ID must be introduced to the dialog module to tell dialogs apart. One way to do so would be to provide the branch index to TMCB_REQUEST_BUILT callbacks during execution of the t_forward_nonack() function (tm/t_fwd.c) . That way, the dialog module could register for that specific tm event and treat branched, unconfirmed dialogs accordingly.
A follow-up design choice that is to be made is how to treat branches once the dialog module is capable of. Two approaches come to my mind: Make the branch ID part of the dialog, thereby treating forked branches as separate dialogs; or, continue mapping branches to the same dialog but adjust the dialog state transition machine such that a forking dialog is not discarded unless all branches failed. To me, the first approach seems to be more straight as it allows to apply the same state machine logic simply to each branched dialog.
To me, a proper solution to problem (2) seems more difficult to find, and I haven't finally settled for a preferred method myself yet. That's why I'd be especially glad for any further input on this point.
Finally, please note that I've been digging into Kamailio 1.5 code most of the time because this is where I need to implement any improvements. sip-router's implementation seemed to be quite on a par with 1.5's, however, last time I checked. Naturally, I'd be helpful in getting any improvements upstream in either version allowable.
Thanks and
cheers,
--Timo
Hi Timo!
Am 12.03.2010 15:22, schrieb Timo Reimann:
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order
Why do they have the same to-tag? Actually the to-tag creates a dialog.
regards klaus
Hey Klaus,
Klaus Darilion wrote:
Am 12.03.2010 15:22, schrieb Timo Reimann:
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order
Why do they have the same to-tag? Actually the to-tag creates a dialog.
Yeah my bad, you're completely right. Of course, they do not have any To-tag while still unconfirmed. Still, branches will be mapped to the same dialog.
Thanks for clarifying.
Cheers,
--timo
Am 12.03.2010 15:57, schrieb Timo Reimann:
Hey Klaus,
Klaus Darilion wrote:
Am 12.03.2010 15:22, schrieb Timo Reimann:
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order
Why do they have the same to-tag? Actually the to-tag creates a dialog.
Yeah my bad, you're completely right. Of course, they do not have any To-tag while still unconfirmed. Still, branches will be mapped to the same dialog.
So, maybe if a response is received and there is no dialog with this totag, just create a new dialog. Also somehow group these early dialogs to destroy them in case one of these gets into confirmed state.
regards klaus
Thanks for clarifying.
Cheers,
--timo
Klaus Darilion wrote:
Am 12.03.2010 15:22, schrieb Timo Reimann:
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order
Why do they have the same to-tag? Actually the to-tag creates a dialog.
Yeah my bad, you're completely right. Of course, they do not have any To-tag while still unconfirmed. Still, branches will be mapped to the same dialog.
So, maybe if a response is received and there is no dialog with this totag, just create a new dialog. Also somehow group these early dialogs to destroy them in case one of these gets into confirmed state.
Applying your idea to the example from my original post (5xx response received from branch #1 prior to 200 response from branch #2): On reception of the 5xx response the proxy would first destroy the dialog that was created when the request was received. Then, after reception of the 200, it would detect no matching dialog and (re-)create a new one. Did I get that right?
If so, my major concern with this approach is that it will break dialog callback functionality. If a dialog user, upon creation of an unconfirmed dialog (initial request received), registers for further callbacks associated with that dialog (for instance, DLGCB_CONFIRMED), it won't get any further callbacks in the scenario outlined above. The reason for this is that when the dialog is terminated due to the 5xx response, all associated callbacks will be swept out too, and the re-created dialog's structure will not contain any callbacks yet.
One possible workaround to this would be to somehow remember the number of branches forked initially and not shut down the dialog unless all branches have failed, just like I mentioned in my original post.
But the 5xx-before-200 is just a special scenario how multiple branches can affect the dialog state. You'd still need to deal with other cases, such as multiple branches in the early state. Which leads back to my initial question on how one could achieve that.
Cheers,
--Timo
2010/3/12 Timo Reimann timo.reimann@1und1.de:
If so, my major concern with this approach is that it will break dialog callback functionality. If a dialog user, upon creation of an unconfirmed dialog (initial request received), registers for further callbacks associated with that dialog (for instance, DLGCB_CONFIRMED), it won't get any further callbacks in the scenario outlined above. The reason for this is that when the dialog is terminated due to the 5xx response, all associated callbacks will be swept out too, and the re-created dialog's structure will not contain any callbacks yet.
One possible workaround to this would be to somehow remember the number of branches forked initially and not shut down the dialog unless all branches have failed, just like I mentioned in my original post.
This workaround wouldn't work in serial forking, in which the proxy generates the second branch after the first one has been terminated.
2010/3/12 Klaus Darilion klaus.mailinglists@pernau.at:
Why do they have the same to-tag? Actually the to-tag creates a dialog.
Yeah my bad, you're completely right. Of course, they do not have any To-tag while still unconfirmed. Still, branches will be mapped to the same dialog.
Hummm, this is not correct. The only response having no To-tag is a 100 Trying (which doesn't create an early-dialog). Any other provisional response MUST contain a To-tag and identifies an early-dialog.
The dialog module should create a dialog entry for every early-dialogs generated by a single request (this can occurs when the proxy itself forks or when a proxy behind forks, in any case inspecting the To-tag should do the work).
Having a different dialog entry for each early-dialog is also required in order to properly handle the CSeq value. For example a request could be forked in parallel to GW1 and GW2 and GW2 could require 100rel so the UAC would send a PRACK for the early-dialog with GW2. This PRACK would increase the client CSeq just in the second early-dialog and this should be updated in the earylu-dialog entry in the dialog module. If not, local generated requests (as BYE) could fail if the don't contain and increased CSeq within the appropriate (early-)dialog.
By fixing this point a potential new issue arises: I use dialog module to count the number of dialogs per client (in order to limit them). If a request is forked by the proxy, I still need it to be computed as a single dialog for the originating client (even if for dialog module there are two active early-dialogs). A solution would be a new MI function that counts dialog by Call-ID and From-tag, so two early-dialogs due to local forking (or remote forking) would be considered a single dialog for the client.
So, maybe if a response is received and there is no dialog with this totag, just create a new dialog. Also somehow group these early dialogs to destroy them in case one of these gets into confirmed state.
They should be destroyed by the 487 received after cancelling them, right? If no 487 is received then a local timeout would do the job. I don't see the need of a special handling for forked requests. Instead, the whold dialog module should be improved to manage early-dialogs rather than just confirmed dialogs, and this should be all, no need for more logic.
Regards.
Hello Timo,
I have a question with respect to issue no. 1. If I understand this correctly, in your particular scenario you have two proxy servers (P1 and P2) and the topology of the call is described by the following diagram.
UA1 --> P1 --> P2 --> P1 -->UA2
Are you suggesting that one single dialog should be kept in the proxy server P1?
Regards, Ovidiu Sas
On Fri, Mar 12, 2010 at 9:22 AM, Timo Reimann timo.reimann@1und1.de wrote:
Hey all,
I'm currently working on improving the dialog module. As some may know, the module isn't perfect with regards to how and and under which circumstances end-to-end calls are tracked properly. Although there are several locations in the code where additional work could assumingly raise the module's quality, I'd like to focus discussion on two major components that IMHO require refurbishment first and foremost.
Any feedback on the described issues, my proposed solutions, or anything else I haven't specifically addressed will be greatly appreciated.
Let's start with the problems and follow up with the solutions.
Problems
(1) When an *unconfirmed* request is received by a proxy multiple times, the dialog module will establish a new dialog for each reception where instead it should be just one. For instance, in our infrastructure, call forwardings are implemented by means of sending out messages with the request URI rewritten to a set of local balancers which will re-route the (slightly modified) request back to our proxies. When this happens, the dialog module doesn't match the subsequent, still unconfirmed request to the initial, unconfirmed request, and creates a new dialog. This erroneous behavior has several consequences, such as distorting statistics, wasting memory, and others.
(2) Forking isn't handled at all. Once you do so by, e.g., calling append_branch() in the configuration script, all branches will be mapped to the same, single dialog. Consequently, the state transition machine of the dialog is affected by each and every of these branches and will affect dialog tracking depending on the order of forked responses received. (Consider the case where one branch's 5xx response is observed first by the proxy, followed by a successful, other branch's 200. In this case, the 200 doesn't affect the dialog state positively anymore since the 5xx made it terminate prematurely.)
Solutions
(1) The reason why Kamailio keeps instantiating new dialogs while it should not is that the dlg_onreq() function keeps things too simple. This function is set up as a callback to tm's TMCB_REQUEST_IN event and only refrains from its main purpose of creating new dialogs if a dialog to a given message is already established. Naturally, this does do not cover scenarios where unconfirmed requests are legitimately re-seen by a proxy as described above.
The solution is to re-use the same basic dialog matching logic that the dlg_onroute() function has with one particular exception that the yet missing To-tag for an unconfirmed dialog must not be essential for dialog matching to succeed. I managed to create a patch that implements this kind of dialog continuation and tested it accordingly. The fix was created for Kamailio 1.5 but should be rather easily portable to sip-router. In my opinion, problem (1) should be considered as a bug since it breaks correct dialog working in certain situation and therefore the patch should be included in 1.5 as well. If no one objects, I will submit it for evaluation and upstream incorporation soon.
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order to differentiate, a branch ID must be introduced to the dialog module to tell dialogs apart. One way to do so would be to provide the branch index to TMCB_REQUEST_BUILT callbacks during execution of the t_forward_nonack() function (tm/t_fwd.c) . That way, the dialog module could register for that specific tm event and treat branched, unconfirmed dialogs accordingly.
A follow-up design choice that is to be made is how to treat branches once the dialog module is capable of. Two approaches come to my mind: Make the branch ID part of the dialog, thereby treating forked branches as separate dialogs; or, continue mapping branches to the same dialog but adjust the dialog state transition machine such that a forking dialog is not discarded unless all branches failed. To me, the first approach seems to be more straight as it allows to apply the same state machine logic simply to each branched dialog.
To me, a proper solution to problem (2) seems more difficult to find, and I haven't finally settled for a preferred method myself yet. That's why I'd be especially glad for any further input on this point.
Finally, please note that I've been digging into Kamailio 1.5 code most of the time because this is where I need to implement any improvements. sip-router's implementation seemed to be quite on a par with 1.5's, however, last time I checked. Naturally, I'd be helpful in getting any improvements upstream in either version allowable.
Thanks and
cheers,
--Timo
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hi Ovidiu,
Ovidiu Sas wrote:
I have a question with respect to issue no. 1. If I understand this correctly, in your particular scenario you have two proxy servers (P1 and P2) and the topology of the call is described by the following diagram.
UA1 --> P1 --> P2 --> P1 -->UA2
Are you suggesting that one single dialog should be kept in the proxy server P1?
If the request message that is routed in this path is just the same with respect to the dialog ID (excluding the To-tag which does not exist at this point as Klaus pointed out), then I do suggest it should be the same dialog.
The dialog module doesn't do that yet but instead creates a second dialog. Consequently, only one will be handled regularly (i.e., confirmed, terminated, whatever) while the other will be dangling until one of the dialog timers triggers to cleanup forgotten dialogs. (Which is supposed to discard dialogs where the BYE got lost or never sent and not deal with implementation bugs.)
Cheers,
--Timo
On Fri, Mar 12, 2010 at 9:22 AM, Timo Reimann timo.reimann@1und1.de wrote:
Hey all,
I'm currently working on improving the dialog module. As some may know, the module isn't perfect with regards to how and and under which circumstances end-to-end calls are tracked properly. Although there are several locations in the code where additional work could assumingly raise the module's quality, I'd like to focus discussion on two major components that IMHO require refurbishment first and foremost.
Any feedback on the described issues, my proposed solutions, or anything else I haven't specifically addressed will be greatly appreciated.
Let's start with the problems and follow up with the solutions.
Problems
(1) When an *unconfirmed* request is received by a proxy multiple times, the dialog module will establish a new dialog for each reception where instead it should be just one. For instance, in our infrastructure, call forwardings are implemented by means of sending out messages with the request URI rewritten to a set of local balancers which will re-route the (slightly modified) request back to our proxies. When this happens, the dialog module doesn't match the subsequent, still unconfirmed request to the initial, unconfirmed request, and creates a new dialog. This erroneous behavior has several consequences, such as distorting statistics, wasting memory, and others.
(2) Forking isn't handled at all. Once you do so by, e.g., calling append_branch() in the configuration script, all branches will be mapped to the same, single dialog. Consequently, the state transition machine of the dialog is affected by each and every of these branches and will affect dialog tracking depending on the order of forked responses received. (Consider the case where one branch's 5xx response is observed first by the proxy, followed by a successful, other branch's 200. In this case, the 200 doesn't affect the dialog state positively anymore since the 5xx made it terminate prematurely.)
Solutions
(1) The reason why Kamailio keeps instantiating new dialogs while it should not is that the dlg_onreq() function keeps things too simple. This function is set up as a callback to tm's TMCB_REQUEST_IN event and only refrains from its main purpose of creating new dialogs if a dialog to a given message is already established. Naturally, this does do not cover scenarios where unconfirmed requests are legitimately re-seen by a proxy as described above.
The solution is to re-use the same basic dialog matching logic that the dlg_onroute() function has with one particular exception that the yet missing To-tag for an unconfirmed dialog must not be essential for dialog matching to succeed. I managed to create a patch that implements this kind of dialog continuation and tested it accordingly. The fix was created for Kamailio 1.5 but should be rather easily portable to sip-router. In my opinion, problem (1) should be considered as a bug since it breaks correct dialog working in certain situation and therefore the patch should be included in 1.5 as well. If no one objects, I will submit it for evaluation and upstream incorporation soon.
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order to differentiate, a branch ID must be introduced to the dialog module to tell dialogs apart. One way to do so would be to provide the branch index to TMCB_REQUEST_BUILT callbacks during execution of the t_forward_nonack() function (tm/t_fwd.c) . That way, the dialog module could register for that specific tm event and treat branched, unconfirmed dialogs accordingly.
A follow-up design choice that is to be made is how to treat branches once the dialog module is capable of. Two approaches come to my mind: Make the branch ID part of the dialog, thereby treating forked branches as separate dialogs; or, continue mapping branches to the same dialog but adjust the dialog state transition machine such that a forking dialog is not discarded unless all branches failed. To me, the first approach seems to be more straight as it allows to apply the same state machine logic simply to each branched dialog.
To me, a proper solution to problem (2) seems more difficult to find, and I haven't finally settled for a preferred method myself yet. That's why I'd be especially glad for any further input on this point.
Finally, please note that I've been digging into Kamailio 1.5 code most of the time because this is where I need to implement any improvements. sip-router's implementation seemed to be quite on a par with 1.5's, however, last time I checked. Naturally, I'd be helpful in getting any improvements upstream in either version allowable.
Thanks and
cheers,
--Timo
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hello Tim,
You are right about the today's server behavior with respect to this case, but on the other hand, I disagree with the one dialog approach.
IMHO two dialogs should be created and a better dialog match should be enforced (adding Via and Route/Record-Route headers checking). Like this, each message should be mapped to it's proper dialog and each dialog will be properly terminated when the corresponding BYE is received.
One issue that I have with one single dialog is related to how dialog termination is handled on timeout. When BYE on timeout needs to be sent, where will be sent, as the single dialog will have four endpoints: - UA1 (the original caller) - P2 (the routed destination for the initial request) - P2 (the incoming destination for the forwarded request) - UA2 (the routed destination for the forwarded request)
Regards, Ovidiu Sas
On Fri, Mar 12, 2010 at 11:30 AM, Timo Reimann timo.reimann@1und1.de wrote:
Hi Ovidiu,
Ovidiu Sas wrote:
I have a question with respect to issue no. 1. If I understand this correctly, in your particular scenario you have two proxy servers (P1 and P2) and the topology of the call is described by the following diagram.
UA1 --> P1 --> P2 --> P1 -->UA2
Are you suggesting that one single dialog should be kept in the proxy server P1?
If the request message that is routed in this path is just the same with respect to the dialog ID (excluding the To-tag which does not exist at this point as Klaus pointed out), then I do suggest it should be the same dialog.
The dialog module doesn't do that yet but instead creates a second dialog. Consequently, only one will be handled regularly (i.e., confirmed, terminated, whatever) while the other will be dangling until one of the dialog timers triggers to cleanup forgotten dialogs. (Which is supposed to discard dialogs where the BYE got lost or never sent and not deal with implementation bugs.)
Cheers,
--Timo
On Fri, Mar 12, 2010 at 9:22 AM, Timo Reimann timo.reimann@1und1.de wrote:
Hey all,
I'm currently working on improving the dialog module. As some may know, the module isn't perfect with regards to how and and under which circumstances end-to-end calls are tracked properly. Although there are several locations in the code where additional work could assumingly raise the module's quality, I'd like to focus discussion on two major components that IMHO require refurbishment first and foremost.
Any feedback on the described issues, my proposed solutions, or anything else I haven't specifically addressed will be greatly appreciated.
Let's start with the problems and follow up with the solutions.
Problems
(1) When an *unconfirmed* request is received by a proxy multiple times, the dialog module will establish a new dialog for each reception where instead it should be just one. For instance, in our infrastructure, call forwardings are implemented by means of sending out messages with the request URI rewritten to a set of local balancers which will re-route the (slightly modified) request back to our proxies. When this happens, the dialog module doesn't match the subsequent, still unconfirmed request to the initial, unconfirmed request, and creates a new dialog. This erroneous behavior has several consequences, such as distorting statistics, wasting memory, and others.
(2) Forking isn't handled at all. Once you do so by, e.g., calling append_branch() in the configuration script, all branches will be mapped to the same, single dialog. Consequently, the state transition machine of the dialog is affected by each and every of these branches and will affect dialog tracking depending on the order of forked responses received. (Consider the case where one branch's 5xx response is observed first by the proxy, followed by a successful, other branch's 200. In this case, the 200 doesn't affect the dialog state positively anymore since the 5xx made it terminate prematurely.)
Solutions
(1) The reason why Kamailio keeps instantiating new dialogs while it should not is that the dlg_onreq() function keeps things too simple. This function is set up as a callback to tm's TMCB_REQUEST_IN event and only refrains from its main purpose of creating new dialogs if a dialog to a given message is already established. Naturally, this does do not cover scenarios where unconfirmed requests are legitimately re-seen by a proxy as described above.
The solution is to re-use the same basic dialog matching logic that the dlg_onroute() function has with one particular exception that the yet missing To-tag for an unconfirmed dialog must not be essential for dialog matching to succeed. I managed to create a patch that implements this kind of dialog continuation and tested it accordingly. The fix was created for Kamailio 1.5 but should be rather easily portable to sip-router. In my opinion, problem (1) should be considered as a bug since it breaks correct dialog working in certain situation and therefore the patch should be included in 1.5 as well. If no one objects, I will submit it for evaluation and upstream incorporation soon.
(2) Kamailio maps all forked branches to the same dialog because they all carry the same dialog ID, i.e., Call-ID, From-tag, and To-tag. In order to differentiate, a branch ID must be introduced to the dialog module to tell dialogs apart. One way to do so would be to provide the branch index to TMCB_REQUEST_BUILT callbacks during execution of the t_forward_nonack() function (tm/t_fwd.c) . That way, the dialog module could register for that specific tm event and treat branched, unconfirmed dialogs accordingly.
A follow-up design choice that is to be made is how to treat branches once the dialog module is capable of. Two approaches come to my mind: Make the branch ID part of the dialog, thereby treating forked branches as separate dialogs; or, continue mapping branches to the same dialog but adjust the dialog state transition machine such that a forking dialog is not discarded unless all branches failed. To me, the first approach seems to be more straight as it allows to apply the same state machine logic simply to each branched dialog.
To me, a proper solution to problem (2) seems more difficult to find, and I haven't finally settled for a preferred method myself yet. That's why I'd be especially glad for any further input on this point.
Finally, please note that I've been digging into Kamailio 1.5 code most of the time because this is where I need to implement any improvements. sip-router's implementation seemed to be quite on a par with 1.5's, however, last time I checked. Naturally, I'd be helpful in getting any improvements upstream in either version allowable.
Thanks and
cheers,
--Timo
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hey Ovidiu,
Ovidiu Sas wrote:
You are right about the today's server behavior with respect to this case, but on the other hand, I disagree with the one dialog approach.
IMHO two dialogs should be created and a better dialog match should be enforced (adding Via and Route/Record-Route headers checking). Like this, each message should be mapped to it's proper dialog and each dialog will be properly terminated when the corresponding BYE is received.
Would that improved dialog matching somehow shorten the time until dangling dialogs are destroyed substantially?
If not, I don't think creating two dialogs for each call were one of them will definitely be dangling and not cleaned up until the dialog timeout (which is quite long by default) triggers is a viable option for large-scale environments (such as ours). The rate at which new dialogs are created could possibly outrun the rate at which they terminate, which isn't desirable resource-wise.
One issue that I have with one single dialog is related to how dialog termination is handled on timeout. When BYE on timeout needs to be sent, where will be sent, as the single dialog will have four endpoints:
- UA1 (the original caller)
- P2 (the routed destination for the initial request)
- P2 (the incoming destination for the forwarded request)
- UA2 (the routed destination for the forwarded request)
I believe it's still just two endpoints no matter how long the route path is, namely the hosts comprising the end-to-end dialog relationship at the edge defined by the Contact header addresses.
That's where BYE messages are sent based on a look at the dialog code. It also seems that record-routing is honored, so even if hosts on the route require to see the triggered or "natural" BYE message, they will do so.
Was that your issue, or did I miss something vital?
Cheers,
--Timo
On Fri, Mar 12, 2010 at 12:56 PM, Timo Reimann timo.reimann@1und1.de wrote:
Hey Ovidiu,
Ovidiu Sas wrote:
You are right about the today's server behavior with respect to this case, but on the other hand, I disagree with the one dialog approach.
IMHO two dialogs should be created and a better dialog match should be enforced (adding Via and Route/Record-Route headers checking). Like this, each message should be mapped to it's proper dialog and each dialog will be properly terminated when the corresponding BYE is received.
Would that improved dialog matching somehow shorten the time until dangling dialogs are destroyed substantially?
If not, I don't think creating two dialogs for each call were one of them will definitely be dangling and not cleaned up until the dialog timeout (which is quite long by default) triggers is a viable option for large-scale environments (such as ours). The rate at which new dialogs are created could possibly outrun the rate at which they terminate, which isn't desirable resource-wise.
The issue in the current design is the dialog matching algorithm. If two dialogs are created, both of them will be chained in the same dialog list. When an in dialog request is received, the first dialog that matched the callid, to/from tag is updated and the second one is just hanging around (it will never be touched). What we need, is a better dialog match (if an in dialog request for the second dialog is received, then the second dialog should be selected and updated, not the first one). What I'm proposing here is that dialog matching should be done based on callid, to/from tag, branch id, list of Route/Record-Route headers). This should ensure that the proper dialog is handle for each in dialog request and no dialog will be left over.
Even if we have a spiral and the INVITE is going twice through the server (and two individual transactions are created), IMHO one dialog is not a proper representation of the call.
Let's assume the following scenario: UA1 --> P1 --> P2 --> P1 --> UA2 Now, UA2 rejects the call, but P2 decides to reroute the call to an IVR UA1 --> P1 --> P2 --> IVR What will be the state of the dialog on P1?
I think having multiple dialogs for each branch of the spiral keeps things clean and easy to understand. The key is to perform proper matching for in-dialog requests to the corresponding dialog.
One issue that I have with one single dialog is related to how dialog termination is handled on timeout. When BYE on timeout needs to be sent, where will be sent, as the single dialog will have four endpoints: - UA1 (the original caller) - P2 (the routed destination for the initial request) - P2 (the incoming destination for the forwarded request) - UA2 (the routed destination for the forwarded request)
I believe it's still just two endpoints no matter how long the route path is, namely the hosts comprising the end-to-end dialog relationship at the edge defined by the Contact header addresses.
That's where BYE messages are sent based on a look at the dialog code. It also seems that record-routing is honored, so even if hosts on the route require to see the triggered or "natural" BYE message, they will do so.
Was that your issue, or did I miss something vital?
If you have a single dialog, then were the BYE messages will be sent: - to UA1 and P2 - to P2 and UA1 - to UA1 and UA2
Cheers,
--Timo
Ovidiu Sas wrote:
If not, I don't think creating two dialogs for each call were one of them will definitely be dangling and not cleaned up until the dialog timeout (which is quite long by default) triggers is a viable option for large-scale environments (such as ours). The rate at which new dialogs are created could possibly outrun the rate at which they terminate, which isn't desirable resource-wise.
The issue in the current design is the dialog matching algorithm. If two dialogs are created, both of them will be chained in the same dialog list. When an in dialog request is received, the first dialog that matched the callid, to/from tag is updated and the second one is just hanging around (it will never be touched). What we need, is a better dialog match (if an in dialog request for the second dialog is received, then the second dialog should be selected and updated, not the first one). What I'm proposing here is that dialog matching should be done based on callid, to/from tag, branch id, list of Route/Record-Route headers). This should ensure that the proper dialog is handle for each in dialog request and no dialog will be left over.
Agreed that more strict matching logic will allow having multiple dialogs to be resolved properly. Still, multiple dialogs come at a price, please see further below.
Even if we have a spiral and the INVITE is going twice through the server (and two individual transactions are created), IMHO one dialog is not a proper representation of the call.
Can you be more specific as to why a single dialog isn't appropriate?
I am thinking that a single dialog could actually be helpful: It abstracts from numerous transactions that could possibly be established for a request that is essentially the same (i.e., except for routing information) at several locations in spiral scenarios. From an end-to-end perspective, it doesn't seem that important to know how many transactions a request spawns but how the associated dialog's state changes over time. Having as little dialogs as possible will help with observing these changes.
Let's assume the following scenario: UA1 --> P1 --> P2 --> P1 --> UA2 Now, UA2 rejects the call, but P2 decides to reroute the call to an IVR UA1 --> P1 --> P2 --> IVR What will be the state of the dialog on P1?
I think this example of yours shows a potential problem of the multiple-dialogs case while there is none in the single-dialog case.
First, the single-dialog case: When P1 receives the request from UA1, it establishes a new, yet unconfirmed dialog. When it receives the request from P2, it continues the dialog and does not create a new one. When UA2's rejection is received by P1, the dialog will be terminated. (This is what the dialog module already does now.) When P2 decides to send out the request to another destination (like the IVR), P1 will establish a new dialog (since there's none to continue anymore). I cannot see any dialog handling problems in this approach.
Now to the multiple-dialog case. When P1 receives P2's request, instead of continuing the first dialog, it establishes another one which is differentiated from the first by means of improved dialog matching capabilities (covering the additional items branch ID and record/record- route headers that you mentioned above). Now, if I get you right, when UA2's rejection is received by P1, it will terminate just one of its dialogs (the second one I suppose). The next question that I ask myself is: When P2's re-routed request is received by P1, will it re-use the first dialog? I'm not sure if that will definitely happen, especially not if you select the Via header as branch ID that changes between the initial and re-routed request. In that case, you will end up with another new dialog (a third one), and the very first one will never have a chance to be destroyed prior to the dialog timeout trigger.
I think having multiple dialogs for each branch of the spiral keeps things clean and easy to understand. The key is to perform proper matching for in-dialog requests to the corresponding dialog.
Unless my comparison of the two approaches w.r.t. the call setup above is wrong, it seems to me that the continuation method is easier to grasp (you do not have to think it terms of multiple dialogs), consumes less memory, and presumably requires less code modifications.
Another issue from the dialog users' perspective: If you want to track the call given above using the dialog module's callbacks, it should be easier to do so the less dialogs of essentially the same call exist. When there are multiple dialogs, users will need to take care themselves not to track calls multiple times. For instance, if you want to make a copy of the SIP messages exchanged for some reason, you'd need special effort to avoid duplicate copies if several dialogs track the same call data.
One issue that I have with one single dialog is related to how dialog termination is handled on timeout. When BYE on timeout needs to be sent, where will be sent, as the single dialog will have four endpoints: - UA1 (the original caller) - P2 (the routed destination for the initial request) - P2 (the incoming destination for the forwarded request) - UA2 (the routed destination for the forwarded request)
I believe it's still just two endpoints no matter how long the route path is, namely the hosts comprising the end-to-end dialog relationship at the edge defined by the Contact header addresses.
That's where BYE messages are sent based on a look at the dialog code. It also seems that record-routing is honored, so even if hosts on the route require to see the triggered or "natural" BYE message, they will do so.
If you have a single dialog, then were the BYE messages will be sent:
- to UA1 and P2
- to P2 and UA1
- to UA1 and UA2
To UA1 and UA2: They represent the endpoints because they provide the respective Contact headers used to send the BYE message. The Contact addresses will never change no matter at which point in the routing path you set up a dialog.
Cheers,
--Timo
Let's assume the following scenario: UA1 --> P1 --> P2 --> P1 --> UA2 Now, UA2 rejects the call, but P2 decides to reroute the call to an IVR UA1 --> P1 --> P2 --> IVR What will be the state of the dialog on P1?
I think this example of yours shows a potential problem of the multiple-dialogs case while there is none in the single-dialog case.
First, the single-dialog case: When P1 receives the request from UA1, it establishes a new, yet unconfirmed dialog. When it receives the request from P2, it continues the dialog and does not create a new one. When UA2's rejection is received by P1, the dialog will be terminated. (This is what the dialog module already does now.) When P2 decides to send out the request to another destination (like the IVR), P1 will establish a new dialog (since there's none to continue anymore). I cannot see any dialog handling problems in this approach.
I don't think that you got the right scenario here: when the call is rerouted, it is rerouted by P2 directly to IVR and P1 is no longer involved. If P1 kills the dialog on the first rejection from UA2, there will be no dialog left on P1.
Regards, Ovidiu Sas
Let's assume the following scenario: UA1 --> P1 --> P2 --> P1 --> UA2 Now, UA2 rejects the call, but P2 decides to reroute the call to an IVR UA1 --> P1 --> P2 --> IVR What will be the state of the dialog on P1?
I think this example of yours shows a potential problem of the multiple-dialogs case while there is none in the single-dialog case.
First, the single-dialog case: When P1 receives the request from UA1, it establishes a new, yet unconfirmed dialog. When it receives the request from P2, it continues the dialog and does not create a new one. When UA2's rejection is received by P1, the dialog will be terminated. (This is what the dialog module already does now.) When P2 decides to send out the request to another destination (like the IVR), P1 will establish a new dialog (since there's none to continue anymore). I cannot see any dialog handling problems in this approach.
I don't think that you got the right scenario here: when the call is rerouted, it is rerouted by P2 directly to IVR and P1 is no longer involved.
You are right, I got the scenario wrong. Thanks for pointing me at that.
If P1 kills the dialog on the first rejection from UA2, there will be no dialog left on P1.
That will definitely break the single-dialog approach. However, one could work around this by introducing a counter to dialogs which is incremented for each re-seen, unconfirmed request and decremented when responses are forwarded (i.e., when the dialog module calls dlg_onreply()). There, the dialog's state wouldn't be modified unless the counter drops to zero.
That way, the single dialog from the example above would not be destroyed when UA2's rejection passes P1 because its counter will simply drop from 2 to 1, and will only be adjusted once a final reply from the IVR is forwarded to P1 by P2 since the counter will drop to 0 then.
I realize that this looks more complicated than the multiple-dialogs approach which may likely work just out-of-the-box with no additional counters and exceptions required whatsoever. However, I believe that convenience for users of dialog callbacks should be one primary goal when improving the module, and I still find multiple dialogs much harder to grasp and more tedious to code against (you will have to account for multiple callbacks representing the same call in the same state, as I I noted in my last post). I am not sure if you (implicitly) suggested a different kind of SIP dialog model where the endpoints are re-defined for multiple dialogs (i.e., between UA1 and P2, and between P2 and UA2). Again, that would aggravate dialog callback usage because users cannot rely on a single peer-to-peer relationship anymore.
Paraphrasing, I think we should put effort into making dialog callbacks look and feel easy to use when in fact there is a lot of complex machinery running in the background to get things right.
Cheers,
--Timo
From my prospective, in the case of a spiral, there is more then one
call. And I like the fact that the statistics is showing this. There are many other cases when a single dialog approach is breaking other things. For instance, the qos module is keeping track of the media connections between the two endpoints. In a single dialog approach, this will not work. It seems that mediaproxy module will be affected too.
A single dialog approach breaks too many things and I'm against it. As Juha pointed out too, the dialog matching algorithm should be fixed in order to avoid the left over dialogs in case of spiral calls.
Regards, Ovidiu Sas
On Sun, Mar 14, 2010 at 8:37 AM, Timo Reimann Timo.Reimann@1und1.de wrote:
Let's assume the following scenario: UA1 --> P1 --> P2 --> P1 --> UA2 Now, UA2 rejects the call, but P2 decides to reroute the call to an IVR UA1 --> P1 --> P2 --> IVR What will be the state of the dialog on P1?
I think this example of yours shows a potential problem of the multiple-dialogs case while there is none in the single-dialog case.
First, the single-dialog case: When P1 receives the request from UA1, it establishes a new, yet unconfirmed dialog. When it receives the request from P2, it continues the dialog and does not create a new one. When UA2's rejection is received by P1, the dialog will be terminated. (This is what the dialog module already does now.) When P2 decides to send out the request to another destination (like the IVR), P1 will establish a new dialog (since there's none to continue anymore). I cannot see any dialog handling problems in this approach.
I don't think that you got the right scenario here: when the call is rerouted, it is rerouted by P2 directly to IVR and P1 is no longer involved.
You are right, I got the scenario wrong. Thanks for pointing me at that.
If P1 kills the dialog on the first rejection from UA2, there will be no dialog left on P1.
That will definitely break the single-dialog approach. However, one could work around this by introducing a counter to dialogs which is incremented for each re-seen, unconfirmed request and decremented when responses are forwarded (i.e., when the dialog module calls dlg_onreply()). There, the dialog's state wouldn't be modified unless the counter drops to zero.
That way, the single dialog from the example above would not be destroyed when UA2's rejection passes P1 because its counter will simply drop from 2 to 1, and will only be adjusted once a final reply from the IVR is forwarded to P1 by P2 since the counter will drop to 0 then.
I realize that this looks more complicated than the multiple-dialogs approach which may likely work just out-of-the-box with no additional counters and exceptions required whatsoever. However, I believe that convenience for users of dialog callbacks should be one primary goal when improving the module, and I still find multiple dialogs much harder to grasp and more tedious to code against (you will have to account for multiple callbacks representing the same call in the same state, as I I noted in my last post). I am not sure if you (implicitly) suggested a different kind of SIP dialog model where the endpoints are re-defined for multiple dialogs (i.e., between UA1 and P2, and between P2 and UA2). Again, that would aggravate dialog callback usage because users cannot rely on a single peer-to-peer relationship anymore.
Paraphrasing, I think we should put effort into making dialog callbacks look and feel easy to use when in fact there is a lot of complex machinery running in the background to get things right.
Cheers,
--Timo
Ovidiu Sas wrote:
From my prospective, in the case of a spiral, there is more then one call. And I like the fact that the statistics is showing this.
Just to be sure, let me stress that we are talking about dialog-transparent proxy stuff all the time, nothing like B2BUAs where a single dialog would be split into two dialogs for the sake of, say, media translation. Regular, dialog-transparent proxying is what my single-dialog approach is dealing with only; nothing more, nothing less.
If that's not an issue, I believe we have a fundamentally different understanding of what a SIP dialog is.
There are many other cases when a single dialog approach is breaking other things. For instance, the qos module is keeping track of the media connections between the two endpoints. In a single dialog approach, this will not work.
Could you elaborate on this please? I'm wondering because the number of media connections should not change no matter how many spirals there are, no matter whether there is just one or multiple dialogs used to track the spiraled call flow. It will still be just one end-to-end SIP connection with a single media connection.
Cheers,
--Timo
There are many other cases when a single dialog approach is breaking other things. For instance, the qos module is keeping track of the media connections between the two endpoints. In a single dialog approach, this will not work.
Could you elaborate on this please? I'm wondering because the number of media connections should not change no matter how many spirals there are, no matter whether there is just one or multiple dialogs used to track the spiraled call flow. It will still be just one end-to-end SIP connection with a single media connection.
Just assume that P2 is proxying media using the nathelper module. In this case, the media endpoint are different for the first and the second dialog on P1.
Regards, Ovidiu Sas
Timo Reimann writes:
That will definitely break the single-dialog approach. However, one could work around this by introducing a counter to dialogs which is incremented for each re-seen, unconfirmed request and decremented when responses are forwarded (i.e., when the dialog module calls dlg_onreply()). There, the dialog's state wouldn't be modified unless the counter drops to zero.
if you don't like two dialogs, why not just do not call the dialog functions when request spirals to P1 the second time?
-- juha
Juha Heinanen wrote:
Timo Reimann writes:
That will definitely break the single-dialog approach. However, one could work around this by introducing a counter to dialogs which is incremented for each re-seen, unconfirmed request and decremented when responses are forwarded (i.e., when the dialog module calls dlg_onreply()). There, the dialog's state wouldn't be modified unless the counter drops to zero.
if you don't like two dialogs, why not just do not call the dialog functions when request spirals to P1 the second time?
By "dialog functions", do you mean the dialog module-internal ones that manage dialogs or the dialog callback functions?
In case of the former: My custom patch does exactly that: If a spiral is detected, skip calls to internal functions that would otherwise create a new dialog. That's how my notion of dialog continuation works but as Ovidiu showed, it breaks stuff in certain scenarios (which should be solved by using a counter just like I proposed).
In case of the latter (and assuming that the former isn't done): From a dialog callback user's perspective, you cannot easily tell whether a call is spiraling or is a "real" new call without keeping additional state (e.g., "have I seen this call before?"). That's the main reason why I think multiple dialogs are troublesome.
Cheers,
--Timo
Timo Reimann writes:
UA1 --> P1 --> P2 --> P1 -->UA2
Are you suggesting that one single dialog should be kept in the proxy server P1?
If the request message that is routed in this path is just the same with respect to the dialog ID (excluding the To-tag which does not exist at this point as Klaus pointed out), then I do suggest it should be the same dialog.
i agree with ovidiu here, i.e., there should be two dialogs in P1. also, i don't understand why the second dialog should remain dangling until cleanup, because BYE will pass P1 twice and would terminate both dialogs.
-- juha
UA1 --> P1 --> P2 --> P1 -->UA2
Are you suggesting that one single dialog should be kept in the proxy server P1?
If the request message that is routed in this path is just the same with respect to the dialog ID (excluding the To-tag which does not exist at this point as Klaus pointed out), then I do suggest it should be the same dialog.
i agree with ovidiu here, i.e., there should be two dialogs in P1. also, i don't understand why the second dialog should remain dangling until cleanup, because BYE will pass P1 twice and would terminate both dialogs.
The dangling dialog occurs due to insufficient dialog matching capabilities in the current dialog module. As Ovidiu explained in his last mail, a more strict matching logic could make sure that the right dialog is chosen, thereby not depending on long timeouts getting triggered.
However, I still do not see what the benefit of having separate dialogs in the particular case described is, and also believe that there is at least another major drawback. Let me outline that in the other thread...
--Timo
Timo Reimann writes:
The dangling dialog occurs due to insufficient dialog matching capabilities in the current dialog module. As Ovidiu explained in his last mail, a more strict matching logic could make sure that the right dialog is chosen, thereby not depending on long timeouts getting triggered.
then strict matching should be implemented to fix this bug.
-- juha