Hello,
the new dialog module proposal seems to imply that dialog_out management (and thereby, implicitly, dialog_in management) is to be carried out when responses arrive at a proxy, i.e., on execution of the tm callback type TMCB_RESPONSE_IN.
However, as the comment to this callback type in modules/tm/t_hooks.h illustrates, such callbacks will also be run on retransmissions. This means that the dialog module will have to take precautions and either
- check whether the response being observed is a retransmission, for instance by means of checking whether a dialog_out entry corresponding to the response already exists, or - ignore retransmissions and stupidly (over-)write the dialog_out entry, i.e., in case of a retransmission re-write the same entry.
The first option induces some delay for CPU cycles and memory accesses, and adds some complexity w.r.t. correct retransmission checking which must not miss possible corner cases. The second option, in addition to wasting even more resources (maybe even I/O when persistent storage of dialog data is enabled), requires that overwriting of dialog state does not have any side-effects. I am not sure if this is always the case.
Instead of thinking about which option to prefer and what needs to be taken care of, I'd rather re-propose a different approach that I have brought forward before in a slightly different context: Instead of using TMCB_RESPONSE_IN, run the dialog state machinery on TMCB_RESPONSE_PRE_OUT. It's guaranteed to be retransmission-free and should cover everything we have discussed so far.
Moreover, it will render dialog_out state needless and ease state keeping: As long as provisional responses are being forwarded (either by a local or remote proxy), the dialog_in state must be "early" because the proxy would only stop informing the UAC about provisional responses when it received a final response, either due to exhaustion of all branches or a positive OK response. Once the forking proxy decides to forward a response with code 200+, all other early dialogs can be safely deleted and the dialog_in state set accordingly.
Consequently, step 2 of the algorithm that changes dialog_in state ("Algorithm for dialog_in state") given in the proposal boils down to a simple matching rule as follows:
Set dialog_in state to the state corresponding to the return code of the forwarded message. That is, if the response code is * greater than 100, but smaller than 200: set it to "early" (may be skipped if there is a dialog_out entry already as this implies that another early branch has already been established and the dialog_in state set to "early"), * 200: set it to "confirmed" (step 1 will have deleted the remaining branches), * greater than 200: set it to "terminated" (step 1 will have deleted the remaining branches).
In-dialog BYE requests will just have to set the dialog_in state to "terminated", trimming down the state machine further.
Considering the corner case of two branches' 200 responses arriving at exactly the same time (let's name them concurrently confirmed dialogs), I believe that this should yield two completely independent dialog entries, with one dialog_in and dialog_out each. Managing them under the same dialog_in as two dialog_outs gives the impression that they are somewhat related but this is not true as each carries a unique dialog identifier. Furthermore, under the current proposal, I believe that callbacks registered for the yet non-confirmed dialog will be effective for the concurrently confirmed dialog too because both OK responses should be forwarded by the same transaction, resulting in the same dialog callbacks being run for both dialogs. If the transaction module handles simultaneously forwarded OK responses differently, then there will be no callbacks at all for one of the dialogs which is equally bad.
Instead, there should be a clean separation of the two dialogs supported by a new dialog callback type (e.g., DLGCB_CONCURRENTLY_CONFIRMED) which allows the user to decide whether he wants to track the additionally confirmed dialog as well, and how long he wants to track it based on further dialog callbacks he chooses. For example, he may register DGLCB_TERMINATED on the additional dialog or opt that getting to know about its generation is sufficient. Generally, in-dialog requests will be mapped to the respective dialogs (which may involve some more effort because dialog hashes are stored in the route headers during processing of the initial request, but it's doable) and managed individually.
Concluding, this modified approach will keep the new dialog module from doing unnecessary and hard-to-do duplicate checks, eases state management, and keeps concurrently confirmed dialogs cleanly separated.
Opinions?
Cheers,
--Timo
2010/3/24 Timo Reimann timo.reimann@1und1.de:
Consequently, step 2 of the algorithm that changes dialog_in state ("Algorithm for dialog_in state") given in the proposal boils down to a simple matching rule as follows:
Set dialog_in state to the state corresponding to the return code of the forwarded message. That is, if the response code is * greater than 100, but smaller than 200: set it to "early" (may be skipped if there is a dialog_out entry already as this implies that another early branch has already been established and the dialog_in state set to "early"), * 200: set it to "confirmed" (step 1 will have deleted the remaining branches), * greater than 200: set it to "terminated" (step 1 will have deleted the remaining branches).
This means that the dialog status is 100% mapped to the INVITE transaction status. It's ok, of course, except by the corner case in which two 200 arrive at the same time (as you describe now).
In-dialog BYE requests will just have to set the dialog_in state to "terminated", trimming down the state machine further.
Considering the corner case of two branches' 200 responses arriving at exactly the same time (let's name them concurrently confirmed dialogs), I believe that this should yield two completely independent dialog entries, with one dialog_in and dialog_out each.
How and when would be the second dialog_in entry be generated? The only approch coming to my mind is generating a new dialog_in entry if a new 200 arrives for an already established dialog. Not sure if it adds too much complexity or becomes easier to manage...
Managing them under the same dialog_in as two dialog_outs gives the impression that they are somewhat related but this is not true as each carries a unique dialog identifier.
But the complete dialog identifier is the union of a dialog_in entry and a dialog_out entry. By itself, a dialog_in entry means "nothing".
When parallel forking occurs and we have different ringing branches it's clear that the approach of having one dialog_in for N dialog_out is appropriate. Each dialog_out means a early-dialog. When some of them is established it becomes a "dialog", but the concept is the same (IMHO).
If the same INVITE receives two different 200 (so two established dialogs) both belong to the same dialog_in entry, while each dialog has its own properties (cseq, route-set, target, dflags...) in its corresponding dialog_out entry. IMHO it makes sense.
Furthermore, under the current proposal, I believe that callbacks registered for the yet non-confirmed dialog will be effective for the concurrently confirmed dialog too because both OK responses should be forwarded by the same transaction, resulting in the same dialog callbacks being run for both dialogs. If the transaction module handles simultaneously forwarded OK responses differently, then there will be no callbacks at all for one of the dialogs which is equally bad.
That's a good technical note :)
Instead, there should be a clean separation of the two dialogs supported by a new dialog callback type (e.g., DLGCB_CONCURRENTLY_CONFIRMED) which allows the user to decide whether he wants to track the additionally confirmed dialog as well, and how long he wants to track it based on further dialog callbacks he chooses. For example, he may register DGLCB_TERMINATED on the additional dialog or opt that getting to know about its generation is sufficient. Generally, in-dialog requests will be mapped to the respective dialogs (which may involve some more effort because dialog hashes are stored in the route headers during processing of the initial request, but it's doable) and managed individually.
Then you couldn't set specific dflags to the secondly generated dialog, neither manage it as any other "normal" dialog. Imagine the UAC receives both 200 and sends the BYE for the first one. Then there would exist an active dialog for which dialog module is not storing its information properly. Do I miss something here?
Concluding, this modified approach will keep the new dialog module from doing unnecessary and hard-to-do duplicate checks, eases state management, and keeps concurrently confirmed dialogs cleanly separated.
Opinions?
I really agree with you, sure. By just taking into account the initial INVITE transaction status it is easier to determine the "global" dialog state (from the client's point of view).
But I still need to understand the possible limitation in double-200 scenarios. Could you please draw an example of how the dialog table(s) would look when such scenario occurs? :)
Thanks a lot.
Iñaki Baz Castillo wrote:
2010/3/24 Timo Reimann timo.reimann@1und1.de:
Considering the corner case of two branches' 200 responses arriving at exactly the same time (let's name them concurrently confirmed dialogs), I believe that this should yield two completely independent dialog entries, with one dialog_in and dialog_out each.
How and when would be the second dialog_in entry be generated? The only approch coming to my mind is generating a new dialog_in entry if a new 200 arrives for an already established dialog. Not sure if it adds too much complexity or becomes easier to manage...
Generating a new dialog_in entry if a new 200 arrives is what I had in mind. Should be easy to do because response messages contain the associated request message as well, so you would just call the yet-to-be-written create_dialog_in() function based on the request message. This should be fairly similar to what you need to do when a request message forms the basis for dialog generation.
The only tricky part is the dialog hash ID added to the record-route header during request processing. Since the two UASes that simultaneously answered 200 already stored the dialog hash ID in their respective route sets, both dialogs must be managed under the same hash ID. Consequently, the dialog module must keep a list of dialog_ins maintained under the same hash ID. They will be distinguishable by the varying To-tags which must be taken into account during in-request routing.
I realize that this introduces a certain degree of ugliness in the design but I see this as the price in order to handle concurrent dialogs truly separately.
Managing them under the same dialog_in as two dialog_outs gives the impression that they are somewhat related but this is not true as each carries a unique dialog identifier.
But the complete dialog identifier is the union of a dialog_in entry and a dialog_out entry. By itself, a dialog_in entry means "nothing".
I agree with that. However, by attaching dialog callbacks to the dialog_in entry, you associate multiple dialog_outs resulting from concurrent 200 responses to the same dialog from a user's perspective. That's why I believe they need to be separated.
Don't get me wrong -- I still want to keep one dialog_out per confirmed response. However, I do not see them attached properly under the same dialog_in entry. Instead, there should be distinct dialog_ins per confirmed response, and that includes concurrent 200 responses.
When parallel forking occurs and we have different ringing branches it's clear that the approach of having one dialog_in for N dialog_out is appropriate. Each dialog_out means a early-dialog. When some of them is established it becomes a "dialog", but the concept is the same (IMHO).
If the same INVITE receives two different 200 (so two established dialogs) both belong to the same dialog_in entry, while each dialog has its own properties (cseq, route-set, target, dflags...) in its corresponding dialog_out entry. IMHO it makes sense.
Again, I do not think it does from a dialog module user's perspective. In the current proposal, you cannot use the dialog module callbacks to track just one of the two calls, you can only track either both or none. That's not how it should be as each 200 response constitutes a distinct SIP dialog, and the modules' callbacks should be able to reflect that.
Instead, there should be a clean separation of the two dialogs supported by a new dialog callback type (e.g., DLGCB_CONCURRENTLY_CONFIRMED) which allows the user to decide whether he wants to track the additionally confirmed dialog as well, and how long he wants to track it based on further dialog callbacks he chooses. For example, he may register DGLCB_TERMINATED on the additional dialog or opt that getting to know about its generation is sufficient. Generally, in-dialog requests will be mapped to the respective dialogs (which may involve some more effort because dialog hashes are stored in the route headers during processing of the initial request, but it's doable) and managed individually.
Then you couldn't set specific dflags to the secondly generated dialog, neither manage it as any other "normal" dialog. Imagine the UAC receives both 200 and sends the BYE for the first one. Then there would exist an active dialog for which dialog module is not storing its information properly. Do I miss something here?
I think a callback to DLGCB_CONCURRENTLY_CONFIRMED should be treated similar to a callback to DLGCB_CREATED, i.e., it is not related to a prior dialog. So the earliest moment a user can set dflags on a concurrently confirmed dialog is when DLGCB_CONCURRENTLY_CONFIRMED is executed, analogous to a regularly created dialog and DLGCB_CREATED. After that, both call type's dflags should be usable in the same fashion.
If this doesn't answer your question or you think I got something wrong...
But I still need to understand the possible limitation in double-200 scenarios. Could you please draw an example of how the dialog table(s) would look when such scenario occurs? :)
...please have a look at the wiki section covering an example scenario I just created
http://www.kamailio.org/dokuwiki/doku.php/dialog-stateful:new-dialog-module-...
and add further comments here.
Cheers,
--Timo
2010/3/25 Timo Reimann timo.reimann@1und1.de:
But the complete dialog identifier is the union of a dialog_in entry and a dialog_out entry. By itself, a dialog_in entry means "nothing".
I agree with that. However, by attaching dialog callbacks to the dialog_in entry, you associate multiple dialog_outs resulting from concurrent 200 responses to the same dialog from a user's perspective. That's why I believe they need to be separated.
Good reason :)
Don't get me wrong -- I still want to keep one dialog_out per confirmed response. However, I do not see them attached properly under the same dialog_in entry. Instead, there should be distinct dialog_ins per confirmed response, and that includes concurrent 200 responses.
ok
Then you couldn't set specific dflags to the secondly generated dialog, neither manage it as any other "normal" dialog. Imagine the UAC receives both 200 and sends the BYE for the first one. Then there would exist an active dialog for which dialog module is not storing its information properly. Do I miss something here?
I think a callback to DLGCB_CONCURRENTLY_CONFIRMED should be treated similar to a callback to DLGCB_CREATED, i.e., it is not related to a prior dialog. So the earliest moment a user can set dflags on a concurrently confirmed dialog is when DLGCB_CONCURRENTLY_CONFIRMED is executed, analogous to a regularly created dialog and DLGCB_CREATED. After that, both call type's dflags should be usable in the same fashion.
Just to be sure I understand properly:
It should be posible to set a dialog flag even in early state (imagine mediaproxy module setting a dialog flag to force MediaProxy usage, it must also work if there is RTP during the 183 session progress).
Is it allowed in your proposal?
But I still need to understand the possible limitation in double-200 scenarios. Could you please draw an example of how the dialog table(s) would look when such scenario occurs? :)
...please have a look at the wiki section covering an example scenario I just created
http://www.kamailio.org/dokuwiki/doku.php/dialog-stateful:new-dialog-module-...
and add further comments here.
Just great :) It simplifies a lot the design. Also the approach of two "dialog_in" (when two confirmed dialogs) seems really reasonable.
But I don't like the usage of three tables (dialog, dialog_in and dialog_out), so I've tryed to simplify it by just leaving dialog_in and dialog_out tables with some modifications:
- dialog_in contains a new field "dialog_id". - dialog_out doesn't contain "hash_id" field anymore. Instead it contains "dialog_in".
Please let me know if it's ok or I miss something:
http://www.kamailio.org/dokuwiki/doku.php/dialog-stateful:new-dialog-module-...
Best regards.
Iñaki Baz Castillo wrote:
2010/3/25 Timo Reimann timo.reimann@1und1.de:
Then you couldn't set specific dflags to the secondly generated dialog, neither manage it as any other "normal" dialog. Imagine the UAC receives both 200 and sends the BYE for the first one. Then there would exist an active dialog for which dialog module is not storing its information properly. Do I miss something here?
I think a callback to DLGCB_CONCURRENTLY_CONFIRMED should be treated similar to a callback to DLGCB_CREATED, i.e., it is not related to a prior dialog. So the earliest moment a user can set dflags on a concurrently confirmed dialog is when DLGCB_CONCURRENTLY_CONFIRMED is executed, analogous to a regularly created dialog and DLGCB_CREATED. After that, both call type's dflags should be usable in the same fashion.
Just to be sure I understand properly:
It should be posible to set a dialog flag even in early state (imagine mediaproxy module setting a dialog flag to force MediaProxy usage, it must also work if there is RTP during the 183 session progress).
Is it allowed in your proposal?
I haven't explicitly considered it but it's easily feasible: When the second (just marginally slower) 200 response is handled by the dialog module, the dflags from the already existing dialog_out entry can be copied into the new dialog_out entry. For better understanding, please see the updated example for that specific part of the scenario.
However, I'm wondering if this is the way it's supposed to be done. I consider the second 200 response to result in the creation of a new dialog, so it seems wrong that the new dialog already comes with set flags. For instance, there might be use cases where a set of already configured dflags is not desired or, worse, breaks things.
Configurability of behavior might be one way out. For further considerations, let me ask you this about mediaproxy: What does that module do when multiple early responses with early media arrive for a single INVITE? Will it relay to a different port for each response?
But I still need to understand the possible limitation in double-200 scenarios. Could you please draw an example of how the dialog table(s) would look when such scenario occurs? :)
...please have a look at the wiki section covering an example scenario I just created
http://www.kamailio.org/dokuwiki/doku.php/dialog-stateful:new-dialog-module-...
and add further comments here.
Just great :) It simplifies a lot the design. Also the approach of two "dialog_in" (when two confirmed dialogs) seems really reasonable.
But I don't like the usage of three tables (dialog, dialog_in and dialog_out), so I've tryed to simplify it by just leaving dialog_in and dialog_out tables with some modifications:
- dialog_in contains a new field "dialog_id".
- dialog_out doesn't contain "hash_id" field anymore. Instead it
contains "dialog_in".
Please let me know if it's ok or I miss something:
That's a good improvement. When I wrote the proposal amendment, I had this feeling that three tables weren't optimal but couldn't get my head around it. Very much appreciated. :)
I replaced my initial approach with yours.
Cheers,
--Timo
2010/3/25 Timo Reimann timo.reimann@1und1.de:
It should be posible to set a dialog flag even in early state (imagine mediaproxy module setting a dialog flag to force MediaProxy usage, it must also work if there is RTP during the 183 session progress).
Is it allowed in your proposal?
I haven't explicitly considered it but it's easily feasible: When the second (just marginally slower) 200 response is handled by the dialog module, the dflags from the already existing dialog_out entry can be copied into the new dialog_out entry. For better understanding, please see the updated example for that specific part of the scenario.
It's ok IMHO.
However, I'm wondering if this is the way it's supposed to be done. I consider the second 200 response to result in the creation of a new dialog, so it seems wrong that the new dialog already comes with set flags. For instance, there might be use cases where a set of already configured dflags is not desired or, worse, breaks things.
Configurability of behavior might be one way out. For further considerations, let me ask you this about mediaproxy: What does that module do when multiple early responses with early media arrive for a single INVITE? Will it relay to a different port for each response?
The fact is that I don't use mediaproxy but I do know about its limitations when parallel forking occurs. I think it creates a RTP session (different port-in and port-out) for each branch having RTP (183 with SDP), but I'm not sure at 100%.
However I do know that using "engage_mediaproxy()" function is not possible just to force MediaProxy for certains branches (i.e: those detected as natted). This function is based on dialog module (just one dialog entry is created as we do know). It's like a "dialog flag" so there is no need to invoke mediaproxy for in-dialog requests as the information about using or not mediaproxy belongs to the whole dialog. This of course makes impossible to use mediaproxy just for certain branches.
But I don't like the usage of three tables (dialog, dialog_in and dialog_out), so I've tryed to simplify it by just leaving dialog_in and dialog_out tables with some modifications:
- dialog_in contains a new field "dialog_id".
- dialog_out doesn't contain "hash_id" field anymore. Instead it
contains "dialog_in".
Oviously I meant:
- dialog_out doesn't contain "hash_id" field anymore. Instead it contains "dialog_id".
Please let me know if it's ok or I miss something:
That's a good improvement. When I wrote the proposal amendment, I had this feeling that three tables weren't optimal but couldn't get my head around it. Very much appreciated. :)
I replaced my initial approach with yours.
Well, it seems we are very close to a definitive design which would also handle the most complex cases :)
Thanks a lot for all your work.
2010/3/25 Iñaki Baz Castillo ibc@aliax.net:
Well, it seems we are very close to a definitive design which would also handle the most complex cases :)
IMHO the page needs some clean-up. I will replace the "dialog state" section (as it has been really simplified with your improvement). Could you please replace the "spiral" treatment section by explaining the callback you suggested (and removing the "allow_multiple_dialogs_for_spiral" parameter? I could try to do it by I think you understand much better such part :)
Thanks a lot.
I aki Baz Castillo writes:
However I do know that using "engage_mediaproxy()" function is not possible just to force MediaProxy for certains branches (i.e: those detected as natted). This function is based on dialog module (just one dialog entry is created as we do know). It's like a "dialog flag" so there is no need to invoke mediaproxy for in-dialog requests as the information about using or not mediaproxy belongs to the whole dialog. This of course makes impossible to use mediaproxy just for certain branches.
incorrect. function use_media_proxy() works just fine. i would not even dream of using engage_media_proxy(), just because it depends on dialog module, which (based on the number of bugs fixed) seems to be by far the buggies module of all modules.
-- juha
2010/3/26 Juha Heinanen jh@tutpro.com:
I aki Baz Castillo writes:
> However I do know that using "engage_mediaproxy()" function is not > possible just to force MediaProxy for certains branches (i.e: those > detected as natted). This function is based on dialog module (just one > dialog entry is created as we do know). It's like a "dialog flag" so > there is no need to invoke mediaproxy for in-dialog requests as the > information about using or not mediaproxy belongs to the whole dialog. > This of course makes impossible to use mediaproxy just for certain > branches.
incorrect. function use_media_proxy() works just fine.
Yes, sure, same as using force_rtpproxy() in nathelper module. However I just meant the "engage_media_proxy()" function which is supposed to be a "convenient" function just to be called at the beginning of the dialog rather than per transaction (as use_media_proxy).