Hello,
the new dialog module proposal seems to imply that dialog_out management (and thereby, implicitly, dialog_in management) is to be carried out when responses arrive at a proxy, i.e., on execution of the tm callback type TMCB_RESPONSE_IN.
However, as the comment to this callback type in modules/tm/t_hooks.h illustrates, such callbacks will also be run on retransmissions. This means that the dialog module will have to take precautions and either
- check whether the response being observed is a retransmission, for instance by means of checking whether a dialog_out entry corresponding to the response already exists, or - ignore retransmissions and stupidly (over-)write the dialog_out entry, i.e., in case of a retransmission re-write the same entry.
The first option induces some delay for CPU cycles and memory accesses, and adds some complexity w.r.t. correct retransmission checking which must not miss possible corner cases. The second option, in addition to wasting even more resources (maybe even I/O when persistent storage of dialog data is enabled), requires that overwriting of dialog state does not have any side-effects. I am not sure if this is always the case.
Instead of thinking about which option to prefer and what needs to be taken care of, I'd rather re-propose a different approach that I have brought forward before in a slightly different context: Instead of using TMCB_RESPONSE_IN, run the dialog state machinery on TMCB_RESPONSE_PRE_OUT. It's guaranteed to be retransmission-free and should cover everything we have discussed so far.
Moreover, it will render dialog_out state needless and ease state keeping: As long as provisional responses are being forwarded (either by a local or remote proxy), the dialog_in state must be "early" because the proxy would only stop informing the UAC about provisional responses when it received a final response, either due to exhaustion of all branches or a positive OK response. Once the forking proxy decides to forward a response with code 200+, all other early dialogs can be safely deleted and the dialog_in state set accordingly.
Consequently, step 2 of the algorithm that changes dialog_in state ("Algorithm for dialog_in state") given in the proposal boils down to a simple matching rule as follows:
Set dialog_in state to the state corresponding to the return code of the forwarded message. That is, if the response code is * greater than 100, but smaller than 200: set it to "early" (may be skipped if there is a dialog_out entry already as this implies that another early branch has already been established and the dialog_in state set to "early"), * 200: set it to "confirmed" (step 1 will have deleted the remaining branches), * greater than 200: set it to "terminated" (step 1 will have deleted the remaining branches).
In-dialog BYE requests will just have to set the dialog_in state to "terminated", trimming down the state machine further.
Considering the corner case of two branches' 200 responses arriving at exactly the same time (let's name them concurrently confirmed dialogs), I believe that this should yield two completely independent dialog entries, with one dialog_in and dialog_out each. Managing them under the same dialog_in as two dialog_outs gives the impression that they are somewhat related but this is not true as each carries a unique dialog identifier. Furthermore, under the current proposal, I believe that callbacks registered for the yet non-confirmed dialog will be effective for the concurrently confirmed dialog too because both OK responses should be forwarded by the same transaction, resulting in the same dialog callbacks being run for both dialogs. If the transaction module handles simultaneously forwarded OK responses differently, then there will be no callbacks at all for one of the dialogs which is equally bad.
Instead, there should be a clean separation of the two dialogs supported by a new dialog callback type (e.g., DLGCB_CONCURRENTLY_CONFIRMED) which allows the user to decide whether he wants to track the additionally confirmed dialog as well, and how long he wants to track it based on further dialog callbacks he chooses. For example, he may register DGLCB_TERMINATED on the additional dialog or opt that getting to know about its generation is sufficient. Generally, in-dialog requests will be mapped to the respective dialogs (which may involve some more effort because dialog hashes are stored in the route headers during processing of the initial request, but it's doable) and managed individually.
Concluding, this modified approach will keep the new dialog module from doing unnecessary and hard-to-do duplicate checks, eases state management, and keeps concurrently confirmed dialogs cleanly separated.
Opinions?
Cheers,
--Timo