Hello,
Thank you very much for your help. I have tried using t_precheck_trans()
to detect retransmission, and not allow parallel processing for BYE
requests, and the issue is solved by this modification.
Also using an mqueue+rtimer solution works, and no retransmits of the
BYE request are happening because of it.
Regards,
Claudiu.
On 03.07.2017 14:58, Daniel-Constantin Mierla wrote:
Hello,
do you have proper retransmission detection in the kamailio.cfg? If you
look at etc/kamailio.cfg with the latest master, it uses
t_precheck_trans() to detect the retransmission in early phase and not
allow parallel processing of the same request.
If routing of BYE doesn't depend on any response from your socket
communication in event route dialog:end, maybe you can redesign a bit
and do that in a dedicated process using mqueue+rtimer, so you avoid
blocking from sip routing workers.
Cheers,
Daniel
On 03.07.17 13:31, Paul Claudiu Boriga wrote:
> Hello,
>
> We are experiencing an issue when having some blocking code in the
> dialog:end event route. If the code takes too long to execute then CDR
> information is not logged.
>
> In our case we have a blocking socket trying to establish a connection
> and waiting 500ms before failing. When it timeouts, the accounting
> module will never log any CDR logs for that call. It appears that the
> callbacks for DLGCB_TERMINATED_CONFIRMED event are never called. That
> is because the function dlg_terminated_confirmed is never called (this
> function is registered as a callback to the TMCB_RESPONSE_OUT event
> from tm module).
>
> After further investigation, the problem seems to be related to a race
> condition between two processes when dealing with ending a dialog:
>
> From what we saw the dialog:end event route is treated when a BYE
> request is received. But if that takes too long, and that BYE request
> will be retransmitted (we are using SIPp to test this, and SIPp will
> automatically retransmit the message), it will then be treated by a
> different process.
>
> When this happens, the first process has already modified the dialog
> state, so the second process will not execute code specific to dialog
> ending from the dialog module dlg_onroute function. It will not
> execute the event route, nor the DLGCB_TERMINATED event callbacks, and
> it will neither register the dlg_terminated_confirmed function to the
> TMCB_RESPONSE_OUT event. This means that the second process will
> actually finish processing the BYE faster then the first process. By
> the time the first process finishes with processing the dialog:end
> event route, the transaction corresponding to the BYE request is
> already created, and even though dlg_terminated_confirmed function is
> eventually registered for the TMCB_RESPONSE_OUT event, it is too late
> because this event was already treated, and all callbacks registered
> for it were called. This means that DLGCB_TERMINATED_CONFIRMED event
> will never be treated, and the registered callbacks for this event
> (including the accounting module functions) will not get called.
>
> We managed to work around this behavior by decreasing the timeout of
> the socket so that the BYE can be processed before a retransmit
> occurs, but maybe there is a different solution to fix this issue. Do
> you have any suggestions on how to solve this problem?
>
> Thank you,
> Claudiu Boriga.
>
>
> _______________________________________________
> Kamailio (SER) - Development Mailing List
> sr-dev(a)lists.kamailio.org
>
https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev