[sr-dev] CDR information is not getting logged, when dialog:end takes too long

Paul Claudiu Boriga paul.boriga at 1and1.ro
Mon Jul 3 13:31:44 CEST 2017


Hello,

We are experiencing an issue when having some blocking code in the 
dialog:end event route. If the code takes too long to execute then CDR 
information is not logged.

In our case we have a blocking socket trying to establish a connection 
and waiting 500ms before failing. When it timeouts, the accounting 
module will never log any CDR logs for that call. It appears that the 
callbacks for DLGCB_TERMINATED_CONFIRMED event are never called. That is 
because the function dlg_terminated_confirmed is never called (this 
function is registered as a callback to the TMCB_RESPONSE_OUT event from 
tm module).

After further investigation, the problem seems to be related to a race 
condition between two processes when dealing with ending a dialog:

 From what we saw the dialog:end event route is treated when a BYE 
request is received. But if that takes too long, and that BYE request 
will be retransmitted (we are using SIPp to test this, and SIPp will 
automatically retransmit the message), it will then be treated by a 
different process.

When this happens, the first process has already modified the dialog 
state, so the second process will not execute code specific to dialog 
ending from the dialog module dlg_onroute function. It will not execute 
the event route, nor the DLGCB_TERMINATED event callbacks, and it will 
neither register the dlg_terminated_confirmed function to the 
TMCB_RESPONSE_OUT event. This means that the second process will 
actually finish processing the BYE faster then the first process. By the 
time the first process finishes with processing the dialog:end event 
route, the transaction corresponding to the BYE request is already 
created, and even though dlg_terminated_confirmed function is eventually 
registered for the TMCB_RESPONSE_OUT event, it is too late because this 
event was already treated, and all callbacks registered for it were 
called. This means that DLGCB_TERMINATED_CONFIRMED event will never be 
treated, and the registered callbacks for this event (including the 
accounting module functions) will not get called.

We managed to work around this behavior by decreasing the timeout of the 
socket so that the BYE can be processed before a retransmit occurs, but 
maybe there is a different solution to fix this issue. Do you have any 
suggestions on how to solve this problem?

Thank you,
Claudiu Boriga.




More information about the sr-dev mailing list