[Devel] TM failure route bug

Bogdan-Andrei Iancu bogdan at voice-system.ro
Tue Oct 18 20:23:35 CEST 2005


Hi everybody,

thanks to the help from Klaus, we manage to hunt down one of the 
remaining bugs. It was reported as a random crash when executing failure 
route for CANCEL requests.

I found rather interesting the scenario and what was happening inside 
OpenSER, so, here is the short-version story - basically a story about 
an INVITE and its CANCEL:
    1) a fast CANCEL happens and the CANCEL is processed by the proxy 
prior to INVITE. So, no INVITE is matched by CANCEL.
    2) CANCEL is is just forwarded statefully without any other info 
regarding the INVITE
    3) the INVITE is finally processed and, without any knowledge of 
CANCEL, it's statefully forwarded.
    4) the destination for both CANCEL and INVITE (the same) does not 
reply at all
    5) final response timer (FR) hits for CANCEL and triggers the 
sending of 408 for CANCEL
    6) 408 reply triggers failure route execution for CANCEL
    7) t_relay() called in failure route for CANCEL looks again for the 
corresponding INVITE (if any) and this time actually finds one
    8) as the found INVITE transaction still has a pending branch (the 
initial forward) which haven't received any reply, it's trigger the 
sending of "487 Request Terminated" reply for the INVITE
    9) 487 reply triggers failure route execution for INVITE. But 
STOP!!!!! we are already in another failure route execution for CANCEL 
reply (step 6). So we get to have nested failure route execution!!!!!!


The problem is that failure route execution wasn't design for nested 
calls as it relays on static variables to backup vital informations 
regarding the TM environment and request.

And here is the crash :(.

I did a fast fixup in TM : failure route execution uses a stack to 
backup the info (instead of static variables). This will allow normal 
processing of this scenario.
This fixup is not the best: as previously talked with Klaus, the normal 
behaviour would be that on step 2, when the INVITE is processes, to 
detect the presence of the CANCEL and to stop any relay for INVITE. But 
this is more complex and I prefer to do it after the release.

Again, many thanks to Klaus for help in testing  ;)


regards,
bogdan



More information about the Devel mailing list