[sr-dev] TM possible deadlock

Daniel-Constantin Mierla miconda at gmail.com
Tue Feb 3 15:27:38 CET 2015


Hi Jason,

On 03/02/15 08:29, Jason Penton wrote:
> Hi Daniel,
>
> I have found another instance of a similar bug where the a process
> gets itself into deadlock. The sequence of events:
>
> 1. INVITE request is relayed downstream
> 2. error response or locally generated 408 occurs
> 2. Failure route in cfg is armed and tries to relay to alternate
> destination (voicemail/annoucnement server, etc) - using t_relay()
> 3. t_relay fails to destination for whatever reason (in this case the
> result of the destination being blacklisted in TM)
> 4. Original reply is attempted to be relayed upstream (this is all
> done safely - ie using t_reply_unsafe() as the REPLY_LOCK has already
> been acquired in failure route
> 5. At or around line 562 of t_reply.c we check if the reply is > 200
> (final response). If it is and if its an INVITE transaction, we clean
> up timers and cancel all UAC branches. This is where we hit a problem:
>
> cancel_uacs( trans, &cancel_data, F_CANCEL_B_KILL, lock );
>
> 6. cancel_uacs calls cancel_branch if there was no response on the
> branch by calling cancel_branch()
> 7. cancel branch calls LOCK_REPLIES(t) on the transaction and you hit
> deadlock.... (original call to LOCK_REPLIES(t) happened in failure
> route processing)
>
> This use case is slightly different to the one you fixed related to
> retransmission timers... (the beginning of this thread)
>
> Suggestion: What about a pkg/process variable that can be used to
> check if you have already acquired the reply lock to combat these
> non-reentrant deadlocks. A side-effect of this would save on passing
> variables around in various functions indicating intention to lock or
> not. 
>
> Thoughts?
What about making the reply lock re-entrant? I think it should be safer
in long term to allow parts of code executed by same process to safely
acquire the lock without worrying if the same process did it before.
Global variables in pkg would need more complex management to set, test
and reset.

Cheers,
Daniel

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda




More information about the sr-dev mailing list