[sr-dev] TM possible deadlock

Thu Apr 9 17:07:51 CEST 2015

Hey Daniel,

Ok great, I will test. No I don't see a need to make async_mutex re-entrant
but let me check.

I will test your change too. Do you have a commit ref for your change?

Cheers
Jason

On Mon, 30 Mar 2015 at 17:03 Daniel-Constantin Mierla <miconda at gmail.com>
wrote:

> Hi Jason,
>
> On 03/02/15 15:27, Daniel-Constantin Mierla wrote:
> > Hi Jason,
> >
> > On 03/02/15 08:29, Jason Penton wrote:
> >> Hi Daniel,
> >>
> >> I have found another instance of a similar bug where the a process
> >> gets itself into deadlock. The sequence of events:
> >>
> >> 1. INVITE request is relayed downstream
> >> 2. error response or locally generated 408 occurs
> >> 2. Failure route in cfg is armed and tries to relay to alternate
> >> destination (voicemail/annoucnement server, etc) - using t_relay()
> >> 3. t_relay fails to destination for whatever reason (in this case the
> >> result of the destination being blacklisted in TM)
> >> 4. Original reply is attempted to be relayed upstream (this is all
> >> done safely - ie using t_reply_unsafe() as the REPLY_LOCK has already
> >> been acquired in failure route
> >> 5. At or around line 562 of t_reply.c we check if the reply is > 200
> >> (final response). If it is and if its an INVITE transaction, we clean
> >> up timers and cancel all UAC branches. This is where we hit a problem:
> >>
> >> cancel_uacs( trans, &cancel_data, F_CANCEL_B_KILL, lock );
> >>
> >> 6. cancel_uacs calls cancel_branch if there was no response on the
> >> branch by calling cancel_branch()
> >> 7. cancel branch calls LOCK_REPLIES(t) on the transaction and you hit
> >> deadlock.... (original call to LOCK_REPLIES(t) happened in failure
> >> route processing)
> >>
> >> This use case is slightly different to the one you fixed related to
> >> retransmission timers... (the beginning of this thread)
> >>
> >> Suggestion: What about a pkg/process variable that can be used to
> >> check if you have already acquired the reply lock to combat these
> >> non-reentrant deadlocks. A side-effect of this would save on passing
> >> variables around in various functions indicating intention to lock or
> >> not.
> >>
> >> Thoughts?
> > What about making the reply lock re-entrant? I think it should be safer
> > in long term to allow parts of code executed by same process to safely
> > acquire the lock without worrying if the same process did it before.
> > Global variables in pkg would need more complex management to set, test
> > and reset.
> >
> I made the reply lock re-entrant, this should handle the situation you
> were exposing it. if you have a chance to simulate it, then let me know
> the result.
>
> On a slightly different idea, I wonder if async_mutex should be made
> re-entrant as well, what do you think? I guess it is unlikely to get two
> async attempts in the same process, but you know better how the mutex is
> planned to be used.
>
> Cheers,
> Daniel
>
> --
> Daniel-Constantin Mierla
> http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
> Kamailio World Conference, May 27-29, 2015
> Berlin, Germany - http://www.kamailioworld.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20150409/c0f6fb95/attachment.html>