[sr-dev] dedicated mutex for t_continue()

Jason Penton jason.penton at gmail.com
Wed May 13 08:56:02 CEST 2015


Hey Daniel,

You are correct this is no longer needed - we actually only used this to be
able to set a flag on the transaction that we later used for branch
picking. We made a fix a while back for doing the correct branch picking
which no longer required this extra flag and therefore no longer required
the mutex. I have tested in our env and all looks good.

p.s. I have committed the cleanup.

Cheers
Jason

On Tue, 12 May 2015 at 10:34 Daniel-Constantin Mierla <miconda at gmail.com>
wrote:

>  Hi Jason,
>
> ok, would be great to sort it out properly, thanks for your time!
>
> Cheers,
> Daniel
>
>
> On 12/05/15 10:10, Jason Penton wrote:
>
> Hey Daniel,
>
> Okay great, let me look into this. It will be great if we have one less
> mutex to worry about ;) - If not required I will remove and commit.
>
>  Cheers
> Jason
>
> On Mon, 11 May 2015 at 09:55 Daniel-Constantin Mierla <miconda at gmail.com>
> wrote:
>
>> Hi Jason,
>>
>> over the weekend I pushed a patch that disables the use of dedicated
>> mutex for t_continue(). It can be enabled by defining ENABLE_ASYNC_MUTEX.
>>
>> While investigated some reports of crash when removing from time, I
>> found the potential of a race when t_coninue() is executed at the time
>> the fr_timer for suspended transaction elapsed. The timer process will
>> get the transaction out and remove it from timer under the reply lock
>> and the worker doing t_continue() will get it out under the async lock.
>>
>> I looked at the commit you did when introducing the dedicated async
>> mutex, the note being:
>>
>>         - "dedicated lock to prevent multiple invocations of suspend on
>> tz (reply lock used to be used)"
>>
>> Perhaps tz is tx and stands for transmission - however, the reply lock
>> should be safe for this case as well. Moreover, the continue is like the
>> suspended branch got a reply and transaction continues processing, which
>> implies the reply lock is aquired (like execution of failure_route,
>> which can also happen if fr_timer elapses before t_continue() is
>> executed).
>>
>> Given those, I don't see anymore a reason for dedicated async mutex.
>> Also, it protects to races of using two mutexes, which can easily lead
>> to deadlocks (e.g., one process acquires the reply lock and tries to get
>> the async lock while another one wanted first the reply lock and later
>> the async lock).
>>
>> For now I disabled the code with defines, as I wanted to discuss and be
>> sure I haven't overlooked any issue you tried to avoid with the
>> dedicated mutex. Let me know what you think about.
>>
>> Cheers,
>> Daniel
>>
>> --
>> Daniel-Constantin Mierla
>> http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
>> Kamailio World Conference, May 27-29, 2015
>> Berlin, Germany - http://www.kamailioworld.com
>>
>>
> --
> Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
> Kamailio World Conference, May 27-29, 2015
> Berlin, Germany - http://www.kamailioworld.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20150513/7a4f7e76/attachment-0001.html>


More information about the sr-dev mailing list