[sr-dev] [Fwd: Re: [SR-Users] Kamailio 1.5 stops processing messages]

Andrei Pelinescu-Onciul andrei at iptel.org
Wed May 26 18:15:54 CEST 2010


On May 25, 2010 at 17:10, marius zbihlei <marius.zbihlei at 1and1.ro> wrote:
> Andrei Pelinescu-Onciul wrote:
> >On May 20, 2010 at 13:53, marius zbihlei <marius.zbihlei at 1and1.ro> wrote:
> >>Forwarded the message from sr-users to sr-dev list
> >>
> >>Cheers
> >>Marius
> >
> >[...]
> >>Hello
> >>
> >>I am a little busy atm, so before I dig into the code, I have a
> >>question for core devs. Is the LOCK_HASH() call  recursive (being
> >>called again from the same process will not block) ? I ask this
> >>because in the 4th blocked INVITE the hash _might_ be blocked by
> >>both t_newtran(#16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at
> >>t_lookup.c:1064)
> >
> >No it's not recursive (it will deadlock if called twice for the same
> >entry in the same process). This is true for all *ser versions
> >(sip-router, kamailio < 3.0, ser *).
> >
> Hello Andrei
> 
> I have looked thru the code and it seems this is the case. While the
> transaction is locked in t_newtrans(), the tm callbacks are called,
> which call the pua_dialog callback, which in turn call a transaction
> function that requires the same lock; so the mutex is locked twice
> from the same process. Is there anyway we can prevent this from the
> future? One way is to patch pua_dialog so it doesn't call t_uac()(?)
> but this still leaves the possibility of another lock happening
> somewhere else. I was thinking that a better approach was
> implementing a recursive mutex.


I don't like it. It would induce a performance hit (depending on the
arch. and the locking method used).  While the performance hit of a
single locking operation will be small, given the number of
locks/unlocks we do under heavy traffic it will have a measurable impact
on the overall performance.
Moreover it's only use would be to "hide" bugs in the code (IMHO locking
twice it's a bug).

> 
> If the recursive mutex solution is not desired, what can we do to
> prevent these types of deadlocks  ?

Well, there is no general solution and I'm not familiar with the dialog
code in question. Somehow locking twice should be avoided (unsafe
callback called from failure route?).

Does the same deadlock appears with 3.0/sip-router?
> 
> Marius
> >>and 6  t_uac (#6  0xb7b6ce01 in t_uac (method=0xbff60558,
> >>headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622
> >><publ_cback_func>, cbp=0xa7715158)), thus causing a deadlock.
> >>
> >

Andrei



More information about the sr-dev mailing list