[sr-dev] TM possible deadlock

Daniel-Constantin Mierla miconda at gmail.com
Wed Apr 9 23:00:04 CEST 2014


Hello,

what are the parameters for tm you are using?

Cheers,
Daniel

On 09/04/14 22:32, Jason Penton wrote:
> Hey Daniel,
>
> Yes I did a test with a very basic config file and I am not able to 
> re-create. However, with my *complex* cfg file I can re-create every 
> time. Tomorrow I will compare what is different and report back... 
> hopefully with fix ;)
>
> here is bt of timer process deadlocking itself:
>
> #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:39
> #1  0x00007f5009f22004 in futex_get (lock=0x7f4fc55030d8) at 
> ../../mem/../futexlock.h:123
> #2  0x00007f5009f223e1 in _lock (s=0x7f4fc55030d8, file=0x7f5009f90fd1 
> "t_cancel.c", function=0x7f5009f91980 "cancel_branch", line=250) at 
> lock.h:99
> #3  0x00007f5009f23271 in cancel_branch (t=0x7f4fc5501b40, branch=0, 
> reason=0x7fff646d03a8, flags=3) at t_cancel.c:250
> #4  0x00007f5009f22c02 in cancel_uacs (t=0x7f4fc5501b40, 
> cancel_data=0x7fff646d03a0, flags=1) at t_cancel.c:123
> #5  0x00007f5009f718c4 in _reply_light (trans=0x7f4fc5501b40,
>     buf=0x7f500a24dc68 "SIP/2.0 500 Server error on LIR select next 
> S-CSCF\r\nVia: SIP/2.0/UDP 
> 10.0.1.167:6060;branch=z9hG4bKb7.2ae09f29ffbd0034cd6d58483053603b.1\r\nVia: 
> SIP/2.0/UDP 10.0.1.166:4060;branch=z9hG4bKb7.3faa03ddea80"..., 
> len=778, code=500, to_tag=0x7f500a1c7ae0 
> "c82b15d7f12ef185f95fe4945457d449-8bab", to_tag_len=37, lock=0, 
> bm=0x7fff646d0b60) at t_reply.c:660
> #6  0x00007f5009f7244c in _reply (trans=0x7f4fc5501b40, 
> p_msg=0x7f500a1c6bc0, code=500, text=0x7f500a249a48 "Server error on 
> LIR select next S-CSCF", lock=0) at t_reply.c:795
> #7  0x00007f5009f76436 in t_reply_unsafe (t=0x7f4fc5501b40, 
> p_msg=0x7f500a1c6bc0, code=500, text=0x7f500a249a48 "Server error on 
> LIR select next S-CSCF") at t_reply.c:1643
> #8  0x00007f5009f57621 in w_t_reply (msg=0x7f500a1c6bc0, 
> p1=0x7f500a2497d8 "\340\332$\nP\177", p2=0x7f500a249870 
> "h\321$\nP\177") at tm.c:1324
> #9  0x000000000041a700 in do_action (h=0x7fff646d1d30, 
> a=0x7f500a24cee8, msg=0x7f500a1c6bc0) at action.c:1119
> #10 0x0000000000423831 in run_actions (h=0x7fff646d1d30, 
> a=0x7f500a24cee8, msg=0x7f500a1c6bc0) at action.c:1607
> #11 0x000000000041a5a4 in do_action (h=0x7fff646d1d30, 
> a=0x7f500a24d478, msg=0x7f500a1c6bc0) at action.c:1102
> #12 0x0000000000423831 in run_actions (h=0x7fff646d1d30, 
> a=0x7f500a249148, msg=0x7f500a1c6bc0) at action.c:1607
> #13 0x000000000041a54e in do_action (h=0x7fff646d1d30, 
> a=0x7f500a24c500, msg=0x7f500a1c6bc0) at action.c:1098
> #14 0x0000000000423831 in run_actions (h=0x7fff646d1d30, 
> a=0x7f500a247a28, msg=0x7f500a1c6bc0) at action.c:1607
> #15 0x0000000000423fdf in run_top_route (a=0x7f500a247a28, 
> msg=0x7f500a1c6bc0, c=0x0) at action.c:1693
> #16 0x00007f5009f73815 in run_failure_handlers (t=0x7f4fc5501b40, 
> rpl=0xffffffffffffffff, code=408, extra_flags=96) at t_reply.c:1061
> #17 0x00007f5009f7527a in t_should_relay_response 
> (Trans=0x7f4fc5501b40, new_code=408, branch=1, 
> should_store=0x7fff646d201c, should_relay=0x7fff646d2018, 
> cancel_data=0x7fff646d2070,
>     reply=0xffffffffffffffff) at t_reply.c:1416
> #18 0x00007f5009f76ede in relay_reply (t=0x7f4fc5501b40, 
> p_msg=0xffffffffffffffff, branch=1, msg_status=408, 
> cancel_data=0x7fff646d2070, do_put_on_wait=0) at t_reply.c:1819
> #19 0x00007f5009f44c88 in fake_reply (t=0x7f4fc5501b40, branch=1, 
> code=408) at timer.c:354
> #20 0x00007f5009f450e7 in final_response_handler 
> (r_buf=0x7f4fc5501e60, t=0x7f4fc5501b40) at timer.c:526
> #21 0x00007f5009f4518d in retr_buf_handler (ticks=260027386, 
> tl=0x7f4fc5501e80, p=0x3e8) at timer.c:584
> #22 0x0000000000544119 in timer_list_expire (t=260027386, 
> h=0x7f4fc527cbe0, slow_l=0x7f4fc527cdf0, slow_mark=0) at timer.c:894
> #23 0x0000000000544418 in timer_handler () at timer.c:959
> #24 0x00000000005446b2 in timer_main () at timer.c:998
> #25 0x0000000000471ddf in main_loop () at main.c:1689
>
>
>
> On Wed, Apr 9, 2014 at 9:34 PM, Daniel-Constantin Mierla 
> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>     Hello,
>
>     that should not be a very rare case and I would expect to be
>     caught so far, anyhow ... this looks like easy to reproduce, have
>     you tried it?
>
>     You can have two kamailio, one relying the invite to the second,
>     which will reply with 100, then wait for the timeout on the first
>     instance. You can add some debug messages in the code to see if
>     the lock is called twice.
>
>     Cheers,
>     Daniel
>
>
>     On 09/04/14 17:51, Jason Penton wrote:
>>     Hi All,
>>
>>     I have been experiencing a deadlock when a timeout occurs on a
>>     t_relayed() INVITE. Going through the code I have noticed a
>>     possible chance of deadlock (without re-entrant enabled). Here is
>>     my thinking:
>>
>>     t_should_relay_response() is called with REPLY_LOCK when the
>>     timer process fires on the fr_inv_timer (no response from the
>>     INVITE that was relayed, other than 100 provisional) and a 408 is
>>     generated. However, from within that function there are calls
>>     to run_failure_handlers() which in turn *could* try and lock the
>>     reply (viz. somebody having a t_reply() call in the cfg file - in
>>     failure route block). This would result in another lock on the
>>     same transaction's REPLY_LOCK....
>>
>>     Has anybody else experienced something like this?
>>
>>     this is on master btw.
>>
>>     Cheers
>>     Jason
>>
>>
>>     _______________________________________________
>>     sr-dev mailing list
>>     sr-dev at lists.sip-router.org  <mailto:sr-dev at lists.sip-router.org>
>>     http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>
>     -- 
>     Daniel-Constantin Mierla -http://www.asipto.com
>     http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  -http://www.linkedin.com/in/miconda
>
>
>     _______________________________________________
>     sr-dev mailing list
>     sr-dev at lists.sip-router.org <mailto:sr-dev at lists.sip-router.org>
>     http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>
>

-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20140409/4730b84f/attachment-0001.html>


More information about the sr-dev mailing list