[sr-dev] [Fwd: Re: [SR-Users] Kamailio 1.5 stops processing messages]

Santiago Gimeno santiago.gimeno at gmail.com
Tue May 25 10:32:23 CEST 2010


Hi,

Just to add some more info that could help:

- The same problem happened again and I've been able to print the buffer
with the message that has caused the deadlock:

0x8158040 <buf.4000>:    "INVITE
sip:yyyyyy at xxxxxxxxx.com<sip%3Ayyyyyy at xxxxxxxxx.com>SIP/2.0\r\nRecord-Route:
<sip:10.172.0.252;lr=on;ftag=as2b58915e>\r\nVia:
SIP/2.0/UDP 10.172.0.252;branch=z9hG4bKa81d.d5c45ce4.0\r\nVia: SIP/2.0/UDP
10.172.0.253:5060;branch=z"...
0x8158108 <buf.4000+200>:        "9hG4bK3fd5e08b;rport=5060\r\nFrom:
\"SSSSSS Ssssss\" <sip:zzzzz at xxxxxxxxx.com
<sip%3Azzzzz at xxxxxxxxx.com>>;tag=as2b58915e\r\nTo:
<sip:yyyyy at xxxxxxxxxx.com <sip%3Ayyyyy at xxxxxxxxxx.com>>\r\nContact: <
sip:zzzzz at 10.172.0.253 <sip%3Azzzzz at 10.172.0.253>>\r\nCall-ID:
50bec1d32a47ca2b3a71253357c4f11e at x"...
0x81581d0 <buf.4000+400>:        "xxxxxxxx.com\r\nCSeq: 102
INVITE\r\nUser-Agent: Vvvvvvvvvvvvv\r\nMax-Forwards: 68\r\nDate: Tue, 25 May
2010 17:20:08 GMT\r\nAllow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER,
SUBSCRIBE, NOTIFY\r\nSupported: replace"...
0x8158298 <buf.4000+600>:        "s\r\nOrigin: from-phone\r\nContent-Type:
application/sdp\r\nContent-Length: 263\r\n\r\nv=0\r\no=root 28715 28715 IN
IP4 10.172.0.253\r\ns=session\r\nc=IN IP4 10.172.0.253\r\nt=0 0\r\nm=audio
15034 RTP/AVP 18 101\r\na=rtpma"...
0x8158360 <buf.4000+800>:        "p:18 G729/8000\r\na=fmtp:18
annexb=no\r\na=rtpmap:101 telephone-event/8000\r\na=fmtp:101
0-16\r\na=silenceSupp:off - - - -\r\na=ptime:20\r\na=sendrecv\r\n"

- By checking the backtraces I've seen that the pua_dialoginfo module could
be involved and in this server we applied the patches from:
http://sip-router.org/tracker/index.php?do=details&task_id=18 and
http://sip-router.org/tracker/index.php?do=details&task_id=20. I think these
were ported to 3.0.0 but not to 1.5. Could they be causing the problem?

Thank you in advance.

Regards,

Santi

2010/5/20 marius zbihlei <marius.zbihlei at 1and1.ro>

> Forwarded the message from sr-users to sr-dev list
>
> Cheers
> Marius
>
> Santiago Gimeno wrote:
>
>> Hi,
>>
>> The problem happened again and I can provide some more info.
>> 6 of the UDP worker processes got blocked. By checking the logs I can see
>> that 4 of them, that seem to be related to the same INVITE request, got
>> blocked at the same time. The other 2 got blocked some hours later and not
>> at the same time.
>>
>> From the 4 first processes, the backtrace of 3 of them is this:
>>
>> #0  0xb7f6d410 in ?? ()
>> #1  0xbff60768 in ?? ()
>> #2  0x00000001 in ?? ()
>> #3  0xa7358180 in ?? ()
>> #4  0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
>> #5  0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182
>> #6  0xb7b52587 in t_lookup_request (p_msg=0x82403d0, leave_new_locked=1)
>> at t_lookup.c:468
>> #7  0xb7b534ae in t_newtran (p_msg=0x82403d0) at t_lookup.c:1124
>>
>
>  The backtrace of the other is:
>>
>> #0  0xb7f6d410 in ?? ()
>> #1  0xbff60138 in ?? ()
>> #2  0x00000001 in ?? ()
>> #3  0xa7358180 in ?? ()
>> #4  0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
>> #5  0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182
>> #6  0xb7b6ce01 in t_uac (method=0xbff60558, headers=0x81e3108,
>> body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 <publ_cback_func>,
>> cbp=0xa7715158)
>>    at uac.c:306
>> #7  0xb7b6e311 in request (m=0xbff60558, ruri=0x81d9adc, to=0x81d9adc,
>> from=0x81d9adc, h=0x81e3108, b=0x81d9afb, oburi=0xb73564ac,
>>    cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158) at uac.c:503
>> #8  0xb7349641 in send_publish (publ=0x81d9aa8) at send_publish.c:552
>> #9  0xb73339bf in dialog_publish (state=0xb7335bb4 "Trying",
>> entity=0xa7709f34, peer=0xa7709f3c, callid=0xa7709f2c, initiator=1,
>> lifetime=300,
>>    localtag=0x0, remotetag=0x0, localtarget=0x0, remotetarget=0x0) at
>> dialog_publish.c:347
>> #10 0xb73348ea in __dialog_created (dlg=0xa7709ef0, type=2,
>> _params=0xb7a7cb9c) at pua_dialoginfo.c:343
>> #11 0xb7a586ff in run_create_callbacks (dlg=0xa7709ef0, msg=0x81f18a8) at
>> dlg_cb.c:230
>> #12 0xb7a60d1f in dlg_new_dialog (msg=0x81f18a8, t=0xa75deeb0) at
>> dlg_handlers.c:494
>> #13 0xb7a61f77 in dlg_onreq (t=0xa75deeb0, type=1, param=0xb7b785a8) at
>> dlg_handlers.c:414
>> #14 0xb7b4a791 in run_reqin_callbacks (trans=0xa75deeb0, req=0x81f18a8,
>> code=1) at t_hooks.c:272
>> #15 0xb7b376af in build_cell (p_msg=0x81f18a8) at h_table.c:284
>> #16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064
>> #17 0xb7b4540c in t_relay_to (p_msg=0x81f18a8, proxy=0x0, flags=8) at
>> t_funcs.c:212
>> #18 0xb7b58ac7 in w_t_relay (p_msg=0x81f18a8, proxy=0x0, flags=0x8
>> <Address 0x8 out of bounds>) at tm.c:1002
>> #19 0x0805301c in do_action (a=0x818c370, msg=0x81f18a8) at action.c:874
>> #20 0x080557aa in run_action_list (a=0x818c370, msg=0x81f18a8) at
>> action.c:145
>> #21 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x81f18a8, val=0x0) at
>> route.c:1171
>> #22 0x0809bd80 in eval_expr (e=0x818c400, msg=0x81f18a8, val=0x0) at
>> route.c:1488
>> #23 0x0809bd16 in eval_expr (e=0x818c428, msg=0x81f18a8, val=0x0) at
>> route.c:1493
>> #24 0x080527ed in do_action (a=0x818c740, msg=0x81f18a8) at action.c:729
>> #25 0x080557aa in run_action_list (a=0x818be08, msg=0x81f18a8) at
>> action.c:145
>> #26 0x08053efb in do_action (a=0x81a12e0, msg=0x81f18a8) at action.c:120
>> #27 0x080557aa in run_action_list (a=0x818ee78, msg=0x81f18a8) at
>> action.c:145
>> #28 0x08053efb in do_action (a=0x81b3448, msg=0x81f18a8) at action.c:120
>> #29 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x81f18a8) at
>> action.c:145
>> #30 0x08054491 in do_action (a=0x81b5f68, msg=0x81f18a8) at action.c:746
>> #31 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81f18a8) at
>> action.c:145
>> #32 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81f18a8) at action.c:752
>> #33 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81f18a8) at
>> action.c:145
>> #34 0x08053efb in do_action (a=0x818bc08, msg=0x81f18a8) at action.c:120
>> #35 0x080557aa in run_action_list (a=0x8187910, msg=0x81f18a8) at
>> action.c:145
>> #36 0x08055b43 in run_top_route (a=0x8187910, msg=0x81f18a8) at
>> action.c:120
>> #37 0x0808c659 in receive_msg (
>>    buf=0x8158040 "INVITE sip:xxxxx at xxxxxxxxxxxxxx.com<sip%3Axxxxx at xxxxxxxxxxxxxx.com><mailto:
>> sip%3Axxxxx at xxxxxxxxxxxxxx.com <sip%253Axxxxx at xxxxxxxxxxxxxx.com>>
>> SIP/2.0\r\nRecord-Route: <sip:10.100.29.7;lr=on;ftag=as60035314>\r\nVia:
>> SIP/2.0/UDP 10.100.29.7;branch=z9hG4bKb6d4.a49d7633.0\r\nVia: SIP/2.0/UDP
>> 10.100.29.8:5060;branch=z9hG"..., len=926, rcv_info=0xbff62334) at
>> receive.c:175
>> #38 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449
>> #39 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774,
>>
>>
> Hello
>
> I am a little busy atm, so before I dig into the code, I have a question
> for core devs. Is the LOCK_HASH() call  recursive (being called again from
> the same process will not block) ? I ask this because in the 4th blocked
> INVITE the hash _might_ be blocked by both t_newtran(#16 0xb7b535fa in
> t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064)
> and 6  t_uac (#6  0xb7b6ce01 in t_uac (method=0xbff60558,
> headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622
> <publ_cback_func>, cbp=0xa7715158)), thus causing a deadlock.
>
> Thanks
> Marius
>
> _______________________________________________
> SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
> sr-users at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
>
>
> _______________________________________________
> sr-dev mailing list
> sr-dev at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20100525/cc0b833d/attachment.htm>


More information about the sr-dev mailing list