[SR-Users] Kamailio 1.5 stops processing messages

marius zbihlei marius.zbihlei at 1and1.ro
Wed May 19 12:10:42 CEST 2010


Santiago Gimeno wrote:
> Hi,
>
> The problem happened again and I can provide some more info.
> 6 of the UDP worker processes got blocked. By checking the logs I can 
> see that 4 of them, that seem to be related to the same INVITE 
> request, got blocked at the same time. The other 2 got blocked some 
> hours later and not at the same time.
>
> From the 4 first processes, the backtrace of 3 of them is this:
>
> #0  0xb7f6d410 in ?? ()
> #1  0xbff60768 in ?? ()
> #2  0x00000001 in ?? ()
> #3  0xa7358180 in ?? ()
> #4  0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
> #5  0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182
> #6  0xb7b52587 in t_lookup_request (p_msg=0x82403d0, 
> leave_new_locked=1) at t_lookup.c:468
> #7  0xb7b534ae in t_newtran (p_msg=0x82403d0) at t_lookup.c:1124

> The backtrace of the other is:
>
> #0  0xb7f6d410 in ?? ()
> #1  0xbff60138 in ?? ()
> #2  0x00000001 in ?? ()
> #3  0xa7358180 in ?? ()
> #4  0xb7ec94ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
> #5  0xb7b37463 in lock_hash (i=19819) at ../../mem/../fastlock.h:182
> #6  0xb7b6ce01 in t_uac (method=0xbff60558, headers=0x81e3108, 
> body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 <publ_cback_func>, 
> cbp=0xa7715158)
>     at uac.c:306
> #7  0xb7b6e311 in request (m=0xbff60558, ruri=0x81d9adc, to=0x81d9adc, 
> from=0x81d9adc, h=0x81e3108, b=0x81d9afb, oburi=0xb73564ac,
>     cb=0xb734a622 <publ_cback_func>, cbp=0xa7715158) at uac.c:503
> #8  0xb7349641 in send_publish (publ=0x81d9aa8) at send_publish.c:552
> #9  0xb73339bf in dialog_publish (state=0xb7335bb4 "Trying", 
> entity=0xa7709f34, peer=0xa7709f3c, callid=0xa7709f2c, initiator=1, 
> lifetime=300,
>     localtag=0x0, remotetag=0x0, localtarget=0x0, remotetarget=0x0) at 
> dialog_publish.c:347
> #10 0xb73348ea in __dialog_created (dlg=0xa7709ef0, type=2, 
> _params=0xb7a7cb9c) at pua_dialoginfo.c:343
> #11 0xb7a586ff in run_create_callbacks (dlg=0xa7709ef0, msg=0x81f18a8) 
> at dlg_cb.c:230
> #12 0xb7a60d1f in dlg_new_dialog (msg=0x81f18a8, t=0xa75deeb0) at 
> dlg_handlers.c:494
> #13 0xb7a61f77 in dlg_onreq (t=0xa75deeb0, type=1, param=0xb7b785a8) 
> at dlg_handlers.c:414
> #14 0xb7b4a791 in run_reqin_callbacks (trans=0xa75deeb0, 
> req=0x81f18a8, code=1) at t_hooks.c:272
> #15 0xb7b376af in build_cell (p_msg=0x81f18a8) at h_table.c:284
> #16 0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064
> #17 0xb7b4540c in t_relay_to (p_msg=0x81f18a8, proxy=0x0, flags=8) at 
> t_funcs.c:212
> #18 0xb7b58ac7 in w_t_relay (p_msg=0x81f18a8, proxy=0x0, flags=0x8 
> <Address 0x8 out of bounds>) at tm.c:1002
> #19 0x0805301c in do_action (a=0x818c370, msg=0x81f18a8) at action.c:874
> #20 0x080557aa in run_action_list (a=0x818c370, msg=0x81f18a8) at 
> action.c:145
> #21 0x0809c304 in eval_expr (e=0x818c3d8, msg=0x81f18a8, val=0x0) at 
> route.c:1171
> #22 0x0809bd80 in eval_expr (e=0x818c400, msg=0x81f18a8, val=0x0) at 
> route.c:1488
> #23 0x0809bd16 in eval_expr (e=0x818c428, msg=0x81f18a8, val=0x0) at 
> route.c:1493
> #24 0x080527ed in do_action (a=0x818c740, msg=0x81f18a8) at action.c:729
> #25 0x080557aa in run_action_list (a=0x818be08, msg=0x81f18a8) at 
> action.c:145
> #26 0x08053efb in do_action (a=0x81a12e0, msg=0x81f18a8) at action.c:120
> #27 0x080557aa in run_action_list (a=0x818ee78, msg=0x81f18a8) at 
> action.c:145
> #28 0x08053efb in do_action (a=0x81b3448, msg=0x81f18a8) at action.c:120
> #29 0x080557aa in run_action_list (a=0x81b0ed0, msg=0x81f18a8) at 
> action.c:145
> #30 0x08054491 in do_action (a=0x81b5f68, msg=0x81f18a8) at action.c:746
> #31 0x080557aa in run_action_list (a=0x81b5f68, msg=0x81f18a8) at 
> action.c:145
> #32 0x08054f2d in do_action (a=0x81b5fd0, msg=0x81f18a8) at action.c:752
> #33 0x080557aa in run_action_list (a=0x81aefd0, msg=0x81f18a8) at 
> action.c:145
> #34 0x08053efb in do_action (a=0x818bc08, msg=0x81f18a8) at action.c:120
> #35 0x080557aa in run_action_list (a=0x8187910, msg=0x81f18a8) at 
> action.c:145
> #36 0x08055b43 in run_top_route (a=0x8187910, msg=0x81f18a8) at 
> action.c:120
> #37 0x0808c659 in receive_msg (
>     buf=0x8158040 "INVITE sip:xxxxx at xxxxxxxxxxxxxx.com 
> <mailto:sip%3Axxxxx at xxxxxxxxxxxxxx.com> SIP/2.0\r\nRecord-Route: 
> <sip:10.100.29.7;lr=on;ftag=as60035314>\r\nVia: SIP/2.0/UDP 
> 10.100.29.7;branch=z9hG4bKb6d4.a49d7633.0\r\nVia: SIP/2.0/UDP 
> 10.100.29.8:5060;branch=z9hG"..., len=926, rcv_info=0xbff62334) at 
> receive.c:175
> #38 0x080c3ea3 in udp_rcv_loop () at udp_server.c:449
> #39 0x0806e394 in main (argc=9, argv=0xbff62514) at main.c:774,
>

Hello

I am a little busy atm, so before I dig into the code, I have a question 
for core devs. Is the LOCK_HASH() call  recursive (being called again 
from the same process will not block) ? I ask this because in the 4th 
blocked INVITE the hash _might_ be blocked by both t_newtran(#16 
0xb7b535fa in t_newtran (p_msg=0x81f18a8) at t_lookup.c:1064)
and 6  t_uac (#6  0xb7b6ce01 in t_uac (method=0xbff60558, 
headers=0x81e3108, body=0x81d9afb, dialog=0xa772c6a8, cb=0xb734a622 
<publ_cback_func>, cbp=0xa7715158)), thus causing a deadlock.

Thanks
Marius



More information about the sr-users mailing list