[SR-Users] two Kamailio instances suddenly stopped handling packets

Daniel-Constantin Mierla miconda at gmail.com
Fri Jan 12 11:08:40 CET 2018


Hello,

have you checked that any of the route blocks used in between lock() and
unlock() don't do an exit/drop (also not return 0)?

Do you have other lock() / unlock() in the config?

Cheers,
Daniel


On 12.01.18 08:49, Sebastian Damm wrote:
> Hi,
>
> this morning, two Kamailio servers suddenly stopped working after
> having worked without a problem for months. Both are running 5.0.2 on
> Debian Jessie. Those systems only analyze mirrored traffic and write
> some information to a RabbitMQ.
>
> I tried restarting, but after a few seconds they get stuck the same
> way as before. Then I attached a gdb to some of the UDP listeners, and
> they all look pretty much the same.
>
>
> ===========8< process 1 =================
> (gdb) bt
> #0  0x00007f404ca435b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f4049f41260 in futex_get (lock=0x7f4041bccd00) at
> ../../core/parser/../mem/../futexlock.h:108
> #2  0x00007f4049f477d7 in cfg_lock_helper (lkey=0x7fff3c048650,
> mode=0) at cfgutils.c:667
> #3  0x00007f4049f481ed in w_cfg_lock_wrapper (msg=0x7f404ba17f18,
> key=0x7f404ba15488, mode=0) at cfgutils.c:712
> #4  0x00007f4049f4823c in w_cfg_lock (msg=0x7f404ba17f18,
> key=0x7f404ba15488 "\210a\241K@\177", s2=0x0) at cfgutils.c:717
> #5  0x000000000045cd85 in do_action (h=0x7fff3c049410,
> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1060
> #6  0x0000000000469afa in run_actions (h=0x7fff3c049410,
> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1552
> #7  0x000000000045958d in do_action (h=0x7fff3c049410,
> a=0x7f404b9ce558, msg=0x7f404ba17f18) at core/action.c:678
> #8  0x0000000000469afa in run_actions (h=0x7fff3c049410,
> a=0x7f404b9cdf70, msg=0x7f404ba17f18) at core/action.c:1552
> #9  0x000000000046a2bd in run_top_route (a=0x7f404b9cdf70,
> msg=0x7f404ba17f18, c=0x0) at core/action.c:1641
> #10 0x0000000000580033 in receive_msg (
>     buf=0xa393f9 <buf+121> "REGISTER sip:domain SIP/2.0\r\nVia:
> SIP/2.0/UDP
> 1.2.3.4;branch=z9hG4bKfc98.c762d7151f110c7eb71fc7d4e0648f1f.0\r\nVia:
> SIP/2.0/UDP
> 192.168.0.3:5060;rport=38020;received=62.30.8.128;branch=z9hG4"...,
>     len=799, rcv_info=0x7fff3c049810) at core/receive.c:264
> #11 0x00007f40483efcdf in parsing_hepv3_message (buf=0xa39380 <buf>
> "HEP3\003\230", len=920) at hep.c:499
> #12 0x00007f40483ee264 in hepv3_received (buf=0xa39380 <buf>
> "HEP3\003\230", len=920, ri=0x7fff3c049a50) at hep.c:231
> #13 0x00007f40483ec9cb in hep_msg_received (data=0x7fff3c049a30) at
> hep.c:85
> #14 0x000000000049e4e1 in sr_event_exec (type=7, data=0x7fff3c049a30)
> at core/events.c:263
> #15 0x000000000048731a in udp_rcv_loop () at core/udp_server.c:466
> #16 0x0000000000422d08 in main_loop () at main.c:1623
> #17 0x000000000042a408 in main (argc=13, argv=0x7fff3c049ef8) at
> main.c:2643
> (gdb) quit
> ===========8< process 1 =================
>
>
> ===========8< process 2 =================
> (gdb) bt
> #0  0x00007f404ca435b9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f4049f41260 in futex_get (lock=0x7f4041bccd00) at
> ../../core/parser/../mem/../futexlock.h:108
> #2  0x00007f4049f477d7 in cfg_lock_helper (lkey=0x7fff3c048650,
> mode=0) at cfgutils.c:667
> #3  0x00007f4049f481ed in w_cfg_lock_wrapper (msg=0x7f404ba17f18,
> key=0x7f404ba15488, mode=0) at cfgutils.c:712
> #4  0x00007f4049f4823c in w_cfg_lock (msg=0x7f404ba17f18,
> key=0x7f404ba15488 "\210a\241K@\177", s2=0x0) at cfgutils.c:717
> #5  0x000000000045cd85 in do_action (h=0x7fff3c049600,
> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1060
> #6  0x0000000000469afa in run_actions (h=0x7fff3c049600,
> a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1552
> #7  0x000000000045958d in do_action (h=0x7fff3c049600,
> a=0x7f404b9cf920, msg=0x7f404ba17f18) at core/action.c:678
> #8  0x0000000000469afa in run_actions (h=0x7fff3c049600,
> a=0x7f404b9ce9c0, msg=0x7f404ba17f18) at core/action.c:1552
> #9  0x000000000046a2bd in run_top_route (a=0x7f404b9ce9c0,
> msg=0x7f404ba17f18, c=0x7fff3c049600) at core/action.c:1641
> #10 0x0000000000580a69 in receive_msg (
>     buf=0xa393f9 <buf+121> "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP
> 172.20.40.8;branch=z9hG4bKa69c.5cca5a54c4d270c515eeb4bbb5e0bb44.0\r\nVia:
> SIP/2.0/UDP 2.3.4.5:5060 <http://2.3.4.5:5060>\r\nContact:
> <sip:1234567 at 9.8.7.6:59669;transport=UDP>\r\nTo: "..., len=506,
> rcv_info=0x7fff3c049810) at core/receive.c:327
> #11 0x00007f40483efcdf in parsing_hepv3_message (buf=0xa39380 <buf>
> "HEP3\002s", len=627) at hep.c:499
> #12 0x00007f40483ee264 in hepv3_received (buf=0xa39380 <buf>
> "HEP3\002s", len=627, ri=0x7fff3c049a50) at hep.c:231
> #13 0x00007f40483ec9cb in hep_msg_received (data=0x7fff3c049a30) at
> hep.c:85
> #14 0x000000000049e4e1 in sr_event_exec (type=7, data=0x7fff3c049a30)
> at core/events.c:263
> #15 0x000000000048731a in udp_rcv_loop () at core/udp_server.c:466
> #16 0x0000000000422d08 in main_loop () at main.c:1623
> #17 0x000000000042a408 in main (argc=13, argv=0x7fff3c049ef8) at
> main.c:2643
> (gdb) quit
> ===========8< process 2 =================
>
> Looks to me as if Kamailio gets stuck trying to get a lock for this
> packet. 
>
> The config file when handling those packets looks like this:
>
> request_route {
>         route(initialize_variables);
>         route(foo);
> }
>
> onreply_route {
>         route(initialize_variables);
>         route(foo);
> }
>
> route[initialize_variables] {
>         $vn(bar) = $null;
>         $vn(baz) = $null;
>         $vn(barbaz) = $null;
>         $vn(foobar) = $null;
>
> }
>
> route[foo] {
>         lock("$ci");
>         xlog("L_DBG", "Obtained lock, calling lua...\n");
>         if(!lua_run("handle_packet")) {
>                 xlog("L_ERR", "SCRIPT: failed to execute lua
> function!\n");
>         }
>
>         if ($vn(bar) != $null) {
>           route(mediaIpToRedis);
>         }
>
>         if ($vn(baz) != $null) {
>           route(channelInfoToRedis);
>         }
>
>         if ($vn(barbaz) != $null) {
>                 route(sendToQueue);
>         }
>         xlog("L_DBG", "All finished, releasing lock...\n");
>         unlock("$ci");
>         xlog("L_DBG", "Released lock...\n");
>
>         # update stats
>         if ($vn(foobar) != $null) {
>                 update_stat("$vn(foobar)", "+1");
>         }
>
>         drop;
>         exit;
> }
>
> Is there any race condition I am missing? Until this morning, it ran
> without problems and only every 2 minutes or so, 4 packets were sent
> to the RabbitMQ. So the system throws away nearly 100% of the traffic
> (which is around 20 Mbit). 
>
> In the log file, there are no errors at all, no aborted lua scripts or
> whatsoever.
>
> Does anybody have a hint for me? Thanks in advance.
>
> Regards
> Sebastian
>
>
> _______________________________________________
> Kamailio (SER) - Users Mailing List
> sr-users at lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - March 5-7, 2018, Berlin - www.asipto.com
Kamailio World Conference - May 14-16, 2018 - www.kamailioworld.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20180112/078602c7/attachment.html>


More information about the sr-users mailing list