<div dir="ltr">Hi Daniel,<div><br></div><div>unless there is a silent exit from some module (those extra routes just write some information into redis), I can't find any other drop and exits than the one in the route posted. </div><div><br></div><div>We have added some logging around the lock functions, and although we are missing some log lines for some packets, making it look like some locks don't get released, we do see that another PID can successfully lock the same key later.</div><div><br></div><div>There is no other lock/unlock in the config.</div><div><br></div><div>Is there a way to debug the lock array? One guess could be that the lock array is full, although we already have the lock size set to 10.</div><div><br></div><div>Regards,</div><div>Sebastian<br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 12, 2018 at 11:08 AM, Daniel-Constantin Mierla <span dir="ltr"><<a href="mailto:miconda@gmail.com" target="_blank">miconda@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <p>Hello,</p>
    <p>have you checked that any of the route blocks used in between
      lock() and unlock() don't do an exit/drop (also not return 0)?</p>
    <p>Do you have other lock() / unlock() in the config?<br>
    </p>
    <p>Cheers,<br>
      Daniel<br>
    </p><div><div class="h5">
    <br>
    <div class="m_8714090039324674859moz-cite-prefix">On 12.01.18 08:49, Sebastian Damm
      wrote:<br>
    </div>
    </div></div><blockquote type="cite"><div><div class="h5">
      <div dir="ltr">Hi,
        <div><br>
        </div>
        <div>this morning, two Kamailio servers suddenly stopped working
          after having worked without a problem for months. Both are
          running 5.0.2 on Debian Jessie. Those systems only analyze
          mirrored traffic and write some information to a RabbitMQ.</div>
        <div><br>
        </div>
        <div>I tried restarting, but after a few seconds they get stuck
          the same way as before. Then I attached a gdb to some of the
          UDP listeners, and they all look pretty much the same.</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>===========8< process 1 =================</div>
        <div>
          <div>(gdb) bt</div>
          <div>#0  0x00007f404ca435b9 in syscall () from
            /lib/x86_64-linux-gnu/libc.so.<wbr>6</div>
          <div>#1  0x00007f4049f41260 in futex_get (lock=0x7f4041bccd00)
            at ../../core/parser/../mem/../<wbr>futexlock.h:108</div>
          <div>#2  0x00007f4049f477d7 in cfg_lock_helper
            (lkey=0x7fff3c048650, mode=0) at cfgutils.c:667</div>
          <div>#3  0x00007f4049f481ed in w_cfg_lock_wrapper
            (msg=0x7f404ba17f18, key=0x7f404ba15488, mode=0) at
            cfgutils.c:712</div>
          <div>#4  0x00007f4049f4823c in w_cfg_lock (msg=0x7f404ba17f18,
            key=0x7f404ba15488 "\210a\241K@\177", s2=0x0) at
            cfgutils.c:717</div>
          <div>#5  0x000000000045cd85 in do_action (h=0x7fff3c049410,
            a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1060</div>
          <div>#6  0x0000000000469afa in run_actions (h=0x7fff3c049410,
            a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1552</div>
          <div>#7  0x000000000045958d in do_action (h=0x7fff3c049410,
            a=0x7f404b9ce558, msg=0x7f404ba17f18) at core/action.c:678</div>
          <div>#8  0x0000000000469afa in run_actions (h=0x7fff3c049410,
            a=0x7f404b9cdf70, msg=0x7f404ba17f18) at core/action.c:1552</div>
          <div>#9  0x000000000046a2bd in run_top_route
            (a=0x7f404b9cdf70, msg=0x7f404ba17f18, c=0x0) at
            core/action.c:1641</div>
          <div>#10 0x0000000000580033 in receive_msg (</div>
          <div>    buf=0xa393f9 <buf+121> "REGISTER <a class="m_8714090039324674859moz-txt-link-freetext">sip:domain</a>
            SIP/2.0\r\nVia: SIP/2.0/UDP
            1.2.3.4;branch=z9hG4bKfc98.<wbr>c762d7151f110c7eb71fc7d4e0648f<wbr>1f.0\r\nVia:
            SIP/2.0/UDP
            192.168.0.3:5060;rport=38020;<wbr>received=62.30.8.128;branch=<wbr>z9hG4"...,</div>
          <div>    len=799, rcv_info=0x7fff3c049810) at
            core/receive.c:264</div>
          <div>#11 0x00007f40483efcdf in parsing_hepv3_message
            (buf=0xa39380 <buf> "HEP3\003\230", len=920) at
            hep.c:499</div>
          <div>#12 0x00007f40483ee264 in hepv3_received (buf=0xa39380
            <buf> "HEP3\003\230", len=920, ri=0x7fff3c049a50) at
            hep.c:231</div>
          <div>#13 0x00007f40483ec9cb in hep_msg_received
            (data=0x7fff3c049a30) at hep.c:85</div>
          <div>#14 0x000000000049e4e1 in sr_event_exec (type=7,
            data=0x7fff3c049a30) at core/events.c:263</div>
          <div>#15 0x000000000048731a in udp_rcv_loop () at
            core/udp_server.c:466</div>
          <div>#16 0x0000000000422d08 in main_loop () at main.c:1623</div>
          <div>#17 0x000000000042a408 in main (argc=13,
            argv=0x7fff3c049ef8) at main.c:2643</div>
          <div>(gdb) quit</div>
        </div>
        <div>
          <div>===========8< process 1 =================</div>
        </div>
        <div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>
            <div>===========8< process 2 =================</div>
          </div>
          <div>(gdb) bt</div>
          <div>#0  0x00007f404ca435b9 in syscall () from
            /lib/x86_64-linux-gnu/libc.so.<wbr>6</div>
          <div>#1  0x00007f4049f41260 in futex_get (lock=0x7f4041bccd00)
            at ../../core/parser/../mem/../<wbr>futexlock.h:108</div>
          <div>#2  0x00007f4049f477d7 in cfg_lock_helper
            (lkey=0x7fff3c048650, mode=0) at cfgutils.c:667</div>
          <div>#3  0x00007f4049f481ed in w_cfg_lock_wrapper
            (msg=0x7f404ba17f18, key=0x7f404ba15488, mode=0) at
            cfgutils.c:712</div>
          <div>#4  0x00007f4049f4823c in w_cfg_lock (msg=0x7f404ba17f18,
            key=0x7f404ba15488 "\210a\241K@\177", s2=0x0) at
            cfgutils.c:717</div>
          <div>#5  0x000000000045cd85 in do_action (h=0x7fff3c049600,
            a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1060</div>
          <div>#6  0x0000000000469afa in run_actions (h=0x7fff3c049600,
            a=0x7f404b9d5170, msg=0x7f404ba17f18) at core/action.c:1552</div>
          <div>#7  0x000000000045958d in do_action (h=0x7fff3c049600,
            a=0x7f404b9cf920, msg=0x7f404ba17f18) at core/action.c:678</div>
          <div>#8  0x0000000000469afa in run_actions (h=0x7fff3c049600,
            a=0x7f404b9ce9c0, msg=0x7f404ba17f18) at core/action.c:1552</div>
          <div>#9  0x000000000046a2bd in run_top_route
            (a=0x7f404b9ce9c0, msg=0x7f404ba17f18, c=0x7fff3c049600) at
            core/action.c:1641</div>
          <div>#10 0x0000000000580a69 in receive_msg (</div>
          <div>    buf=0xa393f9 <buf+121> "SIP/2.0 200 OK\r\nVia:
            SIP/2.0/UDP
172.20.40.8;branch=<wbr>z9hG4bKa69c.<wbr>5cca5a54c4d270c515eeb4bbb5e0bb<wbr>44.0\r\nVia:
            SIP/2.0/UDP <a href="http://2.3.4.5:5060" target="_blank">2.3.4.5:5060</a>\r\nContact:
            <a class="m_8714090039324674859moz-txt-link-rfc2396E"><sip:1234567@9.8.7.6:59669;<wbr>transport=UDP></a>\r\nTo: "...,
            len=506, rcv_info=0x7fff3c049810) at core/receive.c:327</div>
          <div>#11 0x00007f40483efcdf in parsing_hepv3_message
            (buf=0xa39380 <buf> "HEP3\002s", len=627) at hep.c:499</div>
          <div>#12 0x00007f40483ee264 in hepv3_received (buf=0xa39380
            <buf> "HEP3\002s", len=627, ri=0x7fff3c049a50) at
            hep.c:231</div>
          <div>#13 0x00007f40483ec9cb in hep_msg_received
            (data=0x7fff3c049a30) at hep.c:85</div>
          <div>#14 0x000000000049e4e1 in sr_event_exec (type=7,
            data=0x7fff3c049a30) at core/events.c:263</div>
          <div>#15 0x000000000048731a in udp_rcv_loop () at
            core/udp_server.c:466</div>
          <div>#16 0x0000000000422d08 in main_loop () at main.c:1623</div>
          <div>#17 0x000000000042a408 in main (argc=13,
            argv=0x7fff3c049ef8) at main.c:2643</div>
          <div>(gdb) quit</div>
        </div>
        <div>
          <div>===========8< process 2 =================</div>
          <div><br>
          </div>
          <div>Looks to me as if Kamailio gets stuck trying to get a
            lock for this packet. </div>
          <div><br>
          </div>
          <div>The config file when handling those packets looks like
            this:</div>
          <div><br>
          </div>
          <div>
            <div>request_route {</div>
            <div>        route(initialize_variables);<br>
            </div>
            <div>        route(foo);</div>
            <div>}</div>
            <div><br>
            </div>
            <div>onreply_route {</div>
            <div>        route(initialize_variables);<br>
            </div>
            <div>        route(foo);</div>
            <div>}</div>
          </div>
          <div><br>
          </div>
          <div>
            <div>route[initialize_variables] {</div>
            <div>        $vn(bar) = $null;</div>
            <div>        $vn(baz) = $null;</div>
            <div>
              <div>        $vn(barbaz) = $null;</div>
            </div>
            <div>
              <div>        $vn(foobar) = $null;</div>
            </div>
            <div><br>
            </div>
            <div>}<br>
            </div>
          </div>
          <div><br>
          </div>
          <div>
            <div>route[foo] {</div>
            <div>        lock("$ci");</div>
            <div>        xlog("L_DBG", "Obtained lock, calling
              lua...\n");</div>
            <div>        if(!lua_run("handle_packet")) {</div>
            <div>                xlog("L_ERR", "SCRIPT: failed to
              execute lua function!\n");</div>
            <div>        }</div>
            <div><br>
            </div>
            <div>        if ($vn(bar) != $null) {</div>
            <div>          route(mediaIpToRedis);</div>
            <div>        }</div>
            <div><br>
            </div>
            <div>        if ($vn(baz) != $null) {</div>
            <div>          route(channelInfoToRedis);</div>
            <div>        }</div>
            <div><br>
            </div>
            <div>        if ($vn(barbaz) != $null) {</div>
            <div>                route(sendToQueue);</div>
            <div>        }</div>
            <div>        xlog("L_DBG", "All finished, releasing
              lock...\n");</div>
            <div>        unlock("$ci");</div>
            <div>        xlog("L_DBG", "Released lock...\n");</div>
            <div><br>
            </div>
            <div>        # update stats</div>
            <div>        if ($vn(foobar) != $null) {</div>
            <div>                update_stat("$vn(foobar)", "+1");</div>
            <div>        }</div>
            <div><br>
            </div>
            <div>        drop;</div>
            <div>        exit;</div>
            <div>}</div>
          </div>
          <div><br>
          </div>
          <div>Is there any race condition I am missing? Until this
            morning, it ran without problems and only every 2 minutes or
            so, 4 packets were sent to the RabbitMQ. So the system
            throws away nearly 100% of the traffic (which is around 20
            Mbit). </div>
          <div><br>
          </div>
          <div>In the log file, there are no errors at all, no aborted
            lua scripts or whatsoever.</div>
          <div><br>
          </div>
          <div>Does anybody have a hint for me? Thanks in advance.</div>
          <div><br>
          </div>
          <div>Regards</div>
          <div>Sebastian</div>
        </div>
      </div>
      <br>
      <fieldset class="m_8714090039324674859mimeAttachmentHeader"></fieldset>
      <br>
      </div></div><pre>______________________________<wbr>_________________
Kamailio (SER) - Users Mailing List
<a class="m_8714090039324674859moz-txt-link-abbreviated" href="mailto:sr-users@lists.kamailio.org" target="_blank">sr-users@lists.kamailio.org</a>
<a class="m_8714090039324674859moz-txt-link-freetext" href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users" target="_blank">https://lists.kamailio.org/<wbr>cgi-bin/mailman/listinfo/sr-<wbr>users</a><span class="HOEnZb"><font color="#888888">
</font></span></pre><span class="HOEnZb"><font color="#888888">
    </font></span></blockquote><span class="HOEnZb"><font color="#888888">
    <br>
    <pre class="m_8714090039324674859moz-signature" cols="72">-- 
Daniel-Constantin Mierla
<a class="m_8714090039324674859moz-txt-link-abbreviated" href="http://www.twitter.com/miconda" target="_blank">www.twitter.com/miconda</a> -- <a class="m_8714090039324674859moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda" target="_blank">www.linkedin.com/in/miconda</a>
Kamailio Advanced Training - March 5-7, 2018, Berlin - <a class="m_8714090039324674859moz-txt-link-abbreviated" href="http://www.asipto.com" target="_blank">www.asipto.com</a>
Kamailio World Conference - May 14-16, 2018 - <a class="m_8714090039324674859moz-txt-link-abbreviated" href="http://www.kamailioworld.com" target="_blank">www.kamailioworld.com</a></pre>
  </font></span></div>

</blockquote></div><br><br clear="all"><div><br></div><div class="gmail_signature" data-smartmail="gmail_signature"></div>
</div></div></div>