<div dir="ltr">Hi Daniel,<div><br></div><div>thanks for pointing out the details about the lock structure. After some investigation towards locking issues, we found out that the root cause was an external system that blocked kamailio's connection attempts which subsequently lead to blocked threads. Sorry for the noise and have a nice weekend!</div><div><br></div><div>Regards,</div><div>Moritz</div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 12, 2018 at 1:50 PM, Daniel-Constantin Mierla <span dir="ltr"><<a href="mailto:miconda@gmail.com" target="_blank">miconda@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <p>Hello,<br>
    </p><span class="">
    <br>
    <div class="m_4485556811802776221moz-cite-prefix">On 12.01.18 11:48, Sebastian Damm
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Hi Daniel,
        <div><br>
        </div>
        <div>unless there is a silent exit from some module (those extra
          routes just write some information into redis), I can't find
          any other drop and exits than the one in the route posted. </div>
        <div><br>
        </div>
        <div>We have added some logging around the lock functions, and
          although we are missing some log lines for some packets,
          making it look like some locks don't get released, we do see
          that another PID can successfully lock the same key later.</div>
        <div><br>
        </div>
        <div>There is no other lock/unlock in the config.</div>
        <div><br>
        </div>
        <div>Is there a way to debug the lock array?</div>
      </div>
    </blockquote></span>
    attaching with gdb and looking at the values is always an option.
    Otherwise, I do not recall anything for this purpose written in
    kamailio.<span class=""><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div> One guess could be that the lock array is full, although
          we already have the lock size set to 10.</div>
      </div>
    </blockquote>
    <br></span>
    Lock array cannot become "full", if you thought like that if there
    are 10 lock operations with different keys on an array of 10 locks.<br>
    <br>
    Actually many string keys can use same lock/mutex -- a hashing
    function is performed over the key value results in an integer value
    which then is made modulo number of locks and the lock at that index
    is chosen from the array. So if you do lock("abc") and then you
    don't release it, then lock("xyz") can get blocked. Also, doing
    unlock("xyz") without lock("xyz") is releasing the lock("abc") --
    note that "abc" and "xyz" are just random example to explain what
    can happen, but does not mean it happens for these keys.<br>
    <br>
    Cheers,<br>
    Daniel<div><div class="h5"><br>
    <br>
    <blockquote type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>Regards,</div>
        <div>Sebastian<br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Fri, Jan 12, 2018 at 11:08 AM,
              Daniel-Constantin Mierla <span dir="ltr"><<a href="mailto:miconda@gmail.com" target="_blank">miconda@gmail.com</a>></span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF">
                  <p>Hello,</p>
                  <p>have you checked that any of the route blocks used
                    in between lock() and unlock() don't do an exit/drop
                    (also not return 0)?</p>
                  <p>Do you have other lock() / unlock() in the config?<br>
                  </p>
                  <p>Cheers,<br>
                    Daniel<br>
                  </p>
                  <div>
                    <div class="m_4485556811802776221h5"> <br>
                      <div class="m_4485556811802776221m_8714090039324674859moz-cite-prefix">On
                        12.01.18 08:49, Sebastian Damm wrote:<br>
                      </div>
                    </div>
                  </div>
                  <blockquote type="cite">
                    <div>
                      <div class="m_4485556811802776221h5">
                        <div dir="ltr">Hi,
                          <div><br>
                          </div>
                          <div>this morning, two Kamailio servers
                            suddenly stopped working after having worked
                            without a problem for months. Both are
                            running 5.0.2 on Debian Jessie. Those
                            systems only analyze mirrored traffic and
                            write some information to a RabbitMQ.</div>
                          <div><br>
                          </div>
                          <div>I tried restarting, but after a few
                            seconds they get stuck the same way as
                            before. Then I attached a gdb to some of the
                            UDP listeners, and they all look pretty much
                            the same.</div>
                          <div><br>
                          </div>
                          <div><br>
                          </div>
                          <div>===========8< process 1
                            =================</div>
                          <div>
                            <div>(gdb) bt</div>
                            <div>#0  0x00007f404ca435b9 in syscall ()
                              from /lib/x86_64-linux-gnu/libc.so.<wbr>6</div>
                            <div>#1  0x00007f4049f41260 in futex_get
                              (lock=0x7f4041bccd00) at
                              ../../core/parser/../mem/../fu<wbr>texlock.h:108</div>
                            <div>#2  0x00007f4049f477d7 in
                              cfg_lock_helper (lkey=0x7fff3c048650,
                              mode=0) at cfgutils.c:667</div>
                            <div>#3  0x00007f4049f481ed in
                              w_cfg_lock_wrapper (msg=0x7f404ba17f18,
                              key=0x7f404ba15488, mode=0) at
                              cfgutils.c:712</div>
                            <div>#4  0x00007f4049f4823c in w_cfg_lock
                              (msg=0x7f404ba17f18, key=0x7f404ba15488
                              "\210a\241K@\177", s2=0x0) at
                              cfgutils.c:717</div>
                            <div>#5  0x000000000045cd85 in do_action
                              (h=0x7fff3c049410, a=0x7f404b9d5170,
                              msg=0x7f404ba17f18) at core/action.c:1060</div>
                            <div>#6  0x0000000000469afa in run_actions
                              (h=0x7fff3c049410, a=0x7f404b9d5170,
                              msg=0x7f404ba17f18) at core/action.c:1552</div>
                            <div>#7  0x000000000045958d in do_action
                              (h=0x7fff3c049410, a=0x7f404b9ce558,
                              msg=0x7f404ba17f18) at core/action.c:678</div>
                            <div>#8  0x0000000000469afa in run_actions
                              (h=0x7fff3c049410, a=0x7f404b9cdf70,
                              msg=0x7f404ba17f18) at core/action.c:1552</div>
                            <div>#9  0x000000000046a2bd in run_top_route
                              (a=0x7f404b9cdf70, msg=0x7f404ba17f18,
                              c=0x0) at core/action.c:1641</div>
                            <div>#10 0x0000000000580033 in receive_msg (</div>
                            <div>    buf=0xa393f9 <buf+121>
                              "REGISTER <a class="m_4485556811802776221m_8714090039324674859moz-txt-link-freetext">sip:domain</a>
                              SIP/2.0\r\nVia: SIP/2.0/UDP
                              1.2.3.4;branch=z9hG4bKfc98.c76<wbr>2d7151f110c7eb71fc7d4e0648f1f.<wbr>0\r\nVia:
                              SIP/2.0/UDP 192.168.0.3:5060;rport=38020;r<wbr>eceived=62.30.8.128;branch=z9h<wbr>G4"...,</div>
                            <div>    len=799, rcv_info=0x7fff3c049810)
                              at core/receive.c:264</div>
                            <div>#11 0x00007f40483efcdf in
                              parsing_hepv3_message (buf=0xa39380
                              <buf> "HEP3\003\230", len=920) at
                              hep.c:499</div>
                            <div>#12 0x00007f40483ee264 in
                              hepv3_received (buf=0xa39380 <buf>
                              "HEP3\003\230", len=920,
                              ri=0x7fff3c049a50) at hep.c:231</div>
                            <div>#13 0x00007f40483ec9cb in
                              hep_msg_received (data=0x7fff3c049a30) at
                              hep.c:85</div>
                            <div>#14 0x000000000049e4e1 in sr_event_exec
                              (type=7, data=0x7fff3c049a30) at
                              core/events.c:263</div>
                            <div>#15 0x000000000048731a in udp_rcv_loop
                              () at core/udp_server.c:466</div>
                            <div>#16 0x0000000000422d08 in main_loop ()
                              at main.c:1623</div>
                            <div>#17 0x000000000042a408 in main
                              (argc=13, argv=0x7fff3c049ef8) at
                              main.c:2643</div>
                            <div>(gdb) quit</div>
                          </div>
                          <div>
                            <div>===========8< process 1
                              =================</div>
                          </div>
                          <div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div>
                              <div>===========8< process 2
                                =================</div>
                            </div>
                            <div>(gdb) bt</div>
                            <div>#0  0x00007f404ca435b9 in syscall ()
                              from /lib/x86_64-linux-gnu/libc.so.<wbr>6</div>
                            <div>#1  0x00007f4049f41260 in futex_get
                              (lock=0x7f4041bccd00) at
                              ../../core/parser/../mem/../fu<wbr>texlock.h:108</div>
                            <div>#2  0x00007f4049f477d7 in
                              cfg_lock_helper (lkey=0x7fff3c048650,
                              mode=0) at cfgutils.c:667</div>
                            <div>#3  0x00007f4049f481ed in
                              w_cfg_lock_wrapper (msg=0x7f404ba17f18,
                              key=0x7f404ba15488, mode=0) at
                              cfgutils.c:712</div>
                            <div>#4  0x00007f4049f4823c in w_cfg_lock
                              (msg=0x7f404ba17f18, key=0x7f404ba15488
                              "\210a\241K@\177", s2=0x0) at
                              cfgutils.c:717</div>
                            <div>#5  0x000000000045cd85 in do_action
                              (h=0x7fff3c049600, a=0x7f404b9d5170,
                              msg=0x7f404ba17f18) at core/action.c:1060</div>
                            <div>#6  0x0000000000469afa in run_actions
                              (h=0x7fff3c049600, a=0x7f404b9d5170,
                              msg=0x7f404ba17f18) at core/action.c:1552</div>
                            <div>#7  0x000000000045958d in do_action
                              (h=0x7fff3c049600, a=0x7f404b9cf920,
                              msg=0x7f404ba17f18) at core/action.c:678</div>
                            <div>#8  0x0000000000469afa in run_actions
                              (h=0x7fff3c049600, a=0x7f404b9ce9c0,
                              msg=0x7f404ba17f18) at core/action.c:1552</div>
                            <div>#9  0x000000000046a2bd in run_top_route
                              (a=0x7f404b9ce9c0, msg=0x7f404ba17f18,
                              c=0x7fff3c049600) at core/action.c:1641</div>
                            <div>#10 0x0000000000580a69 in receive_msg (</div>
                            <div>    buf=0xa393f9 <buf+121>
                              "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP
                              172.20.40.8;branch=z9hG4bKa69c<wbr>.5cca5a54c4d270c515eeb4bbb5e0b<wbr>b44.0\r\nVia:
                              SIP/2.0/UDP <a href="http://2.3.4.5:5060" target="_blank">2.3.4.5:5060</a>\r\nContact:
                              <a class="m_4485556811802776221m_8714090039324674859moz-txt-link-rfc2396E"><sip:1234567@9.8.7.6:59669;tra<wbr>nsport=UDP></a>\r\nTo:
                              "..., len=506, rcv_info=0x7fff3c049810) at
                              core/receive.c:327</div>
                            <div>#11 0x00007f40483efcdf in
                              parsing_hepv3_message (buf=0xa39380
                              <buf> "HEP3\002s", len=627) at
                              hep.c:499</div>
                            <div>#12 0x00007f40483ee264 in
                              hepv3_received (buf=0xa39380 <buf>
                              "HEP3\002s", len=627, ri=0x7fff3c049a50)
                              at hep.c:231</div>
                            <div>#13 0x00007f40483ec9cb in
                              hep_msg_received (data=0x7fff3c049a30) at
                              hep.c:85</div>
                            <div>#14 0x000000000049e4e1 in sr_event_exec
                              (type=7, data=0x7fff3c049a30) at
                              core/events.c:263</div>
                            <div>#15 0x000000000048731a in udp_rcv_loop
                              () at core/udp_server.c:466</div>
                            <div>#16 0x0000000000422d08 in main_loop ()
                              at main.c:1623</div>
                            <div>#17 0x000000000042a408 in main
                              (argc=13, argv=0x7fff3c049ef8) at
                              main.c:2643</div>
                            <div>(gdb) quit</div>
                          </div>
                          <div>
                            <div>===========8< process 2
                              =================</div>
                            <div><br>
                            </div>
                            <div>Looks to me as if Kamailio gets stuck
                              trying to get a lock for this packet. </div>
                            <div><br>
                            </div>
                            <div>The config file when handling those
                              packets looks like this:</div>
                            <div><br>
                            </div>
                            <div>
                              <div>request_route {</div>
                              <div>        route(initialize_variables);<br>
                              </div>
                              <div>        route(foo);</div>
                              <div>}</div>
                              <div><br>
                              </div>
                              <div>onreply_route {</div>
                              <div>        route(initialize_variables);<br>
                              </div>
                              <div>        route(foo);</div>
                              <div>}</div>
                            </div>
                            <div><br>
                            </div>
                            <div>
                              <div>route[initialize_variables] {</div>
                              <div>        $vn(bar) = $null;</div>
                              <div>        $vn(baz) = $null;</div>
                              <div>
                                <div>        $vn(barbaz) = $null;</div>
                              </div>
                              <div>
                                <div>        $vn(foobar) = $null;</div>
                              </div>
                              <div><br>
                              </div>
                              <div>}<br>
                              </div>
                            </div>
                            <div><br>
                            </div>
                            <div>
                              <div>route[foo] {</div>
                              <div>        lock("$ci");</div>
                              <div>        xlog("L_DBG", "Obtained lock,
                                calling lua...\n");</div>
                              <div>        if(!lua_run("handle_packet"))
                                {</div>
                              <div>                xlog("L_ERR",
                                "SCRIPT: failed to execute lua
                                function!\n");</div>
                              <div>        }</div>
                              <div><br>
                              </div>
                              <div>        if ($vn(bar) != $null) {</div>
                              <div>          route(mediaIpToRedis);</div>
                              <div>        }</div>
                              <div><br>
                              </div>
                              <div>        if ($vn(baz) != $null) {</div>
                              <div>          route(channelInfoToRedis);</div>
                              <div>        }</div>
                              <div><br>
                              </div>
                              <div>        if ($vn(barbaz) != $null) {</div>
                              <div>                route(sendToQueue);</div>
                              <div>        }</div>
                              <div>        xlog("L_DBG", "All finished,
                                releasing lock...\n");</div>
                              <div>        unlock("$ci");</div>
                              <div>        xlog("L_DBG", "Released
                                lock...\n");</div>
                              <div><br>
                              </div>
                              <div>        # update stats</div>
                              <div>        if ($vn(foobar) != $null) {</div>
                              <div>               
                                update_stat("$vn(foobar)", "+1");</div>
                              <div>        }</div>
                              <div><br>
                              </div>
                              <div>        drop;</div>
                              <div>        exit;</div>
                              <div>}</div>
                            </div>
                            <div><br>
                            </div>
                            <div>Is there any race condition I am
                              missing? Until this morning, it ran
                              without problems and only every 2 minutes
                              or so, 4 packets were sent to the
                              RabbitMQ. So the system throws away nearly
                              100% of the traffic (which is around 20
                              Mbit). </div>
                            <div><br>
                            </div>
                            <div>In the log file, there are no errors at
                              all, no aborted lua scripts or whatsoever.</div>
                            <div><br>
                            </div>
                            <div>Does anybody have a hint for me? Thanks
                              in advance.</div>
                            <div><br>
                            </div>
                            <div>Regards</div>
                            <div>Sebastian</div>
                          </div>
                        </div>
                        <br>
                        <fieldset class="m_4485556811802776221m_8714090039324674859mimeAttachmentHeader"></fieldset>
                        <br>
                      </div>
                    </div>
                    <pre>______________________________<wbr>_________________
Kamailio (SER) - Users Mailing List
<a class="m_4485556811802776221m_8714090039324674859moz-txt-link-abbreviated" href="mailto:sr-users@lists.kamailio.org" target="_blank">sr-users@lists.kamailio.org</a>
<a class="m_4485556811802776221m_8714090039324674859moz-txt-link-freetext" href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users" target="_blank">https://lists.kamailio.org/cgi<wbr>-bin/mailman/listinfo/sr-users</a><span class="m_4485556811802776221HOEnZb"><font color="#888888">
</font></span></pre>
                    <span class="m_4485556811802776221HOEnZb"><font color="#888888"> </font></span></blockquote>
                  <span class="m_4485556811802776221HOEnZb"><font color="#888888"> <br>
                      <pre class="m_4485556811802776221m_8714090039324674859moz-signature" cols="72">-- 
Daniel-Constantin Mierla
<a class="m_4485556811802776221m_8714090039324674859moz-txt-link-abbreviated" href="http://www.twitter.com/miconda" target="_blank">www.twitter.com/miconda</a> -- <a class="m_4485556811802776221m_8714090039324674859moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda" target="_blank">www.linkedin.com/in/miconda</a>
Kamailio Advanced Training - March 5-7, 2018, Berlin - <a class="m_4485556811802776221m_8714090039324674859moz-txt-link-abbreviated" href="http://www.asipto.com" target="_blank">www.asipto.com</a>
Kamailio World Conference - May 14-16, 2018 - <a class="m_4485556811802776221m_8714090039324674859moz-txt-link-abbreviated" href="http://www.kamailioworld.com" target="_blank">www.kamailioworld.com</a></pre>
                    </font></span></div>
              </blockquote>
            </div>
            <br>
            <br clear="all">
            <div><br>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="m_4485556811802776221mimeAttachmentHeader"></fieldset>
      <br>
      <pre>______________________________<wbr>_________________
Kamailio (SER) - Users Mailing List
<a class="m_4485556811802776221moz-txt-link-abbreviated" href="mailto:sr-users@lists.kamailio.org" target="_blank">sr-users@lists.kamailio.org</a>
<a class="m_4485556811802776221moz-txt-link-freetext" href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users" target="_blank">https://lists.kamailio.org/<wbr>cgi-bin/mailman/listinfo/sr-<wbr>users</a>
</pre>
    </blockquote>
    <br>
    <pre class="m_4485556811802776221moz-signature" cols="72">-- 
Daniel-Constantin Mierla
<a class="m_4485556811802776221moz-txt-link-abbreviated" href="http://www.twitter.com/miconda" target="_blank">www.twitter.com/miconda</a> -- <a class="m_4485556811802776221moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda" target="_blank">www.linkedin.com/in/miconda</a>
Kamailio Advanced Training - March 5-7, 2018, Berlin - <a class="m_4485556811802776221moz-txt-link-abbreviated" href="http://www.asipto.com" target="_blank">www.asipto.com</a>
Kamailio World Conference - May 14-16, 2018 - <a class="m_4485556811802776221moz-txt-link-abbreviated" href="http://www.kamailioworld.com" target="_blank">www.kamailioworld.com</a></pre>
  </div></div></div>

<br>______________________________<wbr>_________________<br>
Kamailio (SER) - Users Mailing List<br>
<a href="mailto:sr-users@lists.kamailio.org">sr-users@lists.kamailio.org</a><br>
<a href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users" rel="noreferrer" target="_blank">https://lists.kamailio.org/<wbr>cgi-bin/mailman/listinfo/sr-<wbr>users</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>sipgate GmbH - Gladbacher Str. 74 - 40219 Düsseldorf<br>HRB Düsseldorf 39841 - Geschäftsführer: Thilo Salmon, Tim Mois<br>Steuernummer: 106/5724/7147, Umsatzsteuer-ID: DE219349391<br><br><a href="http://www.sipgate.de" target="_blank">www.sipgate.de</a> - <a href="http://www.sipgate.co.uk" target="_blank">www.sipgate.co.uk</a></div></div></div>
</div></div>