<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>Hello,</p>
    <p>You say you can reproduce is as you do some load tests, it is
      better to get the output of:</p>
    <p>kamctl trap</p>
    <p>It writes the gdb bt full for all kamailio processes in a file
      that you can attach here.</p>
    <p>All the locks you listed in your email can be a side effect of
      another blocking operations, because at the first sight the lock()
      inside bcast_dmq_message1() has a corresponding unlock().</p>
    <p>Cheers,<br>
      Daniel<br>
    </p>
    <div class="moz-cite-prefix">On 26.10.20 23:22, Patrick Wakano
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAPu3kNVGK=TvYQBJ9X4LDcxK_x9RqEOpxuGMoQHBpfNswg6ZVg@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div>Hello list,</div>
        <div>Hope all are doing well!</div>
        <div><br>
        </div>
        <div>We are running load tests in our Kamailio server, that is
          just making inbound and outbound calls and eventually (there
          is no identified pattern) Kamailio freezes and of course all
          calls start to fail. It does not crash, it just stops
          responding and it has to be killed -9. When this happens, SIP
          messages are not processed, dmq keepalive fails (so the other
          node reports as down), dialog KA are not sent, but
          Registrations from UAC seem to still go out (logs from
          local_route are seen).<br>
        </div>
        <div>We don't have a high amount of cps, it is max 3 or 4 per
          sec, and it gets around 1900 active calls. We are now using
          Kamailio 5.2.8 installed from the repo on a CentOS7 server.
          Dialog has KA active and DMQ (with 2 workers) is being used on
          an active-active instance.</div>
        <div>From investigation using GDB as pasted below, I can see UDP
          workers are stuck on a lock either on a callback from
          t_relay...<br>
          <span style="font-family:monospace">#0  0x00007ffb74e9bbf9 in
            syscall () from /lib64/libc.so.6<br>
            #1  0x00007ffb2b1bce08 in futex_get (lock=0x7ffb35217b90) at
            ../../core/futexlock.h:108<br>
            #2  0x00007ffb2b1bec44 in bcast_dmq_message1
            (peer=0x7ffb35e8bf38, body=0x7fff2e95ffb0, except=0x0,
            resp_cback=0x7ffb2a8a0ab0 <dlg_dmq_resp_callback>,
            max_forwards=1, content_type=0x7ffb2a8a0a70
            <dlg_dmq_content_type>, incl_inactive=0) at
            dmq_funcs.c:156<br>
            #3  0x00007ffb2b1bf46b in bcast_dmq_message
            (peer=0x7ffb35e8bf38, body=0x7fff2e95ffb0, except=0x0,
            resp_cback=0x7ffb2a8a0ab0 <dlg_dmq_resp_callback>,
            max_forwards=1, content_type=0x7ffb2a8a0a70
            <dlg_dmq_content_type>) at dmq_funcs.c:188<br>
            #4  0x00007ffb2a6448fa in dlg_dmq_send (body=0x7fff2e95ffb0,
            node=0x0) at dlg_dmq.c:88<br>
            #5  0x00007ffb2a64da5d in dlg_dmq_replicate_action
            (action=DLG_DMQ_UPDATE, dlg=0x7ffb362ea3c8, needlock=1,
            node=0x0) at dlg_dmq.c:628<br>
            #6  0x00007ffb2a61f28e in dlg_on_send (t=0x7ffb36c98120,
            type=16, param=0x7fff2e9601e0) at dlg_handlers.c:739<br>
            #7  0x00007ffb2ef285b6 in run_trans_callbacks_internal
            (cb_lst=0x7ffb36c98198, type=16, trans=0x7ffb36c98120,
            params=0x7fff2e9601e0) at t_hooks.c:260<br>
            #8  0x00007ffb2ef286d0 in run_trans_callbacks (type=16,
            trans=0x7ffb36c98120, req=0x7ffb742f27e0, rpl=0x0, code=-1)
            at t_hooks.c:287<br>
            #9  0x00007ffb2ef38ac1 in prepare_new_uac (t=0x7ffb36c98120,
            i_req=0x7ffb742f27e0, branch=0, uri=0x7fff2e9603e0,
            path=0x7fff2e9603c0, next_hop=0x7ffb742f2a58,
            fsocket=0x7ffb73e3e968, snd_flags=..., fproto=0, flags=2,
            instance=0x7fff2e9603b0, ruid=0x7fff2e9603a0,
            location_ua=0x7fff2e960390) at t_fwd.c:381<br>
            #10 0x00007ffb2ef3d02d in add_uac (t=0x7ffb36c98120,
            request=0x7ffb742f27e0, uri=0x7ffb742f2a58,
            next_hop=0x7ffb742f2a58, path=0x7ffb742f2e20, proxy=0x0,
            fsocket=0x7ffb73e3e968, snd_flags=..., proto=0, flags=2,
            instance=0x7ffb742f2e30, ruid=0x7ffb742f2e48,
            location_ua=0x7ffb742f2e58) at t_fwd.c:811<br>
            #11 0x00007ffb2ef4535a in t_forward_nonack
            (t=0x7ffb36c98120, p_msg=0x7ffb742f27e0, proxy=0x0, proto=0)
            at t_fwd.c:1699<br>
            #12 0x00007ffb2ef20505 in t_relay_to (p_msg=0x7ffb742f27e0,
            proxy=0x0, proto=0, replicate=0) at t_funcs.c:334<br>
          </span></div>
        <div><span style="font-family:arial,sans-serif"><br>
          </span></div>
        <div><span style="font-family:arial,sans-serif">or
            loose_route...</span></div>
        <div><span style="font-family:monospace">#0  0x00007ffb74e9bbf9
            in syscall () from /lib64/libc.so.6<br>
            #1  0x00007ffb2b1bce08 in futex_get (lock=0x7ffb35217b90) at
            ../../core/futexlock.h:108<br>
            #2  0x00007ffb2b1bec44 in bcast_dmq_message1
            (peer=0x7ffb35e8bf38, body=0x7fff2e9629d0, except=0x0,
            resp_cback=0x7ffb2a8a0ab0 <dlg_dmq_resp_callback>,
            max_forwards=1, content_type=0x7ffb2a8a0a70
            <dlg_dmq_content_type>, incl_inactive=0) at
            dmq_funcs.c:156<br>
            #3  0x00007ffb2b1bf46b in bcast_dmq_message
            (peer=0x7ffb35e8bf38, body=0x7fff2e9629d0, except=0x0,
            resp_cback=0x7ffb2a8a0ab0 <dlg_dmq_resp_callback>,
            max_forwards=1, content_type=0x7ffb2a8a0a70
            <dlg_dmq_content_type>) at dmq_funcs.c:188<br>
            #4  0x00007ffb2a6448fa in dlg_dmq_send (body=0x7fff2e9629d0,
            node=0x0) at dlg_dmq.c:88<br>
            #5  0x00007ffb2a64da5d in dlg_dmq_replicate_action
            (action=DLG_DMQ_STATE, dlg=0x7ffb363e0c10, needlock=0,
            node=0x0) at dlg_dmq.c:628<br>
            #6  0x00007ffb2a62b3bf in dlg_onroute (req=0x7ffb742f11d0,
            route_params=0x7fff2e962ce0, param=0x0) at
            dlg_handlers.c:1538<br>
            #7  0x00007ffb2e7db203 in run_rr_callbacks
            (req=0x7ffb742f11d0, rr_param=0x7fff2e962d80) at rr_cb.c:96<br>
            #8  0x00007ffb2e7eb2f9 in after_loose (_m=0x7ffb742f11d0,
            preloaded=0) at loose.c:945<br>
            #9  0x00007ffb2e7eb990 in loose_route (_m=0x7ffb742f11d0) at
            loose.c:979<br>
          </span></div>
        <div><span style="font-family:monospace"><br>
          </span></div>
        <div><span style="font-family:arial,sans-serif">or 
            t_check_trans:</span></div>
        <span style="font-family:monospace">#0  0x00007ffb74e9bbf9 in
          syscall () from /lib64/libc.so.6<br>
          #1  0x00007ffb2a5ea9c6 in futex_get (lock=0x7ffb35e78804) at
          ../../core/futexlock.h:108<br>
          #2  0x00007ffb2a5f1c46 in dlg_lookup_mode (h_entry=1609,
          h_id=59882, lmode=0) at dlg_hash.c:709<br>
          #3  0x00007ffb2a5f27aa in dlg_get_by_iuid
          (diuid=0x7ffb36326bd0) at dlg_hash.c:777<br>
          #4  0x00007ffb2a61ba1d in dlg_onreply (t=0x7ffb36952988,
          type=2, param=0x7fff2e963bf0) at dlg_handlers.c:437<br>
          #5  0x00007ffb2ef285b6 in run_trans_callbacks_internal
          (cb_lst=0x7ffb36952a00, type=2, trans=0x7ffb36952988,
          params=0x7fff2e963bf0) at t_hooks<br>
          .c:260<br>
          #6  0x00007ffb2ef286d0 in run_trans_callbacks (type=2,
          trans=0x7ffb36952988, req=0x7ffb3675c360, rpl=0x7ffb742f1930,
          code=200) at t_hooks.c:28<br>
          7<br>
          #7  0x00007ffb2ee7037f in t_reply_matching
          (p_msg=0x7ffb742f1930, p_branch=0x7fff2e963ebc) at
          t_lookup.c:997<br>
          #8  0x00007ffb2ee725e4 in t_check_msg (p_msg=0x7ffb742f1930,
          param_branch=0x7fff2e963ebc) at t_lookup.c:1101<br>
        </span>
        <div><span style="font-family:monospace">#9  0x00007ffb2eee44c7
            in t_check_trans (msg=0x7ffb742f1930) at tm.c:2351</span></div>
        <div><br>
        </div>
        <div>And the DMQ workers are here:<br>
        </div>
        <div><span style="font-family:monospace">#0  0x00007ffb74e9bbf9
            in syscall () from /lib64/libc.so.6<br>
            #1  0x00007ffb2b1d6c81 in futex_get (lock=0x7ffb35217c34) at
            ../../core/futexlock.h:108<br>
            #2  0x00007ffb2b1d7c3a in worker_loop (id=1) at worker.c:86<br>
            #3  0x00007ffb2b1d5d35 in child_init (rank=0) at dmq.c:300<br>
          </span></div>
        <div><br>
        </div>
        <div>Currently I will not be able to upgrade to latest 5.4
          version to try to reproduce the error and since 5.2.8 has
          already reached end-of-life, maybe is there anything I can do
          on the configuration to avoid such condition?</div>
        <div>Any ideas are welcome!</div>
        <div><br>
        </div>
        <div>Kind regards,</div>
        <div>Patrick Wakano<br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Kamailio (SER) - Users Mailing List
<a class="moz-txt-link-abbreviated" href="mailto:sr-users@lists.kamailio.org">sr-users@lists.kamailio.org</a>
<a class="moz-txt-link-freetext" href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users">https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Daniel-Constantin Mierla -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a>
<a class="moz-txt-link-abbreviated" href="http://www.twitter.com/miconda">www.twitter.com/miconda</a> -- <a class="moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda">www.linkedin.com/in/miconda</a>
Funding: <a class="moz-txt-link-freetext" href="https://www.paypal.me/dcmierla">https://www.paypal.me/dcmierla</a></pre>
  </body>
</html>