<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hello,</p>
    <p>can you get the output of the next gdb commands:</p>
    <p>  - first core file</p>
    <p>frame 2<br>
      p *p_cell<br>
      list</p>
    <p>  - second core file</p>
    <p>frame 8<br>
      p *p_cell<br>
      list</p>
    <p>Is there any way that I can get access to the system with the
      core files via ssh? It would help speeding up investigating the
      core files. If yes, I can provide my ssh key.</p>
    <p>Cheers,<br>
      Daniel<br>
    </p>
    <div class="moz-cite-prefix">On 19.03.19 16:12, Enrico Bandiera
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAGqwm+e+rHO__oKvHreY=paqVuHPq9++OZXK0zLJFjyqFf=K8w@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr">
          <div dir="ltr">
            <div dir="ltr">
              <div dir="ltr">
                <div dir="ltr">
                  <div dir="ltr">
                    <div dir="ltr">
                      <div dir="ltr">
                        <div dir="ltr">
                          <div dir="ltr">
                            <div dir="ltr">
                              <div>Hi guys, recently we had some issues
                                with kamailio in production, a proxy at
                                the edge of the infrastructure was
                                crashing seemingly randomly, it happened
                                a couple of times in a timespan of 4-5
                                days until a point where we found it in
                                an endless loop, unable to process SIP
                                packets anymore and printing in loop
                                messages like these:</div>
                              <div><br>
                              </div>
                              <div>50(60) CRITICAL: <core>
                                [core/mem/q_malloc.c:512]: qm_free():
                                BUG: freeing already freed pointer
                                (0x7ff40a359450), called from core:
                                core/usr_avp.c:
                                destroy_avp_list_unsafe(626), first free
                                core: core/usr_avp.c:
                                destroy_avp_list_unsafe(626) - ignoring</div>
                              <div>50(60) CRITICAL: <core>
                                [core/mem/q_malloc.c:512]: qm_free():
                                BUG: freeing already freed pointer
                                (0x7ff409c4fa58), called from core:
                                core/usr_avp.c:
                                destroy_avp_list_unsafe(626), first free
                                core: core/usr_avp.c:
                                destroy_avp_list_unsafe(626) - ignoring</div>
                              <div><br>
                              </div>
                              <div>We managed to replicate the crash and
                                the loop in a dev environment with a lot
                                of difficulties. It seems not related to
                                the peak load, but to the growing number
                                of processed calls</div>
                              <div>The kamailio version is:</div>
                              <div>
                                <div>version: kamailio 5.2.2
                                  (x86_64/linux) </div>
                                <div>flags: STATS: Off, USE_TCP,
                                  USE_TLS, USE_SCTP, TLS_HOOKS,
                                  USE_RAW_SOCKS, DISABLE_NAGLE,
                                  USE_MCAST, DNS_IP_HACK, SHM_MEM,
                                  SHM_MMAP, PKG_MALLOC, Q_MALLOC,
                                  F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY,
                                  USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT,
                                  USE_DNS_CACHE, USE_DNS_FAILOVER,
                                  USE_NAPTR, USE_DST_BLACKLIST,
                                  HAVE_RESOLV_RES</div>
                                <div>ADAPTIVE_WAIT_LOOPS=1024,
                                  MAX_RECV_BUFFER_SIZE 262144
                                  MAX_URI_SIZE 1024, BUF_SIZE 65535,
                                  DEFAULT PKG_SIZE 8MB</div>
                                <div>poll method support: poll,
                                  epoll_lt, epoll_et, sigio_rt, select.</div>
                                <div>id: unknown </div>
                                <div>compiled with gcc 5.3.1</div>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div>Every time this happened it was the
                                timer pid that was crashing or looping,
                                we managed to get 2 core dumps of the
                                crash and 1 of the loop (got it after
                                killing the timer with SIGQUIT).</div>
                              <div>
                                <div>The timer process crashes after
                                  attempting to free memory that it's
                                  not correctly managed. In particular
                                  we saw that it happens inside
                                  destroy_avp_list_unsafe() called by
                                  the tm timer, when uri_avps_from
                                  points to an invalid memory address.</div>
                                <div>We don't use dialog, uac,
                                  http_async_client, but we do use
                                  topos.</div>
                              </div>
                              <div><br>
                              </div>
                              <div>In the crash dumps it seems that
                                there's something wrong inside
                                free_cell_helper(), when trying to clean
                                the dead_cell something goes wrong while
                                freeing uri_avps_from, here's the crash
                                log:</div>
                              <div><br>
                              </div>
                              <div>
                                <div>61(76) CRITICAL: <core>
                                  [core/pass_fd.c:277]: receive_fd():
                                  EOF on 66</div>
                                <div> 0(15) ALERT: <core>
                                  [main.c:740]: handle_sigs(): child
                                  process 65 exited by a signal 11</div>
                                <div> 0(15) ALERT: <core>
                                  [main.c:743]: handle_sigs(): core was
                                  generated</div>
                                <div> 0(15) INFO: <core>
                                  [main.c:766]: handle_sigs():
                                  terminating due to SIGCHLD</div>
                                <div><childs getting the SIGTERM
                                  signal><br>
                                </div>
                                <div> 0(15) CRITICAL: <core>
                                  [core/mem/q_malloc.c:512]: qm_free():
                                  BUG: freeing already freed pointer
                                  (0x7f4d60a2e168), called from htable:
                                  ht_api.c: ht_cell_free(217), first
                                  free core: core/usr_avp.c:
                                  destroy_avp_list_unsafe(626) -
                                  ignoring</div>
                                <div> 0(15) INFO: <core>
                                  [core/sctp_core.c:53]:
                                  sctp_core_destroy(): SCTP API not
                                  initialized</div>
                              </div>
                              <div><br>
                              </div>
                              <div>Before the crash we have no warnings
                                on errors about shared memory allocation<br>
                              </div>
                              <div><br>
                              </div>
                              <div>backtrace:</div>
                              <div><br>
                              </div>
                              <div>
                                <div>(gdb) bt full</div>
                                <div>#0  0x00000000006371d7 in
                                  destroy_avp_list_unsafe
                                  (list=0x7f34f7a411b0) at
                                  core/usr_avp.c:625</div>
                                <div>        avp = 0x2</div>
                                <div>        foo = 0x2</div>
                                <div>        __func__ =
                                  "destroy_avp_list_unsafe"</div>
                                <div>#1  0x00007f351c6909ab in
                                  free_cell_helper
                                  (dead_cell=0x7f34f7a41018, silent=0,
                                  fname=0x7f351c796a02 "timer.c",
                                  fline=689) at h_table.c:260</div>
                                <div>        b = 0x0</div>
                                <div>        i = 1</div>
                                <div>        rpl = 0x0</div>
                                <div>        tt = 0x0</div>
                                <div>        foo = 0x11c68daf2</div>
                                <div>        cbs = 0x0</div>
                                <div>        cbs_tmp = 0x7f34f80c3790</div>
                                <div>        __func__ =
                                  "free_cell_helper"</div>
                                <div>#2  0x00007f351c7291e6 in
                                  wait_handler (ti=1984410416,
                                  wait_tl=0x7f34f7a410a0,
                                  data=0x7f34f7a41018) at timer.c:689</div>
                                <div>        p_cell = 0x7f34f7a41018</div>
                                <div>        ret = 1</div>
                                <div>        unlinked = 0</div>
                                <div>        rcount = 1</div>
                                <div>        __func__ = "wait_handler"</div>
                                <div>#3  0x00000000004d0f56 in
                                  timer_list_expire (t=1984410416,
                                  h=0x7f34f6be7850,
                                  slow_l=0x7f34f6bea368, slow_mark=2702)
                                  at core/timer.c:874</div>
                                <div>        tl = 0x7f34f7a410a0</div>
                                <div>        ret = 0</div>
                                <div>#4  0x00000000004d13bd in
                                  timer_handler () at core/timer.c:939</div>
                                <div>        saved_ticks = 1984410416</div>
                                <div>        run_slow_timer = 0</div>
                                <div>        i = 654</div>
                                <div>        __func__ = "timer_handler"</div>
                                <div>#5  0x00000000004d1845 in
                                  timer_main () at core/timer.c:978</div>
                                <div>No locals.</div>
                                <div>#6  0x0000000000425cc8 in main_loop
                                  () at main.c:1727</div>
                                <div>        i = 8</div>
                                <div>        pid = 0</div>
                                <div>        si = 0x0</div>
                                <div>        si_desc = "udp receiver
                                  child=7 sock=<a
                                    href="http://127.0.0.1:5060"
                                    moz-do-not-send="true">127.0.0.1:5060</a>\000\000\000\000q\000\000\000@6y\000\000\000\000\000\000\b\206|\210X\322:\032\301\255\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\002\000\000\000\000\063\275{\000\000\000\000\000>",
                                  '\000' <repeats 15 times>,
                                  "\320\220\377\226\374\177\000\000\351\022a\000\000\000\000"</div>
                                <div>        nrprocs = 8</div>
                                <div>        woneinit = 1</div>
                                <div>        __func__ = "main_loop"</div>
                                <div>#7  0x000000000042ca76 in main
                                  (argc=10, argv=0x7ffc96ff9398) at
                                  main.c:2696</div>
                                <div>        cfg_stream = 0x25ca010</div>
                                <div>        c = -1</div>
                                <div>        r = 0</div>
                                <div>        tmp = 0x7ffc96ffa4c9 ""</div>
                                <div>        tmp_len = -1761635728</div>
                                <div>        port = 32764</div>
                                <div>        proto = -1761635632</div>
                                <div>        options = 0x76c598
                                  ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"</div>
                                <div>        ret = -1</div>
                                <div>        seed = 1492539644</div>
                                <div>        rfd = 4</div>
                                <div>        debug_save = 0</div>
                                <div>        debug_flag = 0</div>
                                <div>        dont_fork_cnt = 2</div>
                                <div>        n_lst = 0x7f351f6f6718</div>
                                <div>        p = 0xffffffff <error:
                                  Cannot access memory at address
                                  0xffffffff></div>
                                <div>        st = {st_dev = 2049, st_ino
                                  = 1282695, st_nlink = 2, st_mode =
                                  16895, st_uid = 0, st_gid = 0, __pad0
                                  = 0, st_rdev = 0, st_size = 4096,
                                  st_blksize = 4096, st_blocks = 8,
                                  st_atim = {tv_sec = 1552931478,
                                  tv_nsec = 972761643}, </div>
                                <div>          st_mtim = {tv_sec =
                                  1552931478, tv_nsec = 972761643},
                                  st_ctim = {tv_sec = 1552931478,
                                  tv_nsec = 972761643}, __glibc_reserved
                                  = {0, 0, 0}}</div>
                                <div>        __func__ = "main"</div>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div>In the loop dump we have a different
                                situation, there's a loop in the
                                uri_avps_from structure as you can see
                                from the next pointers:</div>
                              <div><br>
                              </div>
                              <div>(gdb) p *dead_cell->uri_avps_from</div>
                              <div>$3 = {id = 66, flags = 275, next =
                                0x7ff40a359450, d = {p = 0x7ff40a9ad540,
                                l = 140686126667072, data =
                                "@\325\232\n\364\177\000"}}</div>
                              <div>(gdb) p
                                *dead_cell->uri_avps_from.next</div>
                              <div>$4 = {id = 66, flags = 275, next =
                                0x7ff409c4fa58, d = {p = 0x7ff40a359480,
                                l = 140686120031360, data =
                                "\200\224\065\n\364\177\000"}}</div>
                              <div>(gdb) p
                                *dead_cell->uri_avps_from.next.next</div>
                              <div>$5 = {id = 66, flags = 275, next =
                                0x7ff40a359450, d = {p = 0x7ff409c4fa88,
                                l = 140686112651912, data =
                                "\210\372\304\t\364\177\000"}}</div>
                              <div><br>
                              </div>
                              <div>backtrace:</div>
                              <div>
                                <div>(gdb) bt full</div>
                                <div>#0  0x00007ff43152e67e in
                                  _IO_default_xsputn () from
                                  /lib/x86_64-linux-gnu/libc.so.6</div>
                                <div>No symbol table info available.</div>
                                <div>#1  0x00007ff43150150b in vfprintf
                                  () from
                                  /lib/x86_64-linux-gnu/libc.so.6</div>
                                <div>No symbol table info available.</div>
                                <div>#2  0x00007ff431502ef1 in ?? ()
                                  from /lib/x86_64-linux-gnu/libc.so.6</div>
                                <div>No symbol table info available.</div>
                                <div>#3  0x00007ff43150032d in vfprintf
                                  () from
                                  /lib/x86_64-linux-gnu/libc.so.6</div>
                                <div>No symbol table info available.</div>
                                <div>#4  0x00007ff4315087f7 in fprintf
                                  () from
                                  /lib/x86_64-linux-gnu/libc.so.6</div>
                                <div>No symbol table info available.</div>
                                <div>#5  0x0000000000686e43 in qm_free
                                  (qmp=0x7ff409336000, p=0x7ff409c4fa58,
                                  file=0x7e0a75 "core: core/usr_avp.c",
                                  func=0x7e2d70 <__func__.8439>
                                  "destroy_avp_list_unsafe", line=626,
                                  mname=0x7e0a70 "core") at
                                  core/mem/q_malloc.c:512</div>
                                <div>        __llevel = -3</div>
                                <div>        qm = 0x7ff409336000</div>
                                <div>        f = 0x7ff409c4fa20</div>
                                <div>        size = 72</div>
                                <div>        next = 0x0</div>
                                <div>        prev = 0x0</div>
                                <div>        __func__ = "qm_free"</div>
                                <div>#6  0x0000000000637209 in
                                  destroy_avp_list_unsafe
                                  (list=0x7ff409e1b6c8) at
                                  core/usr_avp.c:626</div>
                                <div>        avp = 0x7ff40a359450</div>
                                <div>        foo = 0x7ff409c4fa58</div>
                                <div>        __func__ =
                                  "destroy_avp_list_unsafe"</div>
                                <div>#7  0x00007ff42ee5d9ab in
                                  free_cell_helper
                                  (dead_cell=0x7ff409e1b530, silent=0,
                                  fname=0x7ff42ef63a02 "timer.c",
                                  fline=689) at h_table.c:260</div>
                                <div>        b = 0x0</div>
                                <div>        i = 1</div>
                                <div>        rpl = 0x0</div>
                                <div>        tt = 0x0</div>
                                <div>        foo = 0x12ee5aaf2</div>
                                <div>        cbs = 0x0</div>
                                <div>        cbs_tmp = 0x7ff409c5d420</div>
                                <div>        __func__ =
                                  "free_cell_helper"</div>
                                <div>#8  0x00007ff42eef61e6 in
                                  wait_handler (ti=89094309,
                                  wait_tl=0x7ff409e1b5b8,
                                  data=0x7ff409e1b530) at timer.c:689</div>
                                <div>        p_cell = 0x7ff409e1b530</div>
                                <div>        ret = 1</div>
                                <div>        unlinked = 0</div>
                                <div>        rcount = 1</div>
                                <div>        __func__ = "wait_handler"</div>
                                <div>#9  0x00000000004d0f56 in
                                  timer_list_expire (t=89094309,
                                  h=0x7ff4093b4850,
                                  slow_l=0x7ff4093b8058, slow_mark=5981)
                                  at core/timer.c:874</div>
                                <div>        tl = 0x7ff409e1b5b8</div>
                                <div>        ret = 0</div>
                                <div>#10 0x00000000004d13bd in
                                  timer_handler () at core/timer.c:939</div>
                                <div>        saved_ticks = 89094309</div>
                                <div>        run_slow_timer = 0</div>
                                <div>        i = 861</div>
                                <div>        __func__ = "timer_handler"</div>
                                <div>#11 0x00000000004d1845 in
                                  timer_main () at core/timer.c:978</div>
                                <div>No locals.</div>
                                <div>#12 0x0000000000425cc8 in main_loop
                                  () at main.c:1727</div>
                                <div>        i = 8</div>
                                <div>        pid = 0</div>
                                <div>        si = 0x0</div>
                                <div>        si_desc = "udp receiver
                                  child=7 sock=<a
                                    href="http://127.0.0.1:5060"
                                    moz-do-not-send="true">127.0.0.1:5060</a>\000\000\000\000q\000\000\000@6y\000\000\000\000\000\000\032
t\220\260\242\024\032\301\255\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\002\000\000\000\000\063\275{\000\000\000\000\000>",
                                  '\000' <repeats 15 times>,
                                  "\220\306\336\071\377\177\000\000\351\022a\000\000\000\000"</div>
                                <div>        nrprocs = 8</div>
                                <div>        woneinit = 1</div>
                                <div>        __func__ = "main_loop"</div>
                                <div>#13 0x000000000042ca76 in main
                                  (argc=10, argv=0x7fff39dec958) at
                                  main.c:2696</div>
                                <div>        cfg_stream = 0x2292010</div>
                                <div>        c = -1</div>
                                <div>        r = 0</div>
                                <div>        tmp = 0x7fff39dee4c9 ""</div>
                                <div>        tmp_len = 970901552</div>
                                <div>        port = 32767</div>
                                <div>        proto = 970901648</div>
                                <div>        options = 0x76c598
                                  ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"</div>
                                <div>        ret = -1</div>
                                <div>        seed = 2710999419</div>
                                <div>        rfd = 4</div>
                                <div>        debug_save = 0</div>
                                <div>        debug_flag = 0</div>
                                <div>        dont_fork_cnt = 2</div>
                                <div>        n_lst = 0x7ff431ec3718</div>
                                <div>        p = 0xffffffff <error:
                                  Cannot access memory at address
                                  0xffffffff></div>
                                <div>        st = {st_dev = 2049, st_ino
                                  = 1282695, st_nlink = 2, st_mode =
                                  16895, st_uid = 0, st_gid = 0, __pad0
                                  = 0, st_rdev = 0, st_size = 4096,
                                  st_blksize = 4096, st_blocks = 8,
                                  st_atim = {tv_sec = 1552931523,
                                  tv_nsec = 918435869}, </div>
                                <div>          st_mtim = {tv_sec =
                                  1552994274, tv_nsec = 814438687},
                                  st_ctim = {tv_sec = 1552994274,
                                  tv_nsec = 814438687}, __glibc_reserved
                                  = {0, 0, 0}}</div>
                                <div>        __func__ = "main"</div>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                              <div>We are not sure on how to keep
                                debugging this issue which is causing
                                some serious troubles in our
                                environment, any help is appreciated</div>
                              <div><br>
                              </div>
                              <div>Here you can find the dumps (beware,
                                12M download but 1.5G once
                                uncompressed):</div>
                              <div><br>
                              </div>
                              <div><a
href="https://drive.google.com/file/d/1U0LZedyona8jJZHq6HaoqXpfnU83p6Sp/view?usp=sharing"
                                  moz-do-not-send="true">https://drive.google.com/file/d/1U0LZedyona8jJZHq6HaoqXpfnU83p6Sp/view?usp=sharing</a></div>
                              <div><br>
                              </div>
                              <div>Due thanks to Giacomo Vacca who
                                helped us to dig into this issue.</div>
                              <div><br>
                              </div>
                              <div>Thanks</div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
Kamailio (SER) - Users Mailing List
<a class="moz-txt-link-abbreviated" href="mailto:sr-users@lists.kamailio.org">sr-users@lists.kamailio.org</a>
<a class="moz-txt-link-freetext" href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users">https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Daniel-Constantin Mierla -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a>
<a class="moz-txt-link-abbreviated" href="http://www.twitter.com/miconda">www.twitter.com/miconda</a> -- <a class="moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda">www.linkedin.com/in/miconda</a>
Kamailio World Conference - May 6-8, 2019 -- <a class="moz-txt-link-abbreviated" href="http://www.kamailioworld.com">www.kamailioworld.com</a>
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a></pre>
  </body>
</html>