<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hello,</p>
<p>can you get the output of the next gdb commands:</p>
<p> - first core file</p>
<p>frame 2<br>
p *p_cell<br>
list</p>
<p> - second core file</p>
<p>frame 8<br>
p *p_cell<br>
list</p>
<p>Is there any way that I can get access to the system with the
core files via ssh? It would help speeding up investigating the
core files. If yes, I can provide my ssh key.</p>
<p>Cheers,<br>
Daniel<br>
</p>
<div class="moz-cite-prefix">On 19.03.19 16:12, Enrico Bandiera
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAGqwm+e+rHO__oKvHreY=paqVuHPq9++OZXK0zLJFjyqFf=K8w@mail.gmail.com">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div>Hi guys, recently we had some issues
with kamailio in production, a proxy at
the edge of the infrastructure was
crashing seemingly randomly, it happened
a couple of times in a timespan of 4-5
days until a point where we found it in
an endless loop, unable to process SIP
packets anymore and printing in loop
messages like these:</div>
<div><br>
</div>
<div>50(60) CRITICAL: <core>
[core/mem/q_malloc.c:512]: qm_free():
BUG: freeing already freed pointer
(0x7ff40a359450), called from core:
core/usr_avp.c:
destroy_avp_list_unsafe(626), first free
core: core/usr_avp.c:
destroy_avp_list_unsafe(626) - ignoring</div>
<div>50(60) CRITICAL: <core>
[core/mem/q_malloc.c:512]: qm_free():
BUG: freeing already freed pointer
(0x7ff409c4fa58), called from core:
core/usr_avp.c:
destroy_avp_list_unsafe(626), first free
core: core/usr_avp.c:
destroy_avp_list_unsafe(626) - ignoring</div>
<div><br>
</div>
<div>We managed to replicate the crash and
the loop in a dev environment with a lot
of difficulties. It seems not related to
the peak load, but to the growing number
of processed calls</div>
<div>The kamailio version is:</div>
<div>
<div>version: kamailio 5.2.2
(x86_64/linux) </div>
<div>flags: STATS: Off, USE_TCP,
USE_TLS, USE_SCTP, TLS_HOOKS,
USE_RAW_SOCKS, DISABLE_NAGLE,
USE_MCAST, DNS_IP_HACK, SHM_MEM,
SHM_MMAP, PKG_MALLOC, Q_MALLOC,
F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY,
USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT,
USE_DNS_CACHE, USE_DNS_FAILOVER,
USE_NAPTR, USE_DST_BLACKLIST,
HAVE_RESOLV_RES</div>
<div>ADAPTIVE_WAIT_LOOPS=1024,
MAX_RECV_BUFFER_SIZE 262144
MAX_URI_SIZE 1024, BUF_SIZE 65535,
DEFAULT PKG_SIZE 8MB</div>
<div>poll method support: poll,
epoll_lt, epoll_et, sigio_rt, select.</div>
<div>id: unknown </div>
<div>compiled with gcc 5.3.1</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Every time this happened it was the
timer pid that was crashing or looping,
we managed to get 2 core dumps of the
crash and 1 of the loop (got it after
killing the timer with SIGQUIT).</div>
<div>
<div>The timer process crashes after
attempting to free memory that it's
not correctly managed. In particular
we saw that it happens inside
destroy_avp_list_unsafe() called by
the tm timer, when uri_avps_from
points to an invalid memory address.</div>
<div>We don't use dialog, uac,
http_async_client, but we do use
topos.</div>
</div>
<div><br>
</div>
<div>In the crash dumps it seems that
there's something wrong inside
free_cell_helper(), when trying to clean
the dead_cell something goes wrong while
freeing uri_avps_from, here's the crash
log:</div>
<div><br>
</div>
<div>
<div>61(76) CRITICAL: <core>
[core/pass_fd.c:277]: receive_fd():
EOF on 66</div>
<div> 0(15) ALERT: <core>
[main.c:740]: handle_sigs(): child
process 65 exited by a signal 11</div>
<div> 0(15) ALERT: <core>
[main.c:743]: handle_sigs(): core was
generated</div>
<div> 0(15) INFO: <core>
[main.c:766]: handle_sigs():
terminating due to SIGCHLD</div>
<div><childs getting the SIGTERM
signal><br>
</div>
<div> 0(15) CRITICAL: <core>
[core/mem/q_malloc.c:512]: qm_free():
BUG: freeing already freed pointer
(0x7f4d60a2e168), called from htable:
ht_api.c: ht_cell_free(217), first
free core: core/usr_avp.c:
destroy_avp_list_unsafe(626) -
ignoring</div>
<div> 0(15) INFO: <core>
[core/sctp_core.c:53]:
sctp_core_destroy(): SCTP API not
initialized</div>
</div>
<div><br>
</div>
<div>Before the crash we have no warnings
on errors about shared memory allocation<br>
</div>
<div><br>
</div>
<div>backtrace:</div>
<div><br>
</div>
<div>
<div>(gdb) bt full</div>
<div>#0 0x00000000006371d7 in
destroy_avp_list_unsafe
(list=0x7f34f7a411b0) at
core/usr_avp.c:625</div>
<div> avp = 0x2</div>
<div> foo = 0x2</div>
<div> __func__ =
"destroy_avp_list_unsafe"</div>
<div>#1 0x00007f351c6909ab in
free_cell_helper
(dead_cell=0x7f34f7a41018, silent=0,
fname=0x7f351c796a02 "timer.c",
fline=689) at h_table.c:260</div>
<div> b = 0x0</div>
<div> i = 1</div>
<div> rpl = 0x0</div>
<div> tt = 0x0</div>
<div> foo = 0x11c68daf2</div>
<div> cbs = 0x0</div>
<div> cbs_tmp = 0x7f34f80c3790</div>
<div> __func__ =
"free_cell_helper"</div>
<div>#2 0x00007f351c7291e6 in
wait_handler (ti=1984410416,
wait_tl=0x7f34f7a410a0,
data=0x7f34f7a41018) at timer.c:689</div>
<div> p_cell = 0x7f34f7a41018</div>
<div> ret = 1</div>
<div> unlinked = 0</div>
<div> rcount = 1</div>
<div> __func__ = "wait_handler"</div>
<div>#3 0x00000000004d0f56 in
timer_list_expire (t=1984410416,
h=0x7f34f6be7850,
slow_l=0x7f34f6bea368, slow_mark=2702)
at core/timer.c:874</div>
<div> tl = 0x7f34f7a410a0</div>
<div> ret = 0</div>
<div>#4 0x00000000004d13bd in
timer_handler () at core/timer.c:939</div>
<div> saved_ticks = 1984410416</div>
<div> run_slow_timer = 0</div>
<div> i = 654</div>
<div> __func__ = "timer_handler"</div>
<div>#5 0x00000000004d1845 in
timer_main () at core/timer.c:978</div>
<div>No locals.</div>
<div>#6 0x0000000000425cc8 in main_loop
() at main.c:1727</div>
<div> i = 8</div>
<div> pid = 0</div>
<div> si = 0x0</div>
<div> si_desc = "udp receiver
child=7 sock=<a
href="http://127.0.0.1:5060"
moz-do-not-send="true">127.0.0.1:5060</a>\000\000\000\000q\000\000\000@6y\000\000\000\000\000\000\b\206|\210X\322:\032\301\255\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\002\000\000\000\000\063\275{\000\000\000\000\000>",
'\000' <repeats 15 times>,
"\320\220\377\226\374\177\000\000\351\022a\000\000\000\000"</div>
<div> nrprocs = 8</div>
<div> woneinit = 1</div>
<div> __func__ = "main_loop"</div>
<div>#7 0x000000000042ca76 in main
(argc=10, argv=0x7ffc96ff9398) at
main.c:2696</div>
<div> cfg_stream = 0x25ca010</div>
<div> c = -1</div>
<div> r = 0</div>
<div> tmp = 0x7ffc96ffa4c9 ""</div>
<div> tmp_len = -1761635728</div>
<div> port = 32764</div>
<div> proto = -1761635632</div>
<div> options = 0x76c598
":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"</div>
<div> ret = -1</div>
<div> seed = 1492539644</div>
<div> rfd = 4</div>
<div> debug_save = 0</div>
<div> debug_flag = 0</div>
<div> dont_fork_cnt = 2</div>
<div> n_lst = 0x7f351f6f6718</div>
<div> p = 0xffffffff <error:
Cannot access memory at address
0xffffffff></div>
<div> st = {st_dev = 2049, st_ino
= 1282695, st_nlink = 2, st_mode =
16895, st_uid = 0, st_gid = 0, __pad0
= 0, st_rdev = 0, st_size = 4096,
st_blksize = 4096, st_blocks = 8,
st_atim = {tv_sec = 1552931478,
tv_nsec = 972761643}, </div>
<div> st_mtim = {tv_sec =
1552931478, tv_nsec = 972761643},
st_ctim = {tv_sec = 1552931478,
tv_nsec = 972761643}, __glibc_reserved
= {0, 0, 0}}</div>
<div> __func__ = "main"</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>In the loop dump we have a different
situation, there's a loop in the
uri_avps_from structure as you can see
from the next pointers:</div>
<div><br>
</div>
<div>(gdb) p *dead_cell->uri_avps_from</div>
<div>$3 = {id = 66, flags = 275, next =
0x7ff40a359450, d = {p = 0x7ff40a9ad540,
l = 140686126667072, data =
"@\325\232\n\364\177\000"}}</div>
<div>(gdb) p
*dead_cell->uri_avps_from.next</div>
<div>$4 = {id = 66, flags = 275, next =
0x7ff409c4fa58, d = {p = 0x7ff40a359480,
l = 140686120031360, data =
"\200\224\065\n\364\177\000"}}</div>
<div>(gdb) p
*dead_cell->uri_avps_from.next.next</div>
<div>$5 = {id = 66, flags = 275, next =
0x7ff40a359450, d = {p = 0x7ff409c4fa88,
l = 140686112651912, data =
"\210\372\304\t\364\177\000"}}</div>
<div><br>
</div>
<div>backtrace:</div>
<div>
<div>(gdb) bt full</div>
<div>#0 0x00007ff43152e67e in
_IO_default_xsputn () from
/lib/x86_64-linux-gnu/libc.so.6</div>
<div>No symbol table info available.</div>
<div>#1 0x00007ff43150150b in vfprintf
() from
/lib/x86_64-linux-gnu/libc.so.6</div>
<div>No symbol table info available.</div>
<div>#2 0x00007ff431502ef1 in ?? ()
from /lib/x86_64-linux-gnu/libc.so.6</div>
<div>No symbol table info available.</div>
<div>#3 0x00007ff43150032d in vfprintf
() from
/lib/x86_64-linux-gnu/libc.so.6</div>
<div>No symbol table info available.</div>
<div>#4 0x00007ff4315087f7 in fprintf
() from
/lib/x86_64-linux-gnu/libc.so.6</div>
<div>No symbol table info available.</div>
<div>#5 0x0000000000686e43 in qm_free
(qmp=0x7ff409336000, p=0x7ff409c4fa58,
file=0x7e0a75 "core: core/usr_avp.c",
func=0x7e2d70 <__func__.8439>
"destroy_avp_list_unsafe", line=626,
mname=0x7e0a70 "core") at
core/mem/q_malloc.c:512</div>
<div> __llevel = -3</div>
<div> qm = 0x7ff409336000</div>
<div> f = 0x7ff409c4fa20</div>
<div> size = 72</div>
<div> next = 0x0</div>
<div> prev = 0x0</div>
<div> __func__ = "qm_free"</div>
<div>#6 0x0000000000637209 in
destroy_avp_list_unsafe
(list=0x7ff409e1b6c8) at
core/usr_avp.c:626</div>
<div> avp = 0x7ff40a359450</div>
<div> foo = 0x7ff409c4fa58</div>
<div> __func__ =
"destroy_avp_list_unsafe"</div>
<div>#7 0x00007ff42ee5d9ab in
free_cell_helper
(dead_cell=0x7ff409e1b530, silent=0,
fname=0x7ff42ef63a02 "timer.c",
fline=689) at h_table.c:260</div>
<div> b = 0x0</div>
<div> i = 1</div>
<div> rpl = 0x0</div>
<div> tt = 0x0</div>
<div> foo = 0x12ee5aaf2</div>
<div> cbs = 0x0</div>
<div> cbs_tmp = 0x7ff409c5d420</div>
<div> __func__ =
"free_cell_helper"</div>
<div>#8 0x00007ff42eef61e6 in
wait_handler (ti=89094309,
wait_tl=0x7ff409e1b5b8,
data=0x7ff409e1b530) at timer.c:689</div>
<div> p_cell = 0x7ff409e1b530</div>
<div> ret = 1</div>
<div> unlinked = 0</div>
<div> rcount = 1</div>
<div> __func__ = "wait_handler"</div>
<div>#9 0x00000000004d0f56 in
timer_list_expire (t=89094309,
h=0x7ff4093b4850,
slow_l=0x7ff4093b8058, slow_mark=5981)
at core/timer.c:874</div>
<div> tl = 0x7ff409e1b5b8</div>
<div> ret = 0</div>
<div>#10 0x00000000004d13bd in
timer_handler () at core/timer.c:939</div>
<div> saved_ticks = 89094309</div>
<div> run_slow_timer = 0</div>
<div> i = 861</div>
<div> __func__ = "timer_handler"</div>
<div>#11 0x00000000004d1845 in
timer_main () at core/timer.c:978</div>
<div>No locals.</div>
<div>#12 0x0000000000425cc8 in main_loop
() at main.c:1727</div>
<div> i = 8</div>
<div> pid = 0</div>
<div> si = 0x0</div>
<div> si_desc = "udp receiver
child=7 sock=<a
href="http://127.0.0.1:5060"
moz-do-not-send="true">127.0.0.1:5060</a>\000\000\000\000q\000\000\000@6y\000\000\000\000\000\000\032
t\220\260\242\024\032\301\255\000\000\000\000\000\000\000\000
\000\000\000\000\000\000\000\002\000\000\000\000\063\275{\000\000\000\000\000>",
'\000' <repeats 15 times>,
"\220\306\336\071\377\177\000\000\351\022a\000\000\000\000"</div>
<div> nrprocs = 8</div>
<div> woneinit = 1</div>
<div> __func__ = "main_loop"</div>
<div>#13 0x000000000042ca76 in main
(argc=10, argv=0x7fff39dec958) at
main.c:2696</div>
<div> cfg_stream = 0x2292010</div>
<div> c = -1</div>
<div> r = 0</div>
<div> tmp = 0x7fff39dee4c9 ""</div>
<div> tmp_len = 970901552</div>
<div> port = 32767</div>
<div> proto = 970901648</div>
<div> options = 0x76c598
":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:"</div>
<div> ret = -1</div>
<div> seed = 2710999419</div>
<div> rfd = 4</div>
<div> debug_save = 0</div>
<div> debug_flag = 0</div>
<div> dont_fork_cnt = 2</div>
<div> n_lst = 0x7ff431ec3718</div>
<div> p = 0xffffffff <error:
Cannot access memory at address
0xffffffff></div>
<div> st = {st_dev = 2049, st_ino
= 1282695, st_nlink = 2, st_mode =
16895, st_uid = 0, st_gid = 0, __pad0
= 0, st_rdev = 0, st_size = 4096,
st_blksize = 4096, st_blocks = 8,
st_atim = {tv_sec = 1552931523,
tv_nsec = 918435869}, </div>
<div> st_mtim = {tv_sec =
1552994274, tv_nsec = 814438687},
st_ctim = {tv_sec = 1552994274,
tv_nsec = 814438687}, __glibc_reserved
= {0, 0, 0}}</div>
<div> __func__ = "main"</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>We are not sure on how to keep
debugging this issue which is causing
some serious troubles in our
environment, any help is appreciated</div>
<div><br>
</div>
<div>Here you can find the dumps (beware,
12M download but 1.5G once
uncompressed):</div>
<div><br>
</div>
<div><a
href="https://drive.google.com/file/d/1U0LZedyona8jJZHq6HaoqXpfnU83p6Sp/view?usp=sharing"
moz-do-not-send="true">https://drive.google.com/file/d/1U0LZedyona8jJZHq6HaoqXpfnU83p6Sp/view?usp=sharing</a></div>
<div><br>
</div>
<div>Due thanks to Giacomo Vacca who
helped us to dig into this issue.</div>
<div><br>
</div>
<div>Thanks</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Kamailio (SER) - Users Mailing List
<a class="moz-txt-link-abbreviated" href="mailto:sr-users@lists.kamailio.org">sr-users@lists.kamailio.org</a>
<a class="moz-txt-link-freetext" href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users">https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Daniel-Constantin Mierla -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a>
<a class="moz-txt-link-abbreviated" href="http://www.twitter.com/miconda">www.twitter.com/miconda</a> -- <a class="moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda">www.linkedin.com/in/miconda</a>
Kamailio World Conference - May 6-8, 2019 -- <a class="moz-txt-link-abbreviated" href="http://www.kamailioworld.com">www.kamailioworld.com</a>
Kamailio Advanced Training - Mar 25-27, 2019, in Washington, DC, USA -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a></pre>
</body>
</html>