Hello,
Hello dear list,
Today, I have had mutiples crashes. It seems it linked to tm.so module.
-rw------- 1 root kamailio 4299702272 Feb 25 13:08 core.kamailio.sig11.29204 -rw------- 1 root kamailio 1453023232 Feb 25 13:12 core.kamailio.sig11.29203 -rw------- 1 root kamailio 1416065024 Feb 25 13:12 core.kamailio.sig11.29207 -rw------- 1 root kamailio 4299681792 Feb 25 13:16 core.kamailio.sig11.19047 -rw------- 1 root kamailio 2108506112 Feb 25 13:20 core.kamailio.sig11.19043 -rw------- 1 root kamailio 4299689984 Feb 25 13:34 core.kamailio.sig11.19247 -rw------- 1 root kamailio 4299681792 Feb 25 13:34 core.kamailio.sig11.19246 -rw------- 1 root kamailio 4299698176 Feb 25 13:35 core.kamailio.sig11.19248 -rw------- 1 root kamailio 4299689984 Feb 25 13:35 core.kamailio.sig11.19243 -rw------- 1 root kamailio 4299685888 Feb 25 13:35 core.kamailio.sig11.19244 -rw------- 1 root kamailio 4299689984 Feb 25 13:36 core.kamailio.sig11.19242
root@sbc:/var/cores# gdb /usr/local/sbin/kamailio core.kamailio.sig11.29204 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 29204]
warning: .dynamic section for "/lib/x86_64-linux-gnu/libpthread.so.0" is not at the expected address (wrong library or version mismatch?) Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /lib/x86_64-linux-gnu/libthread_db-1.0.so line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /usr/local/etc/ka'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f68b9bb4515 in reply_received (p_msg=0x7f693ccd5cc8) at t_reply.c:2240 2240 last_uac_status=uac->last_received; (gdb)
Any ideas?
Regards
Abdoul
Hello,
can you give the output for next gdb commands:
bt full
info locals
list
Can you check with all core files and see if the backtrace is the same?
What is the version of Kamailio? Is it running on a bare metal server or a virtual machine/container?
Cheers, Daniel
On 25.02.19 14:21, Abdoul Osséni wrote:
Hello,
Hello dear list,
Today, I have had mutiples crashes. It seems it linked to tm.so module.
-rw------- 1 root kamailio 4299702272 Feb 25 13:08 core.kamailio.sig11.29204 -rw------- 1 root kamailio 1453023232 Feb 25 13:12 core.kamailio.sig11.29203 -rw------- 1 root kamailio 1416065024 Feb 25 13:12 core.kamailio.sig11.29207 -rw------- 1 root kamailio 4299681792 Feb 25 13:16 core.kamailio.sig11.19047 -rw------- 1 root kamailio 2108506112 Feb 25 13:20 core.kamailio.sig11.19043 -rw------- 1 root kamailio 4299689984 Feb 25 13:34 core.kamailio.sig11.19247 -rw------- 1 root kamailio 4299681792 Feb 25 13:34 core.kamailio.sig11.19246 -rw------- 1 root kamailio 4299698176 Feb 25 13:35 core.kamailio.sig11.19248 -rw------- 1 root kamailio 4299689984 Feb 25 13:35 core.kamailio.sig11.19243 -rw------- 1 root kamailio 4299685888 Feb 25 13:35 core.kamailio.sig11.19244 -rw------- 1 root kamailio 4299689984 Feb 25 13:36 core.kamailio.sig11.19242
root@sbc:/var/cores# gdb /usr/local/sbin/kamailio core.kamailio.sig11.29204 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 29204]
warning: .dynamic section for "/lib/x86_64-linux-gnu/libpthread.so.0" is not at the expected address (wrong library or version mismatch?) Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so http://libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /lib/x86_64-linux-gnu/libthread_db-1.0.so http://libthread_db-1.0.so line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so http://libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /usr/local/etc/ka'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f68b9bb4515 in reply_received (p_msg=0x7f693ccd5cc8) at t_reply.c:2240 2240 last_uac_status=uac->last_received; (gdb)
Any ideas?
Regards
Abdoul
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hello,
Please see attached the output of the gdb commands.
Can you check with all core files and see if the backtrace is the same?
--> Yes the backtrace is the same.
Sorry, I use kamailio v5.2
root@sbc:/var/cores# kamailio -V version: kamailio 5.2.1 (x86_64/linux) cd2583 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: cd2583 compiled on 07:33:25 Jan 31 2019 with gcc 4.9.2
Kamailio is running on a bare metal server.
Thanks
Abdoul
Le lun. 25 févr. 2019 à 14:40, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
can you give the output for next gdb commands:
bt full
info locals
list
Can you check with all core files and see if the backtrace is the same?
What is the version of Kamailio? Is it running on a bare metal server or a virtual machine/container?
Cheers, Daniel On 25.02.19 14:21, Abdoul Osséni wrote:
Hello,
Hello dear list,
Today, I have had mutiples crashes. It seems it linked to tm.so module.
-rw------- 1 root kamailio 4299702272 Feb 25 13:08 core.kamailio.sig11.29204 -rw------- 1 root kamailio 1453023232 Feb 25 13:12 core.kamailio.sig11.29203 -rw------- 1 root kamailio 1416065024 Feb 25 13:12 core.kamailio.sig11.29207 -rw------- 1 root kamailio 4299681792 Feb 25 13:16 core.kamailio.sig11.19047 -rw------- 1 root kamailio 2108506112 Feb 25 13:20 core.kamailio.sig11.19043 -rw------- 1 root kamailio 4299689984 Feb 25 13:34 core.kamailio.sig11.19247 -rw------- 1 root kamailio 4299681792 Feb 25 13:34 core.kamailio.sig11.19246 -rw------- 1 root kamailio 4299698176 Feb 25 13:35 core.kamailio.sig11.19248 -rw------- 1 root kamailio 4299689984 Feb 25 13:35 core.kamailio.sig11.19243 -rw------- 1 root kamailio 4299685888 Feb 25 13:35 core.kamailio.sig11.19244 -rw------- 1 root kamailio 4299689984 Feb 25 13:36 core.kamailio.sig11.19242
root@sbc:/var/cores# gdb /usr/local/sbin/kamailio core.kamailio.sig11.29204 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later < http://gnu.org/licenses/gpl.html%3E This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 29204]
warning: .dynamic section for "/lib/x86_64-linux-gnu/libpthread.so.0" is not at the expected address (wrong library or version mismatch?) Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /lib/x86_64-linux-gnu/libthread_db-1.0.so line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /usr/local/etc/ka'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f68b9bb4515 in reply_received (p_msg=0x7f693ccd5cc8) at t_reply.c:2240 2240 last_uac_status=uac->last_received; (gdb)
Any ideas?
Regards
Abdoul
Kamailio (SER) - Users Mailing Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 4-6, 2019 in Berlin; Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
Hello,
that's strange, but a while ago someone else reported an issue with same backtrace.
So the crash happens at the last line in the next snippet from reply_received() function in the tm module:
uac=&t->uac[branch]; LM_DBG("org. status uas=%d, uac[%d]=%d local=%d is_invite=%d)\n", t->uas.status, branch, uac->last_received, is_local(t), is_invite(t)); last_uac_status=uac->last_received;
The backtrace and info locals say that uac is null (0x0). According to my knowledge, the address of a field in a structure cannot be null and uac is set to &t->uac[branch]. Moreover, uac->last_received is printed in the LM_DBG() above the line of crash, if uac was 0x0, the crash should have happened there.
Then uac is a local variable, so it is on the stack of the process, in its private memory. There is no other assign or copy operation between the line of code where the uac is set and the crash. So overall, should be no race condition there. Either the kernel was doing something wrong, or maybe the coredump was somehow corrupted.
What was the value of debug level you had during the crash (debug parameter in kamailio.cfg)?
Could there have been any freeze of the operating system for long time and then a resume?
Can you give the output of command:
uname -a
What kind of linux distro and version you are running?
Cheers, Daniel
On 25.02.19 15:27, Abdoul Osséni wrote:
Hello,
Please see attached the output of the gdb commands.
Can you check with all core files and see if the backtrace is the same?
--> Yes the backtrace is the same.
Sorry, I use kamailio v5.2
root@sbc:/var/cores# kamailio -V version: kamailio 5.2.1 (x86_64/linux) cd2583 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: cd2583 compiled on 07:33:25 Jan 31 2019 with gcc 4.9.2
Kamailio is running on a bare metal server.
Thanks
Abdoul
Le lun. 25 févr. 2019 à 14:40, Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> a écrit :
Hello, can you give the output for next gdb commands: bt full info locals list Can you check with all core files and see if the backtrace is the same? What is the version of Kamailio? Is it running on a bare metal server or a virtual machine/container? Cheers, Daniel On 25.02.19 14:21, Abdoul Osséni wrote:
Hello, Hello dear list, Today, I have had mutiples crashes. It seems it linked to tm.so module. -rw------- 1 root kamailio 4299702272 Feb 25 13:08 core.kamailio.sig11.29204 -rw------- 1 root kamailio 1453023232 Feb 25 13:12 core.kamailio.sig11.29203 -rw------- 1 root kamailio 1416065024 Feb 25 13:12 core.kamailio.sig11.29207 -rw------- 1 root kamailio 4299681792 Feb 25 13:16 core.kamailio.sig11.19047 -rw------- 1 root kamailio 2108506112 Feb 25 13:20 core.kamailio.sig11.19043 -rw------- 1 root kamailio 4299689984 Feb 25 13:34 core.kamailio.sig11.19247 -rw------- 1 root kamailio 4299681792 Feb 25 13:34 core.kamailio.sig11.19246 -rw------- 1 root kamailio 4299698176 Feb 25 13:35 core.kamailio.sig11.19248 -rw------- 1 root kamailio 4299689984 Feb 25 13:35 core.kamailio.sig11.19243 -rw------- 1 root kamailio 4299685888 Feb 25 13:35 core.kamailio.sig11.19244 -rw------- 1 root kamailio 4299689984 Feb 25 13:36 core.kamailio.sig11.19242 root@sbc:/var/cores# gdb /usr/local/sbin/kamailio core.kamailio.sig11.29204 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 29204] warning: .dynamic section for "/lib/x86_64-linux-gnu/libpthread.so.0" is not at the expected address (wrong library or version mismatch?) Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so <http://libthread_db-1.0.so>" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /lib/x86_64-linux-gnu/libthread_db-1.0.so <http://libthread_db-1.0.so> line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path" warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so <http://libthread_db-1.0.so>" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /usr/local/etc/ka'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f68b9bb4515 in reply_received (p_msg=0x7f693ccd5cc8) at t_reply.c:2240 2240 last_uac_status=uac->last_received; (gdb) Any ideas? Regards Abdoul _______________________________________________ Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org <mailto:sr-users@lists.kamailio.org> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda> Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com <http://www.kamailioworld.com> Kamailio Advanced Training - Mar 4-6, 2019 in Berlin; Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com <http://www.asipto.com>
On 25/02/2019 12.34, Daniel-Constantin Mierla wrote:
Hello,
that's strange, but a while ago someone else reported an issue with same backtrace.
So the crash happens at the last line in the next snippet from reply_received() function in the tm module:
uac=&t->uac[branch]; LM_DBG("org. status uas=%d, uac[%d]=%d local=%d is_invite=%d)\n", t->uas.status, branch, uac->last_received, is_local(t), is_invite(t)); last_uac_status=uac->last_received;
The backtrace and info locals say that uac is null (0x0). According to my knowledge, the address of a field in a structure cannot be null and uac is set to &t->uac[branch]. Moreover, uac->last_received is printed in the LM_DBG() above the line of crash, if uac was 0x0, the crash should have happened there.
t->uac is a pointer to an array, not a static array contained in the struct. So, if t->uac was null, then &t->uac[branch] would also yield null if branch was zero. (For a non-zero branch, it would yield a pointer to somewhere just past null. &t->uac[branch] is the same as t->uac + branch.)
As for LM_DBG, I'm not too familiar with the logging macros, but if they're defined in such a way to check the log level first and then skip calling the actual logging function if the log level is too low, then the LM_DBG arguments would never be evaluated and so no null dereference would occur there.
I was debugging a similar core dump just the other day, although in a different location. That one was in t_should_relay_response(), line 1282, and also had Trans->uac == null. The strange part about this one was that according to gdb, Trans->uac was valid:
#0 0x00007f3f11d5b5e8 in t_should_relay_response (Trans=Trans@entry=0x7f3e14a551f8, new_code=new_code@entry=200, branch=branch@entry=0, should_store=should_store@entry=0x7fffb0353408, should_relay=should_relay@entry=0x7fffb0353404, cancel_data=cancel_data@entry=0x7fffb0353670, reply=0x7f3f160aa6e8) at t_reply.c:1282 1282 in t_reply.c (gdb) p Trans->uac[branch].last_received $11 = 0
even though the asm instruction definitely was a null dereference into ->uac:
0x00007f3f11d5b5de <+718>: add 0x170(%rbx),%r8 => 0x00007f3f11d5b5e8 <+728>: mov 0x190(%r8),%eax (gdb) p $r8 $2 = 0
%rbx had Trans and so %r8 had Trans->uac. At this point, %8 == Trans->uac == null, even though:
(gdb) p (long int) Trans->uac $18 = 139904611079176
Investigating further, we found that Trans resided in shared memory and so we (tentatively) concluded that this looks to be a race condition with another process overwriting the Trans shm. First Trans->uac was null and got assigned to %r8, then another process changed it to something valid in shm, then the segfault happened through %r8. We didn't have a chance to investigate further and I can't say for sure if these two crashes are related.
Cheers
On 25.02.19 19:05, Richard Fuchs wrote:
On 25/02/2019 12.34, Daniel-Constantin Mierla wrote:
Hello,
that's strange, but a while ago someone else reported an issue with same backtrace.
So the crash happens at the last line in the next snippet from reply_received() function in the tm module:
uac=&t->uac[branch]; LM_DBG("org. status uas=%d, uac[%d]=%d local=%d is_invite=%d)\n", t->uas.status, branch, uac->last_received, is_local(t), is_invite(t)); last_uac_status=uac->last_received;
The backtrace and info locals say that uac is null (0x0). According to my knowledge, the address of a field in a structure cannot be null and uac is set to &t->uac[branch]. Moreover, uac->last_received is printed in the LM_DBG() above the line of crash, if uac was 0x0, the crash should have happened there.
t->uac is a pointer to an array, not a static array contained in the struct. So, if t->uac was null, then &t->uac[branch] would also yield null if branch was zero. (For a non-zero branch, it would yield a pointer to somewhere just past null. &t->uac[branch] is the same as t->uac + branch.)
The t->uac should never be null for a valid t, it is allocated at the same time with t, in the same shm_malloc(). The operation is done under lock - LOCK_REPLIES(t) - but indeed, if there is a race somehow or operation done without lock check, the memory space for it can be overwritten.
As for LM_DBG, I'm not too familiar with the logging macros, but if they're defined in such a way to check the log level first and then skip calling the actual logging function if the log level is too low, then the LM_DBG arguments would never be evaluated and so no null dereference would occur there.
Yes, the macro checks first for the log level and only if it is going to be printed, does the rest of evaluation.
I was debugging a similar core dump just the other day, although in a different location. That one was in t_should_relay_response(), line 1282, and also had Trans->uac == null. The strange part about this one was that according to gdb, Trans->uac was valid:
#0 0x00007f3f11d5b5e8 in t_should_relay_response (Trans=Trans@entry=0x7f3e14a551f8, new_code=new_code@entry=200, branch=branch@entry=0, should_store=should_store@entry=0x7fffb0353408, should_relay=should_relay@entry=0x7fffb0353404, cancel_data=cancel_data@entry=0x7fffb0353670, reply=0x7f3f160aa6e8) at t_reply.c:1282 1282 in t_reply.c (gdb) p Trans->uac[branch].last_received $11 = 0
even though the asm instruction definitely was a null dereference into ->uac:
0x00007f3f11d5b5de <+718>: add 0x170(%rbx),%r8 => 0x00007f3f11d5b5e8 <+728>: mov 0x190(%r8),%eax (gdb) p $r8 $2 = 0
%rbx had Trans and so %r8 had Trans->uac. At this point, %8 == Trans->uac == null, even though:
(gdb) p (long int) Trans->uac $18 = 139904611079176
Investigating further, we found that Trans resided in shared memory and so we (tentatively) concluded that this looks to be a race condition with another process overwriting the Trans shm. First Trans->uac was null and got assigned to %r8, then another process changed it to something valid in shm, then the segfault happened through %r8. We didn't have a chance to investigate further and I can't say for sure if these two crashes are related.
I will look into this direction as well, there was something reported also for t_should_relay_response() over the time.
You were running 5.2.1?
Cheers, Daniel
Short update on this thread as well: I think I found the issue and tried to come up with a solution in the commit:
- https://github.com/kamailio/kamailio/commit/814d5cc1f4f5b1e4b95737108dffc1e7...
The tests that reproduced the crash rather quickly before the commit (done by Yufei Tao) are now running fine for long time.
I will backport to stable branches and likely next week I am considering to do a new release from 5.2 and then from 5.1 branches.
Cheers, Daniel
On 25.02.19 19:42, Richard Fuchs wrote:
On 25/02/2019 13.40, Daniel-Constantin Mierla wrote:
I will look into this direction as well, there was something reported also for t_should_relay_response() over the time.
You were running 5.2.1?
This one was on 5.1.7.
Cheers
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
Hello,
The value of debug level I had during the crash is 2. --- debug=2 ---
I checked from my monitoring tools and system logs if the server has encounter any issue (freeze, network lost, database issues, ...) but I found nothing.
[image: image.png]
[image: image.png]
I use Debian (8.6 default version).
[image: image.png] Regards
Abdoul
Le lun. 25 févr. 2019 à 18:34, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
that's strange, but a while ago someone else reported an issue with same backtrace.
So the crash happens at the last line in the next snippet from reply_received() function in the tm module:
uac=&t->uac[branch]; LM_DBG("org. status uas=%d, uac[%d]=%d local=%d is_invite=%d)\n", t->uas.status, branch, uac->last_received, is_local(t), is_invite(t)); last_uac_status=uac->last_received;
The backtrace and info locals say that uac is null (0x0). According to my knowledge, the address of a field in a structure cannot be null and uac is set to &t->uac[branch]. Moreover, uac->last_received is printed in the LM_DBG() above the line of crash, if uac was 0x0, the crash should have happened there.
Then uac is a local variable, so it is on the stack of the process, in its private memory. There is no other assign or copy operation between the line of code where the uac is set and the crash. So overall, should be no race condition there. Either the kernel was doing something wrong, or maybe the coredump was somehow corrupted.
What was the value of debug level you had during the crash (debug parameter in kamailio.cfg)?
Could there have been any freeze of the operating system for long time and then a resume?
Can you give the output of command:
uname -a
What kind of linux distro and version you are running?
Cheers, Daniel On 25.02.19 15:27, Abdoul Osséni wrote:
Hello,
Please see attached the output of the gdb commands.
Can you check with all core files and see if the backtrace is the same?
--> Yes the backtrace is the same.
Sorry, I use kamailio v5.2
root@sbc:/var/cores# kamailio -V version: kamailio 5.2.1 (x86_64/linux) cd2583 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: cd2583 compiled on 07:33:25 Jan 31 2019 with gcc 4.9.2
Kamailio is running on a bare metal server.
Thanks
Abdoul
Le lun. 25 févr. 2019 à 14:40, Daniel-Constantin Mierla miconda@gmail.com a écrit :
Hello,
can you give the output for next gdb commands:
bt full
info locals
list
Can you check with all core files and see if the backtrace is the same?
What is the version of Kamailio? Is it running on a bare metal server or a virtual machine/container?
Cheers, Daniel On 25.02.19 14:21, Abdoul Osséni wrote:
Hello,
Hello dear list,
Today, I have had mutiples crashes. It seems it linked to tm.so module.
-rw------- 1 root kamailio 4299702272 Feb 25 13:08 core.kamailio.sig11.29204 -rw------- 1 root kamailio 1453023232 Feb 25 13:12 core.kamailio.sig11.29203 -rw------- 1 root kamailio 1416065024 Feb 25 13:12 core.kamailio.sig11.29207 -rw------- 1 root kamailio 4299681792 Feb 25 13:16 core.kamailio.sig11.19047 -rw------- 1 root kamailio 2108506112 Feb 25 13:20 core.kamailio.sig11.19043 -rw------- 1 root kamailio 4299689984 Feb 25 13:34 core.kamailio.sig11.19247 -rw------- 1 root kamailio 4299681792 Feb 25 13:34 core.kamailio.sig11.19246 -rw------- 1 root kamailio 4299698176 Feb 25 13:35 core.kamailio.sig11.19248 -rw------- 1 root kamailio 4299689984 Feb 25 13:35 core.kamailio.sig11.19243 -rw------- 1 root kamailio 4299685888 Feb 25 13:35 core.kamailio.sig11.19244 -rw------- 1 root kamailio 4299689984 Feb 25 13:36 core.kamailio.sig11.19242
root@sbc:/var/cores# gdb /usr/local/sbin/kamailio core.kamailio.sig11.29204 GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later < http://gnu.org/licenses/gpl.html%3E This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 29204]
warning: .dynamic section for "/lib/x86_64-linux-gnu/libpthread.so.0" is not at the expected address (wrong library or version mismatch?) Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load". To enable execution of this file add add-auto-load-safe-path /lib/x86_64-linux-gnu/libthread_db-1.0.so line to your configuration file "/root/.gdbinit". To completely disable this security protection add set auto-load safe-path / line to your configuration file "/root/.gdbinit". For more information about this security protection see the "Auto-loading safe path" section in the GDB manual. E.g., run from the shell: info "(gdb)Auto-loading safe path"
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: generic error
warning: File "/lib/x86_64-linux-gnu/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio/kamailio.pid -f /usr/local/etc/ka'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f68b9bb4515 in reply_received (p_msg=0x7f693ccd5cc8) at t_reply.c:2240 2240 last_uac_status=uac->last_received; (gdb)
Any ideas?
Regards
Abdoul
Kamailio (SER) - Users Mailing Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 4-6, 2019 in Berlin; Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com
--
Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda Kamailio World Conference - May 6-8, 2019 -- www.kamailioworld.com Kamailio Advanced Training - Mar 4-6, 2019 in Berlin; Mar 25-27, 2019, in Washington, DC, USA -- www.asipto.com