<!-- Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment). -->
### Description
We have segfault in Kamailio v5.3.1 installed on Debain 9.x 64 bit occured while kamailio was shutting down while our script tryed to get metric using kamcmd utility at the same time.
### Troubleshooting
No troubleshooting was done, since it happened on a production server. We simply restarted the server.
#### Reproduction
The problem periodically happens on production servers during restart. Kamailio crashes when one of our scripts tried getting statistics about websocket and tls modules using kamcmd during server restart. As I see in core dump, shared memory was already freed when rpc_mod_print called in the child process.
#### Debugging Data
<!-- If you got a core dump, use gdb to extract troubleshooting data - full backtrace, local variables and the list of the code at the issue location.
gdb /path/to/kamailio /path/to/corefile bt full info locals list
If you are familiar with gdb, feel free to attach more of what you consider to be relevant. -->
``` core.kamailio.17684
/var/lib/ums/sbin/kamailio
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /var/lib/ums/sbin/kamailio...done. [New LWP 17684] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/var/lib/ums/sbin/kamailio -m 2048 -M 12 -P /var/run/kamailio/kamailio.pid -f /'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __strcmp_ssse3 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1197 #0 __strcmp_ssse3 () at ../sysdeps/x86_64/multiarch/../strcmp.S:1197 #1 0x00007ff4d6ddb2f5 in rpc_mod_print (rpc=0x7ff4d66f0540 <binrpc_callbacks>, ctx=0x7fff11cd5658, mname=0x1f140b8 "tls", stats=0x1f2f180, flag=2) at mod_stats.c:117 #2 0x00007ff4d6ddb0eb in rpc_mod_print_one (rpc=0x7ff4d66f0540 <binrpc_callbacks>, ctx=0x7fff11cd5658, mname=0x1f140b8 "tls", pkg_stats=0x1f2d9c0, shm_stats=0x1f2f180, flag=2) at mod_stats.c:159 #3 0x00007ff4d6ddaee1 in rpc_mod_mem_stats_mode (rpc=0x7ff4d66f0540 <binrpc_callbacks>, ctx=0x7fff11cd5658, fmode=0) at mod_stats.c:239 #4 0x00007ff4d6dda84f in rpc_mod_mem_stats (rpc=0x7ff4d66f0540 <binrpc_callbacks>, ctx=0x7fff11cd5658) at mod_stats.c:251 #5 0x00007ff4d64bfc80 in process_rpc_req (buf=0x1f140a4 "\241\003\026t\256\214=\221\nmod.stats", size=29, bytes_needed=0x7fff11cd5aa0, sh=0x7fff11cd5a10, saved_state=0x1f240a8) at binrpc_run.c:678 #6 0x00007ff4d64ad72f in handle_stream_read (s_c=0x1f14070, idx=-1) at io_listener.c:511 #7 0x00007ff4d64a9121 in handle_io (fm=0x7ff55a4e4ae0, events=1, idx=-1) at io_listener.c:706 #8 0x00007ff4d64a793a in io_wait_loop_epoll (h=0x7ff4d66f0348 <io_h>, t=10, repeat=0) at ./../../core/io_wait.h:1062 #9 0x00007ff4d649b62c in io_listen_loop (fd_no=2, cs_lst=0x1e28940) at io_listener.c:281 #10 0x00007ff4d64d172c in mod_child (rank=0) at ctl.c:338 #11 0x0000000000638c14 in init_mod_child (m=0x7ff55a3634b0, rank=0) at core/sr_module.c:780 #12 0x000000000063862d in init_mod_child (m=0x7ff55a363b78, rank=0) at core/sr_module.c:776 #13 0x000000000063862d in init_mod_child (m=0x7ff55a364018, rank=0) at core/sr_module.c:776 #14 0x000000000063862d in init_mod_child (m=0x7ff55a364528, rank=0) at core/sr_module.c:776 #15 0x000000000063862d in init_mod_child (m=0x7ff55a3649c8, rank=0) at core/sr_module.c:776 #16 0x000000000063862d in init_mod_child (m=0x7ff55a365140, rank=0) at core/sr_module.c:776 #17 0x000000000063862d in init_mod_child (m=0x7ff55a3657b0, rank=0) at core/sr_module.c:776 #18 0x000000000063862d in init_mod_child (m=0x7ff55a365c38, rank=0) at core/sr_module.c:776 #19 0x00000000006385b2 in init_child (rank=0) at core/sr_module.c:825 #20 0x000000000043140c in main_loop () at main.c:1753 #21 0x000000000043df6f in main (argc=9, argv=0x7fff11cd9dd8) at main.c:2802 ```
#### Log Messages
No any useful logs available.
#### SIP Traffic
No SIP traffic available.
### Possible Solutions
<!-- If you found a solution or workaround for the issue, describe it. Ideally, provide a pull request with a fix. -->
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.3.1 (x86_64/linux) 283e46 flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 283e46 compiled on 14:23:37 Jul 28 2020 with clang 9.0
```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `uname -a`) -->
``` Linux devhpbx005-1.vx 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 GNU/Linux ```
This is a race that can show up in a multi-process application and is hard to avoid it completely if you do not "cool down" the server (stop/reduce other interactions with kamailio) -- you run RPC commands as the same time with triggering shut down.
I pushed the commit referenced above (which depends on another commit to the core) to narrow down the window of this specific case, but it is not completely avoidable, as the rpc command can be in the middle of iterating some internal lists when kill is performed and some structures can be destroyed by other processes.
If someone wants to propose other solutions, make PRs and can be discussed there.
Closed #2433.
Thanks Daniel. I'll add this fix as a patch to our kamailio build.
**Updates:**
A few days ago we've got the same segfault on production server, but in this case it happened without shut down. I think there is a race when calling process_rpc_req function and it does not relate to server cooldown.
**Core dump:**
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /var/lib/ums/sbin/kamailio...done. [New LWP 6226] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/var/lib/ums/sbin/kamailio -m 2048 -M 12 -P /var/run/kamailio/kamailio.pid -f /'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:32 #0 __strcmp_sse2_unaligned () at ../sysdeps/x86_64/multiarch/strcmp-sse2-unaligned.S:32 #1 0x00007f327f55c2f5 in rpc_mod_print (rpc=0x7f327ee71540 <binrpc_callbacks>, ctx=0x7ffe1418a9b8, mname=0x1dca0c8 "tls", stats=0x1de5310, flag=2) at mod_stats.c:117 #2 0x00007f327f55c0eb in rpc_mod_print_one (rpc=0x7f327ee71540 <binrpc_callbacks>, ctx=0x7ffe1418a9b8, mname=0x1dca0c8 "tls", pkg_stats=0x1de39d0, shm_stats=0x1de5310, flag=2) at mod_stats.c:159 #3 0x00007f327f55bee1 in rpc_mod_mem_stats_mode (rpc=0x7f327ee71540 <binrpc_callbacks>, ctx=0x7ffe1418a9b8, fmode=0) at mod_stats.c:239 #4 0x00007f327f55b84f in rpc_mod_mem_stats (rpc=0x7f327ee71540 <binrpc_callbacks>, ctx=0x7ffe1418a9b8) at mod_stats.c:251 #5 0x00007f327ec40c80 in process_rpc_req (buf=0x1dca0b4 "\241\003\026F\270\067\r\221\nmod.stats", size=29, bytes_needed=0x7ffe1418ae00, sh=0x7ffe1418ad70, saved_state=0x1dda0b8) at binrpc_run.c:678 #6 0x00007f327ec2e72f in handle_stream_read (s_c=0x1dca080, idx=-1) at io_listener.c:511 #7 0x00007f327ec2a121 in handle_io (fm=0x7f3302c66cc0, events=1, idx=-1) at io_listener.c:706 #8 0x00007f327ec2893a in io_wait_loop_epoll (h=0x7f327ee71348 <io_h>, t=10, repeat=0) at ./../../core/io_wait.h:1062 #9 0x00007f327ec1c62c in io_listen_loop (fd_no=2, cs_lst=0x1cde940) at io_listener.c:281 #10 0x00007f327ec5272c in mod_child (rank=0) at ctl.c:338 #11 0x0000000000638c14 in init_mod_child (m=0x7f3302ae54b0, rank=0) at core/sr_module.c:780 #12 0x000000000063862d in init_mod_child (m=0x7f3302ae5b78, rank=0) at core/sr_module.c:776 #13 0x000000000063862d in init_mod_child (m=0x7f3302ae6018, rank=0) at core/sr_module.c:776 #14 0x000000000063862d in init_mod_child (m=0x7f3302ae6528, rank=0) at core/sr_module.c:776 #15 0x000000000063862d in init_mod_child (m=0x7f3302ae69c8, rank=0) at core/sr_module.c:776 #16 0x000000000063862d in init_mod_child (m=0x7f3302ae7140, rank=0) at core/sr_module.c:776 #17 0x000000000063862d in init_mod_child (m=0x7f3302ae77b0, rank=0) at core/sr_module.c:776 #18 0x000000000063862d in init_mod_child (m=0x7f3302ae7c38, rank=0) at core/sr_module.c:776 #19 0x00000000006385b2 in init_child (rank=0) at core/sr_module.c:825 #20 0x000000000043140c in main_loop () at main.c:1753 #21 0x000000000043df6f in main (argc=9, argv=0x7ffe1418f138) at main.c:2802
**Logs:**
2020-08-18T12:36:03.755718-07:00 hpbx031-1.va /var/lib/ums/sbin/kamailio[6204]: INFO: <script>: sip_call_id=9bf05e23-bdab-4c7e-93fd-975f1d27d3a9@hpbx031-1.va; Received ACK in a dialog 2020-08-18T12:36:03.755756-07:00 hpbx031-1.va /var/lib/ums/sbin/kamailio[6204]: INFO: <script>: sip_call_id=9bf05e23-bdab-4c7e-93fd-975f1d27d3a9@hpbx031-1.va; Relaying ACK from b2bua to sip:54171324@127.0.0.1:5060 2020-08-18T12:36:03.757855-07:00 hpbx031-1.va /var/lib/ums/sbin/kamailio[6197]: INFO: <script>: [75229096,R,0_1478685494]: Received, method: REGISTER, R-URI: sip:hpbx031.telecomsvc.com:5060, ct: sip:75229096@192.168.10.116:5100, cseq: 14166, s-ip: 74.89.64.126, s-port: 5100 2020-08-18T12:37:29.073600-07:00 hpbx031-1.va /var/lib/ums/sbin/kamailio[6193]: INFO: <core> [core/sctp_core.c:53]: sctp_core_destroy(): SCTP API not initialized 2020-08-18T12:37:39.583593-07:00 hpbx031-1.va kamailio: INFO: <core> [core/sctp_core.c:75]: sctp_core_check_support(): SCTP API not enabled - if you want to use it, load sctp module 2020-08-18T12:37:39.780830-07:00 hpbx031-1.va kamailio: INFO: <core> [core/tcp_main.c:5043]: init_tcp(): using epoll_lt as the io watch method (auto detected) 2020-08-18T12:37:39.781363-07:00 hpbx031-1.va kamailio[23718]: loading modules under config path: /var/lib/ums/lib64/kamailio/modules/ 2020-08-18T12:37:39.781546-07:00 hpbx031-1.va kamailio[23718]: Listening on
Was this a case of running the rpc command many times at the same time?
Was this a case of running the rpc command many times at the same time?
I'll ask our LSA about it tomorow. Do you think it could be a problem? I thought that RPC handlers have thread safe (interprocess, in this case) access to shared memory, is it?
This is a sequence of commands that python script runs every 10 seconds:
/var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 stats.get_statistics websocket: /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 stats.get_statistics tcp: /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 stats.get_statistics shmem: /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 mod.stats tls pkg /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 mod.stats tls shm /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 mod.stats websocket pkg /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 mod.stats websocket shm /var/lib/ums/sbin/kamcmd -s tcp:localhost:2048 mod.stats core shm
This script parses output of every command and gets needed metrics. I see at least one problem here - the script should use udp instead of tcp.
Bump! Any updates?
Access to shared memory is unlocked if shutdown is triggered, otherwise some worker processes can be killed while it acquired the lock, leaving kamailio shutdown worker in deadlock. So there are potential races if you run the rpc commands when you trigger the shut down. Probably you can make your stats fetch script to execute only if shutdown is not triggered, by using a state file created on shutdown command.
Shut down multiprocess races with shared memory access are hard to avoid completely, being a trade between complexity and being able to do a fast restart. If anyone wants to work on this, he/she is more than welcome to propose a pull request.
I understand, but we've got a lot of such crashes during normal kamailio operation, NOT during shutdown. This is the issue I'm concerned about.
The problem is not reproducible on the hosts where we disabled metrics gathering on kamailio.
Somehow I failed to notice it happens at runtime as well, I thought it was only during shut down.
Open a new issue adding the output of `bt full` from a recent core file for a runtime crash, to start fresh on troubleshooting.
Created issue #2460
I pushed a few commits in master branch trying to address this issue. I haven't implemented the memory mod stats, but as I could see in the code, it didn't seem to be protected for races on accessing the shm fragments.
You would need to test with master branch or by using patches from the next commits:
* aa458a62f034c2cb57639bdc713ed3c51b0292c7 * 761eb0616fea2a859a2c0abb652b22feb6f59859 * 9645be245f899fa8ae11a6be045d2ef83fd66bf5