<!-- Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* https://lists.kamailio.org/mailman3/postorius/lists/sr-users.lists.kamailio....
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* https://lists.kamailio.org/mailman3/postorius/lists/sr-dev.lists.kamailio.or...
Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
Note that an issue report may be closed automatically after about 2 months if there is no interest from developers or community users on pursuing it, being considered expired. In such case, it can be reopened by writing a comment that includes the token `/notexpired`. About two weeks before considered expired, the issue is marked with the label `stale`, trying to notify the submitter and everyone else that might be interested in it. To remove the label `stale`, write a comment that includes the token `/notstale`. Also, any comment postpone the `expire` timeline, being considered that there is interest in pursuing the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment). -->
### Description
We are now using kamailio on the edge as a presence proxy to scale presence. We have 3 internal kamailio instances handling the actual subscriptions and publishing. This works well, however, once per 24-36hours we experience a crash on the border proxy instance. We did not experience this when using the border proxy as a presence server (there was no proxying SUBSCRIBE).
### Troubleshooting
#### Reproduction
I believe this is happening when a customers network randomly experiences issues where it doesn't receive our replies to subscribes (and continues to send a subscribe). Logs are below.
#### Debugging Data
<!-- If you got a core dump, use gdb to extract troubleshooting data - full backtrace, local variables and the list of the code at the issue location.
gdb /path/to/kamailio /path/to/corefile bt full info locals list
If you are familiar with gdb, feel free to attach more of what you consider to be relevant. -->
Here's the output while replacing any sensitive info in the payload. ``` GNU gdb (Debian 13.1-3) 13.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/kamailio... Reading symbols from /usr/lib/debug/.build-id/ce/c19d0ed1a928e4e25a84ae20d77e9c904a4892.debug...
warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing [New LWP 26] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `kamailio -DD -E -m 8000 -M 512 -f /etc/kamailio/kamailio.cfg -P /var/run/kamail'. Program terminated with signal SIGSEGV, Segmentation fault. #0 via_builder (len=len@entry=0x7ffed7e3ecb8, msg=msg@entry=0x7ff9529548b0, send_info=send_info@entry=0x7ff764d20bb0, branch=branch@entry=0x7ffed7e3ecc0, extra_params=0x0, hp=hp@entry=0x7ffed7e3ebb0) at core/msg_translator.c:2910 2910 core/msg_translator.c: No such file or directory. (gdb) bt full #0 via_builder (len=len@entry=0x7ffed7e3ecb8, msg=msg@entry=0x7ff9529548b0, send_info=send_info@entry=0x7ff764d20bb0, branch=branch@entry=0x7ffed7e3ecc0, extra_params=0x0, hp=hp@entry=0x7ffed7e3ebb0) at core/msg_translator.c:2910 via_len = <optimized out> extra_len = <optimized out> line_buf = <optimized out> max_len = <optimized out> via_prefix_len = <optimized out> address_str = <optimized out> port_str = 0x0 send_sock = 0x0 comp_len = <optimized out> comp_name_len = <optimized out> port = <optimized out> proto = <optimized out> ip = {af = 536870912, len = 0, u = {addrl = {534820896, 1586192}, addr32 = {534820896, 0, 1586192, 0}, addr16 = {47136, 8160, 0, 0, 13328, 24, 0, 0}, addr = " \270\340\037\000\000\000\000\0204\030\000\000\000\000"}} from = 0x0 local_addr = {s = {sa_family = 16, sa_data = "#\000\000\000\000\000\020\000\000\000\000\000\000"}, sin = {sin_family = 16, sin_port = 35, sin_addr = {s_addr = 0}, sin_zero = "\020\000\000\000\000\000\000"}, sin6 = {sin6_family = 16, sin6_port = 35, sin6_flowinfo = 0, sin6_addr = {__in6_u = { __u6_addr8 = "\020\000\000\000\000\000\000\000\003\000\000\000\000\000\000", __u6_addr16 = {16, 0, 0, 0, 3, 0, 0, 0}, __u6_addr32 = {16, 0, 3, 0}}}, sin6_scope_id = 3000082688}, sas = {ss_family = 16, __ss_padding = "#\000\000\000\000\000\020\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\000\241\321\262cg\357`@\000\000\000\000\000\000\000X\f-\177!V\000\000\020\016\217R\371\177\000\000\001\000\000\000!V\000\000*\000\000\000\000\000\000\000\001\000\000\000\376\177\000\000\004\360;\177!V\000\000\250,A\177!V\000\000\000\355\343\327\376\177\000\000\000\241\321\262cg\357`h\256\006R\371\177\000", __ss_align = 94701866020241}} con = 0x0 rxavp = 0x0 xname = {s = 0x0, len = -2141140176} __func__ = "via_builder" #1 0x000056217f155030 in create_via_hf (len=len@entry=0x7ffed7e3ecb8, msg=msg@entry=0x7ff9529548b0, send_info=send_info@entry=0x7ff764d20bb0, branch=branch@entry=0x7ffed7e3ecc0) at core/msg_translator.c:3232 via = <optimized out> extra_params = {s = 0x0, len = 0} hp = {host = 0x56217f4fb380 <default_global_address>, port = 0x56217f4fb370 <default_global_port>} sbuf = "\331\023\000\000\001", '\000' <repeats 18 times> --Type <RET> for more, q to quit, c to continue without paging--info locals slen = <optimized out> xparams = <optimized out> id_buf = <optimized out> id_len = 0 __func__ = "create_via_hf" __llevel = <optimized out> __kld = <optimized out> #2 0x000056217f1592f7 in build_req_buf_from_sip_req (msg=msg@entry=0x7ff9529548b0, returned_len=returned_len@entry=0x7ffed7e3eeac, send_info=send_info@entry=0x7ff764d20bb0, mode=mode@entry=128) at core/msg_translator.c:2096 len = 581 new_len = <optimized out> received_len = 0 rport_len = 0 uri_len = 0 via_len = 0 body_delta = 0 line_buf = 0x0 received_buf = 0x0 rport_buf = 0x0 new_buf = 0x0 buf = 0x56217f5866c0 <buf> "SUBSCRIBE sip:74@sip.domain.co SIP/2.0\r\nVia: SIP/2.0/UDP 12.171.207.82:25744;branch=z9hG4bK1805670803;rport\r\nFrom: "Heather Myers" sip:1086448@sip.domain.co;tag=1805518213\r\nTo: sip:74@sip.domain.co\r\nCa"... path_buf = {s = 0x0, len = 0} offset = 0 s_offset = 0 size = <optimized out> via_anchor = 0x7ff952951300 via_lump = <optimized out> via_rm = <optimized out> via_insert_param = 0x0 path_anchor = <optimized out> path_lump = <optimized out> branch = {s = 0x7ff952954ed8 "z9hG4bKc7ed.5819b4295fe83f981a2fc19b7f739d7e.0", len = 46} flags = 262273 --Type <RET> for more, q to quit, c to continue without paging--list udp_mtu = <optimized out> di = {send_sock = 0x0, to = {s = {sa_family = 18608, sa_data = "\225R\371\177\000\000\002\000\023\331\n4\001\r"}, sin = {sin_family = 18608, sin_port = 21141, sin_addr = { s_addr = 32761}, sin_zero = "\002\000\023\331\n4\001\r"}, sin6 = {sin6_family = 18608, sin6_port = 21141, sin6_flowinfo = 32761, sin6_addr = {__in6_u = { __u6_addr8 = "\002\000\023\331\n4\001\r\000\000\000\000\000\000\000", __u6_addr16 = {2, 55571, 13322, 3329, 0, 0, 0, 0}, __u6_addr32 = {3641901058, 218182666, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 18608, __ss_padding = "\225R\371\177\000\000\002\000\023\331\n4\001\r", '\000' <repeats 103 times>, __ss_align = 0}}, id = 0, send_flags = {f = 0, blst_imask = 0}, proto = 0 '\000', proto_pad0 = 0 '\000', proto_pad1 = 0} ret = <optimized out> __func__ = "build_req_buf_from_sip_req" error00 = <optimized out> #3 0x00007ff951fbe90d in prepare_new_uac (t=t@entry=0x7ff764d20880, i_req=i_req@entry=0x7ff9529548b0, branch=branch@entry=0, uri=<optimized out>, uri@entry=0x7ff9529548e8, path=<optimized out>, path@entry=0x7ff952954f60, next_hop=<optimized out>, fsocket=0x7ff9528f0e10, snd_flags=..., fproto=<optimized out>, flags=<optimized out>, instance=<optimized out>, ruid=<optimized out>, location_ua=<optimized out>) at ./src/modules/tm/t_fwd.c:482 shbuf = 0x0 add_rm_backup = <optimized out> body_lumps_backup = <optimized out> parsed_uri_bak = {user = { s = 0x56217f5866ce <buf+14> "74@sip.domain.co SIP/2.0\r\nVia: SIP/2.0/UDP 12.171.207.82:25744;branch=z9hG4bK1805670803;rport\r\nFrom: "Heather Myers" sip:1086448@sip.domain.co;tag=1805518213\r\nTo: sip:74@sip.domain.co\r\nCall-ID: 0_18056"..., len = 2}, passwd = {s = 0x0, len = 0}, host = { s = 0x56217f5866d1 <buf+17> "sip.domain.co SIP/2.0\r\nVia: SIP/2.0/UDP 12.171.207.82:25744;branch=z9hG4bK1805670803;rport\r\nFrom: "Heather Myers" sip:1086448@sip.domain.co;tag=1805518213\r\nTo: sip:74@sip.domain.co\r\nCall-ID: 0_18056078"..., len = 11}, port = {s = 0x0, len = 0}, params = {s = 0x0, len = 0}, sip_params = {s = 0x0, len = 0}, headers = { s = 0x0, len = 0}, port_no = 0, proto = 0, type = SIP_URI_T, flags = 0, transport = {s = 0x0, len = 0}, ttl = {s = 0x0, len = 0}, user_param = {s = 0x0, len = 0}, maddr = { s = 0x0, len = 0}, method = {s = 0x0, len = 0}, lr = {s = 0x0, len = 0}, r2 = {s = 0x0, len = 0}, gr = {s = 0x0, len = 0}, transport_val = {s = 0x0, len = 0}, ttl_val = { s = 0x0, len = 0}, user_param_val = {s = 0x0, len = 0}, maddr_val = {s = 0x0, len = 0}, method_val = {s = 0x0, len = 0}, lr_val = {s = 0x0, len = 0}, r2_val = {s = 0x0, len = 0}, gr_val = {s = 0x0, len = 0}} ret = -1 len = 32761 parsed_uri_ok_bak = <optimized out> free_new_uri = 1 msg_uri_bak = {s = 0x0, len = 0} dst_uri_bak = {s = 0x7ff9529525f0 "sip:10.52.1.13:5081", len = 19} dst_uri_backed_up = 1 path_bak = {s = 0x0, len = 0} free_path = 1 instance_bak = {s = 0x0, len = 0} ```
#### Log Messages
<!-- Check the syslog file and if there are relevant log messages printed by Kamailio, add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` ERROR 2025-01-10T06:00:31.847348563Z [resource.labels.containerName: kamailio] 8(28) CRITICAL: {1 1 SUBSCRIBE 0_1907143137@192.168.100.28} tm [timer.h:188]: _set_fr_retr(): already added: 0x7ff763e62250 , tl=0x7ff763e62270!!! ERROR 2025-01-10T06:00:31.847388483Z [resource.labels.containerName: kamailio] 8(28) CRITICAL: {1 1 SUBSCRIBE 0_1907143137@192.168.100.28} tm [t_fwd.c:1613]: t_send_branch(): BUG: retransmission already started for: 0x7ff763e62250 ERROR 2025-01-10T06:00:31.847394809Z [resource.labels.containerName: kamailio] 8(28) ERROR: {1 1 SUBSCRIBE 0_1907143137@192.168.100.28} sl [sl_funcs.c:428]: sl_reply_error(): stateless error reply used: No error (0/SL) ERROR 2025-01-10T06:00:31.847400738Z [resource.labels.containerName: kamailio] 8(28) BUG: {1 1 SUBSCRIBE 0_1907143137@192.168.100.28} tm [t_lookup.c:2052]: t_unref(): REQ_ERR DELAYED should have been caught much earlier for 0x7ff763e61f70: 24 (hex 18) ERROR 2025-01-10T06:00:47.346744773Z [resource.labels.containerName: kamailio] 45(65) ERROR: <core> [core/tcp_main.c:4818]: tcpconn_main_timeout(): connect 64.63.142.90:12190 failed (timeout) ERROR 2025-01-10T06:00:58.274915089Z [resource.labels.containerName: kamailio] 45(65) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 20 ERROR 2025-01-10T06:01:02.715789076Z [resource.labels.containerName: kamailio] 0(1) ALERT: <core> [main.c:805]: handle_sigs(): child process 26 exited by a signal 11 ERROR 2025-01-10T06:01:02.715827333Z [resource.labels.containerName: kamailio] 0(1) ALERT: <core> [main.c:809]: handle_sigs(): core was generated ERROR 2025-01-10T06:01:02.718815287Z [resource.labels.containerName: kamailio] 45(65) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 25 ERROR 2025-01-10T06:01:02.720985493Z [resource.labels.containerName: kamailio] 45(65) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 23 ERROR 2025-01-10T06:01:02.725450552Z [resource.labels.containerName: kamailio] 45(65) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 30 ```
#### SIP Traffic
<!-- If the issue is exposed by processing specific SIP messages, grab them with ngrep or save in a pcap file, then add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` (paste your sip traffic here) ```
### Possible Solutions
I can atleast say that when this segfault happens that these logs are always present right before the crash. However, I also see instances of these logs without a crash immediately following.
On the proxy we are doing this is withindlg route to send subscribes to the presence route: ``` if KSR.is_SUBSCRIBE() and KSR.is_myself_ruri() then -- in-dialog subscribe requests ksr_route_presence(); KSR.x.exit(); end ```
Any then this is our ksr_route_presence(): ``` function ksr_route_presence() if not KSR.is_SUBSCRIBE() then return 1; end
local fuser = KSR.kx.get_fuser();
if fuser=='' then KSR.xlog.xinfo('Subscription attempt without username from IP ' .. KSR.kx.get_srcip() .. ' - Rejecting\n'); KSR.sl.sl_send_reply(404, 'Not Found'); KSR.x.exit(); end
KSR.pv.sets('$avp(s:tenant)', string.match(fuser, '-(.*)')); local setId = '6' .. KSR.kx.get_def('DEFAULT_DISPATCH_SET'); local modeId = '7';
if KSR.dispatcher.ds_select_dst(setId, modeId)<0 then KSR.sl.send_reply(404, 'No destination'); KSR.x.exit(); end
KSR.corex.set_send_socket(KSR.kx.get_def('PRIVATE_LISTEN_IP') .. ':' .. KSR.kx.get_def('INTERNAL_PORT'));
KSR.tm.t_on_failure('ksr_route_rtf_dispatch');
ksr_route_relay(); end ```
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.8.4 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, MEM_JOIN_FREE, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_SEND_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled with gcc 12.2.0 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `lsb_release -a` and `uname -a`) -->
``` PRETTY_NAME="Debian GNU/Linux 12 (bookworm)" NAME="Debian GNU/Linux" VERSION_ID="12" VERSION="12 (bookworm)" VERSION_CODENAME=bookworm ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" ``` ``` Linux gke-us-south1-external-sip-800ca69e-qyb3 5.15.0-1048-gke #53-Ubuntu SMP Tue Nov 28 00:39:01 UTC 2023 x86_64 GNU/Linux ```
Thanks for the report. This looks like a race condition somehow, but strange that it seems to happen related to tm and the core. You were saying that the logs are always like this in case of a crash. Do you also looked into the backtrace, does it looks from the called functions also similar?
To help troubleshooting, get also with gdb from core file the output of the commands:
``` frame 0 p send_info p * send_info list ```
I pushed a commit with a safety check (referenced above), but it is not clear if that is the reason, so getting the output of the commands in the previous comment is still useful.
Here's the output of the commands:
``` --Type <RET> for more, q to quit, c to continue without paging--frame 0 slen = <optimized out> xparams = <optimized out> id_buf = <optimized out> id_len = 0 __func__ = "create_via_hf" __llevel = <optimized out> __kld = <optimized out> #2 0x000056217f1592f7 in build_req_buf_from_sip_req (msg=msg@entry=0x7ff9529548b0, returned_len=returned_len@entry=0x7ffed7e3eeac, send_info=send_info@entry=0x7ff764d20bb0, mode=mode@entry=128) at core/msg_translator.c:2096 len = 581 new_len = <optimized out> received_len = 0 rport_len = 0 uri_len = 0 via_len = 0 body_delta = 0 line_buf = 0x0 received_buf = 0x0 rport_buf = 0x0 new_buf = 0x0 buf = 0x56217f5866c0 <buf> "SUBSCRIBE sip:74@sip.something.co SIP/2.0\r\nVia: SIP/2.0/UDP 12.171.207.82:25744;branch=z9hG4bK1805670803;rport\r\nFrom: "Heather Myers" sip:108-6448@sip.something.co;tag=1805518213\r\nTo: sip:74@sip.something.co\r\nCa"... path_buf = {s = 0x0, len = 0} offset = 0 s_offset = 0 size = <optimized out> via_anchor = 0x7ff952951300 via_lump = <optimized out> via_rm = <optimized out> via_insert_param = 0x0 path_anchor = <optimized out> path_lump = <optimized out> branch = {s = 0x7ff952954ed8 "z9hG4bKc7ed.5819b4295fe83f981a2fc19b7f739d7e.0", len = 46} flags = 262273 --Type <RET> for more, q to quit, c to continue without paging--p send_info udp_mtu = <optimized out> di = {send_sock = 0x0, to = {s = {sa_family = 18608, sa_data = "\225R\371\177\000\000\002\000\023\331\n4\001\r"}, sin = {sin_family = 18608, sin_port = 21141, sin_addr = { s_addr = 32761}, sin_zero = "\002\000\023\331\n4\001\r"}, sin6 = {sin6_family = 18608, sin6_port = 21141, sin6_flowinfo = 32761, sin6_addr = {__in6_u = { __u6_addr8 = "\002\000\023\331\n4\001\r\000\000\000\000\000\000\000", __u6_addr16 = {2, 55571, 13322, 3329, 0, 0, 0, 0}, __u6_addr32 = {3641901058, 218182666, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 18608, __ss_padding = "\225R\371\177\000\000\002\000\023\331\n4\001\r", '\000' <repeats 103 times>, __ss_align = 0}}, id = 0, send_flags = {f = 0, blst_imask = 0}, proto = 0 '\000', proto_pad0 = 0 '\000', proto_pad1 = 0} ret = <optimized out> __func__ = "build_req_buf_from_sip_req" error00 = <optimized out> #3 0x00007ff951fbe90d in prepare_new_uac (t=t@entry=0x7ff764d20880, i_req=i_req@entry=0x7ff9529548b0, branch=branch@entry=0, uri=<optimized out>, uri@entry=0x7ff9529548e8, path=<optimized out>, path@entry=0x7ff952954f60, next_hop=<optimized out>, fsocket=0x7ff9528f0e10, snd_flags=..., fproto=<optimized out>, flags=<optimized out>, instance=<optimized out>, ruid=<optimized out>, location_ua=<optimized out>) at ./src/modules/tm/t_fwd.c:482 shbuf = 0x0 add_rm_backup = <optimized out> body_lumps_backup = <optimized out> parsed_uri_bak = {user = { s = 0x56217f5866ce <buf+14> "74@sip.something.co SIP/2.0\r\nVia: SIP/2.0/UDP 12.171.207.82:25744;branch=z9hG4bK1805670803;rport\r\nFrom: "Heather Myers" sip:108-6448@sip.something.co;tag=1805518213\r\nTo: sip:74@sip.something.co\r\nCall-ID: 0_18056"..., len = 2}, passwd = {s = 0x0, len = 0}, host = { s = 0x56217f5866d1 <buf+17> "sip.something.co SIP/2.0\r\nVia: SIP/2.0/UDP 12.171.207.82:25744;branch=z9hG4bK1805670803;rport\r\nFrom: "Heather Myers" sip:108-6448@sip.something.co;tag=1805518213\r\nTo: sip:74@sip.something.co\r\nCall-ID: 0_18056078"..., len = 11}, port = {s = 0x0, len = 0}, params = {s = 0x0, len = 0}, sip_params = {s = 0x0, len = 0}, headers = { s = 0x0, len = 0}, port_no = 0, proto = 0, type = SIP_URI_T, flags = 0, transport = {s = 0x0, len = 0}, ttl = {s = 0x0, len = 0}, user_param = {s = 0x0, len = 0}, maddr = { s = 0x0, len = 0}, method = {s = 0x0, len = 0}, lr = {s = 0x0, len = 0}, r2 = {s = 0x0, len = 0}, gr = {s = 0x0, len = 0}, transport_val = {s = 0x0, len = 0}, ttl_val = { s = 0x0, len = 0}, user_param_val = {s = 0x0, len = 0}, maddr_val = {s = 0x0, len = 0}, method_val = {s = 0x0, len = 0}, lr_val = {s = 0x0, len = 0}, r2_val = {s = 0x0, len = 0}, gr_val = {s = 0x0, len = 0}} ret = -1 len = 32761 parsed_uri_ok_bak = <optimized out> free_new_uri = 1 msg_uri_bak = {s = 0x0, len = 0} dst_uri_bak = {s = 0x7ff9529525f0 "sip:10.52.1.13:5081", len = 19} dst_uri_backed_up = 1 path_bak = {s = 0x0, len = 0} free_path = 1 instance_bak = {s = 0x0, len = 0} --Type <RET> for more, q to quit, c to continue without paging--p * send_info free_instance = 1 ruid_bak = {s = 0x0, len = 0} free_ruid = 1 ua_bak = {s = 0x0, len = 0} free_ua = 1 backup_route_type = <optimized out> test_dst = <optimized out> fwd_snd_flags_bak = <optimized out> rpl_snd_flags_bak = <optimized out> force_send_socket_bak = <optimized out> dst = <optimized out> ctx = {rec_lev = 0, run_flags = 0, last_retcode = 0, jmp_env = {{__jmpbuf = {140708809049006, 768409604, 140708809015412, 94701867372793, 1921917952, 6984915224974434560, 3, 140700351496192}, __mask_was_saved = 0, __saved_mask = {__val = {94701868763746, 94701868763680, 973, 94701868544004, 94701867582511, 2293776, 16, 3, 140732520460976, 140732520460968, 140708809099440, 94701868763680, 140732520460960, 0, 94701866710460, 140700390001264}}}}} bctx = <optimized out> keng = <optimized out> __func__ = "prepare_new_uac" #4 0x00007ff951fc1e8b in add_uac (t=t@entry=0x7ff764d20880, request=request@entry=0x7ff9529548b0, uri=<optimized out>, next_hop=<optimized out>, path=<optimized out>, proxy=proxy@entry=0x0, fsocket=<optimized out>, snd_flags=..., proto=<optimized out>, flags=<optimized out>, instance=<optimized out>, ruid=<optimized out>, location_ua=<optimized out>) at ./src/modules/tm/t_fwd.c:819 ret = <optimized out> branch = 0 __func__ = "add_uac" #5 0x00007ff951fc9b4d in t_forward_nonack (t=0x7ff764d20880, p_msg=p_msg@entry=0x7ff9529548b0, proxy=proxy@entry=0x0, proto=proto@entry=0) at ./src/modules/tm/t_fwd.c:1695 branch_ret = <optimized out> lowest_ret = -1 added_branches = 0 first_branch = 0 i = <optimized out> t_invite = <optimized out> success_branch = <optimized out> try_new = 1 lock_replies = <optimized out> backup_bflags = 0 --Type <RET> for more, q to quit, c to continue without paging--list obranch = {uri = {s = 0x7ffed7e3f4d8 "", len = 2132277275}, dst_uri = {s = 0x56217f3bf004 "core", len = 1556351488}, q = 5, path = {s = 0x7ff75cc40a00 "e\030\026}", len = 0}, flags = 3000082688, force_socket = 0x0, ruid = {s = 0x40 <error: Cannot access memory at address 0x40>, len = 117}, instance = {s = 0x56217f40d29f "core: core/xavp.c", len = 2134634500}, location_ua = {s = 0x60ef6763b2d1a100 <error: Cannot access memory at address 0x60ef6763b2d1a100>, len = 1544235120}, otcpid = 2133658182} __func__ = "t_forward_nonack" #6 0x00007ff951fb8be0 in t_relay_to (p_msg=0x7ff9529548b0, proxy=0x0, proto=0, replicate=replicate@entry=0) at ./src/modules/tm/t_funcs.c:355 ret = 0 new_tran = <optimized out> t = 0x7ff764d20880 dst = {send_sock = 0x56217f3bf004, to = {s = {sa_family = 64847, sa_data = "\021\177!V\000\000\000\000\000\000\000\000\000"}, sin = {sin_family = 64847, sin_port = 32529, sin_addr = {s_addr = 22049}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 64847, sin6_port = 32529, sin6_flowinfo = 22049, sin6_addr = {__in6_u = { __u6_addr8 = "\000\000\000\000\000\000\000\000\020\000\000\000\000\000\000", __u6_addr16 = {0, 0, 0, 0, 16, 0, 0, 0}, __u6_addr32 = {0, 0, 16, 0}}}, sin6_scope_id = 2134634500}, sas = {ss_family = 64847, __ss_padding = "\021\177!V\000\000\000\000\000\000\000\000\000\000\020\000\000\000\000\000\000\000\004\360;\177!V\000\000\260p=\177!V\000\000\004\360;\177!V\000\000@u=\177!V\000\000>\030\000\000\a\000\000\000\305\023\000\000\000\000\000\000\b\304u\200!V\000\000\n\000\000\000\367\177", '\000' <repeats 26 times>, "x1\026O\371\177\000", __ss_align = 35184372088832}}, id = -1294884608, send_flags = {f = 26467, blst_imask = 24815}, proto = 0 '\000', proto_pad0 = 0 '\000', proto_pad1 = 0} port = <optimized out> host = {s = 0x7ff764d20880 "p\271\236Z\367\177", len = -1294884608} comp = <optimized out> __func__ = "t_relay_to" #7 0x00007ff95203127f in _w_t_relay_to (p_msg=0x7ff9529548b0, proxy=0x0, force_proto=0) at ./src/modules/tm/tm.c:1859 t = <optimized out> res = <optimized out> __func__ = "_w_t_relay_to" #8 0x00007ff94f1985c2 in sr_kemi_lua_exec_func_ex (L=L@entry=0x56218060cb30, ket=ket@entry=0x7ff952085420 <tm_kemi_exports>, pdelta=pdelta@entry=0) at ./src/modules/app_lua/app_lua_api.c:981 i = <optimized out> argc = <optimized out> ret = <optimized out> fname = 0x7ff952085430 <tm_kemi_exports+16> mname = 0x7ff952085420 <tm_kemi_exports> vps = {{vtype = 2, v = {n = -2140581112, l = 94701888295688, s = {s = 0x562180695308 "ksr_manage_reply", len = 16}, dict = 0x562180695308}}, {vtype = 0, v = {n = 0, l = 0, s = { s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}, {vtype = 0, v = {n = 0, l = 0, s = {s = 0x0, len = 0}, dict = 0x0}}} env_L = 0x7ff94f1c4820 <_sr_L_env> ```
It does not seem to be the output of the commands I gave, but a truncated part of bt full.
Here you go:
``` (gdb) frame 0 #0 via_builder (len=len@entry=0x7ffed7e3ecb8, msg=msg@entry=0x7ff9529548b0, send_info=send_info@entry=0x7ff764d20bb0, branch=branch@entry=0x7ffed7e3ecc0, extra_params=0x0, hp=hp@entry=0x7ffed7e3ebb0) at core/msg_translator.c:2910 2910 in core/msg_translator.c (gdb) p send_info $1 = (struct dest_info *) 0x7ff764d20bb0 (gdb) p * send_info $2 = {send_sock = 0x7ff9528f0e10, to = {s = {sa_family = 2, sa_data = "\023\331\n4\001\r\000\000\000\000\000\000\000"}, sin = {sin_family = 2, sin_port = 55571, sin_addr = { s_addr = 218182666}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 2, sin6_port = 55571, sin6_flowinfo = 218182666, sin6_addr = {__in6_u = { __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 2, __ss_padding = "\023\331\n4\001\r", '\000' <repeats 111 times>, __ss_align = 0}}, id = 0, send_flags = {f = 4, blst_imask = 0}, proto = 1 '\001', proto_pad0 = 0 '\000', proto_pad1 = 0} (gdb) list 2905 in core/msg_translator.c ```
Get also:
``` p *send_info-> send_sock ```
``` (gdb) p *send_info->send_sock $3 = {socket = 9, gindex = 0, name = {s = 0x7ff9528f1020 "10.206.0.9", len = 10}, address = {af = 2, len = 4, u = {addrl = {151047690, 0}, addr32 = {151047690, 0, 0, 0}, addr16 = { 52746, 2304, 0, 0, 0, 0, 0, 0}, addr = "\n\316\000\t", '\000' <repeats 11 times>}}, address_str = {s = 0x7ff952934160 "10.206.0.9", len = 10}, port_no_str = { s = 0x7ff9529340e0 "5061", len = 4}, flags = SI_IS_IP, su = {s = {sa_family = 2, sa_data = "\023\305\n\316\000\t\000\000\000\000\000\000\000"}, sin = {sin_family = 2, sin_port = 50451, sin_addr = {s_addr = 151047690}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 2, sin6_port = 50451, sin6_flowinfo = 151047690, sin6_addr = { __in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 2, __ss_padding = "\023\305\n\316\000\t", '\000' <repeats 111 times>, __ss_align = 0}}, next = 0x0, prev = 0x7ff9528f08e0, port_no = 5061, proto = 1 '\001', proto_pad0 = 0 '\000', proto_pad1 = 0, sock_str = {s = 0x7ff952934330 "udp:10.206.0.9:5061", len = 19}, addr_info_lst = 0x0, workers = 0, workers_tcpidx = 0, sockname = {s = 0x0, len = 0}, useinfo = { name = {s = 0x0, len = 0}, proto = 0, af = 0, port_no = 0, port_pad = 0, port_no_str = {s = 0x0, len = 0}, address_str = {s = 0x0, len = 0}, address = {af = 0, len = 0, u = { addrl = {0, 0}, addr32 = {0, 0, 0, 0}, addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, addr = '\000' <repeats 15 times>}}, sock_str = {s = 0x0, len = 0}}, mcast = {s = 0x0, len = 0}} ```
The structure is ok, so it should be no reason to crash, maybe the core was overwritten, do you have one core per pid (or core pattern per process) enabled? It may happen that many cores are generated and the one that was the reason for the crash gets overwritten by the others generated as a side effect.
We have core pattern so a static 8 children for tcp and for udp listeners.
So, if the core file is not named only `core`, do you get many of them or only one? What is the format for the core file name?
Only one. Here’s the format: %t.%e.%s.core 173648827.kamailio.11.core.26
26 is a udp listener in this case.
How is the .26 suffix added because it is not in the pattern? Also, there are other processes which are not udp listeners.
You should add the pid of the process to be the part of the core file name, in that way it is ensured each process writes core in its own file.
Ok I added this and will report back when the new core dump is initiated.
Had another crash with the same exact output from gdb as before. The template is `%t.%e.%s.%p.core`
The filename is `1737592676.kamailio.11.24.core` 24 is a `udp receiver child=4`.
Anything else I can obtain from the core dump that might point to the cause? I do see a new log line produced right before the crash this time referring to `process_lumps()`:
``` ERROR 2025-01-23T00:36:51.270454249Z [resource.labels.containerName: kamailio] 1(21) CRITICAL: {1 1 SUBSCRIBE 0_1452069404@10.58.1.190} <core> [core/msg_translator.c:1546]: process_lumps(): null bind_address ERROR 2025-01-23T00:38:28.146835276Z [resource.labels.containerName: kamailio] 45(65) CRITICAL: <core> [core/pass_fd.c:281]: receive_fd(): EOF on 18 ERROR 2025-01-23T00:38:31.474909193Z [resource.labels.containerName: kamailio] 0(1) ALERT: <core> [main.c:805]: handle_sigs(): child process 24 exited by a signal 11 ERROR 2025-01-23T00:38:31.474968814Z [resource.labels.containerName: kamailio] 0(1) ALERT: <core> [main.c:809]: handle_sigs(): core was generated ```
Hopefully this helps.
The log:
``` ERROR 2025-01-23T00:36:51.270454249Z [resource.labels.containerName: kamailio] 1(21) CRITICAL: {1 1 SUBSCRIBE 0_1452069404@10.58.1.190} <core> [core/msg_translator.c:1546]: process_lumps(): null bind_address ```
is indeed out of common, like kamailio not listening on a socket. There were a couple of places in code, I pushed a commit to give more details. Can you test with latest branch 5.8 and paste again the logs when it happens again?
Ok. Does the 5.8.5 version have your changes?
The commit 13ef1a559 is part of 5.8.5, yes.
Thanks will get it loaded up and report back!
Here's the latest output of bt full for a segfault for pid 45:
``` GNU gdb (Debian 13.1-3) 13.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/kamailio... Reading symbols from /usr/lib/debug/.build-id/20/dc8be8c0aa02eba62b84db3c283789f068fb79.debug...
warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing [New LWP 45] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `kamailio -DD -E -m 8000 -M 512 -f /etc/kamailio/kamailio.cfg -P /var/run/kamail'. Program terminated with signal SIGSEGV, Segmentation fault. #0 timer_list_expire (slow_mark=<optimized out>, slow_l=<optimized out>, h=0x7f7b0b66ee20, t=<optimized out>) at core/timer.c:850 850 core/timer.c: No such file or directory. (gdb) bt full #0 timer_list_expire (slow_mark=<optimized out>, slow_l=<optimized out>, h=0x7f7b0b66ee20, t=<optimized out>) at core/timer.c:850 tl = 0x7f7b1987cd00 ret = <optimized out> tl = <optimized out> ret = <optimized out> #1 timer_handler () at core/timer.c:925 saved_ticks = <optimized out> run_slow_timer = 0 i = <optimized out> saved_ticks = <optimized out> run_slow_timer = <optimized out> i = <optimized out> __func__ = "timer_handler" __llevel = <optimized out> __kld = <optimized out> __llevel = <optimized out> __kld = <optimized out> #2 timer_main () at core/timer.c:963 No locals. #3 0x000055b4df9cf0c1 in main_loop () at ./src/main.c:1933 i = <optimized out> pid = <optimized out> si = <optimized out> si_desc = "udp receiver child=7 sock=10.142.0.56:5061\000(35.196.53.87:8060)\000\377\001\000\000\000\000\000\000\000\2145d#}\177\000\000\001\000\000\000\000\000\000\000\t\000\000\000\000\000\000\000Sun Jan 26 20:51:02 2025\n\000y#}\177\000" nrprocs = <optimized out> woneinit = 1 __func__ = "main_loop" error = <optimized out> #4 0x000055b4df9c353c in main (argc=<optimized out>, argv=<optimized out>) at ./src/main.c:3257 cfg_stream = <optimized out> c = <optimized out> r = <optimized out> tmp = 0x7ffd877e54f4 "" ```
Here's the bt full for pid 1:
``` GNU gdb (Debian 13.1-3) 13.1 Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: https://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/.
For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/sbin/kamailio... Reading symbols from /usr/lib/debug/.build-id/20/dc8be8c0aa02eba62b84db3c283789f068fb79.debug...
warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing [New LWP 1] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `kamailio -DD -E -m 8000 -M 512 -f /etc/kamailio/kamailio.cfg -P /var/run/kamail'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __GI_abort () at ./stdlib/abort.c:107 107 ./stdlib/abort.c: No such file or directory. (gdb) bt full #0 __GI_abort () at ./stdlib/abort.c:107 act = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {18446744073709551615, 0 <repeats 15 times>}}, sa_flags = 0, sa_restorer = 0x0} #1 0x000055b4df9bcbf5 in sig_alarm_abort (signo=<optimized out>) at ./src/main.c:717 __func__ = "sig_alarm_abort" #2 <signal handler called> No locals. #3 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:37 No locals. #4 0x000055b4dfbe96ca in futex_get (lock=0x7f7b0b62ad20) at core/mem/../futexlock.h:108 v = <optimized out> i = <optimized out> v = <optimized out> i = <optimized out> retry = <optimized out> #5 timer_del_safe (tl=0x7f7b0cece950) at core/timer.c:696 ret = -1 again = <optimized out> __func__ = "timer_del_safe" #6 0x00007f7d02de8917 in unlink_timers (t=0x7f7b0cece650) at ./src/modules/tm/timer.h:367 i = <optimized out> i = <optimized out> #7 free_cell_helper (dead_cell=0x7f7b0cece650, silent=silent@entry=1, fname=fname@entry=0x7f7d02eb23fa "h_table.c", fline=fline@entry=470) at ./src/modules/tm/h_table.c:151 b = <optimized out> i = <optimized out> rpl = <optimized out> tt = <optimized out> foo = <optimized out> cbs = <optimized out> cbs_tmp = <optimized out> __func__ = "free_cell_helper" #8 0x00007f7d02deb602 in free_hash_table () at ./src/modules/tm/h_table.c:470 p_cell = <optimized out> tmp_cell = 0x7f7b0b685e40 i = 21 ```
These are the only logs I see worth noting before the crash: ``` ERROR 2025-01-27T18:30:05.205685846Z [resource.labels.containerName: kamailio] 0(1) ALERT: <core> [main.c:806]: handle_sigs(): child process 45 exited by a signal 11 ERROR 2025-01-27T18:30:05.205713122Z [resource.labels.containerName: kamailio] 0(1) ALERT: <core> [main.c:810]: handle_sigs(): core was generated ERROR 2025-01-27T18:31:34.797944246Z [resource.labels.containerName: kamailio] 0(1) CRITICAL: <core> [main.c:716]: sig_alarm_abort(): shutdown timeout triggered, dying... ```
Hey there just checking on this one.
Thanks!
Just bumping this. Do we need any further info on this one in order to dig deeper?
Thanks!
jrozhon left a comment (kamailio/kamailio#4102)
Adding myself with a same issue. Ready to provide what is necessary to debug this.