<!-- Kamailio Project uses GitHub Issues only for bugs in the code or feature requests. Please use this template only for bug reports.
If you have questions about using Kamailio or related to its configuration file, ask on sr-users mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
If you have questions about developing extensions to Kamailio or its existing C code, ask on sr-dev mailing list:
* http://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Please try to fill this template as much as possible for any issue. It helps the developers to troubleshoot the issue.
If there is no content to be filled in a section, the entire section can be removed.
You can delete the comments from the template sections when filling.
You can delete next line and everything above before submitting (it is a comment). -->
### Description
<!-- Explain what you did, what you expected to happen, and what actually happened. --> I am using keepalive module to send keepalives to all the destinations registered with an application server which sits behind kamailio. When Kamailio recieves 200 Ok. I take URI contact from 200 OK and pass it as a param to ka_add_destination function.Most of the time it works fine but sometimes kamailio crashes when it recieves 200 OK of OPTIONS. ### Troubleshooting
#### Reproduction I am able to reproduce it easily in below scenario : All the URIs which we are feeding to KA module has this syntax : sip:2003@10.50.8.11:9090;alias=10.50.8.1~5060~1;pb-ip=35.185.177.49;pb-pt=5060;tp=udp
1. So request will be generate by KA module and it will be send to 10.50.8.11:9090 ( which is another port of same kamailio instance ) via UDP protocol. 2. On recieving OPTION on 9090 port , - We are calling handle_ruri_alias() function which will set $du =sip:10.50.8.1:5060;transport=udp - Modify RURI from sip:2003@10.50.8.11:9090;alias=10.50.8.1~5060~1;pb-ip=35.185.177.49;pb-pt=5060;tp=udp to sip:2003@**35.185.177.49:5060**;pb-ip=35.185.177.49;pb-pt=5060;**transport**=udp - and relay to 10.50.8.1 3. 10.50.8.1 is another instance of Kamailio which will relay OPTION packet to final destination
<!-- If the issue can be reproduced, describe how it can be done. -->
#### Debugging Data
<!-- If you got a core dump, use gdb to extract troubleshooting data - full backtrace, local variables and the list of the code at the issue location.
gdb /path/to/kamailio /path/to/corefile bt full info locals list
If you are familiar with gdb, feel free to attach more of what you consider to be relevant. -->
``` [root@SBC-4-2 /]# gdb kamailio core.67771 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 67771] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/local/sbin/kamailio -m 5000 -M 500 -P /run/kamailio/kamailio.pid'. Program terminated with signal 11, Segmentation fault. #0 0x00000000abcdefed in ?? () Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libcurl-7.29.0-51.el7.x86_64 libevent-2.0.21-4.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-12.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 libuuid-2.23.2-52.el7_5.1.x86_64 mariadb-libs-5.5.64-1.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-5.el7_5.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.el7_5.x86_64 openldap-2.4.44-15.el7_5.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt full #0 0x00000000abcdefed in ?? () No symbol table info available. #1 0x00007f931dc69c7e in ka_options_callback (t=0x7f91e58f05c0, type=1024, ps=0x7ffda0977bd0) at keepalive_core.c:126 uri = { s = 0x7f91e58f63f0 "sip:2003@10.50.8.11:9090;alias=10.50.8.1~5060~1;pb-ip=35.185.177.49;pb-pt=5060;tp=udp>\r\nFrom: sip:keepalive@fromsbc;tag=c7bfd876be500fff196414658218fc27-d6989f6d\r\nCSeq: 10 OPTIONS\r\nCall-ID: 3f2c3c7c"..., len = 85} msg = 0x0 state = 2 state_routes = {0x7f931dc6f844 "", 0x7f931dc6f93d "keepalive:dst-up", 0x7f931dc6f94e "keepalive:dst-down"} ka_dest = 0x7f91e5898988 __FUNCTION__ = "ka_options_callback" #2 0x00007f9324bfddd8 in run_trans_callbacks_internal (cb_lst=0x7f91e58f0638, type=1024, trans=0x7f91e58f05c0, params=0x7ffda0977bd0) at t_hooks.c:258 cbp = 0x7f91e58f6590 backup_from = 0xb2a750 <def_list+16> backup_to = 0xb2a758 <def_list+24> backup_dom_from = 0xb2a760 <def_list+32> backup_dom_to = 0xb2a768 <def_list+40> backup_uri_from = 0xb2a740 <def_list> backup_uri_to = 0xb2a748 <def_list+8> backup_xavps = 0xb29fd0 <_xavp_list_head> backup_xavus = 0xb29fd8 <_xavu_list_head> backup_xavis = 0xb29fe0 <_xavi_list_head> __FUNCTION__ = "run_trans_callbacks_internal" #3 0x00007f9324bfdf0a in run_trans_callbacks (type=1024, trans=0x7f91e58f05c0, req=0x0, rpl=0xffffffffffffffff, code=408) at t_hooks.c:285 params = {req = 0x0, rpl = 0xffffffffffffffff, param = 0x7f91e58f65a0, code = 408, flags = 0, branch = 0, t_rbuf = 0x0, dst = 0x0, send_buf = {s = 0x0, len = 0}} #4 0x00007f9324b943f1 in local_reply (t=0x7f91e58f05c0, p_msg=0xffffffffffffffff, branch=0, msg_status=408, cancel_data=0x7ffda0977d50) at t_reply.c:2265 local_store = 0 local_winner = 0 reply_status = RPS_COMPLETED winning_msg = 0xffffffffffffffff winning_code = 408 totag_retr = 0 ---Type <return> to continue, or q <return> to quit--- __FUNCTION__ = "local_reply" #5 0x00007f9324bcc9fd in fake_reply (t=0x7f91e58f05c0, branch=0, code=408) at timer.c:290 cancel_data = {cancel_bitmap = 0, reason = {cause = 0, u = {text = {s = 0x0, len = -443611712}, e2e_cancel = 0x0, packed_hdrs = {s = 0x0, len = -443611712}}}} do_cancel_branch = 0 reply_status = 67771 #6 0x00007f9324bcce95 in final_response_handler (r_buf=0x7f91e58f0860, t=0x7f91e58f05c0) at timer.c:462 silent = 0 branch_ret = 0 prev_branch = 0 now = 0 #7 0x00007f9324bccf56 in retr_buf_handler (ticks=112894435, tl=0x7f91e58f0880, p=0xfa0) at timer.c:518 rbuf = 0x7f91e58f0860 fr_remainder = 1 retr_remainder = 32657 retr_interval = 3847088576 new_retr_interval_ms = 6989250208 crt_retr_interval_ms = 140264594074048 t = 0x7f91e58f05c0 __FUNCTION__ = "retr_buf_handler" #8 0x00000000004b7bcd in slow_timer_main () at core/timer.c:1105 n = 12 ret = 4294967295 tl = 0x7f91e58f0880 i = 700 __FUNCTION__ = "slow_timer_main" #9 0x000000000042a4c3 in main_loop () at main.c:1737 i = 16 pid = 0 si = 0x0 si_desc = "udp receiver child=15 sock=10.50.8.11:9090\000:0:0:11]:5060\000\061:5060)\000\000\000\000\000\000\000\000\000\000F\345\221\177\000\000\000\000\000\000\000\000\000\000\bLv\345\221\177\000\000P\201\227\240\375\177\000\000\210\212K\000\000\000\000\000\260\270A\000\000\000\000\000\b{\017(\223\177\000" nrprocs = 16 woneinit = 1 ---Type <return> to continue, or q <return> to quit--- __FUNCTION__ = "main_loop" #10 0x0000000000433a66 in main (argc=7, argv=0x7ffda09786b8) at main.c:2856 cfg_stream = 0x27e7010 c = -1 r = 0 tmp = 0x7ffda0979f26 "" tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x7d2498 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 791137056 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x7f934718ea00 <intel_02_known> p = 0x0 st = {st_dev = 23, st_ino = 30676, st_nlink = 2, st_mode = 16877, st_uid = 0, st_gid = 5001, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1597841333, tv_nsec = 176009291}, st_mtim = {tv_sec = 1597841333, tv_nsec = 166009247}, st_ctim = {tv_sec = 1597841333, tv_nsec = 176009291}, __unused = {0, 0, 0}} tbuf = "\377\377\377\377", '\000' <repeats 12 times>, "\340\263\001G\223\177\000\000\310T\320G\223\177", '\000' <repeats 90 times>, "p\342\252\000\000\000\000\000\260\270A\000\000\000\000\000\260\206\227\240\375\177", '\000' <repeats 26 times>, "\036_\260G\223\177\000\000\001", '\000' <repeats 23 times>... option_index = 0 long_options = {{name = 0x7d468f "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x7cfc94 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x7d4694 "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x7d469a "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x7d46a0 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x7d46a9 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x7d46b3 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x7d46bd "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x7d46c8 "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x7d46d1 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x7d46dc "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __FUNCTION__ = "main" (gdb)
```
#### Log Messages
<!-- Check the syslog file and if there are relevant log messages printed by Kamailio, add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` Aug 19 12:58:26 SBC-4-2 /usr/local/sbin/kamailio[67772]: CRITICAL: <core> [core/mem/q_malloc.c:138]: qm_debug_check_frag(): BUG: qm: fragm. 0x7f91e5898950 (address 0x7f91e5898988) end overwritten (c0c0c0c1, abcdefed)! Memory allocator was called from core: core/usr_avp.c:626. Fragment marked by core: core/usr_avp.c:175. Exec from core/mem/q_malloc.c:511. Aug 19 12:58:38 SBC-4-2 /usr/local/sbin/kamailio[67793]: CRITICAL: <core> [core/pass_fd.c:277]: receive_fd(): EOF on 94 ```
#### SIP Traffic
<!-- If the issue is exposed by processing specific SIP messages, grab them with ngrep or save in a pcap file, then add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
``` (paste your sip traffic here) ```
### Possible Solutions
<!-- If you found a solution or workaround for the issue, describe it. Ideally, provide a pull request with a fix. -->
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` [root@SBC-4-2 /]# kamailio -v version: kamailio 5.4.0 (x86_64/linux) 6c4fce flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 6c4fce compiled on 14:16:47 Aug 14 2020 with gcc 4.8.5 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `uname -a`) -->
``` [root@SBC-4-2 /]# uname -a Linux SBC-4-2.netcarrier.net 5.1.0 #1 SMP Mon May 6 13:44:38 GMT 2019 x86_64 x86_64 x86_64 GNU/Linux ```
Do you use `ka_add_destination()` in kamailio configuration file? Or have you developed a module that uses internally this function?
Because looking at the code, when using the function in config, is not setting any state change callback function, which seems to be the one missing at the top of the backtrace (by looking what is at keepalive_core.c:126).
I have used ka_add_destination in config file.
Hi, @miconda So it a quick fix or complex one ? Can i expect it to get fixed in next stable version ?
I am not the developer of the module, I just looked quickly in the code and it seemed that it should not execute the callback that caused the crash if the function was used in the config. That's why I asked if you use it inside a custom module.
If I get some spare time, I will look again, but I cannot say if it is going to be fixed before the next minor release in 5.4 -- this is expected to happen soon, maximum 1-2 weeks and I expect to be quite busy getting Kamailio World online event organized.
The original developer changed the job and is no longer active, iirc, but there were other devs contributing to this module lately, maybe they can have a look as well -- pinging @luismartingil for it.
@miconda , Thanks i understand.
Supposing you still have the core file, can you get the output of next commands executed inside gdb:
``` frame 1 p * ka_dest ```
Hi,
(gdb) frame 1 #1 0x00007ff5bbb2ddd8 in run_trans_callbacks_internal (cb_lst=0x7ff47c9f86a8, type=1024, trans=0x7ff47c9f8630, params=0x7ffdb5b643e0) at t_hooks.c:258 258 cbp->callback( trans, type, params ); (gdb) p * ka_dest No symbol "ka_dest" in current context. (gdb)
On Sun, Aug 30, 2020 at 9:31 PM Daniel-Constantin Mierla < notifications@github.com> wrote:
Supposing you still have the core file, can you get the output of next commands executed inside gdb:
frame 1 p * ka_dest
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/2448#issuecomment-683437869, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALMYVGVDFZR7G6BWCN5XXC3SDJZUTANCNFSM4QE75I5Q .
In the first backtrace, the frame 1 was different:
``` #1 0x00007f931dc69c7e in ka_options_callback (t=0x7f91e58f05c0, type=1024, ps=0x7ffda0977bd0) at keepalive_core.c:126 ```
Did you get another core file? I need the output of `p *ka_dest` in the frame showing execution of `ka_options_callback()`. Get also the output of `list`.
@NGSegovia FYI
I have generated fresh core dump :
``` GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 77082] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/local/sbin/kamailio -m 5000 -M 500 -P /run/kamailio/kamailio.pid'. Program terminated with signal 11, Segmentation fault. #0 0x00007fee0197dc7c in ka_options_callback (t=0x7fecc97f8450, type=1024, ps=0x7ffc051e10f0) at keepalive_core.c:126 126 ka_dest->statechanged_clb(&ka_dest->uri, state, ka_dest->user_attr); Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libcurl-7.29.0-51.el7.x86_64 libevent-2.0.21-4.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-12.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 libuuid-2.23.2-52.el7_5.1.x86_64 mariadb-libs-5.5.64-1.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-5.el7_5.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.el7_5.x86_64 openldap-2.4.44-15.el7_5.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt full #0 0x00007fee0197dc7c in ka_options_callback (t=0x7fecc97f8450, type=1024, ps=0x7ffc051e10f0) at keepalive_core.c:126 uri = { s = 0x7fecc97fa612 "sip:1006@10.50.8.11:9090;alias=10.50.8.1~5060~1;pb-ip=43.228.229.106;pb-pt=39524;tp=tcp>\r\nFrom: sip:keepalive@fromsbc;tag=c7bfd876be500fff196414658218fc27-ab6132c4\r\nCSeq: 10 OPTIONS\r\nCall-ID: 7d20b9"..., len = 87} msg = 0x0 state = 1 state_routes = {0x7fee01983844 "", 0x7fee0198393d "keepalive:dst-up", 0x7fee0198394e "keepalive:dst-down"} ka_dest = 0x7fecc98b1350 __FUNCTION__ = "ka_options_callback" #1 0x00007fee08911dd8 in run_trans_callbacks_internal (cb_lst=0x7fecc97f84c8, type=1024, trans=0x7fecc97f8450, params=0x7ffc051e10f0) at t_hooks.c:258 cbp = 0x7fecc97fa7b8 backup_from = 0xb2a750 <def_list+16> backup_to = 0xb2a758 <def_list+24> backup_dom_from = 0xb2a760 <def_list+32> backup_dom_to = 0xb2a768 <def_list+40> backup_uri_from = 0xb2a740 <def_list> backup_uri_to = 0xb2a748 <def_list+8> backup_xavps = 0xb29fd0 <_xavp_list_head> backup_xavus = 0xb29fd8 <_xavu_list_head> backup_xavis = 0xb29fe0 <_xavi_list_head> __FUNCTION__ = "run_trans_callbacks_internal" #2 0x00007fee08911f0a in run_trans_callbacks (type=1024, trans=0x7fecc97f8450, req=0x0, rpl=0x7fee0be2d230, code=200) at t_hooks.c:285 params = {req = 0x0, rpl = 0x7fee0be2d230, param = 0x7fecc97fa7c8, code = 200, flags = 0, branch = 0, t_rbuf = 0x0, dst = 0x0, send_buf = { s = 0x0, len = 0}} #3 0x00007fee088a83f1 in local_reply (t=0x7fecc97f8450, p_msg=0x7fee0be2d230, branch=0, msg_status=200, cancel_data=0x7ffc051e14e0) at t_reply.c:2265 local_store = 0 local_winner = 0 reply_status = RPS_COMPLETED winning_msg = 0x7fee0be2d230 winning_code = 200 totag_retr = 0 __FUNCTION__ = "local_reply" #4 0x00007fee088ab28d in reply_received (p_msg=0x7fee0be2d230) at t_reply.c:2648 ---Type <return> to continue, or q <return> to quit--- msg_status = 200 last_uac_status = 0 ack = 0x5adbb1 <sr_event_exec+415> "\211E\374\213E\374\351\270\002" ack_len = 198791384 branch = 0 reply_status = 198791440 onreply_route = 0 cancel_data = {cancel_bitmap = 0, reason = {cause = 200, u = {text = {s = 0x0, len = 11322688}, e2e_cancel = 0x0, packed_hdrs = {s = 0x0, len = 11322688}}}} uac = 0x7fecc97f86e0 t = 0x7fecc97f8450 lack_dst = {send_sock = 0xacc673 <buf.7133+723>, to = {s = {sa_family = 50496, sa_data = "\254\000\000\000\000\000@\025\036\005\374\177\000"}, sin = {sin_family = 50496, sin_port = 172, sin_addr = {s_addr = 0}, sin_zero = "@\025\036\005\374\177\000"}, sin6 = {sin6_family = 50496, sin6_port = 172, sin6_flowinfo = 0, sin6_addr = {__in6_u = { __u6_addr8 = "@\025\036\005\374\177\000\000\240\327o\000\000\000\000", __u6_addr16 = {5440, 1310, 32764, 0, 55200, 111, 0, 0}, __u6_addr32 = {85857600, 32764, 7329696, 0}}}, sin6_scope_id = 85857600}, sas = {ss_family = 50496, __ss_padding = "\254\000\000\000\000\000@\025\036\005\374\177\000\000\240\327o\000\000\000\000\000@\025\036\005\374\177\000\000l=s\000\000\000\000\000\220N\205\000\000\000\000\000\006-\001\000\207\000\000\000dr\205\000\000\000\000\000x\365\332\v\356\177\000\000sƬ\000\000\000\000\000\066Ŭ\000\000\000\000\000\000\025\036\005\374\177\000\000\020Q\331\v\356\177\000\000sƬ\000\000\000\000\000\060Ŭ\000\000\000\000", __ss_align = 140720394343680}}, id = 0, send_flags = {f = 0, blst_imask = 0}, proto = -73 '\267', proto_pad0 = -60 '\304', proto_pad1 = 172} backup_user_from = 0x533d48 <get_send_socket2+38> backup_user_to = 0x52be18 <init_su+941> backup_domain_from = 0x3ce0c0 backup_domain_to = 0x737024 <get_hdr_field+6416> backup_uri_from = 0x16 backup_uri_to = 0x8 backup_xavps = 0x7ffc051e16f0 backup_xavus = 0x0 backup_xavis = 0x0 replies_locked = 1 branch_ret = 524288000 prev_branch = 0 blst_503_timeout = 7582578 hf = 0x854e90 ---Type <return> to continue, or q <return> to quit--- onsend_params = {req = 0x7ffc051e1540, rpl = 0x6c640e <qm_malloc+2479>, param = 0x854e90, code = 7562879, flags = 0, branch = 0, t_rbuf = 0x3cd9f8, dst = 0x12d06, send_buf = {s = 0x41b8b0 <_start> "1\355I\211\321^H\211\342H\203\344\360PTI\307\300p\367|", len = 85862032}} ctx = {rec_lev = 0, run_flags = 0, last_retcode = 1, jmp_env = {{__jmpbuf = {77062, -1461319875154594989, 8736400, 7140757, 140720394343312, 140660373053456, 140660377843008, 140660373053456}, __mask_was_saved = 85857232, __saved_mask = {__val = {140660210063967, 140660373061248, 0, 524288000, 7312777, 3989744, 5422768, 5455176, 8, 140660377843104, 140660377843088, 140720394343424, 5954481, 0, 21483460608, 656}}}}} bctx = 0x7fee0be2d230 keng = 0x0 ret = 0 evname = {s = 0x7fee08944dcc "on_sl_reply", len = 11} __FUNCTION__ = "reply_received" #5 0x000000000053bac3 in do_forward_reply (msg=0x7fee0be2d230, mode=0) at core/forward.c:757 new_buf = 0x0 dst = {send_sock = 0x0, to = {s = {sa_family = 0, sa_data = '\000' <repeats 13 times>}, sin = {sin_family = 0, sin_port = 0, sin_addr = { s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = { __in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}}, id = 0, send_flags = {f = 0, blst_imask = 0}, proto = 0 '\000', proto_pad0 = 0 '\000', proto_pad1 = 0} new_len = 1 r = 1 ip = {af = 4307120, len = 0, u = {addrl = {77062, 4259919}, addr32 = {77062, 0, 4259919, 0}, addr16 = {11526, 1, 0, 0, 79, 65, 0, 0}, addr = "\006-\001\000\000\000\000\000O\000A\000\000\000\000"}} s = 0x7fee0be2da60 "" len = 32764 __FUNCTION__ = "do_forward_reply" #6 0x000000000053d795 in forward_reply (msg=0x7fee0be2d230) at core/forward.c:858 No locals. #7 0x00000000005c70a4 in receive_msg ( buf=0xacc3a0 <buf.7133> "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 198.136.226.11:5060;received=10.50.8.11;branch=z9hG4bK6e61.3b138063", '0' <repeats 24 times>, ".0\r\nRecord-Route: sip:198.136.226.1:5060;lr\r\nTo: <sip:1006@10.50.8.11:9090"..., len=723, rcv_info=0x7ffc051e1e10) at core/receive.c:509 msg = 0x7fee0be2d230 ctx = {rec_lev = 0, run_flags = 0, last_retcode = 0, jmp_env = {{__jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, __saved_mask = { __val = {0 <repeats 13 times>, 77062, 140660911038800, 126}}}}} ---Type <return> to continue, or q <return> to quit--- bctx = 0x0 ret = -1 tvb = {tv_sec = 0, tv_usec = 0} tve = {tv_sec = 0, tv_usec = 0} diff = 0 inb = { s = 0xacc3a0 <buf.7133> "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 198.136.226.11:5060;received=10.50.8.11;branch=z9hG4bK6e61.3b138063", '0' <repeats 24 times>, ".0\r\nRecord-Route: sip:198.136.226.1:5060;lr\r\nTo: <sip:1006@10.50.8.11:9090"..., len = 723} netinfo = {data = {s = 0x0, len = 0}, rcv = 0x0, dst = 0x0} keng = 0x0 evp = {data = 0x7ffc051e1940, obuf = {s = 0x0, len = 0}, rcv = 0x7ffc051e1e10, dst = 0x0, req = 0x0, rpl = 0x0, rplcode = 0, mode = 0} cidlockidx = 0 cidlockset = 0 errsipmsg = 0 exectime = 0 __FUNCTION__ = "receive_msg" #8 0x000000000048c09e in udp_rcv_loop () at core/udp_server.c:543 len = 723 buf = "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 198.136.226.11:5060;received=10.50.8.11;branch=z9hG4bK6e61.3b138063", '0' <repeats 24 times>, ".0\r\nRecord-Route: sip:198.136.226.1:5060;lr\r\nTo: <sip:1006@10.50.8.11:9090"... tmp = 0x7fecc91ae520 "" fromaddr = 0x7fee0bdcb8a8 fromaddrlen = 16 rcvi = {src_ip = {af = 2, len = 4, u = {addrl = {185086474, 0}, addr32 = {185086474, 0, 0, 0}, addr16 = {12810, 2824, 0, 0, 0, 0, 0, 0}, addr = "\n2\b\v", '\000' <repeats 11 times>}}, dst_ip = {af = 2, len = 4, u = {addrl = {185086474, 0}, addr32 = {185086474, 0, 0, 0}, addr16 = {12810, 2824, 0, 0, 0, 0, 0, 0}, addr = "\n2\b\v", '\000' <repeats 11 times>}}, src_port = 9090, dst_port = 5060, proto_reserved1 = 0, proto_reserved2 = 0, src_su = {s = {sa_family = 2, sa_data = "#\202\n2\b\v\000\000\000\000\000\000\000"}, sin = { sin_family = 2, sin_port = 33315, sin_addr = {s_addr = 185086474}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = { sin6_family = 2, sin6_port = 33315, sin6_flowinfo = 185086474, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 2, __ss_padding = "#\202\n2\b\v", '\000' <repeats 111 times>, __ss_align = 0}}, bind_address = 0x7fee0b9b2d88, proto = 1 '\001', proto_pad0 = 0 '\000', proto_pad1 = 0} evp = {data = 0x0, obuf = {s = 0x0, len = 0}, rcv = 0x0, dst = 0x0, req = 0x0, rpl = 0x0, rplcode = 0, mode = 0} printbuf = "\000\000\000\000\r\000\000\000\220\356\234\v\356\177", '\000' <repeats 90 times>, "L\206G\311\354\177\000\000\260\270A\000\000\000\000\000\220&\036\005\374\177", '\000' <repeats 18 times>, "`\035\036\005\374\177\000\000XxT", '\000' <repeats 93 times>... ---Type <return> to continue, or q <return> to quit--- i = -1 j = 112640 l = 1 __FUNCTION__ = "udp_rcv_loop" #9 0x0000000000429d17 in main_loop () at main.c:1683 i = 12 pid = 0 si = 0x7fee0b9b2d88 si_desc = "udp receiver child=12 sock=10.50.8.11:5060 (198.136.226.11:5060)\000\000\000\000\000\000\000\000\000@\027\311\354\177\000\000\002\000\000\000\000\000\000\000\200\262G\311\354\177\000\000\060!\036\005\374\177\000\000\210\212K\000\000\000\000\000\260\270A\000\000\000\000\000h\372\340\v\356\177\000" nrprocs = 16 woneinit = 1 __FUNCTION__ = "main_loop" #10 0x0000000000433a66 in main (argc=7, argv=0x7ffc051e2698) at main.c:2856 cfg_stream = 0x127d010 c = -1 r = 0 tmp = 0x7ffc051e3f26 "" tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x7d2498 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 2273675507 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x0 p = 0x0 st = {st_dev = 23, st_ino = 30676, st_nlink = 2, st_mode = 16877, st_uid = 0, st_gid = 5001, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1598866491, tv_nsec = 369804494}, st_mtim = {tv_sec = 1598866491, ---Type <return> to continue, or q <return> to quit--- tv_nsec = 360804455}, st_ctim = {tv_sec = 1598866491, tv_nsec = 369804494}, __unused = {0, 0, 0}} tbuf = '\000' <repeats 88 times>, "p\342\252\000\000\000\000\000\260\270A\000\000\000\000\000\220&\036\005\374\177", '\000' <repeats 26 times>, "\036\237\201+\356\177\000\000\001", '\000' <repeats 23 times>, "\340\363\322*\356\177\000\000`&\036\005\374\177\000\000*\033\202+\356\177\000\000\034\000\000\000\000\000\000\000\000"... option_index = 0 long_options = {{name = 0x7d468f "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x7cfc94 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x7d4694 "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x7d469a "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x7d46a0 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x7d46a9 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x7d46b3 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x7d46bd "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x7d46c8 "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x7d46d1 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x7d46dc "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __FUNCTION__ = "main" (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) (gdb) frame 0 #0 0x00007fee0197dc7c in ka_options_callback (t=0x7fecc97f8450, type=1024, ps=0x7ffc051e10f0) at keepalive_core.c:126 126 ka_dest->statechanged_clb(&ka_dest->uri, state, ka_dest->user_attr); (gdb) p *ka_dest $1 = {uri = {s = 0x113c98b000d <Address 0x113c98b000d out of bounds>, len = 0}, owner = {s = 0x7fecc98b1380 "uuid", len = 4}, flags = -913632379, state = 32748, last_checked = 54, last_up = 4779817840673387893, last_down = 1598866510, counter = 0, ping_interval = 1316440388, user_attr = 0x597855444f304544, statechanged_clb = 0x4d695657596b686a, response_clb = 0x4e786b7a4e30516a, sock = 0x4f7a4d324d355132, ip_address = { af = 3031364, len = 0, u = {addrl = {3233857728, 2882400237}, addr32 = {3233857728, 0, 2882400237, 0}, addr16 = {49344, 49344, 0, 0, 61421, 43981, 0, 0}, addr = "\300\300\300\300\000\000\000\000\355\357ͫ\000\000\000"}}, port = 0, proto = 0, timer = 0x7fecc98b1580, next = 0x70} (gdb) ```
Can you make a minimal kamailio config to reproduce this issue? The uri field is compromised, maybe due to a free/destroy of the keepalive structure. Are you adding/removing ka destinations on the fly/during runtime?
I'd say the reporter is adding the destination from another module, and not from .cfg. `statechanged_clb` should be null (and not run) when the destination is added through cfg.
@NGSegovia - it is what I asked in my first comments, based on what I saw in the code.
But the last backtrace shows also URI field being invalid, so maybe the ka structure is compromised/overwritten -- like already freed/destroyed while the keepalive was ongoing, and another process got the memory zone and wrote there. That's why I asked to see if the destination is eventually removed from the KA list. I am not the developer of the module, but does not look like there is a protection for this case (provided that destinations can be removed at any time)...
In principle, there is a lock used for destination list: https://github.com/kamailio/kamailio/blob/master/src/modules/keepalive/keepa... and when freeing the destination the timer is being stopped, https://github.com/kamailio/kamailio/blob/master/src/modules/keepalive/keepa...
mmm, I wonder why URI seems corrupted, but didn't get to corrupt "owner" in the backtrace.
The timer for keepalive is stopped, but here it seems to come a pointer which was stored inside sip transaction structure and used when the transaction completes (on 408 timeout or 200ok response). If between the options and final response the ka destination structure is destroyed, then the tm callback will provide a pointer to an address that can be used for something else at that moment.
In such case it is better to store in transaction a unique id for ka destination structure and on tm transaction callback to search and see if it is found, locking the structure while is updated.
@miconda , Yes i am adding and removing on the fly. I am adding(using ka_add_destination) to KA list on successful registration and removing(using ka_del_destination) when registration expires. ```
#####################KEEPALIVE####################################### modparam("keepalive", "ping_interval", 20) modparam("keepalive", "delete_counter", 5) modparam("keepalive", "ping_from", "sip:keepalive@fromsbc")
request_route {
##ON RECIEVING OPTIONS FROM KA MODULE ON 9090 PORT, Set $du using handle_ruri_alias , replace tp with transport and relay it to edge proxy from which we recieved register packet if ($Rp == "9090" && is_method("OPTIONS")){ if (t_newtran()) { if (handle_ruri_alias()){ $avp(request_uri)= $ru ; $avp(pb-ip)=$(avp(request_uri){param.value,pb-ip}); if (is_ipv6($avp(pb-ip))) { if (subst_uri('/^sip:(.*)@(.*);pb-ip=(.*);pb-pt=(.*);tp=(.*)$/sip:\1@[\3]:\4;pb-ip=\3;pb-pt=\4;transport=\5/i')){ xlog("L_INFO","(KA OPTIONS) : ($avp(uuid)) : Destination URI <$du> created from alias."); t_relay(); }
} else { if (subst_uri('/^sip:(.*)@(.*);pb-ip=(.*);pb-pt=(.*);tp=(.*)$/sip:\1@\3:\4;pb-ip=\3;pb-pt=\4;transport=\5/i')){ xlog("L_INFO","(KA OPTIONS) : ($avp(uuid)) : Destination URI <$du> created from alias."); t_relay(); }
}
}
exit; }
$du=sip:APPLICATION_SERVER_IP:5060; ##IP of application server ( Freeswitch) to which kamailio will relay register packet add_contact_alias(); t_on_reply("reply_register") if (!t_relay()) { sl_reply_error(); } exit; } onreply_route[reply_register ]{ if ((status=~"200") && $rm=="REGISTER" && $(avp(c_ct){param.value,expires}) != 0) { ## On receiving fresh registration (i.e with expires !=0) , Kamailio will register a contact address for that registration in KA module ping registry. $avp(ka_contact) = 'sip:'+$fU+'@LOCAL_IP:9090;alias='+$(avp(c_ct){param.value,alias})+';pb-ip='+$(avp(c_ct){param.value,pb-ip})+';pb-pt='+$(avp(c_ct){param.value,pb-pt})+';tp='+$(avp(c_ct){param.value,transport}{s.rm,>}); ka_add_destination("$avp(ka_contact)","config"); }
if ((status=~"200") && $rm=="REGISTER" && $(avp(c_ct){param.value,expires}) == 0) {
$avp(ka_contact) = 'sip:'+$fU+'@LOCAL_IP:9090;alias='+$(avp(c_ct){param.value,alias})+';pb-ip='+$(avp(c_ct){param.value,pb-ip})+';pb-pt='+$(avp(c_ct){param.value,pb-pt})+';tp='+$(avp(c_ct){param.value,transport}{s.rm,>}); ka_del_destination("$avp(ka_contact)","config"); }
} ```
@NGSegovia , I am adding destinations from CFG file.
@NGSegovia - not related to the issue itself, but is there any specific reason to do this complex logic for keepalives? If it is for registrations, the usrloc module in kamailio v5.4.x can do keepalive (storing also the round trip time).
Otherwise, the issue seems to be related to keeping the pointer to keepalive destinations structure in the transaction and can be destroyed until transaction completes. If no other devel looks at it, I will try to get familiar with the code of the module and come up with a fix after Kamailio World Online event.
I preferred this module over usrloc module because i dont store registrations on kamailio. It is stored on application server behind Kamailio. So it was easy for me to use keepalive module to send KA instead of usrloc.
@miconda the problem you pointed is the same pointed in this message, right? https://github.com/kamailio/kamailio/blob/master/src/modules/keepalive/keepa...
BTW I did some code, hopefully fixing this, but I didn't cleaned-up and test properly yet, but you can take a look. https://github.com/NGSegovia/kamailio/tree/keepalive_fix_race_condition_tm_r...
@NGSegovia , The changes you made has fixed the issue. Kamailio service is running since last 2 hours and I am not able to reproduce issue. Thanks so much.
@miconda , I understand you are busy with Kamailio world, Is there any chances for this fix to be available in next stable release ?
@NGSegovia - yes, that kind of problem.
The approach in the fix is correct, just wondering if worth adding dependency on external libuuid library instead of using the internal core/utils/sruid.{c,h}, like it is done for other modules. I planned to extend the internal sruid code to be able to use libuuid via the `uuid` module, can be something to consider in this development cycle.
By using the internal sruid backporting will be smooth, because there is no extra dependency, so it does not break packaging (debs, rpms, ... stay the same).
Not a problem @miconda , updated.
@NGSegovia , I tested latest changes under https://github.com/NGSegovia/kamailio/commit/aaded162a6523642873703f6b52e191...
Now it is crashing again but from different place : ``` (gdb) bt full #0 0x00007f3153cd82cc in __strncpy_ssse3 () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f312a7d5132 in ka_str_copy (src=0x7f2ff2441b38, dest=0x7f2ff2416380, prefix=0x0) at keepalive_core.c:203 lp = 0 __FUNCTION__ = "ka_str_copy" #2 0x00007f312a7d2d19 in ka_check_timer (ticks=931009473, tl=0x7f2ff2416618, param=0x7f2ff2416360) at keepalive_core.c:71 ka_dest = 0x7f2ff2416360 ka_ping_method = {s = 0x7f312a7d9740 "OPTIONS", len = 7} ka_outbound_proxy = {s = 0x0, len = 0} uac_r = {method = 0x140b788ad20, headers = 0x7f2ff24162c8, body = 0x0, ssock = 0x0, ssockname = 0x0, dialog = 0x0, cb_flags = 1024, cb = 0x7f312a7d3174 <ka_options_callback>, cbp = 0x7f2ff243f378, callid = 0x0} __FUNCTION__ = "ka_check_timer" uuid = 0x7f2ff2441b38 #3 0x00000000004c3517 in slow_timer_main () at core/timer.c:1105 n = 12 ret = 320 tl = 0x7f2ff2416618 i = 1 __FUNCTION__ = "slow_timer_main" #4 0x000000000042a5f3 in main_loop () at main.c:1737 i = 16 pid = 0 si = 0x0 si_desc = "udp receiver child=15 sock=10.50.8.11:9090\000:0:0:11]:5060\000\061:5060)\000\000\000\000\000\000\000\000\000\220\374\361/\177\000\000\002\000\000\000\000\000\000\000\360\004-\362/\177\000\000\060\260\210\267\375\177\000\000\322CL\000\000\000\000\000\340\271A\000\000\000\000\000\250=\306\064\061\177\000" nrprocs = 16 woneinit = 1 __FUNCTION__ = "main_loop" #5 0x0000000000433ba5 in main (argc=7, argv=0x7ffdb788b598) at main.c:2861 cfg_stream = 0x27e1010 c = -1 r = 0 tmp = 0x7ffdb788bf26 "" tmp_len = 0 ---Type <return> to continue, or q <return> to quit--- port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x7cfaa0 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 862902793 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x0 p = 0x0 st = {st_dev = 23, st_ino = 30676, st_nlink = 2, st_mode = 16877, st_uid = 0, st_gid = 5001, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1599034463, tv_nsec = 259945261}, st_mtim = {tv_sec = 1599034433, tv_nsec = 160818732}, st_ctim = {tv_sec = 1599034463, tv_nsec = 259945261}, __unused = {0, 0, 0}} tbuf = '\000' <repeats 88 times>, "p\262\252\000\000\000\000\000\340\271A\000\000\000\000\000\220\265\210\267\375\177", '\000' <repeats 26 times>, "\036\017gT1\177\000\000\001", '\000' <repeats 23 times>, "\340c\270S1\177\000\000`\265\210\267\375\177\000\000*\213gT1\177\000\000\034\000\000\000\000\000\000\000"... option_index = 0 long_options = {{name = 0x7d1c8f "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x7cd2a1 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x7d1c94 "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x7d1c9a "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x7d1ca0 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x7d1ca9 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x7d1cb3 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x7d1cbd "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x7d1cc8 "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x7d1cd1 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x7d1cdc "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __FUNCTION__ = "main"
```
@sagarmalam I have updated the PR, please check again.
Thanks!
@NGSegovia , It is working fine now. Thanks a lot.
Question : As a developer do you think this module can handle load of sending keepalives to 50 K destinations every 20 seconds ? or you suggest to use ursloc module instead for such a load ?
No idea @sagarmalam
ok. Thanks. I will share my results.
Hello Again,
I was able to reproduce again. It crashed from the same point and it seems that frequency of this issue has reduced but it is not fixed yet. As per my observation, it happens when entry of destination from KA list is removed during an ongoing KA transaction. I am removing destination from KA list when endpoint de registers so this case is possible. I have also observed same on pcaps. Below are the screenshots :
OPTIONS : 
REGISTER : 
Kamailio has dispatched 200 OK for register at 06:33:58.187520 and it received 200 OK for OPTIONS at 06:33:58.411501. Kamailio crashed exactly at a same time.
Core Dump : ``` GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/sbin/kamailio...done. [New LWP 12701] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/local/sbin/kamailio -m 5000 -M 500 -P /run/kamailio/kamailio.pid'. Program terminated with signal 11, Segmentation fault. #0 0x00007f09c138dc7c in ka_options_callback (t=0x7f0896cc1530, type=1024, ps=0x7ffdb22e9d20) at keepalive_core.c:126 126 ka_dest->statechanged_clb(&ka_dest->uri, state, ka_dest->user_attr); Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-222.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libcurl-7.29.0-51.el7.x86_64 libevent-2.0.21-4.el7.x86_64 libgcc-4.8.5-36.el7_6.2.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.5-12.el7.x86_64 libssh2-1.4.3-10.el7_2.1.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 libuuid-2.23.2-52.el7_5.1.x86_64 mariadb-libs-5.5.60-1.el7_5.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-5.el7_5.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.el7_5.x86_64 openldap-2.4.44-15.el7_5.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64 (gdb) bt full #0 0x00007f09c138dc7c in ka_options_callback (t=0x7f0896cc1530, type=1024, ps=0x7ffdb22e9d20) at keepalive_core.c:126 uri = { s = 0x7f0896cc39b7 "sip:7459@10.50.8.6:9090;alias=10.50.8.1~5060~1;pb-ip=174.198.221.83;pb-pt=3939;tp=tcp>\r\nFrom: sip:keepalive@fromsbc;tag=ee3b67b160621ee354865fd8162b5b29-1b81a45f\r\nCSeq: 10 OPTIONS\r\nCall-ID: 2557bb53"..., len = 85} msg = 0x0 state = 1 state_routes = {0x7f09c1393844 "", 0x7f09c139393d "keepalive:dst-up", 0x7f09c139394e "keepalive:dst-down"} ka_dest = 0x7f088ac83048 __FUNCTION__ = "ka_options_callback" #1 0x00007f09c8321dd8 in run_trans_callbacks_internal (cb_lst=0x7f0896cc15a8, type=1024, trans=0x7f0896cc1530, params=0x7ffdb22e9d20) at t_hooks.c:258 cbp = 0x7f0896cc3b60 backup_from = 0xb1f170 <def_list+16> backup_to = 0xb1f178 <def_list+24> backup_dom_from = 0xb1f180 <def_list+32> backup_dom_to = 0xb1f188 <def_list+40> backup_uri_from = 0xb1f160 <def_list> backup_uri_to = 0xb1f168 <def_list+8> backup_xavps = 0xb1ea60 <_xavp_list_head> backup_xavus = 0xb1ea68 <_xavu_list_head> backup_xavis = 0xb1ea70 <_xavi_list_head> __FUNCTION__ = "run_trans_callbacks_internal" #2 0x00007f09c8321f0a in run_trans_callbacks (type=1024, trans=0x7f0896cc1530, req=0x0, rpl=0x7f09cb832030, code=200) at t_hooks.c:285 params = {req = 0x0, rpl = 0x7f09cb832030, param = 0x7f0896cc3b70, code = 200, flags = 0, branch = 0, t_rbuf = 0x0, dst = 0x0, send_buf = { s = 0x0, len = 0}} #3 0x00007f09c82b2bef in local_reply (t=0x7f0896cc1530, p_msg=0x7f09cb832030, branch=0, msg_status=200, cancel_data=0x7ffdb22ea110) at t_reply.c:2265 local_store = 0 local_winner = 0 reply_status = RPS_COMPLETED winning_msg = 0x7f09cb832030 winning_code = 200 totag_retr = 0 __FUNCTION__ = "local_reply" #4 0x00007f09c82b5a8b in reply_received (p_msg=0x7f09cb832030) at t_reply.c:2648 ---Type <return> to continue, or q <return> to quit--- msg_status = 200 last_uac_status = 0 ack = 0x5a62bc <sr_event_exec+415> "\211E\374\213E\374\351)\003" ack_len = 2989400432 branch = 0 reply_status = -881216520 onreply_route = 0 cancel_data = {cancel_bitmap = 0, reason = {cause = 200, u = {text = {s = 0x0, len = 11245581}, e2e_cancel = 0x0, packed_hdrs = {s = 0x0, len = 11245581}}}} uac = 0x7f0896cc17c0 t = 0x7f0896cc1530 lack_dst = {send_sock = 0xab9917 <buf.7136+791>, to = {s = {sa_family = 38925, sa_data = "\253\000\000\000\000\000p\241.\262\375\177\000"}, sin = {sin_family = 38925, sin_port = 171, sin_addr = {s_addr = 0}, sin_zero = "p\241.\262\375\177\000"}, sin6 = {sin6_family = 38925, sin6_port = 171, sin6_flowinfo = 0, sin6_addr = {__in6_u = {__u6_addr8 = "p\241.\262\375\177\000\000\001\344o\000\000\000\000", __u6_addr16 = {41328, 45614, 32765, 0, 58369, 111, 0, 0}, __u6_addr32 = {2989400432, 32765, 7332865, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 38925, __ss_padding = "\253\000\000\000\000\000p\241.\262\375\177\000\000\001\344o", '\000' <repeats 13 times>, "\370\263y\313\t\177\000\000\000)\205\000\000\000\000\000\375\227\253\000\207\000\000\000\324L\205\000\000\000\000\000(\325w\313\t\177\000\000\027\231\253\000\000\000\000\000\003\230\253\000\000\000\000\000\060\241.\262\375\177\000\000P\262y\313\t\177\000\000\027\231\253\000\000\000\000\000\020\226\253\000\000\000\000", __ss_align = 139680045183032}}, id = 0, send_flags = {f = 0, blst_imask = 0}, proto = -123 '\205', proto_pad0 = -105 '\227', proto_pad1 = 171} backup_user_from = 0x53c7a0 <register_module+4320> backup_user_to = 0x521ec0 <check_self_port+680> backup_domain_from = 0x3c6b40 backup_domain_to = 0x7365ed <get_hdr_field+6416> backup_uri_from = 0x19 backup_uri_to = 0x8 backup_xavps = 0x7ffdb22ea320 backup_xavus = 0x0 backup_xavis = 0x0 replies_locked = 1 branch_ret = -881246328 prev_branch = 0 blst_503_timeout = 7579963 hf = 0x852900 ---Type <return> to continue, or q <return> to quit--- onsend_params = {req = 0x7ffdb22ea170, rpl = 0x6c65c9 <qm_malloc+2479>, param = 0x852900, code = 7560264, flags = 0, branch = 0, t_rbuf = 0x3c6298, dst = 0x3192, send_buf = {s = 0x41b980 <_start> "1\355I\211\321^H\211\342H\203\344\360PTI\307\300`\313|", len = -1305562432}} ctx = {rec_lev = 0, run_flags = 0, last_retcode = 1, jmp_env = {{__jmpbuf = {12690, 139680045183568, 8726784, 7141200, 140727592853440, 139680040542224, 139680045061360, 139680040542224}, __mask_was_saved = -1305567232, __saved_mask = {__val = {139679877548472, 11245445, 0, 524288000, 7315946, 3959664, 5381872, 5490592, 8, 139680045061456, 139680045061440, 140727592853552, 5923516, 0, 21483449424, 0}}}}} bctx = 0x7f09cb832030 keng = 0x0 ret = 0 evname = {s = 0x7f09c8354ffc "on_sl_reply", len = 11} __FUNCTION__ = "reply_received" #5 0x0000000000526903 in do_forward_reply (msg=0x7f09cb832030, mode=0) at core/forward.c:757 new_buf = 0x0 dst = {send_sock = 0x0, to = {s = {sa_family = 0, sa_data = '\000' <repeats 13 times>}, sin = {sin_family = 0, sin_port = 0, sin_addr = { s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = {sin6_family = 0, sin6_port = 0, sin6_flowinfo = 0, sin6_addr = { __in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 0, __ss_padding = '\000' <repeats 117 times>, __ss_align = 0}}, id = 0, send_flags = {f = 0, blst_imask = 0}, proto = 0 '\000', proto_pad0 = 0 '\000', proto_pad1 = 0} new_len = 1 r = 1 ip = {af = 4307328, len = 0, u = {addrl = {12690, 4259919}, addr32 = {12690, 0, 4259919, 0}, addr16 = {12690, 0, 0, 0, 79, 65, 0, 0}, addr = "\222\061\000\000\000\000\000\000O\000A\000\000\000\000"}} s = 0x7f09cb832860 "" len = 0 __FUNCTION__ = "do_forward_reply" #6 0x00000000005285d5 in forward_reply (msg=0x7f09cb832030) at core/forward.c:858 No locals. #7 0x00000000005bc8c2 in receive_msg ( buf=0xab9600 <buf.7136> "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 198.136.226.6:5060;received=10.50.8.6;branch=z9hG4bK56d5.462830a6", '0' <repeats 24 times>, ".0\r\nRecord-Route: sip:198.136.226.1:5060;lr\r\nContact: <sip:100.103.40.117:38"..., len=791, rcv_info=0x7ffdb22eaa40) at core/receive.c:509 msg = 0x7f09cb832030 ctx = {rec_lev = 0, run_flags = 0, last_retcode = 0, jmp_env = {{__jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, __mask_was_saved = 0, __saved_mask = { __val = {0 <repeats 13 times>, 12690, 139680578527568, 126}}}}} ---Type <return> to continue, or q <return> to quit--- bctx = 0x0 ret = -1 tvb = {tv_sec = 0, tv_usec = 0} tve = {tv_sec = 0, tv_usec = 0} diff = 0 inb = { s = 0xab9600 <buf.7136> "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 198.136.226.6:5060;received=10.50.8.6;branch=z9hG4bK56d5.462830a6", '0' <repeats 24 times>, ".0\r\nRecord-Route: sip:198.136.226.1:5060;lr\r\nContact: <sip:100.103.40.117:38"..., len = 791} netinfo = {data = {s = 0x0, len = 0}, rcv = 0x0, dst = 0x0} keng = 0x0 evp = {data = 0x7ffdb22ea570, obuf = {s = 0x0, len = 0}, rcv = 0x7ffdb22eaa40, dst = 0x0, req = 0x0, rpl = 0x0, rplcode = 0, mode = 0} cidlockidx = 0 cidlockset = 0 errsipmsg = 0 exectime = 0 __FUNCTION__ = "receive_msg" #8 0x00000000004bb55c in udp_rcv_loop () at core/udp_server.c:543 len = 791 buf = "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP 198.136.226.6:5060;received=10.50.8.6;branch=z9hG4bK56d5.462830a6", '0' <repeats 24 times>, ".0\r\nRecord-Route: sip:198.136.226.1:5060;lr\r\nContact: <sip:100.103.40.117:38"... tmp = 0x7f0888bbe520 "" fromaddr = 0x7f09cb79d880 fromaddrlen = 16 rcvi = {src_ip = {af = 2, len = 4, u = {addrl = {101200394, 0}, addr32 = {101200394, 0, 0, 0}, addr16 = {12810, 1544, 0, 0, 0, 0, 0, 0}, addr = "\n2\b\006", '\000' <repeats 11 times>}}, dst_ip = {af = 2, len = 4, u = {addrl = {101200394, 0}, addr32 = {101200394, 0, 0, 0}, addr16 = {12810, 1544, 0, 0, 0, 0, 0, 0}, addr = "\n2\b\006", '\000' <repeats 11 times>}}, src_port = 9090, dst_port = 5060, proto_reserved1 = 0, proto_reserved2 = 0, src_su = {s = {sa_family = 2, sa_data = "#\202\n2\b\006\000\000\000\000\000\000\000"}, sin = { sin_family = 2, sin_port = 33315, sin_addr = {s_addr = 101200394}, sin_zero = "\000\000\000\000\000\000\000"}, sin6 = { sin6_family = 2, sin6_port = 33315, sin6_flowinfo = 101200394, sin6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, sin6_scope_id = 0}, sas = {ss_family = 2, __ss_padding = "#\202\n2\b\006", '\000' <repeats 111 times>, __ss_align = 0}}, bind_address = 0x7f09cb3c2d98, proto = 1 '\001', proto_pad0 = 0 '\000', proto_pad1 = 0} evp = {data = 0x0, obuf = {s = 0x0, len = 0}, rcv = 0x0, dst = 0x0, req = 0x0, rpl = 0x0, rplcode = 0, mode = 0} printbuf = "\000\000\000\000\a\000\000\000\240\356=\313\t\177", '\000' <repeats 90 times>, "Tm\032\211\b\177\000\000\200\271A\000\000\000\000\000\300\262.\262\375\177", '\000' <repeats 18 times>, "\220\251.\262\375\177\000\000\006=T", '\000' <repeats 93 times>... ---Type <return> to continue, or q <return> to quit--- i = -1 j = 112640 l = 1 __FUNCTION__ = "udp_rcv_loop" #9 0x0000000000429de7 in main_loop () at main.c:1683 i = 6 pid = 0 si = 0x7f09cb3c2d98 si_desc = "udp receiver child=6 sock=10.50.8.6:5060 (198.136.226.6:5060)\000\000\000\004\000\000\000\000\000\000\000\000@\270\210\b\177", '\000' <repeats 11 times>, "\235\032\211\b\177\000\000`\255.\262\375\177\000\000mCL\000\000\000\000\000\200\271A\000\000\000\000\000PH\201\313\t\177\000" nrprocs = 16 woneinit = 1 __FUNCTION__ = "main_loop" #10 0x0000000000433b40 in main (argc=7, argv=0x7ffdb22eb2c8) at main.c:2859 cfg_stream = 0x1cfb010 c = -1 r = 0 tmp = 0x7ffdb22ebf26 "" tmp_len = 0 port = 0 proto = 0 ahost = 0x0 aport = 0 options = 0x7cf8a0 ":f:cm:M:dVIhEeb:l:L:n:vKrRDTN:W:w:t:u:g:P:G:SQ:O:a:A:x:X:Y:" ret = -1 seed = 1260081917 rfd = 4 debug_save = 0 debug_flag = 0 dont_fork_cnt = 0 n_lst = 0x7f09ea8b2a00 <intel_02_known> p = 0x0 st = {st_dev = 23, st_ino = 36410, st_nlink = 2, st_mode = 16877, st_uid = 0, st_gid = 5001, __pad0 = 0, st_rdev = 0, st_size = 40, st_blksize = 4096, st_blocks = 0, st_atim = {tv_sec = 1599228627, tv_nsec = 605193865}, st_mtim = {tv_sec = 1599228597, tv_nsec = 485081596}, st_ctim = {tv_sec = 1599228627, tv_nsec = 605193865}, __unused = {0, 0, 0}} ---Type <return> to continue, or q <return> to quit--- tbuf = "\340\363s\352\t\177\000\000ȔB\353\t\177", '\000' <repeats 90 times>, "p\262\252\000\000\000\000\000\200\271A\000\000\000\000\000\300\262.\262\375\177", '\000' <repeats 26 times>, "\036\237"\353\t\177\000\000\001", '\000' <repeats 23 times>, "\340\363s\352\t\177\000\000\220\262.\262\375\177\000\000*"... option_index = 0 long_options = {{name = 0x7d1a8f "help", has_arg = 0, flag = 0x0, val = 104}, {name = 0x7cd0a1 "version", has_arg = 0, flag = 0x0, val = 118}, {name = 0x7d1a94 "alias", has_arg = 1, flag = 0x0, val = 1024}, {name = 0x7d1a9a "subst", has_arg = 1, flag = 0x0, val = 1025}, {name = 0x7d1aa0 "substdef", has_arg = 1, flag = 0x0, val = 1026}, {name = 0x7d1aa9 "substdefs", has_arg = 1, flag = 0x0, val = 1027}, {name = 0x7d1ab3 "server-id", has_arg = 1, flag = 0x0, val = 1028}, {name = 0x7d1abd "loadmodule", has_arg = 1, flag = 0x0, val = 1029}, {name = 0x7d1ac8 "modparam", has_arg = 1, flag = 0x0, val = 1030}, {name = 0x7d1ad1 "log-engine", has_arg = 1, flag = 0x0, val = 1031}, {name = 0x7d1adc "debug", has_arg = 1, flag = 0x0, val = 1032}, {name = 0x0, has_arg = 0, flag = 0x0, val = 0}} __FUNCTION__ = "main" ```
Thanks
Hello Please ignore above comment.Fix is working.I picked up wrong branch while testing it. Sorry for confusion. Thanks
@NGSegovia , While testing latest fixes we have found that Keepalive module is leaking memory. We are testing with 7K destinations with time interval of 20 seconds This log file has output of : kamcmd mod.stats keepalive shm for every sec : [keepalive_memory_consumption.log](https://github.com/kamailio/kamailio/files/5194119/keepalive_memory_consumpt...)
At Wed Sep 9 09:02:07 GMT 2020 it is 3 MB At Wed Sep 9 09:33:47 GMT 2020 it is 47 MB So 42 MB consumed within 30 mins and growing.
Kamailio core dumped after running for 24 hours with 5GB SHM memory. core dump : [coredump.txt](https://github.com/kamailio/kamailio/files/5194043/coredump.txt)
Kamailio logs : [kamailio.log](https://github.com/kamailio/kamailio/files/5194055/kamailio.log)
Please get back to me in case you need any more information. Thanks a lot
@sagarmalam merged PR fixing this problem. Re-open if there are still issues.
Closed #2448.
@NGSegovia , Thanks for all the efforts in fix this issue. I think i have found regression issue. KA timeout parameter is not working anymore with latest changes. It keeps on sending KA even after failing for number of times defined in ka_timeout mod param.
Hi @sagarmalam , do you mean `delete_counter` parameter? I tried and seems to work for me (The truth is that if I set it to 3 I get 4 attempts but then stops).
BTW id you set that parameter to "0" means no limit. Not sure if that is the case.
Anyway, I can't find the relation with changes done here and this parameter. Can you try again a open a new issue with more information if you still have problems?
Thanks!