Description

There is a Segmentation Fault that occurs with Kamailio 5.3.7 and onward. It does not occurs always. I am using alpine version 3.12 and it happens after lot of traffic. This does not happens with previous versions. After it crash it keeps crashing for about 5-10 times (something with the dialogs I believe).

Troubleshooting

Reproduction

Is difficult to reproduce the issue it happens after some hours runnning. I have not been able to reproduce it out of prod environment

Debugging Data

I was able to have a core dump.

BT

#0  0x00007fbcae3d01fa in dlg_onroute (req=0x7fbcae9fd408, route_params=<optimized out>, param=<optimized out>) at dlg_handlers.c:1322
#1  0x00007fbcae71e6b5 in run_rr_callbacks (req=req@entry=0x7fbcae9fd408, rr_param=rr_param@entry=0x7ffe08bf50d0) at rr_cb.c:96
#2  0x00007fbcae714e7a in after_loose (_m=0x7fbcae9fd408, preloaded=<optimized out>) at loose.c:951
#3  0x0000558669815d5e in do_action (h=h@entry=0x7ffe08bf58c0, a=a@entry=0x7fbcae964878, msg=msg@entry=0x7fbcae9fd408) at core/action.c:1077
#4  0x0000558669810bbf in run_actions (h=h@entry=0x7ffe08bf58c0, a=0x7fbcae964878, msg=0x7fbcae9fd408) at core/action.c:1576
#5  0x000055866981c5f3 in run_actions_safe (h=h@entry=0x7ffe08bf6770, a=<optimized out>, msg=<optimized out>) at core/action.c:1640
#6  0x00005586698e8b5a in rval_get_int (h=0x7ffe08bf6770, msg=<optimized out>, i=i@entry=0x7ffe08bf5b98, rv=rv@entry=0x7fbcae9649d0, cache=cache@entry=0x0) at core/rvalue.c:915
#7  0x00005586698ed9b7 in rval_expr_eval_int (h=h@entry=0x7ffe08bf6770, msg=msg@entry=0x7fbcae9fd408, res=res@entry=0x7ffe08bf5b98, rve=rve@entry=0x7fbcae9649c8) at core/rvalue.c:1913
#8  0x0000558669815a51 in do_action (h=h@entry=0x7ffe08bf6770, a=a@entry=0x7fbcae967258, msg=msg@entry=0x7fbcae9fd408) at core/action.c:1047
#9  0x0000558669810bbf in run_actions (h=h@entry=0x7ffe08bf6770, a=0x7fbcae964640, msg=msg@entry=0x7fbcae9fd408) at core/action.c:1576
#10 0x0000558669813c56 in do_action (h=h@entry=0x7ffe08bf6770, a=a@entry=0x7fbcae941968, msg=msg@entry=0x7fbcae9fd408) at core/action.c:695
#11 0x0000558669810bbf in run_actions (h=h@entry=0x7ffe08bf6770, a=a@entry=0x7fbcae93c9f8, msg=msg@entry=0x7fbcae9fd408) at core/action.c:1576
#12 0x000055866981c68b in run_top_route (a=0x7fbcae93c9f8, msg=msg@entry=0x7fbcae9fd408, c=c@entry=0x0) at core/action.c:1661
#13 0x00005586698bff2f in receive_msg (buf=buf@entry=0x558669baffa0 <buf> "ACK sip:529999999@185.X.X.X:5060 SIP/2.0\r\nVia: SIP/2.0/UDP 201.X.X.X:51620;rport;branch=z9hG4bKa229em1tD95yg\r\nRoute: <sip:34.X.X.X;lr;did=d6c.7ac2>\r\nRoute: <sip:204.X.X.X;lr;ftag="...,
    len=<optimized out>, rcv_info=rcv_info@entry=0x7ffe08bf6ab8) at core/receive.c:423
#14 0x0000558669975ad5 in udp_rcv_loop () at core/udp_server.c:554
#15 0x000055866980f36f in main_loop () at main.c:1673
#16 0x00005586698072df in main (argc=8, argv=<optimized out>) at main.c:2802

I notice that req->callid is NULL and that's what the LM_WARN instruction is trying to answer. I see that there is really a callid in the request

(gdb) p req->callid
$2 = (struct hdr_field *) 0x0

buf with address sanitized

(gdb) p req->buf
$3 = 0x558669baffa0 <buf> "ACK sip:52999999999@185.X.X.X:5060 SIP/2.0\r\nVia: SIP/2.0/UDP 201.X.X.X:51620;rport;branch=z9hG4bKa229em1tD95yg\r\nRoute: <sip:34.X.X.X;lr;did=d6c.7ac2>\r\nRoute: <sip:204.44.138.145;lr;ftag=6Ua3gyNtjcNZg;did=d6c.a82f0ab6>\r\nMax-Forwards: 69\r\nFrom: \"V7211038140XXXXX\" <sip:3325040318@201.X.X.X>;tag=6Ua3gyNtjcNZg\r\nTo: <sip:0XXXXX#529999999@34.X.X.X:5060>;tag=yTTrUSrh-D1\r\nCall-ID: 80f064e6-64dc-123a-55b9-d4bed9f99694\r\nCSeq: 38887155 ACK\r\nContact: <sip:gw+1-XXXXXX-XXXXX@201.X.X.X:51620;transport=udp;gw=1-XXXXX-XXXX>\r\nContent-Length: 0\r\n\r\n"

Log Messages

{"log":" 8(20) NOTICE: {2 1 INVITE 91a0458-d564a8c0-13c4-65014-325a3-60f3462f-325a3} acc [acc.c:279]: acc_log_request(): ACC: transaction answered: timestamp=1626827218;method=INVITE;from_tag=a8f3938-d564a8c0-13c4-65014-325a3-504c0745-325a3;to_tag=00mTqYrLtU7;call_id=91a0458-d564a8c0-13c4-65014-325a3-60f3462f-325a3;code=200;reason=OK;src_user=99999999;src_domain=34.X.X.X;src_ip=201.X.X.X;dst_ouser=9999999#529999999;dst_user=9999999#529999999;dst_domain=200.X.X.X.145\n","stream":"stderr","time":"2021-07-21T00:26:58.667367489Z"}
{"log":" 7(19) NOTICE: {2 102 INVITE 2e0da1634918d23662e1ad786be4f5ee@34.X.X.X} acc [acc.c:279]: acc_log_request(): ACC: transaction answered: timestamp=1626827218;method=INVITE;from_tag=as751a378e;to_tag=rdsCYaseahb;call_id=2e0da1634918d23662e1ad786be4f5ee@34.X.X.X;code=200;reason=OK;src_user=asterisk;src_domain=34.X.X.X;src_ip=200.X.X.X.46;dst_ouser=888888#529999999;dst_user=888888#529999999;dst_domain=200.X.X.X.145\n","stream":"stderr","time":"2021-07-21T00:26:58.969958529999999Z"}
{"log":" 1(13) NOTICE: {2 1 INVITE 919e958-d564a8c0-13c4-65014-325a1-7577499e-325a1} acc [acc.c:279]: acc_log_request(): ACC: transaction answered: timestamp=1626827219;method=INVITE;from_tag=a8e0a78-d564a8c0-13c4-65014-325a1-7a5b131c-325a1;to_tag=maap2JSSk82;call_id=919e958-d564a8c0-13c4-65014-325a1-7577499e-325a1;code=200;reason=OK;src_user=99999999;src_domain=34.X.X.X;src_ip=201.X.X.X;dst_ouser=9999999#529999999;dst_user=9999999#529999999;dst_domain=200.X.X.X.145\n","stream":"stderr","time":"2021-07-21T00:26:59.067200485Z"}
{"log":" 1(13) NOTICE: {2 102 INVITE 1f5bbcfa4052aa7c0b9e6def3529999999d7b@34.X.X.X} acc [acc.c:279]: acc_log_request(): ACC: transaction answered: timestamp=1626827219;method=INVITE;from_tag=as27988b45;to_tag=2ugyneFdK-6;call_id=1f5bbcfa4052aa7c0b9e6def3529999999d7b@34.X.X.X;code=200;reason=OK;src_user=asterisk;src_domain=34.X.X.X;src_ip=200.X.X.X.46;dst_ouser=888888#529999999;dst_user=888888#529999999;dst_domain=200.X.X.X.145\n","stream":"stderr","time":"2021-07-21T00:26:59.790695092Z"}
{"log":"15(27) CRITICAL: \u003ccore\u003e [core/pass_fd.c:277]: receive_fd(): EOF on 11\n","stream":"stderr","time":"2021-07-21T00:27:00.21590784Z"}
{"log":" 0(12) ALERT: \u003ccore\u003e [main.c:766]: handle_sigs(): child process 16 exited by a signal 11\n","stream":"stderr","time":"2021-07-21T00:27:00.21832504Z"}
{"log":" 0(12) ALERT: \u003ccore\u003e [main.c:769]: handle_sigs(): core was generated\n","stream":"stderr","time":"2021-07-21T00:27:00.218383311Z"}
{"log":" 0(12) INFO: \u003ccore\u003e [main.c:792]: handle_sigs(): terminating due to SIGCHLD\n","stream":"stderr","time":"2021-07-21T00:27:00.21839027Z"}
{"log":" 7(19) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218395693Z"}
{"log":"10(22) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218401306Z"}
{"log":"11(23) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218406783Z"}
{"log":"12(24) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218479042Z"}
{"log":" 9(21) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218598805Z"}
{"log":"15(27) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218610128Z"}
{"log":"13(25) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218629146Z"}
{"log":"14(26) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218768686Z"}
{"log":" 8(20) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.218918555Z"}
{"log":" 2(14) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.22114204Z"}
{"log":" 3(15) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.222683081Z"}
{"log":" 5(17) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.2228392Z"}
{"log":" 1(13) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.224429948Z"}
{"log":" 6(18) INFO: \u003ccore\u003e [main.c:847]: sig_usr(): signal 15 received\n","stream":"stderr","time":"2021-07-21T00:27:00.225537387Z"}
{"log":" 0(12) INFO: \u003ccore\u003e [core/sctp_core.c:53]: sctp_core_destroy(): SCTP API not initialized\n","stream":"stderr","time":"2021-07-21T00:27:00.240625585Z"}

SIP Traffic

Unfortunately I couldn't capture traffic

Possible Solutions

I went back to alpine 3.10

Additional Information

version: kamailio 5.3.7 (x86_64/linux)
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: unknown
compiled on 12:42:10 Jul 15 2021 with gcc 9.3.0

alpine 3.12 running on a container

(paste your output here)


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.