Yes. The crash was happening when using subst() or subst_body() inside of event_route[tm:local-request].
From: Daniel-Constantin Mierla [mailto:miconda@gmail.com]
Sent: Monday, August 7, 2017 12:55 PM
To: Cody Herzog <CHerzog@IntouchHealth.com>; Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Subject: Re: [SR-Users] Core crash in del_lump() "offset exceeds message size" when using subst() inside tm:local-request route.
Were you using the subst() in event_route[...]?
Cheers,
Daniel
On 07.08.17 19:47, Cody Herzog wrote:
Thanks for the response.
I will make a pull request soon.
>are you using msg_apply_changes()?
No. I tried to use it, but it does not seem to be allowed in the tm:local-request route.
>do you receive traffic over tcp/tls?
Yes. We are using TLS.
Thanks.
-Cody
From: Daniel-Constantin Mierla [mailto:miconda@gmail.com]
Sent: Monday, August 7, 2017 12:15 AM
To: Cody Herzog <CHerzog@IntouchHealth.com>; Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>
Subject: Re: [SR-Users] Core crash in del_lump() "offset exceeds message size" when using subst() inside tm:local-request route.
Hello,
can you make a pull request with the patch you did to terminate the buffer with '\0'? It is a valid fix if there are situations when the buffer is not zero-terminated.
To figure out when it is not zero-terminated, if you don't know already, in order to analyze if it has impact on other part of code, few questions:
- are you using msg_apply_changes()?
- do you receive traffic over tcp/tls?
Cheers,
Daniel
On 31.07.17 19:43, Cody Herzog wrote:
Hello.
We are running Kamailio 4.3.5 on Ubuntu 14.04 LTS 64-bit.
Sorry, I should have included that information in my first email.
I have done more debugging, and I believe the problem is caused by the subst_run() function being used on a message buffer which is not null terminated at msg->len.
The subst_f() and subst_body_f() functions both call subst_run(), and most of the time, everything seems fine. However, in our use case, the msg->buf does not always seem to be null terminated at msg->len.
In other words: msg->buf[msg->len] != '\0'
It seems there is sometimes some extra data in the buffer beyond msg->len, and subst_run() may abort() if it ever moves beyond msg->len.
I found a few places in textops.c where a buffer is being temporarily null terminated in order to perform a regexec () search, then restored later, as in:
body = hf->body;
c = body.s[body.len];
body.s[body.len] = '\0';
ret = regexec((regex_t*) re, body.s, 1, &pmatch, 0);
body.s[body.len] = c;
I've tried making the same type of change to subst_f() and subst_body_f(), and now I am not seeing the crash.
If I guarantee that msg->buf[msg->len] == '\0' when subst_run() is called, then everything seems to be OK.
What do you think? Is that a valid change to make?
Let me know if you'd like me to make a pull request on the main branch with those changes.
Thanks very much.
-Cody
From: Daniel-Constantin Mierla [mailto:miconda@gmail.com]
Sent: Sunday, July 30, 2017 11:12 PM
To: Kamailio (SER) - Users Mailing List <sr-users@lists.kamailio.org>; Cody Herzog <CHerzog@IntouchHealth.com>
Subject: Re: [SR-Users] Core crash in del_lump() "offset exceeds message size" when using subst() inside tm:local-request route.
Hello,
what's the version of Kamailio and the operating system you run on?
Cheers,
Daniel
On 28.07.17 23:53, Cody Herzog wrote:
Hello.
Inside the tm:local-request route, I am trying to modify the body of outgoing NOTIFY messages using the subst_body() function.
Things seem to work fine most of the time. The NOTIFY messages are correctly modified. However, I will very rarely hit a crash.
I can force the crash to happen more quickly by stress testing with thousands of NOTIFY messages being modified.
Is it safe to call subst() or subst_body() inside the tm:local-request route? If not, is there another way I can modify the outgoing NOTIFY messages safely?
Note that I am also calling append_hf() inside the same tm:local-request route, but that has been working for a very long time without causing any problems. Could there be some kind of bad interaction between append_hf() and subst_body()?
Strangely, it seems like using subst() makes the crash more likely to happen. For some reason, subst_body() seems to be more robust.
Here are the logs I see:
---------
Jul 28 10:57:40 SIPCOMM-VEGAS-TEST /usr/local/sbin/kamailio[2476]: CRITICAL: <core> [data_lump.c:292]: del_lump(): offset exceeds message size (1625 > 1194) aborting...
Jul 28 10:57:43 SIPCOMM-VEGAS-TEST /usr/local/sbin/kamailio[2481]: CRITICAL: <core> [pass_fd.c:275]: receive_fd(): EOF on 53
Jul 28 10:57:44 SIPCOMM-VEGAS-TEST /usr/local/sbin/kamailio[2367]: ALERT: <core> [main.c:731]: handle_sigs(): child process 2476 exited by a signal 6
Jul 28 10:57:44 SIPCOMM-VEGAS-TEST /usr/local/sbin/kamailio[2367]: ALERT: <core> [main.c:734]: handle_sigs(): core was generated
Jul 28 10:57:44 SIPCOMM-VEGAS-TEST /usr/local/sbin/kamailio[2367]: INFO: <core> [main.c:756]: handle_sigs(): terminating due to SIGCHLD
---------
Here is a GDB back trace from the core dump:
---------
#0 0x00007fd91a567c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007fd91a56b028 in __GI_abort () at abort.c:89
#2 0x000000000044e981 in del_lump (msg=0x7fd917c5ed60 <lreq>, offset=1625, len=20, type=HDR_OTHER_T) at data_lump.c:293
#3 0x00007fd91676297f in subst_f (msg=0x7fd917c5ed60 <lreq>, subst=0x7fd919e5ccc0 "\320\356\345\031\331\177", ignored=0x0) at textops.c:681
#4 0x000000000041e1dd in do_action (h=0x7ffd364b87c0, a=0x7fd919e9b438, msg=0x7fd917c5ed60 <lreq>) at action.c:1059
#5 0x000000000042a917 in run_actions (h=0x7ffd364b87c0, a=0x7fd919e9b438, msg=0x7fd917c5ed60 <lreq>) at action.c:1548
#6 0x000000000042af94 in run_actions_safe (h=0x7ffd364ba5b0, a=0x7fd919e9b438, msg=0x7fd917c5ed60 <lreq>) at action.c:1613
#7 0x0000000000549972 in rval_get_int (h=0x7ffd364ba5b0, msg=0x7fd917c5ed60 <lreq>, i=0x7ffd364b8b58, rv=0x7fd919e9ced8, cache=0x0) at rvalue.c:912
#8 0x000000000054dc7c in rval_expr_eval_int (h=0x7ffd364ba5b0, msg=0x7fd917c5ed60 <lreq>, res=0x7ffd364b8b58, rve=0x7fd919e9ced0) at rvalue.c:1910
#9 0x000000000041dc52 in do_action (h=0x7ffd364ba5b0, a=0x7fd919e9d990, msg=0x7fd917c5ed60 <lreq>) at action.c:1029
#10 0x000000000042a917 in run_actions (h=0x7ffd364ba5b0, a=0x7fd919e9d990, msg=0x7fd917c5ed60 <lreq>) at action.c:1548
#11 0x000000000041e0c4 in do_action (h=0x7ffd364ba5b0, a=0x7fd919e9dad8, msg=0x7fd917c5ed60 <lreq>) at action.c:1044
#12 0x000000000042a917 in run_actions (h=0x7ffd364ba5b0, a=0x7fd919e973b8, msg=0x7fd917c5ed60 <lreq>) at action.c:1548
#13 0x000000000041ab46 in do_action (h=0x7ffd364ba5b0, a=0x7fd919e95d70, msg=0x7fd917c5ed60 <lreq>) at action.c:677
#14 0x000000000042a917 in run_actions (h=0x7ffd364ba5b0, a=0x7fd919e8f330, msg=0x7fd917c5ed60 <lreq>) at action.c:1548
#15 0x000000000041e0c4 in do_action (h=0x7ffd364ba5b0, a=0x7fd919e95eb8, msg=0x7fd917c5ed60 <lreq>) at action.c:1044
#16 0x000000000042a917 in run_actions (h=0x7ffd364ba5b0, a=0x7fd919e8c8e0, msg=0x7fd917c5ed60 <lreq>) at action.c:1548
#17 0x000000000042b07d in run_top_route (a=0x7fd919e8c8e0, msg=0x7fd917c5ed60 <lreq>, c=0x0) at action.c:1634
#18 0x00007fd917a0539a in t_uac_prepare (uac_r=0x7ffd364baf20, dst_req=0x7ffd364ba8f0, dst_cell=0x7ffd364ba8f8) at uac.c:391
#19 0x00007fd917a07945 in t_uac_with_ids (uac_r=0x7ffd364baf20, ret_index=0x0, ret_label=0x0) at uac.c:599
#20 0x00007fd917a07918 in t_uac (uac_r=0x7ffd364baf20) at uac.c:584
#21 0x00007fd917a09b9e in req_within (uac_r=0x7ffd364baf20) at uac.c:802
#22 0x00007fd914674052 in send_notify_request (subs=0x7ffd364bb200, watcher_subs=0x0, n_body=0x0, force_null_body=0) at notify.c:1600
#23 0x00007fd9146754b3 in notify (subs=0x7ffd364bb200, watcher_subs=0x0, n_body=0x0, force_null_body=0) at notify.c:1690
#24 0x00007fd9146cec6a in update_subscription (msg=0x7fd919efeb70, subs=0x7ffd364bb200, to_tag_gen=1, sent_reply=0x7ffd364bb16c) at subscribe.c:697
#25 0x00007fd9146d45b7 in handle_subscribe (msg=0x7fd919efeb70, watcher_user=..., watcher_domain=...) at subscribe.c:1057
#26 0x00007fd9146d08b0 in handle_subscribe0 (msg=0x7fd919efeb70) at subscribe.c:816
#27 0x000000000041e155 in do_action (h=0x7ffd364bd680, a=0x7fd919e19508, msg=0x7fd919efeb70) at action.c:1053
#28 0x000000000042a917 in run_actions (h=0x7ffd364bd680, a=0x7fd919e19508, msg=0x7fd919efeb70) at action.c:1548
#29 0x000000000041e112 in do_action (h=0x7ffd364bd680, a=0x7fd919e19650, msg=0x7fd919efeb70) at action.c:1048
#30 0x000000000042a917 in run_actions (h=0x7ffd364bd680, a=0x7fd919e02638, msg=0x7fd919efeb70) at action.c:1548
#31 0x000000000041e0c4 in do_action (h=0x7ffd364bd680, a=0x7fd919e199c0, msg=0x7fd919efeb70) at action.c:1044
#32 0x000000000042a917 in run_actions (h=0x7ffd364bd680, a=0x7fd919e199c0, msg=0x7fd919efeb70) at action.c:1548
#33 0x000000000041e112 in do_action (h=0x7ffd364bd680, a=0x7fd919e19b08, msg=0x7fd919efeb70) at action.c:1048
#34 0x000000000042a917 in run_actions (h=0x7ffd364bd680, a=0x7fd919df99d8, msg=0x7fd919efeb70) at action.c:1548
#35 0x000000000041ab46 in do_action (h=0x7ffd364bd680, a=0x7fd919dbe960, msg=0x7fd919efeb70) at action.c:677
#36 0x000000000042a917 in run_actions (h=0x7ffd364bd680, a=0x7fd919daedf8, msg=0x7fd919efeb70) at action.c:1548
#37 0x000000000042b07d in run_top_route (a=0x7fd919daedf8, msg=0x7fd919efeb70, c=0x0) at action.c:1634
#38 0x000000000050f678 in receive_msg (buf=0x7fd8f2c6ce28 "SUBSCRIBE sip:endpoint-1720@intouchstaging.com SIP/2.0\r\nRecord-Route: <sip:64.64.203.109:443;transport=tls;lr=on>\r\nAccept: application/dialog-info+xml, application/pidf+xml, application/pidf-diff+xml,"..., len=1284, rcv_info=0x7fd8f2c6cb50) at receive.c:196
#39 0x00000000005fb99e in receive_tcp_msg (tcpbuf=0x7fd8f2c6ce28 "SUBSCRIBE sip:endpoint-1720@intouchstaging.com SIP/2.0\r\nRecord-Route: <sip:64.64.203.109:443;transport=tls;lr=on>\r\nAccept: application/dialog-info+xml, application/pidf+xml, application/pidf-diff+xml,"..., len=1284, rcv_info=0x7fd8f2c6cb50, con=0x7fd8f2c6cb38) at tcp_read.c:1207
#40 0x00000000005fd48c in tcp_read_req (con=0x7fd8f2c6cb38, bytes_read=0x7ffd364bdb7c, read_flags=0x7ffd364bdb84) at tcp_read.c:1411
#41 0x0000000000600aee in handle_io (fm=0x7fd919f07d80, events=1, idx=-1) at tcp_read.c:1643
#42 0x00000000005f3ab8 in io_wait_loop_epoll (h=0xa3d3e0 <io_w>, t=2, repeat=0) at io_wait.h:1061
#43 0x0000000000601cf7 in tcp_receive_loop (unix_sock=57) at tcp_read.c:1755
#44 0x00000000005ea235 in tcp_init_children () at tcp_main.c:4788
#45 0x00000000004ab048 in main_loop () at main.c:1675
#46 0x00000000004b10c2 in main (argc=13, argv=0x7ffd364be258) at main.c:2566
---------
Thanks very much.
-Cody
_______________________________________________Kamailio (SER) - Users Mailing Listsr-users@lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
--Daniel-Constantin Mierlawww.twitter.com/miconda -- www.linkedin.com/in/micondaKamailio Advanced Training - www.asipto.comKamailio World Conference - www.kamailioworld.com
--Daniel-Constantin Mierlawww.twitter.com/miconda -- www.linkedin.com/in/micondaKamailio Advanced Training - www.asipto.comKamailio World Conference - www.kamailioworld.com
--
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - www.asipto.com
Kamailio World Conference - www.kamailioworld.com