Hello,

I think you were hit by an issue solved with commit 15fc8b9c59aaf31f005e38f54d363f1e9d0a068e :

https://github.com/kamailio/kamailio/commit/15fc8b9c59aaf31f005e38f54d363f1e9d0a068e

The 4.1.3 was released before, in April 2014:

  - https://www.kamailio.org/pub/kamailio/4.1.3/README

I am not sure if it was backported to 4.1 branch, but should not be hard to backport.

The issues was with many processing handling the same transaction, which has the sip_msg in shared memory, but then parsing of some headers created pointers to private memory of the process doing the parsing. Another process coming shortly after would see the pointer in sip_msg, but it would be to another process private memory and accessing it does a seg fault as expected.

Not all headers are cloned in shared memory, like it's done for To/From, and results in this kind of processing with parsing again in private memory.

Anyhow, I didn't have much time to dig in too much, so it's mu quick guess based on the fact you run a rather old release.

Cheers,
Daniel

On 14/01/2017 17:53, Matthew Jordan wrote:

On Fri, Jan 13, 2017 at 2:57 PM, Joshua Colp <jcolp@digium.com> wrote:
Greetings all,

I'm currently investigating a crash in a Kamailio 4.1.3 deployment. The
backtrace is as follows:

(I've commented out phone numbers and IP addresses using X but have
retained the correct length for each)

(gdb) bt
#0  0x00007f017f2ac96e in memcpy () from /lib64/libc.so.6
#1  0x00007f0174e1bb50 in rc_avpair_assign () from
/usr/lib64/libradiusclient-ng.so.2
#2  0x00007f0174e1bc92 in rc_avpair_new () from
/usr/lib64/libradiusclient-ng.so.2
#3  0x00007f0174e1bdd1 in rc_avpair_add () from
/usr/lib64/libradiusclient-ng.so.2
#4  0x00007f0175026806 in acc_radius_send_request (req=0x7f0162b6c1f0,
inf=0x7fff113e9cc0) at acc_radius_mod.c:365
#5  0x00007f017522ff13 in acc_run_engines (msg=0x7f0162b6c1f0, type=0,
reset=0x0) at acc.c:957
#6  0x00007f0175239b49 in acc_onreply (t=0x7f0161a12f00,
req=0x7f0162b6c1f0, reply=0x7f017ca845c8, code=200) at acc_logic.c:485
#7  0x00007f017523a07d in tmcb_func (t=0x7f0161a12f00, type=512,
ps=0x7fff113e9ee0) at acc_logic.c:559
#8  0x00007f0177434478 in run_trans_callbacks_internal
(cb_lst=0x7f0161a12f70, type=512, trans=0x7f0161a12f00,
params=0x7fff113e9ee0) at t_hooks.c:290
#9  0x00007f017743468a in run_trans_callbacks_with_buf (type=512,
rbuf=0x7f0161a12fc0, req=0x7f0162b6c1f0, repl=0x7f017ca845c8, flags=200)
at t_hooks.c:336
#10 0x00007f0177466c06 in relay_reply (t=0x7f0161a12f00,
p_msg=0x7f017ca845c8, branch=0, msg_status=200,
cancel_data=0x7fff113ea240, do_put_on_wait=1)
    at t_reply.c:2001
#11 0x00007f01774690b7 in reply_received (p_msg=0x7f017ca845c8) at
t_reply.c:2499
#12 0x000000000045d837 in do_forward_reply (msg=0x7f017ca845c8, mode=0)
at forward.c:777
#13 0x000000000045e0f8 in forward_reply (msg=0x7f017ca845c8) at
forward.c:860
#14 0x00000000004a5887 in receive_msg (
    buf=0x9245e0 "SIP/2.0 200 OK\r\nVia: SIP/2.0/UDP
    X.XX.XX.XX:5060;rport=5060;received=X.XX.XX.XX;branch=z9hG4bK9981.5d2e8535c8eb94d43283f83230b5011e.0\r\nVia:
    SIP/2.0/UDP X.XX.XX.XX;rport=5060;branch=z9hG4bK9981.78d8b40"...,
    len=1161, rcv_info=0x7fff113ea5c0) at receive.c:273
#15 0x000000000053c838 in udp_rcv_loop () at udp_server.c:536
#16 0x000000000046d42b in main_loop () at main.c:1617
#17 0x00000000004704d3 in main (argc=11, argv=0x7fff113ea8f8) at
main.c:2533

As you can see it's crashing when interacting with the radius client.

Looking at frame 4 we can examine what is being provided to radius:

(gdb) frame 4
#4  0x00007f0175026806 in acc_radius_send_request (req=0x7f0162b6c1f0,
inf=0x7fff113e9cc0) at acc_radius_mod.c:365
365                     ADD_RAD_AVPAIR(offset+i, inf->varr[i].s,
inf->varr[i].len);
(gdb) print inf->varr[i].s
$12 = 0x0
(gdb) print inf->varr[i].len
$13 = 3
(gdb) print inf->tarr[i]
$14 = 2 '\002'

This is a TYPE_STR with a length of 3, but no actual string. Needless to
say this upsets radius. Based on the configuration of the system and the
value of 'i' this is actually for the Remote-Party-ID header:

(gdb) print i
$15 = 8
(gdb) print *rad_extra->next->next->next->next
$16 = {name = {s = 0x7f017b27fcf4 "Sip-RPid", len = 8}, spec = {type =
PVT_OTHER, getf = 0x7f0176b79883 <pv_get_rpid>, setf = 0, pvp = {pvn =
{type = 0,
        nfree = 0, u = {isname = {type = 0, name = {n = 0, s = {s = 0x0,
        len = 0}, re = 0x0}}, dname = 0x0}}, pvi = {type = 0, u = {ival
        = 0, dval = 0x0}}},
    trans = 0x0}, next = 0x7f017b3ae4b0}

Going up a frame and looking at the RPID URI on the message it matches
the bad TYPE_STR:

(gdb) frame 5
#5  0x00007f017522ff13 in acc_run_engines (msg=0x7f0162b6c1f0, type=0,
reset=0x0) at acc.c:957
957                                     e->acc_req(msg, &inf);
(gdb) print ((struct to_body*)(msg)->rpid->parsed)->uri
$17 = {s = 0x0, len = 3}

And digging into the full parsed information shows that it's not correct
at all:
(gdb) print *((struct to_body*)(msg)->rpid->parsed)
$30 = {error = 3, body = {s = 0x0, len = 11064280}, uri = {s = 0x0, len
= 3}, display = {s = 0x0, len = 11064313}, tag_value = {s = 0x0, len =
5},
  parsed_uri = {user = {s = 0x0, len = 1484108237}, passwd = {s = 0x0,
  len = 2}, host = {s = 0x0, len = 0}, port = {s = 0x0, len = 3}, params
  = {s = 0x0,
      len = 11064395}, sip_params = {s = 0x0, len = 0}, headers = {s =
      0x0, len = 8619}, port_no = 0, proto = 0, type = ERROR_URI_T,
      flags = 0, transport = {
      s = 0x0, len = 0}, ttl = {s = 0x0, len = 0}, user_param = {s =
      0x0, len = 192}, maddr = {s = 0x0, len = 3}, method = {s = 0x0,
      len = 11064439}, lr = {
      s = 0x0, len = 3}, r2 = {s = 0x0, len = 11064464}, gr = {s = 0x0,
      len = 3}, transport_val = {s = 0x0, len = 2039735278}, ttl_val =
      {s = 0x0, len = 3},
    user_param_val = {s = 0x0, len = 11064487}, maddr_val = {s = 0x0,
    len = 0}, method_val = {s = 0x0, len = 16367}, lr_val = {s = 0x0,
    len = 5}, r2_val = {
      s = 0x0, len = 1484104637}, gr_val = {s = 0x0, len = 3}},
      param_lst = 0x0, last_param = 0xa8d4d6}

Looking at the actual Remote-Party-ID header on the other hand itself
shows nothing out of the ordinary:

(gdb) print *msg->rpid
$18 = {type = HDR_RPID_T, name = {
    s = 0x7f0162b6cba0 "Remote-Party-ID:
    <sip:XXXXXXXXXX@XXX.XX.XX.XX:5060>;privacy=off\r\nP-Charge-Info:
    sip:XXXXXXXXXX@XXX.XX.XX.XX:5060\r\nSupported:
    timer,replaces\r\nSession-Expires: 3600\r\nMin-SE:
    90\r\nContent-Length:  260\r\nCo"..., len = 15}, body = {
    s = 0x7f0162b6cbb1
    "<sip:XXXXXXXXXX@XXX.XX.XX.XX:5060>;privacy=off\r\nP-Charge-Info:
    sip:XXXXXXXXXX@XXX.XX.XX.XX:5060\r\nSupported:
    timer,replaces\r\nSession-Expires: 3600\r\nMin-SE:
    90\r\nContent-Length:  260\r\nContent-Disposition"..., len = 46},
    len = 65, parsed = 0x7f017b33c710, next = 0x7f0162b6d720}

I should also add that the value before and after RPID appears to be
correct:

(gdb) print inf->varr[i-1].s
$24 = 0x7f0161ce8427 "sip:XXXXXXXXXX@X.XX.XX.XXX:5060
SIP/2.0\r\nRecord-Route:
<sip:X.XX.XX.XX;lr=on;ftag=gK0c7f2820;did=438.50e1>\r\nRecord-Route:
<sip:X.XX.XX.XX;lr=on;did=ee5.4da5ae1>\r\nVia: SIP/2.0/UDP
X.XX.XX.XX:5060;branc"...
(gdb) print inf->varr[i-1].len
$25 = 31
(gdb) print inf->varr[i+1].s
$26 = 0x7f0176dbe720 "X.XX.XX.XX"
(gdb) print inf->varr[i+1].len
$27 = 10

Has anyone seen anything like this before or have a suggestion on where
to look further? I've traced through the code in question and it all
seems to be correct.

We have a theory that we hope to test out later today. If anyone more familiar with the inner workings of Kamailio could comment on it, that'd be hugely appreciated.

Looking through the Kamailio code base, not many things manipulate rpid->parsed once it has been set. One place that it could be manipulated is in sip_msg_shm_clone. Comparing the handling of the rpid header in that routine to the from/to headers is instructive - the from/to header handling directly handles the case of parsed being set, while rpid does not:

from/to:

case HDR_TO_T:
case HDR_FROM_T:
if (hdr->type == HDR_TO_T) {
if (!HOOK_SET(to)) new_msg->to = new_hdr;
} else {
if (!HOOK_SET(from)) new_msg->from = new_hdr;
}
    /* From header might be unparsed */
if (!hdr->parsed) break;
new_hdr->parsed = p;
p +=ROUND4(sizeof(struct to_body));
memcpy(new_hdr->parsed, hdr->parsed, sizeof(struct to_body));
((struct to_body*)new_hdr->parsed)->body.s =
translate_pointer( new_msg->buf , org_msg->buf ,
  ((struct to_body*)hdr->parsed)->body.s );
((struct to_body*)new_hdr->parsed)->display.s =
translate_pointer( new_msg->buf, org_msg->buf,
  ((struct to_body*)hdr->parsed)->display.s);
((struct to_body*)new_hdr->parsed)->uri.s =
translate_pointer( new_msg->buf , org_msg->buf ,
  ((struct to_body*)hdr->parsed)->uri.s );
if ( ((struct to_body*)hdr->parsed)->tag_value.s )
((struct to_body*)new_hdr->parsed)->tag_value.s =
translate_pointer( new_msg->buf , org_msg->buf ,
  ((struct to_body*)hdr->parsed)->tag_value.s );
if ( (((struct to_body*)new_hdr->parsed)->parsed_uri.user.s)
|| (((struct to_body*)new_hdr->parsed)->parsed_uri.host.s) )
uri_trans(new_msg->buf, org_msg->buf,
&((struct to_body*)new_hdr->parsed)->parsed_uri);

etc.

rpid:

case HDR_RPID_T:
if (!HOOK_SET(rpid)) {
new_msg->rpid = new_hdr;
}
break;

Since the rpid->parsed pointer points to a struct to_body, I would expect it would need the same handling as the from/to headers.

It's interesting to note in our backtrace that the crash occurs in a reply route, and - I think - Kamailio will clone the SIP message prior to executing the reply route (again - I think). Since the crash only happens if you evaluate the rpid pseudo variable, and since the Radius configuration we're using maps an 'extra field' to said psuedo variable, that might explain the issue we're seeing.

Does this sound plausible, or are we potentially going down the wrong path here?

-- 
Matthew Jordan
Digium, Inc. | CTO
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org


_______________________________________________
sr-dev mailing list
sr-dev@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio World Conference - May 8-10, 2017 - www.kamailioworld.com