[sr-dev] crash (segfault) on Kamailio master branch

Daniel-Constantin Mierla miconda at gmail.com
Tue Oct 6 15:50:00 CEST 2015


Can you try to run with '-x qm' -- maybe there is a double free or a
buffer overflow, q_malloc is more suitable for debugging such cases.

Also, can you give some details on reproducing it? Are you stress
testing it?

Daniel

On 06/10/15 15:41, Jason Penton wrote:
> I can confirm this is not related to TLS as I have recreated using TCP
> only. Here is another BT (it seems the entire tcpconn structure is
> corrupt)....
>
> *(gdb) bt*
> #0  local_timer_del (h=h at entry=0x99f3a0 <tcp_main_ltimer>,
> tl=tl at entry=0x7ff3c82ce910) at local_timer.c:178
> #1  0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828,
> ev=<optimized out>, fd_i=-1) at tcp_main.c:4139
> #2  0x00000000005c3e57 in io_wait_loop_epoll (h=<optimized out>,
> t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061
> #3  tcp_main_loop () at tcp_main.c:4481
> #4  0x00000000004ab79f in main_loop () at main.c:1672
> #5  0x000000000042411e in main (argc=<optimized out>, argv=<optimized
> out>) at main.c:2566
> (gdb) frame 1
> #1  0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828,
> ev=<optimized out>, fd_i=-1) at tcp_main.c:4139
> 4139tcp_main.c: No such file or directory.
>
> *(gdb) frame 0*
> *(gdb) p *tl*
> $3 = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire =
> 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f
> = 0x9060d3022018230, flags = 34346, slow_idx = 34376}
> Looks like there is corruption....
>
> *(gdb) frame 1*
> *(gdb) print *tcpconn*
> $11 = {s = 77, fd = -1, write_lock = {val = 14}, id = 16254464,
> reader_pid = -2110739197, rcv = {src_ip = {af = 60854530, len =
> 33685762, u = {addrl = {9667549471699894785, 363385298974377544},
> addr32 = {
>           221250049, 2250901766, 234325576, 84607233}, addr16 = {513,
> 3376, 2310, 34346, 34376, 3575, 257, 1291}, addr =
> "\001\002\060\r\006\t*\206H\206\367\r\001\001\v\005"}}, dst_ip = {af =
> 825700352, 
>       len = 101265419, u = {addrl = {5140016830441870595,
> 312159073630948145}, addr32 = {100947203, 1196753427, 154143537,
> 72680198}, addr16 = {21763, 1540, 531, 18261, 2865, 2352, 774, 1109}, 
>         addr = "\003U\004\006\023\002UG1\v0\t\006\003U\004"}},
> src_port = 3080, dst_port = 21762, proto_reserved1 = 806039879,
> proto_reserved2 = 1426261513, src_su = {s = {sa_family = 1796, 
>         sa_data = "\f\002UG1\016\060\f\006\003U\004\n\f"}, sin =
> {sin_family = 1796, sin_port = 524, sin_addr = {s_addr = 238110549},
> sin_zero = "0\f\006\003U\004\n\f"}, sin6 = {sin6_family = 1796, 
>         sin6_port = 524, sin6_flowinfo = 238110549, sin6_addr =
> {__in6_u = {__u6_addr8 = "0\f\006\003U\004\n\f\005Smile0\036",
> __u6_addr16 = {3120, 774, 1109, 3082, 21253, 26989, 25964, 7728},
> __u6_addr32 = {
>               50727984, 201983061, 1768772357, 506488172}}},
> sin6_scope_id = 892407063}}, bind_address = 0x3135393034323830, proto
> = 51 '3'}, req = {next = 0x3331353930333238, 
>     buf = 0x9300b317c305a34 <error: Cannot access memory at address
> 0x9300b317c305a34>, start = 0x5502130604550306 <error: Cannot access
> memory at address 0x5502130604550306>, 
>     pos = 0x55030609300b3147 <error: Cannot access memory at address
> 0x55030609300b3147>, parsed = 0xe314755020c0804 <error: Cannot access
> memory at address 0xe314755020c0804>, 
>     body = 0xc0a045503060c30 <error: Cannot access memory at address
> 0xc0a045503060c30>, b_size = 1768772357, content_len = 623994220,
> chunk_size = 50733872, flags = 1109, bytes_to_go = 1935896604, 
>     error = 1680762467, state = 1663989349}, refcnt = {val =
> 1818848627}, type = 1836016485, flags = 12155, send_flags = {f = 99
> 'c', blst_imask = 111 'o'}, state = 808005997, extra_data =
> 0xf78648862a090627, 
>   timer = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73,
> expire = 1933602415, initial_timeout = 1701603693, data =
> 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx =
> 34376}, 
>   timeout = 50332929, lifetime = 983426, id_hash = 167871024, id_next
> = 0xab4698aaf4cd000, id_prev = 0x82f3bd212ad477e1, c_next =
> 0xfebfadf2e80d476b, c_prev = 0x542ad26c6fff5d01, con_aliases = {{
>       parent = 0x7ee7daa501f3ec2, next = 0x2564cfdb796faf0b, prev =
> 0x67075f982b68c22d, port = 26158, hash = 52278}, {parent =
> 0xa61ccf520173dbcf, next = 0xedfd914bd9290055, prev = 0x4c82e1be15c8e6bd, 
>       port = 56482, hash = 35158}, {parent = 0x1dd62e187ab759f8, next
> = 0x5b9483e3bd60806b, prev = 0x3c0e501ef81ca353, port = 42794, hash =
> 37520}, {parent = 0xb6c3f82065f2d125, next = 0xa78be2be546e1fb7, 
>       prev = 0x7a2f32985edb7215, port = 16490, hash = 49867}, {parent
> = 0x12bd4ff964ba4650, next = 0x7f071da522484c27, prev =
> 0x949f60e68638512f, port = 26169, hash = 41924}, {parent =
> 0x60a5d1aac5486cb5, 
>       next = 0xfa82965a57bf2ff6, prev = 0x7121974cf7f338d5, port =
> 64152, hash = 31481}, {parent = 0xbe418ec01629988f, next =
> 0x57a216f873989cb5, prev = 0x8dceabdc4ec9c171, port = 55564, hash =
> 38148}, {
>       parent = 0x7ba30100010302ef, next = 0x1d55030609307930, prev =
> 0x62c300030020413, port = 24585, hash = 18566}, {parent =
> 0x704f1d161f040d01, next = 0x6547204c53536e65, prev = 0x206465746172656e, 
>       port = 25923, hash = 29810}, {parent = 0x5503061d30657461, next
> = 0xb4f5140416040e1d, prev = 0xb4484ad4fb6fc4a5, port = 45712, hash =
> 7028}, {parent = 0x1d5503061f30883e, next = 0xf714801630180423, 
>       prev = 0xf2ba7396fd05170d, port = 24267, hash = 6103}, {parent =
> 0x2a09060d30f186cf, next = 0xb01010df7864886, prev =
> 0x8f00010182030005, port = 51556, hash = 28499}}, aliases = 672183753,
> wbuf_q = {
>     first = 0x5b7bf0e590790cf6, last = 0x67d25b5f6e53d679, wr_timeout
> = 1518084272, queued = 3564629620, offset = 996062876, last_used =
> 950886522}}
>
>
>
>
>
> On Mon, 5 Oct 2015 at 15:56 Jason Penton <jason.penton at gmail.com
> <mailto:jason.penton at gmail.com>> wrote:
>
>     Hey Daniel
>
>     Using TLS only. not using tcpops nor websocket
>
>     On Mon, 5 Oct 2015 at 15:54 Daniel-Constantin Mierla
>     <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>         It could be some other reason, I will investigate as well.
>         Using any modules on top of tcp (like tcpops, tls or websocket)?
>
>         Cheers,
>         Daniel
>
>
>         On 05/10/15 15:47, Jason Penton wrote:
>>         Hey Daniel,
>>
>>         Got it once so far... I'll take a look at that commit and see
>>         what it did - but very strange indeed.
>>
>>         Cheers
>>         Jason
>>
>>         On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla
>>         <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>>             Hello,
>>
>>             the only recent change in the tcp core was resetting a
>>             flag to avoid infinite log messages (until connection was
>>             closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee .
>>
>>             Does it happen often, or you got it just once so far?
>>
>>             Cheers,
>>             Daniel
>>
>>
>>             On 05/10/15 08:29, Jason Penton wrote:
>>>             Hey guys, 
>>>
>>>             Not sure if there have been any changes but I have an
>>>             interesting problem here when using TCP:
>>>
>>>             *The BT is as follows:*
>>>
>>>             #0  local_timer_list_expire (h=0xa0f128
>>>             <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740
>>>             <tcp_reader_ltimer>) at local_timer.c:198
>>>             #1  local_timer_expire (t=723807134, h=<optimized out>)
>>>             at local_timer.c:227
>>>             #2  local_timer_run (lt=lt at entry=0x9e3740
>>>             <tcp_reader_ltimer>, saved_ticks=723807150) at
>>>             local_timer.c:250
>>>             #3  0x00000000005d8417 in tcp_reader_timer_run () at
>>>             tcp_read.c:1682
>>>             #4  tcp_receive_loop (unix_sock=<optimized out>) at
>>>             tcp_read.c:1734
>>>             #5  0x00000000005c81c8 in tcp_init_children () at
>>>             tcp_main.c:4788
>>>             #6  0x00000000004a9da3 in main_loop () at main.c:1664
>>>             #7  0x000000000042411e in main (argc=<optimized out>,
>>>             argv=<optimized out>) at main.c:2566
>>>
>>>             This seems to be related to clearing timers for TCP
>>>             connections. The crash is related to the following code:
>>>
>>>             *_timer_rm_list(tl)*
>>>
>>>             where it does a null pointer deref on tl->next and
>>>             tl->prev, which, according to the bt, are null (see below).
>>>
>>>             *(gdb) print *tl*
>>>             $14 = {next = 0x0, prev = 0x0, expire = 723807134,
>>>             initial_timeout = 32, data = 0x7fbbb05aa628, f =
>>>             0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
>>>
>>>             Any ideas?
>>>
>>>             Cheers
>>>             Jason
>>>              
>>>
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             sr-dev mailing list
>>>             sr-dev at lists.sip-router.org
>>>             <mailto:sr-dev at lists.sip-router.org>
>>>             http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>>             -- 
>>             Daniel-Constantin Mierla
>>             http://twitter.com/#!/miconda
>>             <http://twitter.com/#%21/miconda> - http://www.linkedin.com/in/miconda
>>             Book: SIP Routing With Kamailio - http://www.asipto.com
>>             Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
>>
>>             _______________________________________________
>>             sr-dev mailing list
>>             sr-dev at lists.sip-router.org
>>             <mailto:sr-dev at lists.sip-router.org>
>>             http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>
>         -- 
>         Daniel-Constantin Mierla
>         http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda> - http://www.linkedin.com/in/miconda
>         Book: SIP Routing With Kamailio - http://www.asipto.com
>         Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
>

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Book: SIP Routing With Kamailio - http://www.asipto.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20151006/a16626d4/attachment-0001.html>


More information about the sr-dev mailing list