Hey guys,
Not sure if there have been any changes but I have an interesting problem here when using TCP:
*The BT is as follows:*
#0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566
This seems to be related to clearing timers for TCP connections. The crash is related to the following code:
*_timer_rm_list(tl)*
where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below).
*(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
Any ideas?
Cheers Jason
Hello,
the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee .
Does it happen often, or you got it just once so far?
Cheers, Daniel
On 05/10/15 08:29, Jason Penton wrote:
Hey guys,
Not sure if there have been any changes but I have an interesting problem here when using TCP:
*The BT is as follows:*
#0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566
This seems to be related to clearing timers for TCP connections. The crash is related to the following code:
*_timer_rm_list(tl)*
where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below).
*(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
Any ideas?
Cheers Jason
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hey Daniel,
Got it once so far... I'll take a look at that commit and see what it did - but very strange indeed.
Cheers Jason
On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee .
Does it happen often, or you got it just once so far?
Cheers, Daniel
On 05/10/15 08:29, Jason Penton wrote:
Hey guys,
Not sure if there have been any changes but I have an interesting problem here when using TCP:
*The BT is as follows:*
#0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566
This seems to be related to clearing timers for TCP connections. The crash is related to the following code:
*_timer_rm_list(tl)*
where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below).
*(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
Any ideas?
Cheers Jason
sr-dev mailing listsr-dev@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
It could be some other reason, I will investigate as well. Using any modules on top of tcp (like tcpops, tls or websocket)?
Cheers, Daniel
On 05/10/15 15:47, Jason Penton wrote:
Hey Daniel,
Got it once so far... I'll take a look at that commit and see what it did - but very strange indeed.
Cheers Jason
On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com> wrote:
Hello, the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee . Does it happen often, or you got it just once so far? Cheers, Daniel On 05/10/15 08:29, Jason Penton wrote:
Hey guys, Not sure if there have been any changes but I have an interesting problem here when using TCP: *The BT is as follows:* #0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566 This seems to be related to clearing timers for TCP connections. The crash is related to the following code: *_timer_rm_list(tl)* where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below). *(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0} Any ideas? Cheers Jason _______________________________________________ sr-dev mailing list sr-dev@lists.sip-router.org <mailto:sr-dev@lists.sip-router.org> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierla http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda> - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat _______________________________________________ sr-dev mailing list sr-dev@lists.sip-router.org <mailto:sr-dev@lists.sip-router.org> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Hey Daniel
Using TLS only. not using tcpops nor websocket
On Mon, 5 Oct 2015 at 15:54 Daniel-Constantin Mierla miconda@gmail.com wrote:
It could be some other reason, I will investigate as well. Using any modules on top of tcp (like tcpops, tls or websocket)?
Cheers, Daniel
On 05/10/15 15:47, Jason Penton wrote:
Hey Daniel,
Got it once so far... I'll take a look at that commit and see what it did
- but very strange indeed.
Cheers Jason
On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee .
Does it happen often, or you got it just once so far?
Cheers, Daniel
On 05/10/15 08:29, Jason Penton wrote:
Hey guys,
Not sure if there have been any changes but I have an interesting problem here when using TCP:
*The BT is as follows:*
#0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566
This seems to be related to clearing timers for TCP connections. The crash is related to the following code:
*_timer_rm_list(tl)*
where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below).
*(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
Any ideas?
Cheers Jason
sr-dev mailing listsr-dev@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
I can confirm this is not related to TLS as I have recreated using TCP only. Here is another BT (it seems the entire tcpconn structure is corrupt)....
*(gdb) bt* #0 local_timer_del (h=h@entry=0x99f3a0 <tcp_main_ltimer>, tl=tl@entry=0x7ff3c82ce910) at local_timer.c:178 #1 0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828, ev=<optimized out>, fd_i=-1) at tcp_main.c:4139 #2 0x00000000005c3e57 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #3 tcp_main_loop () at tcp_main.c:4481 #4 0x00000000004ab79f in main_loop () at main.c:1672 #5 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566 (gdb) frame 1 #1 0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828, ev=<optimized out>, fd_i=-1) at tcp_main.c:4139 4139 tcp_main.c: No such file or directory.
*(gdb) frame 0* *(gdb) p *tl* $3 = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire = 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx = 34376} Looks like there is corruption....
*(gdb) frame 1* *(gdb) print *tcpconn* $11 = {s = 77, fd = -1, write_lock = {val = 14}, id = 16254464, reader_pid = -2110739197, rcv = {src_ip = {af = 60854530, len = 33685762, u = {addrl = {9667549471699894785, 363385298974377544}, addr32 = { 221250049, 2250901766, 234325576, 84607233}, addr16 = {513, 3376, 2310, 34346, 34376, 3575, 257, 1291}, addr = "\001\002\060\r\006\t*\206H\206\367\r\001\001\v\005"}}, dst_ip = {af = 825700352, len = 101265419, u = {addrl = {5140016830441870595, 312159073630948145}, addr32 = {100947203, 1196753427, 154143537, 72680198}, addr16 = {21763, 1540, 531, 18261, 2865, 2352, 774, 1109}, addr = "\003U\004\006\023\002UG1\v0\t\006\003U\004"}}, src_port = 3080, dst_port = 21762, proto_reserved1 = 806039879, proto_reserved2 = 1426261513, src_su = {s = {sa_family = 1796, sa_data = "\f\002UG1\016\060\f\006\003U\004\n\f"}, sin = {sin_family = 1796, sin_port = 524, sin_addr = {s_addr = 238110549}, sin_zero = "0\f\006\003U\004\n\f"}, sin6 = {sin6_family = 1796, sin6_port = 524, sin6_flowinfo = 238110549, sin6_addr = {__in6_u = {__u6_addr8 = "0\f\006\003U\004\n\f\005Smile0\036", __u6_addr16 = {3120, 774, 1109, 3082, 21253, 26989, 25964, 7728}, __u6_addr32 = { 50727984, 201983061, 1768772357, 506488172}}}, sin6_scope_id = 892407063}}, bind_address = 0x3135393034323830, proto = 51 '3'}, req = {next = 0x3331353930333238, buf = 0x9300b317c305a34 <error: Cannot access memory at address 0x9300b317c305a34>, start = 0x5502130604550306 <error: Cannot access memory at address 0x5502130604550306>, pos = 0x55030609300b3147 <error: Cannot access memory at address 0x55030609300b3147>, parsed = 0xe314755020c0804 <error: Cannot access memory at address 0xe314755020c0804>, body = 0xc0a045503060c30 <error: Cannot access memory at address 0xc0a045503060c30>, b_size = 1768772357, content_len = 623994220, chunk_size = 50733872, flags = 1109, bytes_to_go = 1935896604, error = 1680762467, state = 1663989349}, refcnt = {val = 1818848627}, type = 1836016485, flags = 12155, send_flags = {f = 99 'c', blst_imask = 111 'o'}, state = 808005997, extra_data = 0xf78648862a090627, timer = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire = 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx = 34376}, timeout = 50332929, lifetime = 983426, id_hash = 167871024, id_next = 0xab4698aaf4cd000, id_prev = 0x82f3bd212ad477e1, c_next = 0xfebfadf2e80d476b, c_prev = 0x542ad26c6fff5d01, con_aliases = {{ parent = 0x7ee7daa501f3ec2, next = 0x2564cfdb796faf0b, prev = 0x67075f982b68c22d, port = 26158, hash = 52278}, {parent = 0xa61ccf520173dbcf, next = 0xedfd914bd9290055, prev = 0x4c82e1be15c8e6bd, port = 56482, hash = 35158}, {parent = 0x1dd62e187ab759f8, next = 0x5b9483e3bd60806b, prev = 0x3c0e501ef81ca353, port = 42794, hash = 37520}, {parent = 0xb6c3f82065f2d125, next = 0xa78be2be546e1fb7, prev = 0x7a2f32985edb7215, port = 16490, hash = 49867}, {parent = 0x12bd4ff964ba4650, next = 0x7f071da522484c27, prev = 0x949f60e68638512f, port = 26169, hash = 41924}, {parent = 0x60a5d1aac5486cb5, next = 0xfa82965a57bf2ff6, prev = 0x7121974cf7f338d5, port = 64152, hash = 31481}, {parent = 0xbe418ec01629988f, next = 0x57a216f873989cb5, prev = 0x8dceabdc4ec9c171, port = 55564, hash = 38148}, { parent = 0x7ba30100010302ef, next = 0x1d55030609307930, prev = 0x62c300030020413, port = 24585, hash = 18566}, {parent = 0x704f1d161f040d01, next = 0x6547204c53536e65, prev = 0x206465746172656e, port = 25923, hash = 29810}, {parent = 0x5503061d30657461, next = 0xb4f5140416040e1d, prev = 0xb4484ad4fb6fc4a5, port = 45712, hash = 7028}, {parent = 0x1d5503061f30883e, next = 0xf714801630180423, prev = 0xf2ba7396fd05170d, port = 24267, hash = 6103}, {parent = 0x2a09060d30f186cf, next = 0xb01010df7864886, prev = 0x8f00010182030005, port = 51556, hash = 28499}}, aliases = 672183753, wbuf_q = { first = 0x5b7bf0e590790cf6, last = 0x67d25b5f6e53d679, wr_timeout = 1518084272, queued = 3564629620, offset = 996062876, last_used = 950886522}}
On Mon, 5 Oct 2015 at 15:56 Jason Penton jason.penton@gmail.com wrote:
Hey Daniel
Using TLS only. not using tcpops nor websocket
On Mon, 5 Oct 2015 at 15:54 Daniel-Constantin Mierla miconda@gmail.com wrote:
It could be some other reason, I will investigate as well. Using any modules on top of tcp (like tcpops, tls or websocket)?
Cheers, Daniel
On 05/10/15 15:47, Jason Penton wrote:
Hey Daniel,
Got it once so far... I'll take a look at that commit and see what it did
- but very strange indeed.
Cheers Jason
On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee .
Does it happen often, or you got it just once so far?
Cheers, Daniel
On 05/10/15 08:29, Jason Penton wrote:
Hey guys,
Not sure if there have been any changes but I have an interesting problem here when using TCP:
*The BT is as follows:*
#0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566
This seems to be related to clearing timers for TCP connections. The crash is related to the following code:
*_timer_rm_list(tl)*
where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below).
*(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
Any ideas?
Cheers Jason
sr-dev mailing listsr-dev@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
Can you try to run with '-x qm' -- maybe there is a double free or a buffer overflow, q_malloc is more suitable for debugging such cases.
Also, can you give some details on reproducing it? Are you stress testing it?
Daniel
On 06/10/15 15:41, Jason Penton wrote:
I can confirm this is not related to TLS as I have recreated using TCP only. Here is another BT (it seems the entire tcpconn structure is corrupt)....
*(gdb) bt* #0 local_timer_del (h=h@entry=0x99f3a0 <tcp_main_ltimer>, tl=tl@entry=0x7ff3c82ce910) at local_timer.c:178 #1 0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828, ev=<optimized out>, fd_i=-1) at tcp_main.c:4139 #2 0x00000000005c3e57 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #3 tcp_main_loop () at tcp_main.c:4481 #4 0x00000000004ab79f in main_loop () at main.c:1672 #5 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566 (gdb) frame 1 #1 0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828, ev=<optimized out>, fd_i=-1) at tcp_main.c:4139 4139tcp_main.c: No such file or directory.
*(gdb) frame 0* *(gdb) p *tl* $3 = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire = 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx = 34376} Looks like there is corruption....
*(gdb) frame 1* *(gdb) print *tcpconn* $11 = {s = 77, fd = -1, write_lock = {val = 14}, id = 16254464, reader_pid = -2110739197, rcv = {src_ip = {af = 60854530, len = 33685762, u = {addrl = {9667549471699894785, 363385298974377544}, addr32 = { 221250049, 2250901766, 234325576, 84607233}, addr16 = {513, 3376, 2310, 34346, 34376, 3575, 257, 1291}, addr = "\001\002\060\r\006\t*\206H\206\367\r\001\001\v\005"}}, dst_ip = {af = 825700352, len = 101265419, u = {addrl = {5140016830441870595, 312159073630948145}, addr32 = {100947203, 1196753427, 154143537, 72680198}, addr16 = {21763, 1540, 531, 18261, 2865, 2352, 774, 1109}, addr = "\003U\004\006\023\002UG1\v0\t\006\003U\004"}}, src_port = 3080, dst_port = 21762, proto_reserved1 = 806039879, proto_reserved2 = 1426261513, src_su = {s = {sa_family = 1796, sa_data = "\f\002UG1\016\060\f\006\003U\004\n\f"}, sin = {sin_family = 1796, sin_port = 524, sin_addr = {s_addr = 238110549}, sin_zero = "0\f\006\003U\004\n\f"}, sin6 = {sin6_family = 1796, sin6_port = 524, sin6_flowinfo = 238110549, sin6_addr = {__in6_u = {__u6_addr8 = "0\f\006\003U\004\n\f\005Smile0\036", __u6_addr16 = {3120, 774, 1109, 3082, 21253, 26989, 25964, 7728}, __u6_addr32 = { 50727984, 201983061, 1768772357, 506488172}}}, sin6_scope_id = 892407063}}, bind_address = 0x3135393034323830, proto = 51 '3'}, req = {next = 0x3331353930333238, buf = 0x9300b317c305a34 <error: Cannot access memory at address 0x9300b317c305a34>, start = 0x5502130604550306 <error: Cannot access memory at address 0x5502130604550306>, pos = 0x55030609300b3147 <error: Cannot access memory at address 0x55030609300b3147>, parsed = 0xe314755020c0804 <error: Cannot access memory at address 0xe314755020c0804>, body = 0xc0a045503060c30 <error: Cannot access memory at address 0xc0a045503060c30>, b_size = 1768772357, content_len = 623994220, chunk_size = 50733872, flags = 1109, bytes_to_go = 1935896604, error = 1680762467, state = 1663989349}, refcnt = {val = 1818848627}, type = 1836016485, flags = 12155, send_flags = {f = 99 'c', blst_imask = 111 'o'}, state = 808005997, extra_data = 0xf78648862a090627, timer = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire = 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx = 34376}, timeout = 50332929, lifetime = 983426, id_hash = 167871024, id_next = 0xab4698aaf4cd000, id_prev = 0x82f3bd212ad477e1, c_next = 0xfebfadf2e80d476b, c_prev = 0x542ad26c6fff5d01, con_aliases = {{ parent = 0x7ee7daa501f3ec2, next = 0x2564cfdb796faf0b, prev = 0x67075f982b68c22d, port = 26158, hash = 52278}, {parent = 0xa61ccf520173dbcf, next = 0xedfd914bd9290055, prev = 0x4c82e1be15c8e6bd, port = 56482, hash = 35158}, {parent = 0x1dd62e187ab759f8, next = 0x5b9483e3bd60806b, prev = 0x3c0e501ef81ca353, port = 42794, hash = 37520}, {parent = 0xb6c3f82065f2d125, next = 0xa78be2be546e1fb7, prev = 0x7a2f32985edb7215, port = 16490, hash = 49867}, {parent = 0x12bd4ff964ba4650, next = 0x7f071da522484c27, prev = 0x949f60e68638512f, port = 26169, hash = 41924}, {parent = 0x60a5d1aac5486cb5, next = 0xfa82965a57bf2ff6, prev = 0x7121974cf7f338d5, port = 64152, hash = 31481}, {parent = 0xbe418ec01629988f, next = 0x57a216f873989cb5, prev = 0x8dceabdc4ec9c171, port = 55564, hash = 38148}, { parent = 0x7ba30100010302ef, next = 0x1d55030609307930, prev = 0x62c300030020413, port = 24585, hash = 18566}, {parent = 0x704f1d161f040d01, next = 0x6547204c53536e65, prev = 0x206465746172656e, port = 25923, hash = 29810}, {parent = 0x5503061d30657461, next = 0xb4f5140416040e1d, prev = 0xb4484ad4fb6fc4a5, port = 45712, hash = 7028}, {parent = 0x1d5503061f30883e, next = 0xf714801630180423, prev = 0xf2ba7396fd05170d, port = 24267, hash = 6103}, {parent = 0x2a09060d30f186cf, next = 0xb01010df7864886, prev = 0x8f00010182030005, port = 51556, hash = 28499}}, aliases = 672183753, wbuf_q = { first = 0x5b7bf0e590790cf6, last = 0x67d25b5f6e53d679, wr_timeout = 1518084272, queued = 3564629620, offset = 996062876, last_used = 950886522}}
On Mon, 5 Oct 2015 at 15:56 Jason Penton <jason.penton@gmail.com mailto:jason.penton@gmail.com> wrote:
Hey Daniel Using TLS only. not using tcpops nor websocket On Mon, 5 Oct 2015 at 15:54 Daniel-Constantin Mierla <miconda@gmail.com <mailto:miconda@gmail.com>> wrote: It could be some other reason, I will investigate as well. Using any modules on top of tcp (like tcpops, tls or websocket)? Cheers, Daniel On 05/10/15 15:47, Jason Penton wrote:
Hey Daniel, Got it once so far... I'll take a look at that commit and see what it did - but very strange indeed. Cheers Jason On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla <miconda@gmail.com <mailto:miconda@gmail.com>> wrote: Hello, the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee . Does it happen often, or you got it just once so far? Cheers, Daniel On 05/10/15 08:29, Jason Penton wrote:
Hey guys, Not sure if there have been any changes but I have an interesting problem here when using TCP: *The BT is as follows:* #0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566 This seems to be related to clearing timers for TCP connections. The crash is related to the following code: *_timer_rm_list(tl)* where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below). *(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0} Any ideas? Cheers Jason _______________________________________________ sr-dev mailing list sr-dev@lists.sip-router.org <mailto:sr-dev@lists.sip-router.org> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierla http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda> - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat _______________________________________________ sr-dev mailing list sr-dev@lists.sip-router.org <mailto:sr-dev@lists.sip-router.org> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierla http://twitter.com/#!/miconda <http://twitter.com/#%21/miconda> - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
Ok cool Daniel, will do.
No, not stress testing at all.... I can't reproduce it on request - but let me add the mem debugging and see how we go...
Cheers Jason
On Tue, 6 Oct 2015 at 15:50 Daniel-Constantin Mierla miconda@gmail.com wrote:
Can you try to run with '-x qm' -- maybe there is a double free or a buffer overflow, q_malloc is more suitable for debugging such cases.
Also, can you give some details on reproducing it? Are you stress testing it?
Daniel
On 06/10/15 15:41, Jason Penton wrote:
I can confirm this is not related to TLS as I have recreated using TCP only. Here is another BT (it seems the entire tcpconn structure is corrupt)....
*(gdb) bt* #0 local_timer_del (h=h@entry=0x99f3a0 <tcp_main_ltimer>, tl=tl@entry=0x7ff3c82ce910) at local_timer.c:178 #1 0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828, ev=<optimized out>, fd_i=-1) at tcp_main.c:4139 #2 0x00000000005c3e57 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #3 tcp_main_loop () at tcp_main.c:4481 #4 0x00000000004ab79f in main_loop () at main.c:1672 #5 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566 (gdb) frame 1 #1 0x00000000005ad47c in handle_tcpconn_ev (tcpconn=0x7ff3c82ce828, ev=<optimized out>, fd_i=-1) at tcp_main.c:4139 4139 tcp_main.c: No such file or directory.
*(gdb) frame 0* *(gdb) p *tl* $3 = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire = 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx = 34376} Looks like there is corruption....
*(gdb) frame 1* *(gdb) print *tcpconn* $11 = {s = 77, fd = -1, write_lock = {val = 14}, id = 16254464, reader_pid = -2110739197, rcv = {src_ip = {af = 60854530, len = 33685762, u = {addrl = {9667549471699894785, 363385298974377544}, addr32 = { 221250049, 2250901766, 234325576, 84607233}, addr16 = {513, 3376, 2310, 34346, 34376, 3575, 257, 1291}, addr = "\001\002\060\r\006\t*\206H\206\367\r\001\001\v\005"}}, dst_ip = {af = 825700352, len = 101265419, u = {addrl = {5140016830441870595, 312159073630948145}, addr32 = {100947203, 1196753427, 154143537, 72680198}, addr16 = {21763, 1540, 531, 18261, 2865, 2352, 774, 1109}, addr = "\003U\004\006\023\002UG1\v0\t\006\003U\004"}}, src_port = 3080, dst_port = 21762, proto_reserved1 = 806039879, proto_reserved2 = 1426261513, src_su = {s = {sa_family = 1796, sa_data = "\f\002UG1\016\060\f\006\003U\004\n\f"}, sin = {sin_family = 1796, sin_port = 524, sin_addr = {s_addr = 238110549}, sin_zero = "0\f\006\003U\004\n\f"}, sin6 = {sin6_family = 1796, sin6_port = 524, sin6_flowinfo = 238110549, sin6_addr = {__in6_u = {__u6_addr8 = "0\f\006\003U\004\n\f\005Smile0\036", __u6_addr16 = {3120, 774, 1109, 3082, 21253, 26989, 25964, 7728}, __u6_addr32 = { 50727984, 201983061, 1768772357, 506488172}}}, sin6_scope_id = 892407063}}, bind_address = 0x3135393034323830, proto = 51 '3'}, req = {next = 0x3331353930333238, buf = 0x9300b317c305a34 <error: Cannot access memory at address 0x9300b317c305a34>, start = 0x5502130604550306 <error: Cannot access memory at address 0x5502130604550306>, pos = 0x55030609300b3147 <error: Cannot access memory at address 0x55030609300b3147>, parsed = 0xe314755020c0804 <error: Cannot access memory at address 0xe314755020c0804>, body = 0xc0a045503060c30 <error: Cannot access memory at address 0xc0a045503060c30>, b_size = 1768772357, content_len = 623994220, chunk_size = 50733872, flags = 1109, bytes_to_go = 1935896604, error = 1680762467, state = 1663989349}, refcnt = {val = 1818848627}, type = 1836016485, flags = 12155, send_flags = {f = 99 'c', blst_imask = 111 'o'}, state = 808005997, extra_data = 0xf78648862a090627, timer = {next = 0x616a1a160109010d, prev = 0x746e65702e6e6f73, expire = 1933602415, initial_timeout = 1701603693, data = 0x6d6f632e736d6f63, f = 0x9060d3022018230, flags = 34346, slow_idx = 34376}, timeout = 50332929, lifetime = 983426, id_hash = 167871024, id_next = 0xab4698aaf4cd000, id_prev = 0x82f3bd212ad477e1, c_next = 0xfebfadf2e80d476b, c_prev = 0x542ad26c6fff5d01, con_aliases = {{ parent = 0x7ee7daa501f3ec2, next = 0x2564cfdb796faf0b, prev = 0x67075f982b68c22d, port = 26158, hash = 52278}, {parent = 0xa61ccf520173dbcf, next = 0xedfd914bd9290055, prev = 0x4c82e1be15c8e6bd, port = 56482, hash = 35158}, {parent = 0x1dd62e187ab759f8, next = 0x5b9483e3bd60806b, prev = 0x3c0e501ef81ca353, port = 42794, hash = 37520}, {parent = 0xb6c3f82065f2d125, next = 0xa78be2be546e1fb7, prev = 0x7a2f32985edb7215, port = 16490, hash = 49867}, {parent = 0x12bd4ff964ba4650, next = 0x7f071da522484c27, prev = 0x949f60e68638512f, port = 26169, hash = 41924}, {parent = 0x60a5d1aac5486cb5, next = 0xfa82965a57bf2ff6, prev = 0x7121974cf7f338d5, port = 64152, hash = 31481}, {parent = 0xbe418ec01629988f, next = 0x57a216f873989cb5, prev = 0x8dceabdc4ec9c171, port = 55564, hash = 38148}, { parent = 0x7ba30100010302ef, next = 0x1d55030609307930, prev = 0x62c300030020413, port = 24585, hash = 18566}, {parent = 0x704f1d161f040d01, next = 0x6547204c53536e65, prev = 0x206465746172656e, port = 25923, hash = 29810}, {parent = 0x5503061d30657461, next = 0xb4f5140416040e1d, prev = 0xb4484ad4fb6fc4a5, port = 45712, hash = 7028}, {parent = 0x1d5503061f30883e, next = 0xf714801630180423, prev = 0xf2ba7396fd05170d, port = 24267, hash = 6103}, {parent = 0x2a09060d30f186cf, next = 0xb01010df7864886, prev = 0x8f00010182030005, port = 51556, hash = 28499}}, aliases = 672183753, wbuf_q = { first = 0x5b7bf0e590790cf6, last = 0x67d25b5f6e53d679, wr_timeout = 1518084272, queued = 3564629620, offset = 996062876, last_used = 950886522}}
On Mon, 5 Oct 2015 at 15:56 Jason Penton jason.penton@gmail.com wrote:
Hey Daniel
Using TLS only. not using tcpops nor websocket
On Mon, 5 Oct 2015 at 15:54 Daniel-Constantin Mierla miconda@gmail.com wrote:
It could be some other reason, I will investigate as well. Using any modules on top of tcp (like tcpops, tls or websocket)?
Cheers, Daniel
On 05/10/15 15:47, Jason Penton wrote:
Hey Daniel,
Got it once so far... I'll take a look at that commit and see what it did - but very strange indeed.
Cheers Jason
On Mon, 5 Oct 2015 at 15:42 Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
the only recent change in the tcp core was resetting a flag to avoid infinite log messages (until connection was closed) via commit d36734d658cd0bcfc8357c7e85ca32da0612aaee .
Does it happen often, or you got it just once so far?
Cheers, Daniel
On 05/10/15 08:29, Jason Penton wrote:
Hey guys,
Not sure if there have been any changes but I have an interesting problem here when using TCP:
*The BT is as follows:*
#0 local_timer_list_expire (h=0xa0f128 <tcp_reader_ltimer+178664>, t=723807134, l=0x9e3740 <tcp_reader_ltimer>) at local_timer.c:198 #1 local_timer_expire (t=723807134, h=<optimized out>) at local_timer.c:227 #2 local_timer_run (lt=lt@entry=0x9e3740 <tcp_reader_ltimer>, saved_ticks=723807150) at local_timer.c:250 #3 0x00000000005d8417 in tcp_reader_timer_run () at tcp_read.c:1682 #4 tcp_receive_loop (unix_sock=<optimized out>) at tcp_read.c:1734 #5 0x00000000005c81c8 in tcp_init_children () at tcp_main.c:4788 #6 0x00000000004a9da3 in main_loop () at main.c:1664 #7 0x000000000042411e in main (argc=<optimized out>, argv=<optimized out>) at main.c:2566
This seems to be related to clearing timers for TCP connections. The crash is related to the following code:
*_timer_rm_list(tl)*
where it does a null pointer deref on tl->next and tl->prev, which, according to the bt, are null (see below).
*(gdb) print *tl* $14 = {next = 0x0, prev = 0x0, expire = 723807134, initial_timeout = 32, data = 0x7fbbb05aa628, f = 0x5d02f0 <tcpconn_read_timeout>, flags = 512, slow_idx = 0}
Any ideas?
Cheers Jason
sr-dev mailing listsr-dev@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat
-- Daniel-Constantin Mierlahttp://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda Book: SIP Routing With Kamailio - http://www.asipto.com
Le Tue, 6 Oct 2015 13:57:18 +0000, Jason Penton jason.penton@gmail.com a écrit :
No, not stress testing at all.... I can't reproduce it on request - but let me add the mem debugging and see how we go...
Hi Jason,
I'm digging up this message to know if you were using the usrloc module and its close_expire_tcp when this crash occurred, as a recent issue found in this functionality (memory leak) might have prevented some expired elements to be removed from the TCP connection list.
Hey Camille,
No, this was using the IMS usrloc modules. We found some memory corruption which we fixed.
Cheers Jason On 26 Nov 2015 8:05 pm, "Camille Oudot" camille.oudot@orange.com wrote:
Le Tue, 6 Oct 2015 13:57:18 +0000, Jason Penton jason.penton@gmail.com a écrit :
No, not stress testing at all.... I can't reproduce it on request - but let me add the mem debugging and see how we go...
Hi Jason,
I'm digging up this message to know if you were using the usrloc module and its close_expire_tcp when this crash occurred, as a recent issue found in this functionality (memory leak) might have prevented some expired elements to be removed from the TCP connection list.
-- Camille