i was sending some xmlrpc requests to kamailio and got crash.
-- juha
(gdb) where #0 0x00007f9c465da634 in __GI___libc_free (mem=0x17828c0) at malloc.c:2945 #1 0x00007f9c39e0b797 in collect_garbage () at xmlrpc.c:741 #2 0x00007f9c39e12669 in dispatch_rpc (msg=0x0, s1=0x17828b0 "ams", s2=0x7f9c46903628 <main_arena+8> "`\352w\001") at xmlrpc.c:2440 #3 0x000000000053c2c0 in do_action (h=0x7ffd17c562d0, a=0x7f9c46355c28, msg=0x7ffd17c563b0) at action.c:1060 #4 0x000000000053ad85 in run_actions (h=0x7f9c46903620 <main_arena>, a=0x17828b0, msg=0x7f9c46903628 <main_arena+8>) at action.c:1549 #5 0x00007f9c39e0d04b in em_receive_request (orig_msg=0x7f9c45cc43e8, new_buf=0x0, new_len=24651952) at xmlrpc.c:2303 #6 0x00007f9c39e14034 in process_xmlrpc (msg=0x7f9c45cc43e8) at xmlrpc.c:2385 #7 0x00000000004fdce3 in nonsip_msg_run_hooks (msg=0x7f9c45cc43e8) at nonsip_hooks.c:111 #8 0x0000000000549610 in receive_msg (buf=0x0, len=1171014632, rcv_info=0x7f9c3e4b0110) at receive.c:188 #9 0x00000000005cb97a in tcp_read_req (con=0x7f9c3e4b00f8, bytes_read=0x7ffd17c56e64, read_flags=0x7ffd17c56e68) at tcp_read.c:1418 #10 0x00000000005d0293 in handle_io (fm=0x7f9c3e4b00f8, events=10416, idx=-1) at tcp_read.c:1611 #11 0x00000000005d6bce in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #12 tcp_receive_loop (unix_sock=1183856160) at tcp_read.c:1781 #13 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #14 0x000000000050736f in main_loop () at main.c:1690 #15 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
Hello,
do you get any error messages in the logs or error code in the xmlrpc reply?
In frame 2 the msg structure is NULL, should be the associated sip-http request.
Cheers, Daniel
On 01/05/16 15:48, Juha Heinanen wrote:
i was sending some xmlrpc requests to kamailio and got crash.
-- juha
(gdb) where #0 0x00007f9c465da634 in __GI___libc_free (mem=0x17828c0) at malloc.c:2945 #1 0x00007f9c39e0b797 in collect_garbage () at xmlrpc.c:741 #2 0x00007f9c39e12669 in dispatch_rpc (msg=0x0, s1=0x17828b0 "ams", s2=0x7f9c46903628 <main_arena+8> "`\352w\001") at xmlrpc.c:2440 #3 0x000000000053c2c0 in do_action (h=0x7ffd17c562d0, a=0x7f9c46355c28, msg=0x7ffd17c563b0) at action.c:1060 #4 0x000000000053ad85 in run_actions (h=0x7f9c46903620 <main_arena>, a=0x17828b0, msg=0x7f9c46903628 <main_arena+8>) at action.c:1549 #5 0x00007f9c39e0d04b in em_receive_request (orig_msg=0x7f9c45cc43e8, new_buf=0x0, new_len=24651952) at xmlrpc.c:2303 #6 0x00007f9c39e14034 in process_xmlrpc (msg=0x7f9c45cc43e8) at xmlrpc.c:2385 #7 0x00000000004fdce3 in nonsip_msg_run_hooks (msg=0x7f9c45cc43e8) at nonsip_hooks.c:111 #8 0x0000000000549610 in receive_msg (buf=0x0, len=1171014632, rcv_info=0x7f9c3e4b0110) at receive.c:188 #9 0x00000000005cb97a in tcp_read_req (con=0x7f9c3e4b00f8, bytes_read=0x7ffd17c56e64, read_flags=0x7ffd17c56e68) at tcp_read.c:1418 #10 0x00000000005d0293 in handle_io (fm=0x7f9c3e4b00f8, events=10416, idx=-1) at tcp_read.c:1611 #11 0x00000000005d6bce in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #12 tcp_receive_loop (unix_sock=1183856160) at tcp_read.c:1781 #13 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #14 0x000000000050736f in main_loop () at main.c:1690 #15 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla writes:
do you get any error messages in the logs or error code in the xmlrpc reply?
no error messages. the crash does not happen immediate when xmlrpc request is served. in the example below, the crash came about 20 sec later perhaps as result of some garbage collection activity.
kamailio is listening xmlrpc requests at port 6060 and one get served:
##### T 2016/05/02 21:54:19.117214 127.0.0.1:46250 -> 127.0.0.1:6060 [AP] POST /RPC2 HTTP/1.1. User-Agent: XMLRPC++ 0.8. Host: 127.0.0.1:6060. Content-Type: text/xml. Content-length: 143. . <?xml version="1.0"?>. <methodCall><methodName>htable.reload</methodName>. <params><param><value>htable</value></param></params></methodCall>.
## T 2016/05/02 21:54:19.118067 127.0.0.1:6060 -> 127.0.0.1:46250 [AP] HTTP/1.1 200 OK. Sia: SIP/2.0/TCP 127.0.0.1:46250. Server: OpenXg SIP Proxy (4.4.0-1 (x86_64/linux)). Content-Length: 108. . <?xml version="1.0"?> <methodResponse> <params> <param> <value></value> </param> </params> </methodResponse> #####
then the crash comes:
May 2 21:54:51 lohi kernel: [100671.026931] sip-proxy[1628]: segfault at 2213008 ip 00007f5debb87b45 sp 00007ffe98f51cf0 error 4 in libc-2.19.so[7f5debb0f000+1a2000] May 2 21:54:51 lohi kernel: [100671.111115] sip-proxy[1643]: segfault at 2213008 ip 00007f5debb87b45 sp 00007ffe98f54660 error 4 in libc-2.19.so[7f5debb0f000+1a2000] May 2 21:54:51 lohi kernel: [100671.114042] sip-proxy[1638]: segfault at 2213008 ip 00007f5debb87b45 sp 00007ffe98f54660 error 4 in libc-2.19.so[7f5debb0f000+1a2000] May 2 21:54:51 lohi /usr/bin/sip-proxy[1649]: CRITICAL: <core> [pass_fd.c:275]: receive_fd(): EOF on 39 May 2 21:54:51 lohi /usr/bin/sip-proxy[1563]: ALERT: <core> [main.c:739]: handle_sigs(): child process 1628 exited by a signal 11 May 2 21:54:51 lohi /usr/bin/sip-proxy[1563]: ALERT: <core> [main.c:742]: handle_sigs(): core was generated
three cores are generated:
47508 -rw------- 1 root root 60002304 May 2 21:54 core.sip-proxy.sig11.1628 47508 -rw------- 1 root root 60002304 May 2 21:54 core.sip-proxy.sig11.1638 47508 -rw------- 1 root root 60002304 May 2 21:54 core.sip-proxy.sig11.1643
first one:
(gdb) where #0 malloc_consolidate (av=av@entry=0x7f5debeb4620 <main_arena>) at malloc.c:4151 #1 0x00007f5debb88e98 in _int_malloc (av=0x7f5debeb4620 <main_arena>, bytes=8160) at malloc.c:3423 #2 0x00007f5debb8b020 in __GI___libc_malloc (bytes=8160) at malloc.c:2891 #3 0x00007f5de9f5fd14 in my_malloc () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #4 0x00007f5de9f5b6ac in alloc_root () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #5 0x00007f5de9f3fc5c in cli_read_rows () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #6 0x00007f5de9f40b75 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #7 0x00007f5de9f4167e in mysql_real_query () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #8 0x00007f5dea452705 in db_mysql_submit_query (_h=0x7f5deaf2f940, _s=0x7ffe98f520e0) at km_dbase.c:116 #9 0x00007f5dea89fb94 in db_do_submit_query (submit_query=<optimized out>, _query=<optimized out>, _h=<optimized out>) at db_query.c:53 #10 db_do_raw_query (_h=0x7f5deaf2f940, _s=0x7ffe98f520e0, _r=0x7ffe98f52090, submit_query=0x7f5debeb4678 <main_arena+88>, store_result=0x7f5dea453cd0 <db_mysql_store_result>) at db_query.c:180 #11 0x00007f5dde409f5c in sql_do_query (con=0x21f4080, query=0x7ffe98f520e0, res=0x7f5deb6964e8) at sql_api.c:262 #12 0x00007f5dde41554e in sql_query (msg=0x21f4080, dbl=0x7f5deaf5dbe0 "\030\333\365\352]\177", query=0x21f4080 "", res=0x7f5deb6964e8 "\245\246=|") at sqlops.c:210 #13 0x000000000053cf2a in do_action (h=0x7ffe98f52be0, a=0x7f5deb7ca0d8, msg=0x7f5deb2753e8) at action.c:1072 #14 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #15 0x000000000053c9b0 in do_action (h=0x7ffe98f52be0, a=0x7f5deb7c09e8, msg=0x7f5deb2753e8) at action.c:678 #16 0x000000000053ad85 in run_actions (h=0x21f4080, h@entry=0x7ffe98f52be0, a=0x21f4080, msg=0x71) at action.c:1549 #17 0x0000000000547757 in run_actions_safe (h=0x7ffe98f54aa0, a=<optimized out>, msg=<optimized out>) at action.c:1614 #18 0x0000000000424720 in rval_get_int (h=0x7ffe98f54aa0, msg=0x21f4080, i=0x7ffe98f52ee4, rv=0x7f5debeb4678 <main_arena+88>, cache=0x7f5debeb4628 <main_arena+8>) at rvalue.c:912 #19 0x000000000042a4ac in rval_expr_eval_int (h=0x7ffe98f54aa0, msg=0x7f5deb2753e8, res=0x7ffe98f52ee4, rve=0x7f5deb0f97f0) at rvalue.c:1910 #20 0x000000000042a5a6 in rval_expr_eval_int (h=0x7ffe98f54aa0, msg=0x7f5deb2753e8, res=0x7ffe98f53060, rve=0x7f5deb0f9f00) at rvalue.c:1918 #21 0x000000000053c797 in do_action (h=0x7ffe98f54aa0, a=0x7f5deb1006b8, msg=0x7f5deb2753e8) at action.c:1030 #22 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #23 0x000000000053c9b0 in do_action (h=0x7ffe98f54aa0, a=0x7f5deb7c09e8, msg=0x7f5deb2753e8) at action.c:678 #24 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #25 0x000000000053c7e8 in do_action (h=0x7ffe98f54aa0, a=0x7f5deaf79008, msg=0x7f5deb2753e8) at action.c:1049 #26 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #27 0x000000000053c9b0 in do_action (h=0x7ffe98f54aa0, a=0x7f5deb7c09e8, msg=0x7f5deb2753e8) at action.c:678 ---Type <return> to continue, or q <return> to quit--- #28 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #29 0x000000000053c7e8 in do_action (h=0x7ffe98f54aa0, a=0x7f5deaf77f90, msg=0x7f5deb2753e8) at action.c:1049 #30 0x000000000053ad85 in run_actions (h=0x21f4080, h@entry=0x7ffe98f54aa0, a=0x21f4080, a@entry=0x7f5deaf61b08, msg=0x71, msg@entry=0x7f5deb2753e8) at action.c:1549 #31 0x00000000005477f5 in run_top_route (a=0x7f5deaf61b08, msg=0x7f5deb2753e8, c=<optimized out>) at action.c:1635 #32 0x000000000054968e in receive_msg (buf=0x0, len=3945223144, rcv_info=0x7f5de3a2e358) at receive.c:239 #33 0x00000000005cb97a in tcp_read_req (con=0x7f5de3a2e340, bytes_read=0x7ffe98f54e24, read_flags=0x7ffe98f54e28) at tcp_read.c:1418 #34 0x00000000005d0293 in handle_io (fm=0x7f5de3a2e340, events=16512, idx=-1) at tcp_read.c:1611 #35 0x00000000005d6bce in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #36 tcp_receive_loop (unix_sock=35602560) at tcp_read.c:1781 #37 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #38 0x000000000050736f in main_loop () at main.c:1690 #39 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
second one:
(gdb) where #0 malloc_consolidate (av=av@entry=0x7f5debeb4620 <main_arena>) at malloc.c:4151 #1 0x00007f5debb88e98 in _int_malloc (av=av@entry=0x7f5debeb4620 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3423 #2 0x00007f5debb8b9dc in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3219 #3 0x00007f5debb80d43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe98f547d0, sizeloc=sizeloc@entry=0x7ffe98f547d8) at memstream.c:85 #4 0x00007f5debbf307d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1, fmt=0x6d9c00 "%s: %s%s(): signal %d received\n", ap=ap@entry=0x7ffe98f548a8) at ../misc/syslog.c:167 #5 0x00007f5debbf35bf in __syslog (pri=<optimized out>, fmt=<optimized out>) at ../misc/syslog.c:117 #6 0x0000000000502f49 in sig_usr (signo=35602560) at main.c:809 #7 <signal handler called> #8 0x00007f5debbf7e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #9 0x00000000005d6b81 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1030 #10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781 #11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #12 0x000000000050736f in main_loop () at main.c:1690 #13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
third one:
(gdb) where #0 malloc_consolidate (av=av@entry=0x7f5debeb4620 <main_arena>) at malloc.c:4151 #1 0x00007f5debb88e98 in _int_malloc (av=av@entry=0x7f5debeb4620 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3423 #2 0x00007f5debb8b9dc in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3219 #3 0x00007f5debb80d43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe98f547d0, sizeloc=sizeloc@entry=0x7ffe98f547d8) at memstream.c:85 #4 0x00007f5debbf307d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1, fmt=0x6d9c00 "%s: %s%s(): signal %d received\n", ap=ap@entry=0x7ffe98f548a8) at ../misc/syslog.c:167 #5 0x00007f5debbf35bf in __syslog (pri=<optimized out>, fmt=<optimized out>) at ../misc/syslog.c:117 #6 0x0000000000502f49 in sig_usr (signo=35602560) at main.c:809 #7 <signal handler called> #8 0x00007f5debbf7e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #9 0x00000000005d6b81 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1030 #10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781 #11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #12 0x000000000050736f in main_loop () at main.c:1690 #13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
-- juha
These ones are no longer related to xmlrpc, but triggered by an sqlops query - first bracktrace, the other two are during the signal handling.
The one for sqlops leads quite a bit inside the libmysqlclient, which can happen because there is a buffer overflow in kamailio for the operations in the system memory or an issue in the libmysqlclient.
Trying to investigate on the first option -- does it happen always after an xmlrpc command? If yes, it is any kind of command or a specific xmlrpc command (I see you used htable.reload)?
Cheers, Daniel
On 02/05/16 21:03, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
do you get any error messages in the logs or error code in the xmlrpc reply?
no error messages. the crash does not happen immediate when xmlrpc request is served. in the example below, the crash came about 20 sec later perhaps as result of some garbage collection activity.
kamailio is listening xmlrpc requests at port 6060 and one get served:
##### T 2016/05/02 21:54:19.117214 127.0.0.1:46250 -> 127.0.0.1:6060 [AP] POST /RPC2 HTTP/1.1. User-Agent: XMLRPC++ 0.8. Host: 127.0.0.1:6060. Content-Type: text/xml. Content-length: 143. .
<?xml version="1.0"?>.
<methodCall><methodName>htable.reload</methodName>. <params><param><value>htable</value></param></params></methodCall>.
## T 2016/05/02 21:54:19.118067 127.0.0.1:6060 -> 127.0.0.1:46250 [AP] HTTP/1.1 200 OK. Sia: SIP/2.0/TCP 127.0.0.1:46250. Server: OpenXg SIP Proxy (4.4.0-1 (x86_64/linux)). Content-Length: 108. .
<?xml version="1.0"?>
<methodResponse> <params> <param> <value></value> </param> </params> </methodResponse> #####
then the crash comes:
May 2 21:54:51 lohi kernel: [100671.026931] sip-proxy[1628]: segfault at 2213008 ip 00007f5debb87b45 sp 00007ffe98f51cf0 error 4 in libc-2.19.so[7f5debb0f000+1a2000] May 2 21:54:51 lohi kernel: [100671.111115] sip-proxy[1643]: segfault at 2213008 ip 00007f5debb87b45 sp 00007ffe98f54660 error 4 in libc-2.19.so[7f5debb0f000+1a2000] May 2 21:54:51 lohi kernel: [100671.114042] sip-proxy[1638]: segfault at 2213008 ip 00007f5debb87b45 sp 00007ffe98f54660 error 4 in libc-2.19.so[7f5debb0f000+1a2000] May 2 21:54:51 lohi /usr/bin/sip-proxy[1649]: CRITICAL: <core> [pass_fd.c:275]: receive_fd(): EOF on 39 May 2 21:54:51 lohi /usr/bin/sip-proxy[1563]: ALERT: <core> [main.c:739]: handle_sigs(): child process 1628 exited by a signal 11 May 2 21:54:51 lohi /usr/bin/sip-proxy[1563]: ALERT: <core> [main.c:742]: handle_sigs(): core was generated
three cores are generated:
47508 -rw------- 1 root root 60002304 May 2 21:54 core.sip-proxy.sig11.1628 47508 -rw------- 1 root root 60002304 May 2 21:54 core.sip-proxy.sig11.1638 47508 -rw------- 1 root root 60002304 May 2 21:54 core.sip-proxy.sig11.1643
first one:
(gdb) where #0 malloc_consolidate (av=av@entry=0x7f5debeb4620 <main_arena>) at malloc.c:4151 #1 0x00007f5debb88e98 in _int_malloc (av=0x7f5debeb4620 <main_arena>, bytes=8160) at malloc.c:3423 #2 0x00007f5debb8b020 in __GI___libc_malloc (bytes=8160) at malloc.c:2891 #3 0x00007f5de9f5fd14 in my_malloc () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #4 0x00007f5de9f5b6ac in alloc_root () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #5 0x00007f5de9f3fc5c in cli_read_rows () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #6 0x00007f5de9f40b75 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #7 0x00007f5de9f4167e in mysql_real_query () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.18 #8 0x00007f5dea452705 in db_mysql_submit_query (_h=0x7f5deaf2f940, _s=0x7ffe98f520e0) at km_dbase.c:116 #9 0x00007f5dea89fb94 in db_do_submit_query (submit_query=<optimized out>, _query=<optimized out>, _h=<optimized out>) at db_query.c:53 #10 db_do_raw_query (_h=0x7f5deaf2f940, _s=0x7ffe98f520e0, _r=0x7ffe98f52090, submit_query=0x7f5debeb4678 <main_arena+88>, store_result=0x7f5dea453cd0 <db_mysql_store_result>) at db_query.c:180 #11 0x00007f5dde409f5c in sql_do_query (con=0x21f4080, query=0x7ffe98f520e0, res=0x7f5deb6964e8) at sql_api.c:262 #12 0x00007f5dde41554e in sql_query (msg=0x21f4080, dbl=0x7f5deaf5dbe0 "\030\333\365\352]\177", query=0x21f4080 "", res=0x7f5deb6964e8 "\245\246=|") at sqlops.c:210 #13 0x000000000053cf2a in do_action (h=0x7ffe98f52be0, a=0x7f5deb7ca0d8, msg=0x7f5deb2753e8) at action.c:1072 #14 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #15 0x000000000053c9b0 in do_action (h=0x7ffe98f52be0, a=0x7f5deb7c09e8, msg=0x7f5deb2753e8) at action.c:678 #16 0x000000000053ad85 in run_actions (h=0x21f4080, h@entry=0x7ffe98f52be0, a=0x21f4080, msg=0x71) at action.c:1549 #17 0x0000000000547757 in run_actions_safe (h=0x7ffe98f54aa0, a=<optimized out>, msg=<optimized out>) at action.c:1614 #18 0x0000000000424720 in rval_get_int (h=0x7ffe98f54aa0, msg=0x21f4080, i=0x7ffe98f52ee4, rv=0x7f5debeb4678 <main_arena+88>, cache=0x7f5debeb4628 <main_arena+8>) at rvalue.c:912 #19 0x000000000042a4ac in rval_expr_eval_int (h=0x7ffe98f54aa0, msg=0x7f5deb2753e8, res=0x7ffe98f52ee4, rve=0x7f5deb0f97f0) at rvalue.c:1910 #20 0x000000000042a5a6 in rval_expr_eval_int (h=0x7ffe98f54aa0, msg=0x7f5deb2753e8, res=0x7ffe98f53060, rve=0x7f5deb0f9f00) at rvalue.c:1918 #21 0x000000000053c797 in do_action (h=0x7ffe98f54aa0, a=0x7f5deb1006b8, msg=0x7f5deb2753e8) at action.c:1030 #22 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #23 0x000000000053c9b0 in do_action (h=0x7ffe98f54aa0, a=0x7f5deb7c09e8, msg=0x7f5deb2753e8) at action.c:678 #24 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #25 0x000000000053c7e8 in do_action (h=0x7ffe98f54aa0, a=0x7f5deaf79008, msg=0x7f5deb2753e8) at action.c:1049 #26 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #27 0x000000000053c9b0 in do_action (h=0x7ffe98f54aa0, a=0x7f5deb7c09e8, msg=0x7f5deb2753e8) at action.c:678 ---Type <return> to continue, or q <return> to quit--- #28 0x000000000053ad85 in run_actions (h=0x21f4080, a=0x21f4080, msg=0x71) at action.c:1549 #29 0x000000000053c7e8 in do_action (h=0x7ffe98f54aa0, a=0x7f5deaf77f90, msg=0x7f5deb2753e8) at action.c:1049 #30 0x000000000053ad85 in run_actions (h=0x21f4080, h@entry=0x7ffe98f54aa0, a=0x21f4080, a@entry=0x7f5deaf61b08, msg=0x71, msg@entry=0x7f5deb2753e8) at action.c:1549 #31 0x00000000005477f5 in run_top_route (a=0x7f5deaf61b08, msg=0x7f5deb2753e8, c=<optimized out>) at action.c:1635 #32 0x000000000054968e in receive_msg (buf=0x0, len=3945223144, rcv_info=0x7f5de3a2e358) at receive.c:239 #33 0x00000000005cb97a in tcp_read_req (con=0x7f5de3a2e340, bytes_read=0x7ffe98f54e24, read_flags=0x7ffe98f54e28) at tcp_read.c:1418 #34 0x00000000005d0293 in handle_io (fm=0x7f5de3a2e340, events=16512, idx=-1) at tcp_read.c:1611 #35 0x00000000005d6bce in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #36 tcp_receive_loop (unix_sock=35602560) at tcp_read.c:1781 #37 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #38 0x000000000050736f in main_loop () at main.c:1690 #39 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
second one:
(gdb) where #0 malloc_consolidate (av=av@entry=0x7f5debeb4620 <main_arena>) at malloc.c:4151 #1 0x00007f5debb88e98 in _int_malloc (av=av@entry=0x7f5debeb4620 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3423 #2 0x00007f5debb8b9dc in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3219 #3 0x00007f5debb80d43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe98f547d0, sizeloc=sizeloc@entry=0x7ffe98f547d8) at memstream.c:85 #4 0x00007f5debbf307d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1, fmt=0x6d9c00 "%s: %s%s(): signal %d received\n", ap=ap@entry=0x7ffe98f548a8) at ../misc/syslog.c:167 #5 0x00007f5debbf35bf in __syslog (pri=<optimized out>, fmt=<optimized out>) at ../misc/syslog.c:117 #6 0x0000000000502f49 in sig_usr (signo=35602560) at main.c:809 #7 <signal handler called> #8 0x00007f5debbf7e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #9 0x00000000005d6b81 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1030 #10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781 #11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #12 0x000000000050736f in main_loop () at main.c:1690 #13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
third one:
(gdb) where #0 malloc_consolidate (av=av@entry=0x7f5debeb4620 <main_arena>) at malloc.c:4151 #1 0x00007f5debb88e98 in _int_malloc (av=av@entry=0x7f5debeb4620 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3423 #2 0x00007f5debb8b9dc in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3219 #3 0x00007f5debb80d43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe98f547d0, sizeloc=sizeloc@entry=0x7ffe98f547d8) at memstream.c:85 #4 0x00007f5debbf307d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1, fmt=0x6d9c00 "%s: %s%s(): signal %d received\n", ap=ap@entry=0x7ffe98f548a8) at ../misc/syslog.c:167 #5 0x00007f5debbf35bf in __syslog (pri=<optimized out>, fmt=<optimized out>) at ../misc/syslog.c:117 #6 0x0000000000502f49 in sig_usr (signo=35602560) at main.c:809 #7 <signal handler called> #8 0x00007f5debbf7e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #9 0x00000000005d6b81 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1030 #10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781 #11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #12 0x000000000050736f in main_loop () at main.c:1690 #13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616
-- juha
Daniel-Constantin Mierla writes:
These ones are no longer related to xmlrpc, but triggered by an sqlops query - first bracktrace, the other two are during the signal handling.
OK, but there must be some kind of relation to xmlrpc, because I have had this same test system running without any problems for months and now suddenly I started to experience crashes, which happen always after xmlrpc queries.
The one for sqlops leads quite a bit inside the libmysqlclient, which can happen because there is a buffer overflow in kamailio for the operations in the system memory or an issue in the libmysqlclient.
Trying to investigate on the first option -- does it happen always after an xmlrpc command? If yes, it is any kind of command or a specific xmlrpc command (I see you used htable.reload)?
Yes. I made another test. I sent three xmlrpc queries (htable.reload) and got crash after the third. Why after third could be explained by this sql_query that gets executed once per minute:
route [cleaner] { # Clean old statistics from database sql_query_async("sip_proxy_usage", "DELETE FROM statistics WHERE (unix_timestamp() - time_stamp) > 86400"); }
So xmlrpc query or queries did something and that something triggered subsequent sql_query to crash.
-- Juha
On 03/05/16 09:01, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
These ones are no longer related to xmlrpc, but triggered by an sqlops query - first bracktrace, the other two are during the signal handling.
OK, but there must be some kind of relation to xmlrpc, because I have had this same test system running without any problems for months and now suddenly I started to experience crashes, which happen always after xmlrpc queries.
The one for sqlops leads quite a bit inside the libmysqlclient, which can happen because there is a buffer overflow in kamailio for the operations in the system memory or an issue in the libmysqlclient.
Trying to investigate on the first option -- does it happen always after an xmlrpc command? If yes, it is any kind of command or a specific xmlrpc command (I see you used htable.reload)?
Yes. I made another test. I sent three xmlrpc queries (htable.reload) and got crash after the third. Why after third could be explained by this sql_query that gets executed once per minute:
route [cleaner] { # Clean old statistics from database sql_query_async("sip_proxy_usage", "DELETE FROM statistics WHERE (unix_timestamp() - time_stamp) > 86400"); }
So xmlrpc query or queries did something and that something triggered subsequent sql_query to crash.
What is the version of kamailio running there?
xmlrpc is compiled by default to use system memory manager (expecting to need big chunks for some responses), can you recompile it to use the pkg? You have to edit modules/xmlrpc/Makefile and remove the line:
DEFS+=-DXMLRPC_SYSTEM_MALLOC
If there is a buffer overflow, it should be easier to catch using the internal memory manager checks -- some error messages should be printed inside the syslog as well.
Cheers, Daniel
Daniel-Constantin Mierla writes:
What is the version of kamailio running there?
4.4 build from git on April 7.
xmlrpc is compiled by default to use system memory manager (expecting to need big chunks for some responses), can you recompile it to use the pkg? You have to edit modules/xmlrpc/Makefile and remove the line:
DEFS+=-DXMLRPC_SYSTEM_MALLOC
If there is a buffer overflow, it should be easier to catch using the internal memory manager checks -- some error messages should be printed inside the syslog as well.
I'll give it a try.
-- Juha
On 03/05/16 09:01, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
These ones are no longer related to xmlrpc, but triggered by an sqlops query - first bracktrace, the other two are during the signal handling.
OK, but there must be some kind of relation to xmlrpc, because I have had this same test system running without any problems for months and now suddenly I started to experience crashes, which happen always after xmlrpc queries.
The one for sqlops leads quite a bit inside the libmysqlclient, which can happen because there is a buffer overflow in kamailio for the operations in the system memory or an issue in the libmysqlclient.
Trying to investigate on the first option -- does it happen always after an xmlrpc command? If yes, it is any kind of command or a specific xmlrpc command (I see you used htable.reload)?
Yes. I made another test. I sent three xmlrpc queries (htable.reload) and got crash after the third. Why after third could be explained by this sql_query that gets executed once per minute:
route [cleaner] { # Clean old statistics from database sql_query_async("sip_proxy_usage", "DELETE FROM statistics WHERE (unix_timestamp() - time_stamp) > 86400"); }
So xmlrpc query or queries did something and that something triggered subsequent sql_query to crash.
Thinking more of it -- is the route[cleaner] executed with rtimer on a dedicated timer process?
If yes, then the private memory operations should not conflict between rtimer process and sip worker process (that handled xmlrpc)... Even there would be a buffer overflow, only the same process is affected.
Cheers, Daniel
Daniel-Constantin Mierla writes:
Thinking more of it -- is the route[cleaner] executed with rtimer on a dedicated timer process?
I have two timer routes:
modparam("rtimer", "timer", "name=stat_timer;interval=60;mode=1") modparam("rtimer", "exec", "timer=stat_timer;route=stats") modparam("rtimer", "timer", "name=clean_timer;interval=3600") modparam("rtimer", "exec", "timer=clean_timer;route=cleaner")
and I remembered incorrectly: it is the stat_timer route that is executed once per minute and that does two mysql selects and one inserts.
i rebuild latest 4.4 kamailio where I have
# use system malloc #DEFS+=-DXMLRPC_SYSTEM_MALLOC
in xmlrpc/Makefile.
Again I got a crash. Syslog has:
May 3 11:31:22 lohi /usr/bin/sip-proxy[7234]: CRITICAL: <core> [pass_fd.c:275]: receive_fd(): EOF on 39 May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: ALERT: <core> [main.c:739]: handle_sigs(): child process 7219 exited by a signal 6 May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: ALERT: <core> [main.c:742]: handle_sigs(): core was generated May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: INFO: <core> [main.c:754]: handle_sigs(): terminating due to SIGCHLD May 3 11:31:22 lohi /usr/bin/sip-proxy[7233]: INFO: <core> [main.c:809]: sig_usr(): signal 15 received May 3 11:31:22 lohi kernel: [115953.517429] sip-proxy[7232]: segfault at 2987008 ip 00007fb666045b45 sp 00007ffe0eea3c60 error 4 in libc-2.19.so[7fb665fcd000+1a2000] May 3 11:31:22 lohi /usr/bin/sip-proxy[7229]: INFO: <core> [main.c:809]: sig_usr(): signal 15 received ...
This one produced two core files:
47508 -rw------- 1 root root 60002304 May 3 11:31 core.sip-proxy.sig11.7232 47516 -rw------- 1 root root 60006400 May 3 11:31 core.sip-proxy.sig6.7219
gdbs are below.
-- juha
# gdb /usr/bin/sip-proxy /var/cores/core.sip-proxy.sig11.7232 ... (gdb) where #0 malloc_consolidate (av=av@entry=0x7fb666372620 <main_arena>) at malloc.c:4151 #1 0x00007fb666046e98 in _int_malloc (av=av@entry=0x7fb666372620 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3423 #2 0x00007fb6660499dc in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3219 #3 0x00007fb66603ed43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe0eea3dd0, sizeloc=sizeloc@entry=0x7ffe0eea3dd8) at memstream.c:85 #4 0x00007fb6660b107d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1, fmt=0x6d9c80 "%s: %s%s(): signal %d received\n", ap=ap@entry=0x7ffe0eea3ea8) at ../misc/syslog.c:167 #5 0x00007fb6660b15bf in __syslog (pri=<optimized out>, fmt=<optimized out>) at ../misc/syslog.c:117 #6 0x0000000000502f39 in sig_usr (signo=1714890272) at main.c:809 #7 <signal handler called> #8 0x00007fb6660b5e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #9 0x00000000005d6c01 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1030 #10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781 #11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #12 0x000000000050735f in main_loop () at main.c:1690 #13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616 (gdb) quit
# gdb /usr/bin/sip-proxy /var/cores/core.sip-proxy.sig6.7219 ... (gdb) where #0 0x00007fb666002067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fb666003448 in __GI_abort () at abort.c:89 #2 0x00007fb6660401b4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7fb666135530 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007fb66604598e in malloc_printerr (action=1, str=0x7fb6661358d0 "malloc(): memory corruption (fast)", ptr=<optimized out>) at malloc.c:4996 #4 0x00007fb666047d2b in _int_malloc (av=0x7fb666372620 <main_arena>, bytes=4) at malloc.c:3359 #5 0x00007fb666049020 in __GI___libc_malloc (bytes=4) at malloc.c:2891 #6 0x00007fb6660926c5 in re_node_set_init_copy (dest=0x296a2c8, src=0x7ffe0eea2ba0) at regex_internal.c:1031 #7 0x00007fb6660929c3 in create_cd_newstate (hash=2, context=0, nodes=0x7ffe0eea2ba0, dfa=0x292acb0) at regex_internal.c:1673 #8 re_acquire_state_context (err=err@entry=0x7ffe0eea2b80, dfa=dfa@entry=0x292acb0, nodes=nodes@entry=0x7ffe0eea2ba0, context=context@entry=0) at regex_internal.c:1545 #9 0x00007fb665fee28b in build_trtable (dfa=<optimized out>, state=state@entry=0x292be30) at regexec.c:3439 #10 0x00007fb666098500 in transit_state (state=0x292be30, mctx=0x7ffe0eea2d30, err=0x7ffe0eea2cf8) at regexec.c:2322 #11 check_matching (p_match_first=0x7ffe0eea2cf4, fl_longest_match=1, mctx=0x7ffe0eea2d30) at regexec.c:1173 #12 re_search_internal (preg=preg@entry=0x7fb665e563a8, string=string@entry=0x2953558 "412", length=3, start=<optimized out>, start@entry=0, range=3, stop=<optimized out>, nmatch=<optimized out>, pmatch=0x7ffe0eea2ee0, eflags=0) at regexec.c:844 #13 0x00007fb66609ce65 in __regexec (preg=0x7fb665e563a8, string=0x2953558 "412", nmatch=<optimized out>, pmatch=0x7ffe0eea2ee0, eflags=<optimized out>) at regexec.c:252 #14 0x00007fb6632f0234 in t_check_status (msg=0x7fb6657333e8, p1=0x7fb665e56328 "pY\247e\266\177", foo=0x6 <error: Cannot access memory at address 0x6>) at tm.c:947 ---Type <return> to continue, or q <return> to quit--- #15 0x000000000053c340 in do_action (h=0x7ffe0eea34a0, a=0x7fb665a75010, msg=0x7fb6657333e8) at action.c:1060 #16 0x000000000053ae05 in run_actions (h=0x1c33, h@entry=0x7ffe0eea34a0, a=0x1c33, msg=0x6) at action.c:1549 #17 0x00000000005477d7 in run_actions_safe (h=0x7ffe0eea3ec0, a=<optimized out>, msg=<optimized out>) at action.c:1614 #18 0x0000000000424720 in rval_get_int (h=0x7ffe0eea3ec0, msg=0x1c33, i=0x7ffe0eea37d0, rv=0x7fb666002067 <__GI_raise+55>, cache=0x3134323336393230) at rvalue.c:912 #19 0x000000000042a4ac in rval_expr_eval_int (h=0x7ffe0eea3ec0, msg=0x7fb6657333e8, res=0x7ffe0eea37d0, rve=0x7fb665a75208) at rvalue.c:1910 #20 0x000000000053c817 in do_action (h=0x7ffe0eea3ec0, a=0x7fb665a789c8, msg=0x7fb6657333e8) at action.c:1030 #21 0x000000000053ae05 in run_actions (h=0x1c33, h@entry=0x7ffe0eea3ec0, a=0x1c33, a@entry=0x7fb665a745f0, msg=0x6, msg@entry=0x7fb6657333e8) at action.c:1549 #22 0x0000000000547875 in run_top_route (a=0x7fb665a745f0, msg=0x7fb6657333e8, c=<optimized out>) at action.c:1635 #23 0x00007fb6632b7957 in reply_received (p_msg=0x7fb6657333e8) at t_reply.c:2323 #24 0x00000000004fbfe6 in do_forward_reply (msg=0x7fb6657333e8, mode=0) at forward.c:747 #25 0x0000000000549673 in receive_msg (buf=0x0, len=1702048744, rcv_info=0x7fb665770f50) at receive.c:298 #26 0x00000000005cb9fa in tcp_read_req (con=0x7fb65def8fd8, bytes_read=0x7ffe0eea4404, read_flags=0x7ffe0eea4408) at tcp_read.c:1418 #27 0x00000000005d0313 in handle_io (fm=0x7fb65def8fd8, events=7219, idx=-1) at tcp_read.c:1611 #28 0x00000000005d6c4e in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #29 tcp_receive_loop (unix_sock=7219) at tcp_read.c:1781 #30 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #31 0x000000000050735f in main_loop () at main.c:1690 #32 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616 (gdb)
Still the issue is in the system memory, even the xmlrpc module was compiled with pkg malloc manager.
The crash this time is in regexp functions from libc.
It could be the operations done with libxml2 inside the xmlrpc, because they use the libxml2 native functions that rely on system malloc.
I checked the xmlrpc module commit log and last code changes were in September 2014 - even those are not related to any xml ops. In the past, have you used the xmlrpc commands for other similar deployments with an older version than 4.4 of kamailio?
Did you remember if there was any update of the OS bringing a new version of libxml2? No other ops should use system memory in xmlrpc module than the ones on libxml2...
Cheers, Daniel
On 03/05/16 10:46, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
Thinking more of it -- is the route[cleaner] executed with rtimer on a dedicated timer process?
I have two timer routes:
modparam("rtimer", "timer", "name=stat_timer;interval=60;mode=1") modparam("rtimer", "exec", "timer=stat_timer;route=stats") modparam("rtimer", "timer", "name=clean_timer;interval=3600") modparam("rtimer", "exec", "timer=clean_timer;route=cleaner")
and I remembered incorrectly: it is the stat_timer route that is executed once per minute and that does two mysql selects and one inserts.
i rebuild latest 4.4 kamailio where I have
# use system malloc #DEFS+=-DXMLRPC_SYSTEM_MALLOC
in xmlrpc/Makefile.
Again I got a crash. Syslog has:
May 3 11:31:22 lohi /usr/bin/sip-proxy[7234]: CRITICAL: <core> [pass_fd.c:275]: receive_fd(): EOF on 39 May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: ALERT: <core> [main.c:739]: handle_sigs(): child process 7219 exited by a signal 6 May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: ALERT: <core> [main.c:742]: handle_sigs(): core was generated May 3 11:31:22 lohi /usr/bin/sip-proxy[7152]: INFO: <core> [main.c:754]: handle_sigs(): terminating due to SIGCHLD May 3 11:31:22 lohi /usr/bin/sip-proxy[7233]: INFO: <core> [main.c:809]: sig_usr(): signal 15 received May 3 11:31:22 lohi kernel: [115953.517429] sip-proxy[7232]: segfault at 2987008 ip 00007fb666045b45 sp 00007ffe0eea3c60 error 4 in libc-2.19.so[7fb665fcd000+1a2000] May 3 11:31:22 lohi /usr/bin/sip-proxy[7229]: INFO: <core> [main.c:809]: sig_usr(): signal 15 received ...
This one produced two core files:
47508 -rw------- 1 root root 60002304 May 3 11:31 core.sip-proxy.sig11.7232 47516 -rw------- 1 root root 60006400 May 3 11:31 core.sip-proxy.sig6.7219
gdbs are below.
-- juha
# gdb /usr/bin/sip-proxy /var/cores/core.sip-proxy.sig11.7232 ... (gdb) where #0 malloc_consolidate (av=av@entry=0x7fb666372620 <main_arena>) at malloc.c:4151 #1 0x00007fb666046e98 in _int_malloc (av=av@entry=0x7fb666372620 <main_arena>, bytes=bytes@entry=8192) at malloc.c:3423 #2 0x00007fb6660499dc in __libc_calloc (n=<optimized out>, elem_size=<optimized out>) at malloc.c:3219 #3 0x00007fb66603ed43 in __GI_open_memstream (bufloc=bufloc@entry=0x7ffe0eea3dd0, sizeloc=sizeloc@entry=0x7ffe0eea3dd8) at memstream.c:85 #4 0x00007fb6660b107d in __GI___vsyslog_chk (pri=134, flag=flag@entry=-1, fmt=0x6d9c80 "%s: %s%s(): signal %d received\n", ap=ap@entry=0x7ffe0eea3ea8) at ../misc/syslog.c:167 #5 0x00007fb6660b15bf in __syslog (pri=<optimized out>, fmt=<optimized out>) at ../misc/syslog.c:117 #6 0x0000000000502f39 in sig_usr (signo=1714890272) at main.c:809 #7 <signal handler called> #8 0x00007fb6660b5e53 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81 #9 0x00000000005d6c01 in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1030 #10 tcp_receive_loop (unix_sock=14) at tcp_read.c:1781 #11 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #12 0x000000000050735f in main_loop () at main.c:1690 #13 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616 (gdb) quit
# gdb /usr/bin/sip-proxy /var/cores/core.sip-proxy.sig6.7219 ... (gdb) where #0 0x00007fb666002067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56 #1 0x00007fb666003448 in __GI_abort () at abort.c:89 #2 0x00007fb6660401b4 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7fb666135530 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175 #3 0x00007fb66604598e in malloc_printerr (action=1, str=0x7fb6661358d0 "malloc(): memory corruption (fast)", ptr=<optimized out>) at malloc.c:4996 #4 0x00007fb666047d2b in _int_malloc (av=0x7fb666372620 <main_arena>, bytes=4) at malloc.c:3359 #5 0x00007fb666049020 in __GI___libc_malloc (bytes=4) at malloc.c:2891 #6 0x00007fb6660926c5 in re_node_set_init_copy (dest=0x296a2c8, src=0x7ffe0eea2ba0) at regex_internal.c:1031 #7 0x00007fb6660929c3 in create_cd_newstate (hash=2, context=0, nodes=0x7ffe0eea2ba0, dfa=0x292acb0) at regex_internal.c:1673 #8 re_acquire_state_context (err=err@entry=0x7ffe0eea2b80, dfa=dfa@entry=0x292acb0, nodes=nodes@entry=0x7ffe0eea2ba0, context=context@entry=0) at regex_internal.c:1545 #9 0x00007fb665fee28b in build_trtable (dfa=<optimized out>, state=state@entry=0x292be30) at regexec.c:3439 #10 0x00007fb666098500 in transit_state (state=0x292be30, mctx=0x7ffe0eea2d30, err=0x7ffe0eea2cf8) at regexec.c:2322 #11 check_matching (p_match_first=0x7ffe0eea2cf4, fl_longest_match=1, mctx=0x7ffe0eea2d30) at regexec.c:1173 #12 re_search_internal (preg=preg@entry=0x7fb665e563a8, string=string@entry=0x2953558 "412", length=3, start=<optimized out>, start@entry=0, range=3, stop=<optimized out>, nmatch=<optimized out>, pmatch=0x7ffe0eea2ee0, eflags=0) at regexec.c:844 #13 0x00007fb66609ce65 in __regexec (preg=0x7fb665e563a8, string=0x2953558 "412", nmatch=<optimized out>, pmatch=0x7ffe0eea2ee0, eflags=<optimized out>) at regexec.c:252 #14 0x00007fb6632f0234 in t_check_status (msg=0x7fb6657333e8, p1=0x7fb665e56328 "pY\247e\266\177", foo=0x6 <error: Cannot access memory at address 0x6>) at tm.c:947 ---Type <return> to continue, or q <return> to quit--- #15 0x000000000053c340 in do_action (h=0x7ffe0eea34a0, a=0x7fb665a75010, msg=0x7fb6657333e8) at action.c:1060 #16 0x000000000053ae05 in run_actions (h=0x1c33, h@entry=0x7ffe0eea34a0, a=0x1c33, msg=0x6) at action.c:1549 #17 0x00000000005477d7 in run_actions_safe (h=0x7ffe0eea3ec0, a=<optimized out>, msg=<optimized out>) at action.c:1614 #18 0x0000000000424720 in rval_get_int (h=0x7ffe0eea3ec0, msg=0x1c33, i=0x7ffe0eea37d0, rv=0x7fb666002067 <__GI_raise+55>, cache=0x3134323336393230) at rvalue.c:912 #19 0x000000000042a4ac in rval_expr_eval_int (h=0x7ffe0eea3ec0, msg=0x7fb6657333e8, res=0x7ffe0eea37d0, rve=0x7fb665a75208) at rvalue.c:1910 #20 0x000000000053c817 in do_action (h=0x7ffe0eea3ec0, a=0x7fb665a789c8, msg=0x7fb6657333e8) at action.c:1030 #21 0x000000000053ae05 in run_actions (h=0x1c33, h@entry=0x7ffe0eea3ec0, a=0x1c33, a@entry=0x7fb665a745f0, msg=0x6, msg@entry=0x7fb6657333e8) at action.c:1549 #22 0x0000000000547875 in run_top_route (a=0x7fb665a745f0, msg=0x7fb6657333e8, c=<optimized out>) at action.c:1635 #23 0x00007fb6632b7957 in reply_received (p_msg=0x7fb6657333e8) at t_reply.c:2323 #24 0x00000000004fbfe6 in do_forward_reply (msg=0x7fb6657333e8, mode=0) at forward.c:747 #25 0x0000000000549673 in receive_msg (buf=0x0, len=1702048744, rcv_info=0x7fb665770f50) at receive.c:298 #26 0x00000000005cb9fa in tcp_read_req (con=0x7fb65def8fd8, bytes_read=0x7ffe0eea4404, read_flags=0x7ffe0eea4408) at tcp_read.c:1418 #27 0x00000000005d0313 in handle_io (fm=0x7fb65def8fd8, events=7219, idx=-1) at tcp_read.c:1611 #28 0x00000000005d6c4e in io_wait_loop_epoll (h=<optimized out>, t=<optimized out>, repeat=<optimized out>) at io_wait.h:1061 #29 tcp_receive_loop (unix_sock=7219) at tcp_read.c:1781 #30 0x00000000004d9df7 in tcp_init_children () at tcp_main.c:4788 #31 0x000000000050735f in main_loop () at main.c:1690 #32 0x000000000041cdbc in main (argc=0, argv=0x0) at main.c:2616 (gdb)
Daniel-Constantin Mierla writes:
I checked the xmlrpc module commit log and last code changes were in September 2014 - even those are not related to any xml ops. In the past, have you used the xmlrpc commands for other similar deployments with an older version than 4.4 of kamailio?
I have issued kamctl functions over xmlrpc for a long time and have never had issues before.
In the test where kamailio is crashing, xmlrpc command is issued from sems (which I have not tried earlier). I compared this test to my normal xmlrpc usage and found a difference.
Sems sends this kind of xmlrpc command:
T 2016/05/03 13:28:20.001689 127.0.0.1:48615 -> 127.0.0.1:6060 [AP] POST /RPC2 HTTP/1.1. User-Agent: XMLRPC++ 0.8. Host: 127.0.0.1:6060. Content-Type: text/xml. Content-length: 143. . <?xml version="1.0"?>. <methodCall><methodName>htable.reload</methodName>. <params><param><value>htable</value></param></params></methodCall>.
whereas the same command from my normal source (web server) looks like this:
T 2016/05/03 13:27:48.857234 127.0.0.1:48613 -> 127.0.0.1:6060 [AP] POST /RPC2 HTTP/1.0. Host: 127.0.0.1. Connection: close. User-Agent: OpenSIPg XML_RPC Client. Content-Type: text/xml. Content-Length: 194. .
## T 2016/05/03 13:27:48.857263 127.0.0.1:48613 -> 127.0.0.1:6060 [AP] <?xml version="1.0" ?>. <methodCall>. .<methodName>htable.reload</methodName>. .<params>. ..<param>. ...<value>. ....<string>htable</string>. ...</value>. ..</param>. .</params>. </methodCall>.
i see two differences:
1) sems does not include
Connection: close.
header in the requests
2) sems does not include param value inside <string></string>
perhaps the latter confuses kamailio xmlrpc request parser and causes the crash.
i made several tests generating xmlrpc requests from my wweb server and didn't get the crash. then i made one xmlrpc request from sems and got crash immediate when a made a call.
-- juha
string parameter value without <string></string> tags is legal in xmlrpc:
Scalar <value>s
<value>s can be scalars, type is indicated by nesting the value inside one of the tags listed in this table:
Tag Type Example <i4> or <int> four-byte signed integer -12 <boolean> 0 (false) or 1 (true) 1 <string> string hello world <double> double-precision signed floating point number -12.214 <dateTime.iso8601> date/time 19980717T14:08:55 <base64> base64-encoded binary eW91IGNhbid0IHJlYWQgdGhpcyE=
If no type is indicated, the type is string.
-- juha
in my sems xmlrpc test, i changed htable.reload command to permissions.trustedReload (that does not have parameters) and have not been able to produce a crash.
my conclusion is that xmlrpc string parameter value without <string></string> is causing the crash (perhaps due to memory allocation bug either in kamailio or in the xml lib that it is using).
-- juha
Thanks for digging in further, the details proved to be important.
And I think I found the issue -- when the type was not provided, the pointer inside the xmlNode structure was used instead of using the function to return a clone of the content, in order to free it later via xmlrpc module garbage collector.
I couldn't test it -- if you can test the commit:
https://github.com/kamailio/kamailio/commit/6b23c27d8406780083a6b7bb774efa2a...
and all works fine, then you can backport to stable branches.
Cheers, Daniel
On 03/05/16 14:19, Juha Heinanen wrote:
in my sems xmlrpc test, i changed htable.reload command to permissions.trustedReload (that does not have parameters) and have not been able to produce a crash.
my conclusion is that xmlrpc string parameter value without <string></string> is causing the crash (perhaps due to memory allocation bug either in kamailio or in the xml lib that it is using).
-- juha
Daniel-Constantin Mierla writes:
And I think I found the issue -- when the type was not provided, the pointer inside the xmlNode structure was used instead of using the function to return a clone of the content, in order to free it later via xmlrpc module garbage collector.
I couldn't test it -- if you can test the commit:
https://github.com/kamailio/kamailio/commit/6b23c27d8406780083a6b7bb774efa2a...
and all works fine, then you can backport to stable branches.
Thanks, I tested and can't anymore reproduce the crash. I'll do the backporting.
-- Juha
On 03/05/16 17:31, Juha Heinanen wrote:
Daniel-Constantin Mierla writes:
And I think I found the issue -- when the type was not provided, the pointer inside the xmlNode structure was used instead of using the function to return a clone of the content, in order to free it later via xmlrpc module garbage collector.
I couldn't test it -- if you can test the commit:
https://github.com/kamailio/kamailio/commit/6b23c27d8406780083a6b7bb774efa2a...
and all works fine, then you can backport to stable branches.
Thanks, I tested and can't anymore reproduce the crash. I'll do the backporting.
OK, thanks, Daniel