Could be a buffer overflow somewhere.
First, do a 'bt full' for both cores and send the output, just to see if
something is strage inside the structures.
Then, can you compile with MEMDBG=1 in Makefile.defs, reinstall and test
again. Check the logs for memory related error messages and see if you
get head/tail overwritten.
Cheers,
Daniel
On 8/28/13 9:11 AM, Alex Balashov wrote:
With the patch applied, I sometimes get this
crash, too:
(gdb) where
#0 0x0000000000539ed9 in qm_detach_free (qm=0x7f516c197010,
frag=0x7f516c3b6dc0) at mem/q_malloc.c:268
#1 0x000000000053a118 in qm_malloc (qm=0x7f516c197010, size=960)
at mem/q_malloc.c:386
#2 0x00000000004bc41d in rval_new_empty (extra_size=102) at rvalue.c:236
#3 0x00000000004bc48f in rval_new_str (s=0x7ffff3194e70, extra_size=80)
at rvalue.c:260
#4 0x00000000004beb87 in rval_convert (h=0x7ffff3196cb0,
msg=0x7f516b6e7920,
type=RV_STR, v=0x7f516c2e3728, c=0x7ffff3195030) at rvalue.c:1321
#5 0x00000000004c002b in rval_str_lop2 (h=0x7ffff3196cb0,
msg=0x7f516b6e7920,
res=0x7ffff31954d8, op=RVE_EQ_OP, l=0x7f516c2e3728,
c1=0x7ffff3195030,
r=0x7f516c2e3ea8, c2=0x0) at rvalue.c:1752
#6 0x00000000004c0c61 in rval_expr_eval_int (h=0x7ffff3196cb0,
msg=0x7f516b6e7920, res=0x7ffff31954d8, rve=0x7f516c2e4580)
at rvalue.c:2058
#7 0x0000000000418d5a in do_action (h=0x7ffff3196cb0, a=0x7f516c2e59d0,
msg=0x7f516b6e7920) at action.c:1050
#8 0x0000000000421aa7 in run_actions (h=0x7ffff3196cb0,
a=0x7f516c2e3580,
msg=0x7f516b6e7920) at action.c:1573
#9 0x000000000042047f in do_action (h=0x7ffff3196cb0, a=0x7f516c2e5d70,
msg=0x7f516b6e7920) at action.c:1374
#10 0x0000000000421aa7 in run_actions (h=0x7ffff3196cb0,
a=0x7f516c2daae0,
msg=0x7f516b6e7920) at action.c:1573
#11 0x0000000000418fa2 in do_action (h=0x7ffff3196cb0, a=0x7f516c2eb450,
msg=0x7f516b6e7920) at action.c:1065
#12 0x0000000000421aa7 in run_actions (h=0x7ffff3196cb0,
a=0x7f516c2eb450,
msg=0x7f516b6e7920) at action.c:1573
#13 0x0000000000418ffb in do_action (h=0x7ffff3196cb0, a=0x7f516c2eb550,
msg=0x7f516b6e7920) at action.c:1069
#14 0x0000000000421aa7 in run_actions (h=0x7ffff3196cb0,
a=0x7f516c2c4170,
msg=0x7f516b6e7920) at action.c:1573
#15 0x0000000000416f3a in do_action (h=0x7ffff3196cb0, a=0x7f516c2f5800,
---Type <return> to continue, or q <return> to quit---
msg=0x7f516b6e7920) at action.c:690
#16 0x0000000000421aa7 in run_actions (h=0x7ffff3196cb0,
a=0x7f516c2ef330,
msg=0x7f516b6e7920) at action.c:1573
#17 0x0000000000422231 in run_top_route (a=0x7f516c2ef330,
msg=0x7f516b6e7920,
c=0x0) at action.c:1658
#18 0x00007f516b49b220 in run_failure_handlers (t=0x7f506769a3b0,
rpl=0x7f516c3b5df0, code=480, extra_flags=64) at t_reply.c:1024
#19 0x00007f516b49c39b in t_should_relay_response (Trans=0x7f506769a3b0,
new_code=480, branch=0, should_store=0x7ffff3196f90,
should_relay=0x7ffff3196f94, cancel_data=0x7ffff31971a0,
reply=0x7f516c3b5df0) at t_reply.c:1300
#20 0x00007f516b49dec4 in relay_reply (t=0x7f506769a3b0,
p_msg=0x7f516c3b5df0,
branch=0, msg_status=480, cancel_data=0x7ffff31971a0,
do_put_on_wait=1)
at t_reply.c:1703
#21 0x00007f516b4a0f46 in reply_received (p_msg=0x7f516c3b5df0)
at t_reply.c:2370
#22 0x0000000000458861 in do_forward_reply (msg=0x7f516c3b5df0, mode=0)
at forward.c:799
#23 0x00000000004590d0 in forward_reply (msg=0x7f516c3b5df0) at
forward.c:882
#24 0x000000000049e276 in receive_msg (
buf=0x9065c0 "SIP/2.0 480 Temporarily Unavailable\r\nVia:
SIP/2.0/UDP 55.177.31.199;branch=z9hG4bK25c.fe28da07.0\r\nVia:
SIP/2.0/UDP
192.13.219.87:5060;branch=z9hG4bK-1cc0-521da21e-332440e0-d482ab\r\nRecord-Route:
<sip:6"..., len=862,
rcv_info=0x7ffff3197520) at receive.c:272
#25 0x000000000052ffa1 in udp_rcv_loop () at udp_server.c:557
#26 0x0000000000467de2 in main_loop () at main.c:1638
#27 0x000000000046ad8b in main (argc=13, argv=0x7ffff3197858) at
main.c:2566
I've seen it before in this scenario, but so infrequently that I
didn't think it was worth mentioning.
On 08/28/2013 03:01 AM, Alex Balashov wrote:
Hi Daniel,
With your patch applied (setting param list head to NULL), it now
crashes in a different place:
Program terminated with signal 11, Segmentation fault.
#0 0x000000000055e602 in free_to_params (tb=0x7f31fee421a0)
at parser/parse_to.c:827
827 foo = tp->next;
Missing separate debuginfos, use: debuginfo-install
cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 glibc-2.12-1.107.el6.x86_64
keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.2.x86_64
libcom_err-1.41.12-14.el6.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64
nspr-4.9.2-1.el6.x86_64 nss-3.14.0.0-12.el6.x86_64
nss-softokn-freebl-3.12.9-11.el6.x86_64 nss-util-3.14.0.0-2.el6.x86_64
openldap-2.4.23-32.el6_4.1.x86_64 openssl-1.0.0-27.el6_4.2.x86_64
postgresql92-libs-9.2.4-1PGDG.rhel6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) where
#0 0x000000000055e602 in free_to_params (tb=0x7f31fee421a0)
at parser/parse_to.c:827
#1 0x000000000055e658 in free_to (tb=0x7f31fee421a0) at
parser/parse_to.c:838
#2 0x000000000053e2a9 in clean_hdr_field (hf=0x7f31fee23bc0)
at parser/hf.c:113
#3 0x000000000053e51d in free_hdr_field_lst (hf=0x7f31fee20a60)
at parser/hf.c:223
#4 0x0000000000542d04 in free_sip_msg (msg=0x7f31fee40df0)
at parser/msg_parser.c:729
#5 0x000000000049e39d in receive_msg (
buf=0x9065c0 "SIP/2.0 480 Temporarily Unavailable\r\nVia:
SIP/2.0/UDP 55.177.31.199;branch=z9hG4bKbe3a.dab6345.0\r\nVia:
SIP/2.0/UDP
192.13.219.87:5060;branch=z9hG4bK-1a97-521d9f57-331967d3-3174bfdc\r\nRecord-Route:
<sip"..., len=866,
rcv_info=0x7fff34138bd0) at receive.c:296
#6 0x000000000052ffa1 in udp_rcv_loop () at udp_server.c:557
#7 0x0000000000467de2 in main_loop () at main.c:1638
#8 0x000000000046ad8b in main (argc=13, argv=0x7fff34138f08) at
main.c:2566
-- Alex
On 08/27/2013 08:49 AM, Alex Balashov wrote:
Hi Daniel,
On 08/27/2013 08:47 AM, Daniel-Constantin Mierla wrote:
> Hello,
>
> can you try this patch?
> -
>
http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=14835f8…
>
>
>
>
>
> One reason for such crash could be double-free, which could eventually
> happen because the pointer to params was not reset after freeing the
> list.
I will certainly try it, thank you.
However, it is curious that this crash occurs only in this exact
situation, only when calling this PBX, only when it has two registrants
to fork among, only when I use this combination of request
routes/subroutines.