[sr-dev] SER crash : Segmentation fault

inge inge at legos.fr
Thu Aug 20 10:40:34 CEST 2009


Hi Andrei,

As I understand, this changelog only apply to the tm module.
Is there any clues that this module caused the crash we experienced ?

We would like to determine which of the known and corrected bug could
have caused the crash, in order to find a short-time workaround letting
us some time to deploy abn upgrade to the latest rel in the 0.9.0
branch.

Adrien

Le mardi 18 août 2009 à 09:00 +0200, Andrei Pelinescu-Onciul a écrit :
> On Aug 17, 2009 at 14:42, inge <inge at legos.fr> wrote:
> > Hi Andrei,
> > 
> > Hope you are fine.
> > Do you have any update on our crash ?
> > Is there anything we can do to find the segmentation fault cause, maybe
> > as a well-known bug, without bothering you ?
> 
> 
> There are lots of changes between 0.9.5-pre and the latest 0.9.x
> version.
> You should try updating to the latest code on the rel_0_9_0 branch and
> see if you run into this problem again.
> To get the latest 0.9.x code either get the latest snapshot from
>  http://ftp.iptel.org/pub/ser/daily-snapshots/stable/ , use cvs to
>  get the rel_0_9_0 branch
>  (CVSROOT=:pserver:anonymous at cvs.berlios.de:/cvsroot/ser ;
>  export CVSROOT ; cvs co -r rel_0_9_0 sip_router ), or use git and the
>  ser repository (see http://sip-router.org/wiki/git/ser-repository).
> 
> Here's a short changelog for tm, between 0.9.5 and 0.9.7+
>  (git log --oneline v_0_9_5..origin/rel_0_9_0 modules/tm):
> - tm: fix delete_cell() when the transaction is referenced
> - variable timer fix: variable timers (avps) won't be exteneded anymore 
> - fix for free_rdata_list() which used to access the "next" pointer af
> - deadlock when t_relay-ing a message from the failure_route fixed  (e2e
> - added sems specific patch. This patch is present in the ser version ship
> - added diversion and rpid header cloning
> -bug fix: tm insert_timer used to eat too much cpu, decreasing dramatic
> - fixed misplaced set_avp list, courtesy of cesc.santa at gmail.com
> - int2reverse_hex/reverse_hex2int fixes  (tm with large "labels" was aff
> - fix of local ACK matching provided by cesc.santa at gmail.com
> - avp race condition fix (backported from HEAD)
> - CANCEL terminates retransmission timers properly (backported)
> 
> 
> Andrei
> 
> 
> > 
> > Le vendredi 14 ao??t 2009 ?? 17:03 +0200, inge a ??crit :
> > > Please find the requested information in attached.
> > > 
> > > I'm aware of the need for an update. It's in the list of tasks to be
> > > done, however, the priority is to troubleshoot the problem and maybe
> > > find a workaround.
> > > 
> > > Regards,
> > > 
> > > Adrien
> > > 
> > > Le vendredi 14 ao??t 2009 ?? 16:34 +0200, Andrei Pelinescu-Onciul a
> > > ??crit :
> > > > On Aug 14, 2009 at 15:01, inge <inge at legos.fr> wrote:
> > > > > Hi Andrei,
> > > > > 
> > > > > Thanks for your reply.
> > > > > 
> > > > > I use ser 0.9.5-pre4. 
> > > > > 
> > > > > I don't really understand the bug you have identify, where can I find a
> > > > > description ?
> > > > 
> > > > Sorry, I was wrong (that bug was in RR and appears only in newer code).
> > > > 
> > > > Could you run gdb on the core again , type "frame 0" and then send me the 
> > > > output of the following commands:
> > > > 
> > > > print p_cell
> > > > print p_msg
> > > > print p_msg->buf
> > > > print p_cell->uas.local_totag.len
> > > > print p_cell->uas.local_totag.s
> > > > print p_msg->to
> > > > print p_msg->to->parsed
> > > > print *((struct to_body*)(p_msg->to->parsed))
> > > > print ((struct to_body*)(p_msg->to->parsed))->tag_value.len
> > > > print ((struct to_body*)(p_msg->to->parsed))->tag_value.s
> > > > 
> > > > 
> > > > Andrei
> > > > P.S.: you could try also upgrading to ser 2.0, 2.1 or sip-router.
> > > > 
> > > > 
> > > > > 
> > > > > Regards,
> > > > > 
> > > > > Adrien
> > > > > 
> > > > > Le vendredi 14 ao??t 2009 ?? 14:45 +0200, Andrei Pelinescu-Onciul a
> > > > > ??crit :
> > > > > > On Aug 13, 2009 at 15:32, inge <inge at legos.fr> wrote:
> > > > > > > Hi Klaus,
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > I put the output of gdb in attached.
> > > > > > > 
> > > > > > > I hope someone can decrypt this. Thank you.
> > > > > > 
> > > > > > 
> > > > > > If you are using ser 2.1/latest cvs or sip-router then just update to
> > > > > > the latest cvs or git. It's a known fixed bug (sip router
> > > > > > git 6fcd5e or ser 2.1 commit starting with "rr: fix from header
> > > > > > access").
> > > > > > 
> > > > > > If you are using another version then tell me which one (ser -V) 
> > > > > > and I'll fix it.
> > > > > > 
> > > > > > Andrei
> > > > > > 
> > > > > > > 
> > > > > > > Le jeudi 13 ao??t 2009 ?? 13:53 +0200, Klaus Darilion a ??crit :
> > > > > > > > locate the core file (either in the working dir or /tmp or /)
> > > > > > > > then execute:
> > > > > > > > 
> > > > > > > > gdb /usr/local/sbin/ser /path/to/core
> > > > > > > > (gdb) bt
> > > > > > > > 
> > > > > > > > regards
> > > > > > > > klaus
> > > > > > > > 
> > > > > > > > inge schrieb:
> > > > > > > > > Hi all,
> > > > > > > > > 
> > > > > > > > > My SER process had crashed today with the following logs
> > > > > > > > > in /var/log/messages : 
> > > > > > > > > 
> > > > > > > > > ser[378]: child process 418 exited by a signal 11
> > > > > > > > > ser[378]: core was generated
> > > > > > > > > ser[378]: INFO: terminating due to SIGCHLD
> > > > > > > > > ser[421]: INFO: signal 15 received
> > > > > > > > > ...
> > > > > > > > > 
> > > > > > > > > Can someone help me to determine what kind of problem is it ? I think I
> > > > > > > > > need to use gdb to extract some information from the core dump. How can
> > > > > > > > > I use it to extract the uses informations ?
> > > > > > > > > 
> > > > > > > > > Regards,
> > > > > > > > > 
> > > > > > > > > Adrien
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > sr-dev mailing list
> > > > > > > > > sr-dev at lists.sip-router.org
> > > > > > > > > http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
> > > > > > 
> > > > > > > #0  0x00e964d3 in matching_3261 (p_msg=0x81647e8, trans=0xbff74f38, skip_method=4294967294) at t_lookup.c:222
> > > > > > > 222             if (memcmp(get_to(ack)->tag_value.s,p_cell->uas.local_totag.s,
> > > > > > > (gdb) bt
> > > > > > > #0  0x00e964d3 in matching_3261 (p_msg=0x81647e8, trans=0xbff74f38, skip_method=4294967294) at t_lookup.c:222
> > > > > > > #1  0x00e96aff in t_lookup_request (p_msg=0x81647e8, leave_new_locked=1) at t_lookup.c:421
> > > > > > > #2  0x00e992a0 in t_newtran (p_msg=0x81647e8) at t_lookup.c:1085
> > > > > > > #3  0x00e9116a in t_relay_to (p_msg=0x81647e8, proxy=0x0, proto=0, replicate=0) at t_funcs.c:224
> > > > > > > #4  0x00e9c410 in w_t_relay (p_msg=0x81647e8, _foo=0x0, _bar=0x0) at tm.c:889
> > > > > > > #5  0x0804fc81 in do_action (a=0x8117818, msg=0x81647e8) at action.c:610
> > > > > > > #6  0x0805099d in run_actions (a=0x8117818, msg=0x81647e8) at action.c:718
> > > > > > > #7  0x08073f08 in eval_elem (e=0x8117840, msg=0x81647e8) at route.c:605
> > > > > > > #8  0x08074392 in eval_expr (e=0x8117840, msg=0x81647e8) at route.c:654
> > > > > > > #9  0x080743ce in eval_expr (e=0x8117860, msg=0x81647e8) at route.c:670
> > > > > > > #10 0x0804ec95 in do_action (a=0x8117bc8, msg=0x81647e8) at action.c:586
> > > > > > > #11 0x0805099d in run_actions (a=0x8117630, msg=0x81647e8) at action.c:718
> > > > > > > #12 0x0804ffdf in do_action (a=0x8114f70, msg=0x81647e8) at action.c:375
> > > > > > > #13 0x0805099d in run_actions (a=0x8114f70, msg=0x81647e8) at action.c:718
> > > > > > > #14 0x0804ecd3 in do_action (a=0x8114fc0, msg=0x81647e8) at action.c:603
> > > > > > > #15 0x0805099d in run_actions (a=0x8114fc0, msg=0x81647e8) at action.c:718
> > > > > > > #16 0x0804ecd3 in do_action (a=0x8114fe8, msg=0x81647e8) at action.c:603
> > > > > > > #17 0x0805099d in run_actions (a=0x8114fe8, msg=0x81647e8) at action.c:718
> > > > > > > #18 0x0804ecd3 in do_action (a=0x8115010, msg=0x81647e8) at action.c:603
> > > > > > > #19 0x0805099d in run_actions (a=0x8115010, msg=0x81647e8) at action.c:718
> > > > > > > #20 0x0804ecd3 in do_action (a=0x8115038, msg=0x81647e8) at action.c:603
> > > > > > > #21 0x0805099d in run_actions (a=0x8115038, msg=0x81647e8) at action.c:718
> > > > > > > #22 0x0804ecd3 in do_action (a=0x8115060, msg=0x81647e8) at action.c:603
> > > > > > > #23 0x0805099d in run_actions (a=0x810fe88, msg=0x81647e8) at action.c:718
> > > > > > > #24 0x0806d062 in receive_msg (
> > > > > > >     buf=0x80d61e0 "ACK sip:0389719641 at domain.tld:5060 SIP/2.0\r\nMax-Forwards: 16\r\nContent-Length: 0\r\nVia: SIP/2.0/UDP 10.0.140.147:5060;branch=z9hG4bK4f1b8571c\r\nCall-ID: bf85c76a5e2066256679e3945f6b4e36 at 10.0.140.147\r\nF"..., len=592, rcv_info=0xbff76340) at receive.c:165
> > > > > > > #25 0x080843cc in udp_rcv_loop () at udp_server.c:472
> > > > > > > #26 0x0805cdaf in main_loop () at main.c:1056
> > > > > > > #27 0x0805e40b in main (argc=1, argv=0xbff76504) at main.c:1592
> > > > > > > 
> > > > > > 
> > > > > > > _______________________________________________
> > > > > > > sr-dev mailing list
> > > > > > > sr-dev at lists.sip-router.org
> > > > > > > http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
> > > > > > 
> > > _______________________________________________
> > > sr-dev mailing list
> > > sr-dev at lists.sip-router.org
> > > http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev




More information about the sr-dev mailing list