[sr-dev] SER crash : Segmentation fault

Andrei Pelinescu-Onciul andrei at iptel.org
Wed Sep 9 12:26:39 CEST 2009


On Aug 20, 2009 at 10:40, inge <inge at legos.fr> wrote:
> Hi Andrei,
> 
> As I understand, this changelog only apply to the tm module.
> Is there any clues that this module caused the crash we experienced ?

Yes, according to the backtrace it crashed in tm. It looks like the tag
value was corrupted (one possible explanation is that matching against a
deleted transaction was attempted). It's also possible but much more
unlikely that despite the backtrace info the crash is not related to tm
(e.g. some other module corrupting shared memory).

> 
> We would like to determine which of the known and corrected bug could
> have caused the crash, in order to find a short-time workaround letting
> us some time to deploy abn upgrade to the latest rel in the 0.9.0
> branch.

That would be quite hard since we don't know yet if the crash is really
fixed in the latest 0.9.x
If you can reproduce the crash, then you could try a test instalation of
the latest 0.9.x and see if the crash is fixed.
It's very easy to upgrade between 0.9.x versions. There are no config or
db changes, the only differences are bug fixes.

If it still crashes with the latest 0.9.x, then the next step would be
to compile it with debugging info, in an attempt to get more meaningful
backtraces.


Andrei

> 
> Le mardi 18 ao??t 2009 ?? 09:00 +0200, Andrei Pelinescu-Onciul a ??crit :
> > On Aug 17, 2009 at 14:42, inge <inge at legos.fr> wrote:
> > > Hi Andrei,
> > > 
> > > Hope you are fine.
> > > Do you have any update on our crash ?
> > > Is there anything we can do to find the segmentation fault cause, maybe
> > > as a well-known bug, without bothering you ?
> > 
> > 
> > There are lots of changes between 0.9.5-pre and the latest 0.9.x
> > version.
> > You should try updating to the latest code on the rel_0_9_0 branch and
> > see if you run into this problem again.
> > To get the latest 0.9.x code either get the latest snapshot from
> >  http://ftp.iptel.org/pub/ser/daily-snapshots/stable/ , use cvs to
> >  get the rel_0_9_0 branch
> >  (CVSROOT=:pserver:anonymous at cvs.berlios.de:/cvsroot/ser ;
> >  export CVSROOT ; cvs co -r rel_0_9_0 sip_router ), or use git and the
> >  ser repository (see http://sip-router.org/wiki/git/ser-repository).
> > 
> > Here's a short changelog for tm, between 0.9.5 and 0.9.7+
> >  (git log --oneline v_0_9_5..origin/rel_0_9_0 modules/tm):
> > - tm: fix delete_cell() when the transaction is referenced
> > - variable timer fix: variable timers (avps) won't be exteneded anymore 
> > - fix for free_rdata_list() which used to access the "next" pointer af
> > - deadlock when t_relay-ing a message from the failure_route fixed  (e2e
> > - added sems specific patch. This patch is present in the ser version ship
> > - added diversion and rpid header cloning
> > -bug fix: tm insert_timer used to eat too much cpu, decreasing dramatic
> > - fixed misplaced set_avp list, courtesy of cesc.santa at gmail.com
> > - int2reverse_hex/reverse_hex2int fixes  (tm with large "labels" was aff
> > - fix of local ACK matching provided by cesc.santa at gmail.com
> > - avp race condition fix (backported from HEAD)
> > - CANCEL terminates retransmission timers properly (backported)
> > 
> > 
> > Andrei
> > 
> > 
> > > 
> > > Le vendredi 14 ao??t 2009 ?? 17:03 +0200, inge a ??crit :
> > > > Please find the requested information in attached.
> > > > 
> > > > I'm aware of the need for an update. It's in the list of tasks to be
> > > > done, however, the priority is to troubleshoot the problem and maybe
> > > > find a workaround.
> > > > 
> > > > Regards,
> > > > 
> > > > Adrien
> > > > 
> > > > Le vendredi 14 ao??t 2009 ?? 16:34 +0200, Andrei Pelinescu-Onciul a
> > > > ??crit :
> > > > > On Aug 14, 2009 at 15:01, inge <inge at legos.fr> wrote:
> > > > > > Hi Andrei,
> > > > > > 
> > > > > > Thanks for your reply.
> > > > > > 
> > > > > > I use ser 0.9.5-pre4. 
> > > > > > 
> > > > > > I don't really understand the bug you have identify, where can I find a
> > > > > > description ?
> > > > > 
> > > > > Sorry, I was wrong (that bug was in RR and appears only in newer code).
> > > > > 
> > > > > Could you run gdb on the core again , type "frame 0" and then send me the 
> > > > > output of the following commands:
> > > > > 
> > > > > print p_cell
> > > > > print p_msg
> > > > > print p_msg->buf
> > > > > print p_cell->uas.local_totag.len
> > > > > print p_cell->uas.local_totag.s
> > > > > print p_msg->to
> > > > > print p_msg->to->parsed
> > > > > print *((struct to_body*)(p_msg->to->parsed))
> > > > > print ((struct to_body*)(p_msg->to->parsed))->tag_value.len
> > > > > print ((struct to_body*)(p_msg->to->parsed))->tag_value.s
> > > > > 
> > > > > 
> > > > > Andrei
> > > > > P.S.: you could try also upgrading to ser 2.0, 2.1 or sip-router.
> > > > > 
> > > > > 
> > > > > > 
> > > > > > Regards,
> > > > > > 
> > > > > > Adrien
> > > > > > 
> > > > > > Le vendredi 14 ao??t 2009 ?? 14:45 +0200, Andrei Pelinescu-Onciul a
> > > > > > ??crit :
> > > > > > > On Aug 13, 2009 at 15:32, inge <inge at legos.fr> wrote:
> > > > > > > > Hi Klaus,
> > > > > > > > 
> > > > > > > > Thanks.
> > > > > > > > 
> > > > > > > > I put the output of gdb in attached.
> > > > > > > > 
> > > > > > > > I hope someone can decrypt this. Thank you.
> > > > > > > 
> > > > > > > 
> > > > > > > If you are using ser 2.1/latest cvs or sip-router then just update to
> > > > > > > the latest cvs or git. It's a known fixed bug (sip router
> > > > > > > git 6fcd5e or ser 2.1 commit starting with "rr: fix from header
> > > > > > > access").
> > > > > > > 
> > > > > > > If you are using another version then tell me which one (ser -V) 
> > > > > > > and I'll fix it.
> > > > > > > 
> > > > > > > Andrei
> > > > > > > 
> > > > > > > > 
> > > > > > > > Le jeudi 13 ao??t 2009 ?? 13:53 +0200, Klaus Darilion a ??crit :
> > > > > > > > > locate the core file (either in the working dir or /tmp or /)
> > > > > > > > > then execute:
> > > > > > > > > 
> > > > > > > > > gdb /usr/local/sbin/ser /path/to/core
> > > > > > > > > (gdb) bt
> > > > > > > > > 
> > > > > > > > > regards
> > > > > > > > > klaus
> > > > > > > > > 
> > > > > > > > > inge schrieb:
> > > > > > > > > > Hi all,
> > > > > > > > > > 
> > > > > > > > > > My SER process had crashed today with the following logs
> > > > > > > > > > in /var/log/messages : 
> > > > > > > > > > 
> > > > > > > > > > ser[378]: child process 418 exited by a signal 11
> > > > > > > > > > ser[378]: core was generated
> > > > > > > > > > ser[378]: INFO: terminating due to SIGCHLD
> > > > > > > > > > ser[421]: INFO: signal 15 received
> > > > > > > > > > ...
> > > > > > > > > > 
> > > > > > > > > > Can someone help me to determine what kind of problem is it ? I think I
> > > > > > > > > > need to use gdb to extract some information from the core dump. How can
> > > > > > > > > > I use it to extract the uses informations ?
> > > > > > > > > > 
> > > > > > > > > > Regards,
> > > > > > > > > > 
> > > > > > > > > > Adrien
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > _______________________________________________
> > > > > > > > > > sr-dev mailing list
> > > > > > > > > > sr-dev at lists.sip-router.org
> > > > > > > > > > http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
> > > > > > > 
> > > > > > > > #0  0x00e964d3 in matching_3261 (p_msg=0x81647e8, trans=0xbff74f38, skip_method=4294967294) at t_lookup.c:222
> > > > > > > > 222             if (memcmp(get_to(ack)->tag_value.s,p_cell->uas.local_totag.s,
> > > > > > > > (gdb) bt
> > > > > > > > #0  0x00e964d3 in matching_3261 (p_msg=0x81647e8, trans=0xbff74f38, skip_method=4294967294) at t_lookup.c:222
> > > > > > > > #1  0x00e96aff in t_lookup_request (p_msg=0x81647e8, leave_new_locked=1) at t_lookup.c:421
> > > > > > > > #2  0x00e992a0 in t_newtran (p_msg=0x81647e8) at t_lookup.c:1085
> > > > > > > > #3  0x00e9116a in t_relay_to (p_msg=0x81647e8, proxy=0x0, proto=0, replicate=0) at t_funcs.c:224
> > > > > > > > #4  0x00e9c410 in w_t_relay (p_msg=0x81647e8, _foo=0x0, _bar=0x0) at tm.c:889
> > > > > > > > #5  0x0804fc81 in do_action (a=0x8117818, msg=0x81647e8) at action.c:610
> > > > > > > > #6  0x0805099d in run_actions (a=0x8117818, msg=0x81647e8) at action.c:718
> > > > > > > > #7  0x08073f08 in eval_elem (e=0x8117840, msg=0x81647e8) at route.c:605
> > > > > > > > #8  0x08074392 in eval_expr (e=0x8117840, msg=0x81647e8) at route.c:654
> > > > > > > > #9  0x080743ce in eval_expr (e=0x8117860, msg=0x81647e8) at route.c:670
> > > > > > > > #10 0x0804ec95 in do_action (a=0x8117bc8, msg=0x81647e8) at action.c:586
> > > > > > > > #11 0x0805099d in run_actions (a=0x8117630, msg=0x81647e8) at action.c:718
> > > > > > > > #12 0x0804ffdf in do_action (a=0x8114f70, msg=0x81647e8) at action.c:375
> > > > > > > > #13 0x0805099d in run_actions (a=0x8114f70, msg=0x81647e8) at action.c:718
> > > > > > > > #14 0x0804ecd3 in do_action (a=0x8114fc0, msg=0x81647e8) at action.c:603
> > > > > > > > #15 0x0805099d in run_actions (a=0x8114fc0, msg=0x81647e8) at action.c:718
> > > > > > > > #16 0x0804ecd3 in do_action (a=0x8114fe8, msg=0x81647e8) at action.c:603
> > > > > > > > #17 0x0805099d in run_actions (a=0x8114fe8, msg=0x81647e8) at action.c:718
> > > > > > > > #18 0x0804ecd3 in do_action (a=0x8115010, msg=0x81647e8) at action.c:603
> > > > > > > > #19 0x0805099d in run_actions (a=0x8115010, msg=0x81647e8) at action.c:718
> > > > > > > > #20 0x0804ecd3 in do_action (a=0x8115038, msg=0x81647e8) at action.c:603
> > > > > > > > #21 0x0805099d in run_actions (a=0x8115038, msg=0x81647e8) at action.c:718
> > > > > > > > #22 0x0804ecd3 in do_action (a=0x8115060, msg=0x81647e8) at action.c:603
> > > > > > > > #23 0x0805099d in run_actions (a=0x810fe88, msg=0x81647e8) at action.c:718
> > > > > > > > #24 0x0806d062 in receive_msg (
> > > > > > > >     buf=0x80d61e0 "ACK sip:0389719641 at domain.tld:5060 SIP/2.0\r\nMax-Forwards: 16\r\nContent-Length: 0\r\nVia: SIP/2.0/UDP 10.0.140.147:5060;branch=z9hG4bK4f1b8571c\r\nCall-ID: bf85c76a5e2066256679e3945f6b4e36 at 10.0.140.147\r\nF"..., len=592, rcv_info=0xbff76340) at receive.c:165
> > > > > > > > #25 0x080843cc in udp_rcv_loop () at udp_server.c:472
> > > > > > > > #26 0x0805cdaf in main_loop () at main.c:1056
> > > > > > > > #27 0x0805e40b in main (argc=1, argv=0xbff76504) at main.c:1592
> > > > > > > > 
> > > > > > > 
> > > > > > > > _______________________________________________
> > > > > > > > sr-dev mailing list
> > > > > > > > sr-dev at lists.sip-router.org
> > > > > > > > http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
> > > > > > > 
> > > > _______________________________________________
> > > > sr-dev mailing list
> > > > sr-dev at lists.sip-router.org
> > > > http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev



More information about the sr-dev mailing list