[SR-Users] core in dialog module

Anton Roman antonroman at gmail.com
Tue May 17 02:10:22 CEST 2011


Hi,

yes, you're totally right, we got the core in other server and I though the
fix was included in the code we compiled in this server, but it wasn't. My
fault.

Now, a very recent copy of the 3.1 git branch is running, Daniel's patch is
included. I'll keep you informed but it should go fine.

Thanks, and sorry for the misunderstanding,

Regards,
Anton



2011/5/13 Timo Reimann <timo.reimann at 1und1.de>

> Hey,
>
>
> On 13.05.2011 11:11, Timo Reimann wrote:
> > On 12.05.2011 15:55, Anton Roman wrote:
> >> my answer is inline:
> >>
> >> 2011/5/12 Timo Reimann <timo.reimann at 1und1.de
> >> <mailto:timo.reimann at 1und1.de>>
> >>     As to the reason of the segfault, the dialog structure or hash table
> may
> >>     already be gone when unref_dlg() is called. Can you go to stack #0
> and
> >>     tell us what the value of each of the following data structures is
> (use
> >>     "p <data structure> in gdb):
> >>
> >>     *dlg
> >>     d_table
> >>     d_table->entries
> >>
> >>
> >> Here you have:
> >>
> >> (gdb) p *dlg
> >> $1 = {ref = 793790803, next = 0xa0d4b4f20303032, prev =
> >> 0x504953203a616956, h_id = 808333871, h_entry = 1346655535, state =
> >> 775174432,
> >>   lifetime = 841888562, start_ts = 892219952, dflags = 808794678, sflags
> >> = 1648046134, toroute = 1668178290, toroute_name = {
> >>     s = 0x62344768397a3d68 <Address 0x62344768397a3d68 out of bounds>,
> >> len = 946221643}, from_rr_nb = 1886534457, tl = {
> >>     next = 0x72460a0d30363035, prev = 0x6f6e4122203a6d6f, timeout =
> >> 1869445486}, callid = {
> >>     s = 0x6f6e613a7069733c <Address 0x6f6e613a7069733c out of bounds>,
> >> len = 1869445486}, from_uri = {
> >>     s = 0x3230322e33322e34 <Address 0x3230322e33322e34 out of bounds>,
> >> len = 1043739950}, to_uri = {
> >
> > [...]
> >
> > As I suspected, your dialog seems outdated already: The reference count
> > is 793790803, and the Call-ID is supposed to have a rough 2 billions
> > characters. That's what I call unique. :)
> >
> > I could ask you for more details on the dump but it'd probably be
> > easiest if I could take a direct (gdb-)look at it. Would you mind
> > sending it to me in private (i.e., no CC to the mailing list) to the
> > address I am writing from?
>
> I (and Marius -- credits!) digged through your coredump and found a few
> curiosities. Before I bug you with the details, let me just say this:
> There might be something wrong the dialog reference counter that
> determines when a dialog is a to be removed from the hash table. In
> fact, your call stack indicates that an unreference operation was
> attempted on a hash table which looks empty:
>
> (gdb) frame 0
> #0  unref_dlg (dlg=0x7f08a9f67da8, cnt=1) at dlg_hash.c:598
> 598             dlg_lock( d_table, d_entry);
> (gdb) p *d_table->entries
> $53 = {first = 0x0, last = 0x0, next_id = 1124074261, lock_idx = 0}
>
>
> Looking through the mailing-list archive, I noticed you brought
> attention to another reference counter-related bug which Daniel provided
> a fix for with commit 2c28a251a. Since you reported that no more issues
> appeared with that fixed version, I just backported the patch into 3.1.
> However, I can see from your core dump that you are not using a Kamailio
> version that includes the fix.
>
> Before we continue with any bug hunting, could you try a version of
> Kamailio that comes with Daniel's "safer unref of terminated dialogs"
> patch? This can be master branch copy or a recent copy of the 3.1 git
> branch. I'd suggest the latter so we can ensure that no bleeding-edge
> features added to the dialog module distort our analysis.
>
> Thanks and
>
>
> Cheers,
>
> --Timo
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20110517/5d7d390b/attachment-0001.htm>


More information about the sr-users mailing list