[SR-Users] core in dialog module

Timo Reimann timo.reimann at 1und1.de
Fri May 13 18:02:54 CEST 2011


Hey,


On 13.05.2011 11:11, Timo Reimann wrote:
> On 12.05.2011 15:55, Anton Roman wrote:
>> my answer is inline:
>>
>> 2011/5/12 Timo Reimann <timo.reimann at 1und1.de
>> <mailto:timo.reimann at 1und1.de>>
>>     As to the reason of the segfault, the dialog structure or hash table may
>>     already be gone when unref_dlg() is called. Can you go to stack #0 and
>>     tell us what the value of each of the following data structures is (use
>>     "p <data structure> in gdb):
>>
>>     *dlg
>>     d_table
>>     d_table->entries
>>
>>
>> Here you have:
>>
>> (gdb) p *dlg
>> $1 = {ref = 793790803, next = 0xa0d4b4f20303032, prev =
>> 0x504953203a616956, h_id = 808333871, h_entry = 1346655535, state =
>> 775174432,
>>   lifetime = 841888562, start_ts = 892219952, dflags = 808794678, sflags
>> = 1648046134, toroute = 1668178290, toroute_name = {
>>     s = 0x62344768397a3d68 <Address 0x62344768397a3d68 out of bounds>,
>> len = 946221643}, from_rr_nb = 1886534457, tl = {
>>     next = 0x72460a0d30363035, prev = 0x6f6e4122203a6d6f, timeout =
>> 1869445486}, callid = {
>>     s = 0x6f6e613a7069733c <Address 0x6f6e613a7069733c out of bounds>,
>> len = 1869445486}, from_uri = {
>>     s = 0x3230322e33322e34 <Address 0x3230322e33322e34 out of bounds>,
>> len = 1043739950}, to_uri = {
> 
> [...]
> 
> As I suspected, your dialog seems outdated already: The reference count
> is 793790803, and the Call-ID is supposed to have a rough 2 billions
> characters. That's what I call unique. :)
> 
> I could ask you for more details on the dump but it'd probably be
> easiest if I could take a direct (gdb-)look at it. Would you mind
> sending it to me in private (i.e., no CC to the mailing list) to the
> address I am writing from?

I (and Marius -- credits!) digged through your coredump and found a few
curiosities. Before I bug you with the details, let me just say this:
There might be something wrong the dialog reference counter that
determines when a dialog is a to be removed from the hash table. In
fact, your call stack indicates that an unreference operation was
attempted on a hash table which looks empty:

(gdb) frame 0
#0  unref_dlg (dlg=0x7f08a9f67da8, cnt=1) at dlg_hash.c:598
598		dlg_lock( d_table, d_entry);
(gdb) p *d_table->entries
$53 = {first = 0x0, last = 0x0, next_id = 1124074261, lock_idx = 0}


Looking through the mailing-list archive, I noticed you brought
attention to another reference counter-related bug which Daniel provided
a fix for with commit 2c28a251a. Since you reported that no more issues
appeared with that fixed version, I just backported the patch into 3.1.
However, I can see from your core dump that you are not using a Kamailio
version that includes the fix.

Before we continue with any bug hunting, could you try a version of
Kamailio that comes with Daniel's "safer unref of terminated dialogs"
patch? This can be master branch copy or a recent copy of the 3.1 git
branch. I'd suggest the latter so we can ensure that no bleeding-edge
features added to the dialog module distort our analysis.

Thanks and


Cheers,

--Timo



More information about the sr-users mailing list