Hi,
yes, you're totally right, we got the core in other server and I though the fix was included in the code we compiled in this server, but it wasn't. My fault.
Now, a very recent copy of the 3.1 git branch is running, Daniel's patch is included. I'll keep you informed but it should go fine.
Thanks, and sorry for the misunderstanding,
Regards,
Anton
Hey,
I (and Marius -- credits!) digged through your coredump and found a few
On 13.05.2011 11:11, Timo Reimann wrote:
> On 12.05.2011 15:55, Anton Roman wrote:
>> my answer is inline:
>>
>> 2011/5/12 Timo Reimann <timo.reimann@1und1.de
>> <mailto:timo.reimann@1und1.de>>
>> As to the reason of the segfault, the dialog structure or hash table may
>> already be gone when unref_dlg() is called. Can you go to stack #0 and
>> tell us what the value of each of the following data structures is (use
>> "p <data structure> in gdb):
>>
>> *dlg
>> d_table
>> d_table->entries
>>
>>
>> Here you have:
>>
>> (gdb) p *dlg
>> $1 = {ref = 793790803, next = 0xa0d4b4f20303032, prev =
>> 0x504953203a616956, h_id = 808333871, h_entry = 1346655535, state =
>> 775174432,
>> lifetime = 841888562, start_ts = 892219952, dflags = 808794678, sflags
>> = 1648046134, toroute = 1668178290, toroute_name = {
>> s = 0x62344768397a3d68 <Address 0x62344768397a3d68 out of bounds>,
>> len = 946221643}, from_rr_nb = 1886534457, tl = {
>> next = 0x72460a0d30363035, prev = 0x6f6e4122203a6d6f, timeout =
>> 1869445486}, callid = {
>> s = 0x6f6e613a7069733c <Address 0x6f6e613a7069733c out of bounds>,
>> len = 1869445486}, from_uri = {
>> s = 0x3230322e33322e34 <Address 0x3230322e33322e34 out of bounds>,
>> len = 1043739950}, to_uri = {
>
> [...]
>
> As I suspected, your dialog seems outdated already: The reference count
> is 793790803, and the Call-ID is supposed to have a rough 2 billions
> characters. That's what I call unique. :)
>
> I could ask you for more details on the dump but it'd probably be
> easiest if I could take a direct (gdb-)look at it. Would you mind
> sending it to me in private (i.e., no CC to the mailing list) to the
> address I am writing from?
curiosities. Before I bug you with the details, let me just say this:
There might be something wrong the dialog reference counter that
determines when a dialog is a to be removed from the hash table. In
fact, your call stack indicates that an unreference operation was
attempted on a hash table which looks empty:
(gdb) frame 0
#0 unref_dlg (dlg=0x7f08a9f67da8, cnt=1) at dlg_hash.c:598598 dlg_lock( d_table, d_entry);
(gdb) p *d_table->entries$53 = {first = 0x0, last = 0x0, next_id = 1124074261, lock_idx = 0}
Looking through the mailing-list archive, I noticed you brought
attention to another reference counter-related bug which Daniel provided
a fix for with commit 2c28a251a. Since you reported that no more issues
appeared with that fixed version, I just backported the patch into 3.1.
However, I can see from your core dump that you are not using a Kamailio
version that includes the fix.
Before we continue with any bug hunting, could you try a version of
Kamailio that comes with Daniel's "safer unref of terminated dialogs"
patch? This can be master branch copy or a recent copy of the 3.1 git
branch. I'd suggest the latter so we can ensure that no bleeding-edge
features added to the dialog module distort our analysis.
Thanks and
Cheers,
--Timo