Hey,
On 13.05.2011 11:11, Timo Reimann wrote:
On 12.05.2011 15:55, Anton Roman wrote:
my answer is inline:
2011/5/12 Timo Reimann <timo.reimann@1und1.de mailto:timo.reimann@1und1.de> As to the reason of the segfault, the dialog structure or hash table may already be gone when unref_dlg() is called. Can you go to stack #0 and tell us what the value of each of the following data structures is (use "p <data structure> in gdb):
*dlg d_table d_table->entries
Here you have:
(gdb) p *dlg $1 = {ref = 793790803, next = 0xa0d4b4f20303032, prev = 0x504953203a616956, h_id = 808333871, h_entry = 1346655535, state = 775174432, lifetime = 841888562, start_ts = 892219952, dflags = 808794678, sflags = 1648046134, toroute = 1668178290, toroute_name = { s = 0x62344768397a3d68 <Address 0x62344768397a3d68 out of bounds>, len = 946221643}, from_rr_nb = 1886534457, tl = { next = 0x72460a0d30363035, prev = 0x6f6e4122203a6d6f, timeout = 1869445486}, callid = { s = 0x6f6e613a7069733c <Address 0x6f6e613a7069733c out of bounds>, len = 1869445486}, from_uri = { s = 0x3230322e33322e34 <Address 0x3230322e33322e34 out of bounds>, len = 1043739950}, to_uri = {
[...]
As I suspected, your dialog seems outdated already: The reference count is 793790803, and the Call-ID is supposed to have a rough 2 billions characters. That's what I call unique. :)
I could ask you for more details on the dump but it'd probably be easiest if I could take a direct (gdb-)look at it. Would you mind sending it to me in private (i.e., no CC to the mailing list) to the address I am writing from?
I (and Marius -- credits!) digged through your coredump and found a few curiosities. Before I bug you with the details, let me just say this: There might be something wrong the dialog reference counter that determines when a dialog is a to be removed from the hash table. In fact, your call stack indicates that an unreference operation was attempted on a hash table which looks empty:
(gdb) frame 0 #0 unref_dlg (dlg=0x7f08a9f67da8, cnt=1) at dlg_hash.c:598 598 dlg_lock( d_table, d_entry); (gdb) p *d_table->entries $53 = {first = 0x0, last = 0x0, next_id = 1124074261, lock_idx = 0}
Looking through the mailing-list archive, I noticed you brought attention to another reference counter-related bug which Daniel provided a fix for with commit 2c28a251a. Since you reported that no more issues appeared with that fixed version, I just backported the patch into 3.1. However, I can see from your core dump that you are not using a Kamailio version that includes the fix.
Before we continue with any bug hunting, could you try a version of Kamailio that comes with Daniel's "safer unref of terminated dialogs" patch? This can be master branch copy or a recent copy of the 3.1 git branch. I'd suggest the latter so we can ensure that no bleeding-edge features added to the dialog module distort our analysis.
Thanks and
Cheers,
--Timo