Hey,
On 03.03.2011 10:19, Anton Roman wrote:
Checking the time staps from the acc and the crash log, the BYE for the dialog was before the crash but the To-tag is not printed from dlg_hash.c, although it is in the acc for INVITE and BYE. Do you have parallel forking in front of this SIP server? I mean, is there another proxy that can do parallel forking then send two or more branches to this instance?
AFAIK the the client who is sending that calls is not doing parallel forking, they are sending calls over a SIP trunk to our Kamailio. They are calling to PSTN numbers and we are sending that calls to a gateway, so they shouldn't do parallel forking, I'll get some traces to check it.
Your trace shows that there are two worker processes dealing with the segfault-triggering dialog, process ID 32155 and 32158. I cannot see from your trace what module caused the latter process to execute unref_dlg() in dlg_hash.c, however.
What I can tell though is that the crash happens because too much dialog reference counter decrementing takes place. Although I have no clue why, I believe the implementation of unref_dlg_unsafe() (a macro) could be somewhat more robust by not unlinking and destroying a dialog when the counter drops below zero. That is, instead of running the following block
if ((_dlg)->ref<=0) { \ unlink_unsafe_dlg( _d_entry, _dlg);\ LM_DBG("ref <=0 for dialog %p\n",_dlg);\ destroy_dlg(_dlg);\ }\
for _dlg->ref <= 0, I see no reason to change the compare operator to ==.
Of course, that just cures the symptoms. A coredump would be really helpful in identifying the root of the crash problem but I don't know why it wasn't generated in your case. Your configuration looks good to me.
Cheers,
--Timo