Hey,
On 03.03.2011 10:19, Anton Roman wrote:
Checking the time staps from the acc and the crash
log, the BYE for
the dialog was before the crash but the To-tag is not printed from
dlg_hash.c, although it is in the acc for INVITE and BYE. Do you
have parallel forking in front of this SIP server? I mean, is there
another proxy that can do parallel forking then send two or more
branches to this instance?
AFAIK the the client who is sending that calls is not doing parallel
forking, they are sending calls over a SIP trunk to our Kamailio. They
are calling to PSTN numbers and we are sending that calls to a gateway,
so they shouldn't do parallel forking, I'll get some traces to check it.
Your trace shows that there are two worker processes dealing with the
segfault-triggering dialog, process ID 32155 and 32158. I cannot see
from your trace what module caused the latter process to execute
unref_dlg() in dlg_hash.c, however.
What I can tell though is that the crash happens because too much dialog
reference counter decrementing takes place. Although I have no clue why,
I believe the implementation of unref_dlg_unsafe() (a macro) could be
somewhat more robust by not unlinking and destroying a dialog when the
counter drops below zero. That is, instead of running the following block
if ((_dlg)->ref<=0) { \
unlink_unsafe_dlg( _d_entry, _dlg);\
LM_DBG("ref <=0 for dialog %p\n",_dlg);\
destroy_dlg(_dlg);\
}\
for _dlg->ref <= 0, I see no reason to change the compare operator to ==.
Of course, that just cures the symptoms. A coredump would be really
helpful in identifying the root of the crash problem but I don't know
why it wasn't generated in your case. Your configuration looks good to me.
Cheers,
--Timo