[SR-Users] problem unreferencing dialog in dialog module

Timo Reimann timo.reimann at 1und1.de
Thu Mar 3 11:11:15 CET 2011


Hey,


On 03.03.2011 10:19, Anton Roman wrote:
>     Checking the time staps from the acc and the crash log, the BYE for
>     the dialog was before the crash but the To-tag is not printed from
>     dlg_hash.c, although it is in the acc for INVITE and BYE. Do you
>     have parallel forking in front of this SIP server? I mean, is there
>     another proxy that can do parallel forking then send two or more
>     branches to this instance?
> 
> AFAIK the the client who is sending that calls is not doing parallel
> forking, they are sending calls over a SIP trunk to our Kamailio. They
> are calling to PSTN numbers and we are sending that calls to a gateway,
> so they shouldn't do parallel forking, I'll get some traces to check it.  

Your trace shows that there are two worker processes dealing with the
segfault-triggering dialog, process ID 32155 and 32158. I cannot see
from your trace what module caused the latter process to execute
unref_dlg() in dlg_hash.c, however.

What I can tell though is that the crash happens because too much dialog
reference counter decrementing takes place. Although I have no clue why,
I believe the implementation of unref_dlg_unsafe() (a macro) could be
somewhat more robust by not unlinking and destroying a dialog when the
counter drops below zero. That is, instead of running the following block

if ((_dlg)->ref<=0) { \
	unlink_unsafe_dlg( _d_entry, _dlg);\
	LM_DBG("ref <=0 for dialog %p\n",_dlg);\
	destroy_dlg(_dlg);\
}\

for _dlg->ref <= 0, I see no reason to change the compare operator to ==.

Of course, that just cures the symptoms. A coredump would be really
helpful in identifying the root of the crash problem but I don't know
why it wasn't generated in your case. Your configuration looks good to me.


Cheers,

--Timo



More information about the sr-users mailing list