On Monday 03 November 2014, Daniel-Constantin Mierla wrote:
Reference counters are good as long as they are used in predictable circumstances. The problem encountered so far were related to the fact that not all calls have proper signaling (e.g., network issues, buggy clients), the reason for this cleanup routine as well
True. But even then refcounting shouldn't be bypassed. For each (cleanup) situation the expected refcount should be compared to the actual refcount before deleting.
For the situation at hand, if the refcount is >1, something is handling the dlg (or has buggy code not unreffing after finishing).
I think my fix catches those situations tough. When the ACK is missing, the refcount should be 1. When another process is handling the dlg, the refcount will be >1.
(i.e., also sip protocol relies on a time interval for not receiving ACK).
This specific situation is not about missing ACK. That would be DLG_STATE_CONFIRMED_NA.
My goal was to figure out what was the situation, see if it is something predictable that can be fixed in source code -- dialogs staying too long in a state that shouldn't take too long are susceptible to issues and it is better if we know what was the reason.
It is weird that the TMCB_DESTROY callback didn't cleanup the dialog. I haven't investigated the cause as i could fix the segfault pretty easily.