[sr-dev] git:master: dialog: Don't destroy dialogs in timer while they' re still being referenced

Daniel-Constantin Mierla miconda at gmail.com
Tue Nov 4 10:57:54 CET 2014


On 03/11/14 17:05, Alex Hermann wrote:
> On Monday 03 November 2014, Daniel-Constantin Mierla wrote:
>> Reference counters are good as long as they are used in predictable
>> circumstances. The problem encountered so far were related to the fact
>> that not all calls have proper signaling (e.g., network issues, buggy
>> clients), the reason for this cleanup routine as well
> True. But even then refcounting shouldn't be bypassed. For each (cleanup) 
> situation the expected refcount should be compared to the actual refcount 
> before deleting.
Not sure what would be expected reference counter value in each of the
states. The aim of the cleanup was to go beyond reasonable time a dialog
can stay in a state and clean up to avoid memory leaks.

> For the situation at hand, if the refcount is >1, something is handling the 
> dlg (or has buggy code not unreffing after finishing).

In theory, the dialog cannot last for ever in initial state, so cleaning
it up make sense to stay safe of memory leaks at runtime -- we can print
log messages to alert, but keeping it for ever brings more troubles
(e.g., besides memory leaks, the call limit can be affected).

>
> I think my fix catches those situations tough. When the ACK is missing, the 
> refcount should be 1. When another process is handling the dlg, the refcount 
> will be >1.

As I said, for operations that are long term, I consider better to keep
dlg index and label (two integers) than increasing the reference count
and storing the pointer (on 64b it is the same size). Then lookup again
the dialog when needed. Finding the reason of what determined you to do
this is important, not to push kind of workaround that we don't fully
understand its purpose and benefits.

IMO, it is not right to keep a dialog in proxy as long as caller/callee
don't have it anymore.

You can eventually make the behavior of the patch configurable, so you
can enable it via modparam, in the current form I don't find it ok.


>
>
>> (i.e., also sip
>> protocol relies on a time interval for not receiving ACK).
> This specific situation is not about missing ACK. That would be 
> DLG_STATE_CONFIRMED_NA.

It was a quick example that sip relies on time intervals for changing
states.

>
>> My goal was to figure out what was the situation, see if it is something
>> predictable that can be fixed in source code -- dialogs staying too long
>> in a state that shouldn't take too long are susceptible to issues and it
>> is better if we know what was the reason.
> It is weird that the TMCB_DESTROY callback didn't cleanup the dialog. I 
> haven't investigated the cause as i could fix the segfault pretty easily.
>
So far sounds like fixing something that we don't know what was and the
side effects are not good. I would like to get to a proper solution,
knowing what was the issue.

Cheers,
Daniel

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio Advanced Training, Nov 24-27, Berlin - http://www.asipto.com




More information about the sr-dev mailing list