[sr-dev] git:master: dialog: Don't destroy dialogs in timer while they' re still being referenced

Alex Hermann alex at speakup.nl
Tue Nov 4 14:19:58 CET 2014


On Tuesday 04 November 2014, Daniel-Constantin Mierla wrote:
> On 03/11/14 17:05, Alex Hermann wrote:
> > On Monday 03 November 2014, Daniel-Constantin Mierla wrote:

> > True. But even then refcounting shouldn't be bypassed. For each (cleanup)
> > situation the expected refcount should be compared to the actual refcount
> > before deleting.
> 
> Not sure what would be expected reference counter value in each of the
> states.

I think it is necessary to know it to be able to securely cleanup dialogs.


> In theory, the dialog cannot last for ever in initial state, so cleaning
> it up make sense to stay safe of memory leaks at runtime -- we can print
> log messages to alert, but keeping it for ever brings more troubles
> (e.g., besides memory leaks, the call limit can be affected).

I agree those must be cleaned, even though it is not technically a memory 
leak, as the dialog pointer is still used in the dialog hash.


> As I said, for operations that are long term, I consider better to keep
> dlg index and label (two integers) than increasing the reference count
> and storing the pointer (on 64b it is the same size). Then lookup again
> the dialog when needed.

Who is talking about "long term" operations? The duration is not of any 
relevance here.

Almost none of the dialog related operations apart from getting the dlg 
pointer are protected by locks but by the refcounter. Suppose process 1 
obtains a dlg pointer (from the h_entry, and h_id), and upped the refcounter 
as a result. It can get scheduled away at any random time, even directly after 
obtaining the dlg pointer. If the timer process get scheduled at that moment 
and frees the memory of the dlg (disregarding the fact that is is still in 
use), how is process 1 to know that?


> Finding the reason of what determined you to do
> this is important, not to push kind of workaround that we don't fully
> understand its purpose and benefits.
> 
> IMO, it is not right to keep a dialog in proxy as long as caller/callee
> don't have it anymore.

I agree, just not with the method you used to get rid of the dialog.

 
> You can eventually make the behavior of the patch configurable, so you
> can enable it via modparam, in the current form I don't find it ok.

I'm not going to make a known segfault path configurable, so i'll revert it if 
you insist. But may i kindly suggest you revert your code too which introduced 
this code in the first place as, IMHO, it is fundamentally wrong by deleting 
an object without checking if it is in use.


> >> (i.e., also sip
> >> protocol relies on a time interval for not receiving ACK).
> > 
> > This specific situation is not about missing ACK. That would be
> > DLG_STATE_CONFIRMED_NA.
> 
> It was a quick example that sip relies on time intervals for changing
> states.
> 
> >> My goal was to figure out what was the situation, see if it is something
> >> predictable that can be fixed in source code -- dialogs staying too long
> >> in a state that shouldn't take too long are susceptible to issues and it
> >> is better if we know what was the reason.
> > 
> > It is weird that the TMCB_DESTROY callback didn't cleanup the dialog. I
> > haven't investigated the cause as i could fix the segfault pretty easily.
> 
> So far sounds like fixing something that we don't know what was and the
> side effects are not good. I would like to get to a proper solution,
> knowing what was the issue.

I'm sorry, but IMHO this (and earlier) argument applies more to your initial 
cleanup routine for the stray dialogs than to my fix for the segfault.

-- 
Greetings,

Alex Hermann




More information about the sr-dev mailing list