On 03/11/14 17:05, Alex Hermann wrote:
On Monday 03 November 2014, Daniel-Constantin Mierla
wrote:
Reference counters are good as long as they are
used in predictable
circumstances. The problem encountered so far were related to the fact
that not all calls have proper signaling (e.g., network issues, buggy
clients), the reason for this cleanup routine as well
True. But even then
refcounting shouldn't be bypassed. For each (cleanup)
situation the expected refcount should be compared to the actual refcount
before deleting.
Not sure what would be expected reference counter value in each of
the
states. The aim of the cleanup was to go beyond reasonable time a dialog
can stay in a state and clean up to avoid memory leaks.
For the situation at hand, if the refcount is >1,
something is handling the
dlg (or has buggy code not unreffing after finishing).
In theory, the dialog cannot last for ever in initial state, so cleaning
it up make sense to stay safe of memory leaks at runtime -- we can print
log messages to alert, but keeping it for ever brings more troubles
(e.g., besides memory leaks, the call limit can be affected).
I think my fix catches those situations tough. When the ACK is missing, the
refcount should be 1. When another process is handling the dlg, the refcount
will be >1.
As I said, for operations that are long term, I consider better to keep
dlg index and label (two integers) than increasing the reference count
and storing the pointer (on 64b it is the same size). Then lookup again
the dialog when needed. Finding the reason of what determined you to do
this is important, not to push kind of workaround that we don't fully
understand its purpose and benefits.
IMO, it is not right to keep a dialog in proxy as long as caller/callee
don't have it anymore.
You can eventually make the behavior of the patch configurable, so you
can enable it via modparam, in the current form I don't find it ok.
(i.e., also sip
protocol relies on a time interval for not receiving ACK).
This specific situation
is not about missing ACK. That would be
DLG_STATE_CONFIRMED_NA.
It was a quick example that sip relies on time intervals for changing
states.
My goal was to figure out what was the situation,
see if it is something
predictable that can be fixed in source code -- dialogs staying too long
in a state that shouldn't take too long are susceptible to issues and it
is better if we know what was the reason.
It is weird that the TMCB_DESTROY
callback didn't cleanup the dialog. I
haven't investigated the cause as i could fix the segfault pretty easily.
So far sounds like fixing something that we don't know what was and the
side effects are not good. I would like to get to a proper solution,
knowing what was the issue.
Cheers,
Daniel
--
Daniel-Constantin Mierla
http://twitter.com/#!/miconda -
http://www.linkedin.com/in/miconda
Kamailio Advanced Training, Nov 24-27, Berlin -
http://www.asipto.com