Hello,
I thought of starting a dedicated thread to discuss the cleaning of dialogs stuck for long time in a state that it shouldn't, following the sr-dev debate on a commit from Alex (cc-ed).
Finding a root cause it is always the best, but some time it is hard or even impossible given the nature of sip, where packets can be mis-routed or overtake the order, or even sip is relying on changing state based on timeouts. This is supposed to be last resort in dealing with exceptional cases and be safe with memory and call limits.
Existing code was cleaning dialogs not answered for more than 5 min, based on the fact that theoretically such case doesn't happen. At the time of adding this, I reviewed that dialog and tm don't keep the pointer for long time relying on reference counter, it such cases it should clone the dialog id and lookup again.
Pointed by Alex, this is not completely if it happens that lookup of a dialog in this special situation (staying too long in a state that theoretically shouldn't be) is done when the cleanup task discovers the anomaly.
My first thought to fix it is to track the last usage and based on it to clean. So the condition to clean would become: dialog is too long in a state that shouldn't be and last access to dialog was quite a while ago.
I am trying to figure out if anyone else thinks of different/better solution.
Cheers, Daniel