On 2012-01-20 at 23:32, Timo Reimann wrote:
Hi and sorry for the late response!
Another possible root cause: After calling
dlg_manage() on an INVITE, you
do not forward the request (e.g., by calling exit() instead). Could that
be the case? If so, the solution would be (again) to defer dialog
tracking unless you're sure the INVITE will be routed.
Thanks, this was indeed at least the main problem! We have no replaced a
lot of sl_send_reply("123", "message") with t_newtran() +
t_reply("123","message") before exit(), and that solved most of the
hanging
dialogs. These were by the way hanging in state 1.
If not, the last thing I can think of to try is
to do some tracing (using
ngrep or tcpdump, for example) and attempt to catch a dialog that
dangles. If you succeed at that, analyzing the trace will probably help
in determining the issue.
We still have a few calls, probably one to two a day, which get stuck in
state 4. They have proper INVITE, 200 OK and later BYE. We have tracked
the problem down to that once in a while Asterisk, our bridge to PSTN,
issues double INVITEs at the extact same time for the same call and in this
case there seem to be a race within Kamailio. The call is properly setup
and termitated, but the entry is still in the dialog table.
Any ideas how to cope with the double INVITEs? We do btw use
dlg_match_mode = 1, as we used that in Kamailio 1.5 and that worked like a
charm. Have not tested altering it to either 0 or 2.
Some extra information:
We also still get some dialogs stuck in state 1 when we see these double
invites (but the call is not set up due to busy, hang-up etc).
For these events we also (always?) see one or more of these messages in
the logs:
CRITICAL: dialog [dlg_hash.c:650]: bogus event 6 in state 1 for dlg
0xb5e88dc4 [2445:97666510] with clid
'2adcd4b23355a3aa3a4ae5a73fe72631@pstn-gateway-ip' and tags 'as485e20a7'
''
CRITICAL: dialog [dlg_hash.c:650]: bogus event 7 in state 1 for dlg
0xb5e88dc4 [2445:97666510] with clid
'2adcd4b23355a3aa3a4ae5a73fe72631@pstn-gateway-ip' and tags 'as485e20a7'
''
Also sometimes event 8.
This happens to a very small minority of calls (<1% I'd guess).
Best regards,
Marius Pedersen