Situation: Two Kamailio nodes syncing dialog profiles via DMQ
Observation: When a dialog timeout is encountered, about half of the time, the timeout is triggered on the peer node not handling the dialog preventing the call to be correctly terminated and also leaving entries in the dialog database which never get deleted.
https://kamailio.org/docs/modules/5.7.x/modules/dialog.html#dialog.p.enable_...
Makes clear, that only the node which is handling the dialog in question, can make changes not related to the dialog profiles. So when a dialog times out, it is this node which has to trigger the timeout, not any other one.
When looking at the source code, it is clear, that the 'lifetime' is transmitted via DMQ dialog message to the peer nodes, which in turn arm a timer. So it is obvious, this time will trigger, sometimes before the instance handling the dialog itself triggers that timer.
With some assistance of @oej I found a way to alter the JSON payload to extend the lifetime on the peer nodes:
``` route[DMQ_CAPTURE] { if(is_method("KDMQ")) { if(has_body("application/json") && $fU == 'dialog') { if (jansson_get("lifetime", $rb, "$var(lifetime)")) { $var(new_lifetime) = $var(lifetime) + 60; # Add 60 seconds on DMQ peer to make sure it expires AFTER main node. $var(newrb) = $rb; jansson_set("integer", "lifetime", $var(new_lifetime), "$var(newrb)"); set_body("$var(newrb)","application/json"); msg_apply_changes(); } } dmq_handle_message(); exit; } } ```
Testing with this config fixed the issue of the timeout firing on a peer node instead of the node handling the dialog.
Issue is present in 5.5, 5.6 and after looking at the code, I assume also on 5.7
-Benoît
Maybe we can add a setting in the dialog module for
- ignoring the timeout on the DMQ peer (receiver) - adding a value on the DMQ peer so the primary server triggers first.
/O
Just for reference, there is also another issue with is related to the DMQ dialog handling #2080
I guess I stumbled over some other issue or bug...
`[1 dialog 0f57b21a4ca92561-1371066@x.x.x.x 10 KDMQ]jansson [jansson_funcs.c:47]: janssonmod_get_helper(): json error at line 1, col 3154: duplicate object key near '"tag1"'`
Indeed, when looking at the JSON structure sent with dialog action 1, state 2, sometimes the key 'tag1' is present twice. Value is the same in both keys so exact duplicates. This is not valid JSON and thus the jansson parser fails and I can not change the lifetime in those occurrences.
Quick check in the code revealed a potential execution path with duplicated attributes. There is a switch a fall-through case, not being the developer of that snippet, I don't know the design reasons, but hopefully the commit I pushed to mater is fixing the issue. Test and report if it doesn't.
Just a short warning to anyone considering using my work-around. I experienced dialog data corruption which could be due to this or maybe the duplicate key. I reverted this change to my config.
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
@hb9eue did you test the proposed fix?
Unfortunately not as I'm working with an older version. But the fix also was for the problem I found with my work-around. A fix for the original issue that the node not in charge of a dialog might delete it, is not solved by this fix. So the issue that a timer is armed on a 'slave' node still is present and I believe this should get addressed.
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
keep alive
Hmm, would modparam("dialog", "dlg_filter_mode", 1) fix the issue by not arming the timer?
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Ping to remove stale label
This issue is stale because it has been open 6 weeks with no activity. Remove stale label or comment or this will be closed in 2 weeks.
Closed #3656 as not planned.