More info:
On another cluster, same setup..., during this troubleshooting I disabled DMQ and enabled MySQL for dialog replication, I also left one node outside of rotation to see replication behavior.
Well, with 0 traffic I can see this:
``` root@sbc01:~# kamctl rpc dlg.stats_active { "jsonrpc": "2.0", "result": { "starting": 0, "connecting": 0, "answering": 0, "ongoing": 0, "all": 0 }, "id": 2634 } root@sbc01:~# ```
Which is correct, but...:
``` root@sbc01:~# kamctl rpc stats.fetch all | grep dialog "dialog.active_dialogs": "38", "dialog.early_dialogs": "0", "dialog.expired_dialogs": "38", "dialog.failed_dialogs": "0", "dialog.processed_dialogs": "1", root@sbc01:~# ```
I wonder if it's correct that those 38 `expired` dialogs still count towards the `active` counter? Could that have something to do? Maybe those non-existent-expired-dialogs because they are somewhere in the "active" counter, they get replicated only when DMQ is enabled?
So far I have clear that:
1- For sure there is a scenario where counters go below 0. 2- It only happens (so far) when DMQ is enabled 3- In my initial look, I found some logs about dialogs created on node1 were timed-out on node2 (see previous posts) 4- Now I see a discrepancy on active vs expired dialogs on a node with 0 active dialogs.
I feel we are slowly narrowing down what the possible problem can be, but I still don't have a clear picture what causes what.