More info:

On another cluster, same setup..., during this troubleshooting I disabled DMQ and enabled MySQL for dialog replication, I also left one node outside of rotation to see replication behavior.

Well, with 0 traffic I can see this:

root@sbc01:~# kamctl rpc dlg.stats_active
{
  "jsonrpc":  "2.0",
  "result": {
    "starting": 0,
    "connecting": 0,
    "answering":  0,
    "ongoing":  0,
    "all":  0
  },
  "id": 2634
}
root@sbc01:~#

Which is correct, but...:

root@sbc01:~# kamctl rpc stats.fetch all | grep dialog
    "dialog.active_dialogs":  "38",
    "dialog.early_dialogs": "0",
    "dialog.expired_dialogs": "38",
    "dialog.failed_dialogs":  "0",
    "dialog.processed_dialogs": "1",
root@sbc01:~#

I wonder if it's correct that those 38 expired dialogs still count towards the active counter? Could that have something to do? Maybe those non-existent-expired-dialogs because they are somewhere in the "active" counter, they get replicated only when DMQ is enabled?

So far I have clear that:

1- For sure there is a scenario where counters go below 0.
2- It only happens (so far) when DMQ is enabled
3- In my initial look, I found some logs about dialogs created on node1 were timed-out on node2 (see previous posts)
4- Now I see a discrepancy on active vs expired dialogs on a node with 0 active dialogs.

I feel we are slowly narrowing down what the possible problem can be, but I still don't have a clear picture what causes what.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.