Having made some tests around this, whilst I have not yet been able to reproduce the
negative counter issue, I do think there needs to be some further thought around dialog
replication.
First thing to note - stats are not affected by replicated dialogs, so I don't think
DMQ is _directly_ responsible for the negative counting. However - and this may be
_indirectly_ related - if a node is restarted, any dialogs owned by it at the time will be
forever 'stuck'. This is owing to the fact that in its current implementation, the
dialog owner is responsible for triggering update/removal across the rest of the cluster.
If the owner no longer exists - or has been restarted and has no idea that it was once the
owner of some/all of the dialogs it receives in its initial re-sync - then this link is
broken permanently. It is further compounded by the fact these orphaned dialogs never (in
my tests, anyway) timeout.
I need to spend some more time on the DMQ side, since this is the first time I have looked
at it properly. In the meantime, @joelsdc, regarding your issue:
1. Do you have database enabled alongside DMQ replication or was it only for testing? I
suspect this is where the recent 38 'expired' dialogs came from - conversely, the
earlier 'bad' dialogs you mentioned were likely a result of the owning node having
been restarted (these would not have been included in the 'expired' counter).
2. Are you expecting the stats counters (the original ones, not the new
'dlg.stats_active') to reflect all dialogs across the cluster or just those
handled directly by the local instance?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/1591#issuecomment-409205552