I forgot to mention that the recoveries are actual Kamailio manual restarts.
> When
you restart a node on the DMQ bus, it can trigger memory usage on the
other nodes since they will start to do a SYNC and send one DMQ message /
contact
> It could be that one node in the DMQ bus is restarted and not answering DMQ messages ?
The mem leak periods do not match the moment we restart any of the nodes.
> Few ideas :
> You could search you trace, maybe you will find the DMQ sync requests ...
We verified the traces and we found that at the moment of the mem leak there was nothing unusual.
> You can also confirm significant increase in active transactions.
Same.
> Verify the state of the bus :
> kamcmd dmq.list_nodes
The primary node state shows the affected secondary node as inactive.
> Verify the amount of contact on each node (confirm that the cluster is healthy)
> kamctl stats | grep usrloc | grep contact