Thank you @charlesrchance , I did some tests with this setup:
kamailio.cfg (meaningful lines):

fork=yes
children=1
tcp_connection_lifetime=3605
pv_buffer_size=8192

# ----- dmq params -----
modparam("dmq", "server_address", DMQ_SERVER_ADDRESS)
modparam("dmq", "notification_address", DMQ_NOTIFICATION_ADDRESS)
modparam("dmq", "multi_notify", 1)
modparam("dmq", "num_workers", 1)
modparam("dmq", "ping_interval", 15)
modparam("dmq", "worker_usleep", 1000)

# ----- htable params -----
modparam("htable", "enable_dmq", 1)
modparam("htable", "dmq_init_sync", 1)
modparam("htable", "htable", "ht=>size=16;dmqreplicate=1;autoexpire=10800;")               # Keep track of concurrent channels for accounts. Should be same as dialog
modparam("htable", "htable", "ht1=>size=16;dmqreplicate=1;autoexpire=10800;")               # Keep track of concurrent channels for accounts. Should be same as dialog
modparam("htable", "htable", "ht2=>size=16;dmqreplicate=1;autoexpire=10800;")               # Keep track of concurrent channels for accounts. Should be same as dialog
modparam("htable", "htable", "ht3=>size=16;dmqreplicate=1;autoexpire=10800;")               # Keep track of concurrent channels for accounts. Should be same as dialog

#!define ONEK "really 1 k chars, believe me :)"

event_route[htable:mod-init] {
  $var(name) = POD_NAME + "\n";
  xlog("L_ALERT", "$var(name)");
  if(POD_NAME == "kama-0") {
    $var(count) = 0;
    while($var(count) < 99) {
      $sht(ht=>$var(count)) = ONEK;
      $sht(ht1=>$var(count)) = ONEK;
      $sht(ht2=>$var(count)) = ONEK;
      $sht(ht3=>$var(count)) = ONEK;
      $var(count) = $var(count)+1;
    }
  }
}

request_route {
  if ($rm == "KDMQ"){
    dmq_handle_message();
  }
  exit;
}

Started kama-0 which has now 4 htables of ~99K size each
Started 10 kubernetes pods and launched kamailio 100 times with a timeout of 3 seconds on each pod
So we have roughly 1000 kamailios trying to get these htables from kama-0
I didn't see any dangerous CPU spike and the loop doesn't happen anymore.

There's something I'm worried of though: the memory of the DMQ worker (measured from top), which usually stays around 0.1% is now stable at 1.4% and it's not going down again

I fear there's a memory leak somewhere but I'm not sure where, I had some doubts while debugging the loop issue about how the json_t structures are freed but it could be caused by me not knowing well the code; can you give us any hint to help you understand this issue better?

Thanks


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.