Thank you @charlesrchance , I did some tests with this setup:
kamailio.cfg (meaningful lines):
fork=yes
children=1
tcp_connection_lifetime=3605
pv_buffer_size=8192
# ----- dmq params -----
modparam("dmq", "server_address", DMQ_SERVER_ADDRESS)
modparam("dmq", "notification_address", DMQ_NOTIFICATION_ADDRESS)
modparam("dmq", "multi_notify", 1)
modparam("dmq", "num_workers", 1)
modparam("dmq", "ping_interval", 15)
modparam("dmq", "worker_usleep", 1000)
# ----- htable params -----
modparam("htable", "enable_dmq", 1)
modparam("htable", "dmq_init_sync", 1)
modparam("htable", "htable", "ht=>size=16;dmqreplicate=1;autoexpire=10800;") # Keep track of concurrent channels for accounts. Should be same as dialog
modparam("htable", "htable", "ht1=>size=16;dmqreplicate=1;autoexpire=10800;") # Keep track of concurrent channels for accounts. Should be same as dialog
modparam("htable", "htable", "ht2=>size=16;dmqreplicate=1;autoexpire=10800;") # Keep track of concurrent channels for accounts. Should be same as dialog
modparam("htable", "htable", "ht3=>size=16;dmqreplicate=1;autoexpire=10800;") # Keep track of concurrent channels for accounts. Should be same as dialog
#!define ONEK "really 1 k chars, believe me :)"
event_route[htable:mod-init] {
$var(name) = POD_NAME + "\n";
xlog("L_ALERT", "$var(name)");
if(POD_NAME == "kama-0") {
$var(count) = 0;
while($var(count) < 99) {
$sht(ht=>$var(count)) = ONEK;
$sht(ht1=>$var(count)) = ONEK;
$sht(ht2=>$var(count)) = ONEK;
$sht(ht3=>$var(count)) = ONEK;
$var(count) = $var(count)+1;
}
}
}
request_route {
if ($rm == "KDMQ"){
dmq_handle_message();
}
exit;
}
Started kama-0 which has now 4 htables of ~99K size each
Started 10 kubernetes pods and launched kamailio 100 times with a timeout of 3 seconds on each pod
So we have roughly 1000 kamailios trying to get these htables from kama-0
I didn't see any dangerous CPU spike and the loop doesn't happen anymore.
There's something I'm worried of though: the memory of the DMQ worker (measured from top), which usually stays around 0.1% is now stable at 1.4% and it's not going down again
I fear there's a memory leak somewhere but I'm not sure where, I had some doubts while debugging the loop issue about how the json_t structures are freed but it could be caused by me not knowing well the code; can you give us any hint to help you understand this issue better?
Thanks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.