Hello,
I have in production environment with 4 proxies using DMQ to sync terminal registers and the DMQ configuration parameters is following:
loadmodule "dmq.so"
#modparam("dmq", "server_address", "sip:MY_IP_ADDRESS:MY_PORT_ADDRESS")
modparam("dmq", "notification_address", "DMQ_HOSTS")
modparam("dmq", "multi_notify", 1)
modparam("dmq", "num_workers", 4)
In the lab environment with the same configuration that in prod, I can simulate this problem with 12000 terminal REGISTERS and, when doing several restarts on a proxy, one of them is generated a coredump.
Analyzing the core, I can see that the problem is
concurrency
in the dmq synchronization.
Job_queue_size (...) is called with a valid queue and, but inside this function, the queue is null.
(gdb) up
#1 0x00007fefe58ee246 in job_queue_size (queue=0x0) at worker.c:254
254 return atomic_get(&queue->count);
(gdb) print queue
$1 = (job_queue_t *) 0x0
(gdb) up
#2 0x00007fefe58ed920 in add_dmq_job (msg=0x7ff028b453f8, peer=0x7fefebcc4df8) at worker.c:184
184 if(job_queue_size(workers[i].queue) == 0) {
(gdb) print i
$2 = 1
(gdb) print workers[i].queue
$3 = (job_queue_t *) 0x7fefebce49b0
If I configure DMQ with
num_workers=1 I can't reproduce this problem.
I'm using the kamailio 5.0.8 release.
Is this problem known to you? What is the right way to solve it?
Best Regards,
Virgílio Cunha