Hello,
I have in production environment with 4 proxies using DMQ to sync terminal registers and the DMQ configuration parameters is following: loadmodule "dmq.so" #modparam("dmq", "server_address", "sip:MY_IP_ADDRESS:MY_PORT_ADDRESS") modparam("dmq", "notification_address", "DMQ_HOSTS") modparam("dmq", "multi_notify", 1) modparam("dmq", "num_workers", 4)
In the lab environment with the same configuration that in prod, I can simulate this problem with 12000 terminal REGISTERS and, when doing several restarts on a proxy, one of them is generated a coredump. Analyzing the core, I can see that the problem is concurrency in the dmq synchronization.
Job_queue_size (...) is called with a valid *queue *and, but inside this function, the *queue *is null.
(gdb) up #1 0x00007fefe58ee246 in job_queue_size (queue=0x0) at worker.c:254 254 return atomic_get(&queue->count); (gdb) print queue $1 = (job_queue_t *) 0x0 (gdb) up #2 0x00007fefe58ed920 in add_dmq_job (msg=0x7ff028b453f8, peer=0x7fefebcc4df8) at worker.c:184 184 if(job_queue_size(workers[i].queue) == 0) { (gdb) print i $2 = 1 (gdb) print workers[i].queue $3 = (job_queue_t *) 0x7fefebce49b0
If I configure DMQ with num_workers=1 I can't reproduce this problem. I'm using the kamailio 5.0.8 release.
Is this problem known to you? What is the right way to solve it?
Best Regards, Virgílio Cunha