Hello,
not using dmq much, but at a quick look in the code, I noticed that there are some cases when the job fields were not released if the processing was not completely done for various reasons.
I pushed the commit a1f5fbe2c18246d4afefa44fd8a52612a5182a46, can you try with it and see the results?
Maybe Charles Chance can also do a bit of review here, being the one doing most of the work lately for dmq.
Cheers, Daniel
On Tue, Jul 31, 2018 at 6:58 AM, Rogelio Perez rogelio@telnyx.com wrote:
Hello,
We're running three instances of Kamailo v5.14 as registrars handling registrations from ~2000 SIP clients, with one instance being primary and the other two as backups.
The three of them are using the dmq and dmq_usrloc modules to synchronize user locations, however after a couple of days of operation the two failover instances show memory leak behaviors, with mem usage assigned to the core taking all available resources.
When this happens we've noticed that:
- The shared memory used by the function "sip_msg_shm_clone" spikes
(from 1kb to 1.5GB).
- The shared memory used by the function "dmq:worker.c:job_queue_push"
also increases, but not as much (from 1kb to 1MB)
- DMQ request are not being answered (with a 200 OK) by the affected
instance during this memory leak, which make us think that DMQ module becomes unresponsive.
A few more notes:
- The failover instances are doing nothing except receiving replicated
contacts.
- The shared memory grows at the same rate on both instances, but the
critical behavior never happens at the same time.
- We are allocating 1GB memory on startup to each instance.
- We store the location DB in a psql DB and we load it at startup.
- We didn't find any errors in syslog, even at debug level.
Has anyone experienced a similar issue who can suggest a possible solution?
Thanks, Rogelio Perez Telnyx
Kamailio (SER) - Users Mailing List sr-users@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users