[SR-Users] DMQ mem leak issues

Tue Jul 31 14:59:46 CEST 2018

Hi Daniel,

Nice spot! I had tried to reproduce locally, but had not considered the
possibility that jobs may be failing somewhere in Rogelio’s setup.

Most likely your patch will resolve it but I’m happy to take a look further
if not.

Cheers,

Charles

On Tue, 31 Jul 2018 at 13:05, Daniel-Constantin Mierla <miconda at gmail.com>
wrote:

> Hello,
>
> not using dmq much, but at a quick look in the code, I noticed that there
> are some cases when the job fields were not released if the processing was
> not completely done for various reasons.
>
> I pushed the commit a1f5fbe2c18246d4afefa44fd8a52612a5182a46, can you try
> with it and see the results?
>
> Maybe Charles Chance can also do a bit of review here, being the one doing
> most of the work lately for dmq.
>
> Cheers,
> Daniel
>
> On Tue, Jul 31, 2018 at 6:58 AM, Rogelio Perez <rogelio at telnyx.com> wrote:
>
>> Hello,
>>
>> We're running three instances of Kamailo v5.14 as registrars handling
>> registrations from ~2000 SIP clients, with one instance being primary and
>> the other two as backups.
>>
>> The three of them are using the dmq and dmq_usrloc modules to synchronize
>> user locations, however after a couple of days of operation the two
>> failover instances show memory leak behaviors, with mem usage assigned to
>> the core taking all available resources.
>>
>> When this happens we've noticed that:
>>  - The shared memory used by the function "sip_msg_shm_clone" spikes
>> (from 1kb to 1.5GB).
>>   - The shared memory used by the function "dmq:worker.c:job_queue_push"
>> also increases, but not as much (from 1kb to 1MB)
>>  - DMQ request are not being answered (with a 200 OK) by the affected
>> instance during this memory leak, which make us think that DMQ module
>> becomes unresponsive.
>>
>> A few more notes:
>>  - The failover instances are doing nothing except receiving replicated
>> contacts.
>>  - The shared memory grows at the same rate on both instances, but the
>> critical behavior never happens at the same time.
>>  - We are allocating 1GB memory on startup to each instance.
>>  - We store the location DB in a psql DB and we load it at startup.
>>  - We didn't find any errors in syslog, even at debug level.
>>
>> Has anyone experienced a similar issue who can suggest a possible
>> solution?
>>
>> Thanks,
>> Rogelio Perez
>> Telnyx
>>
>> _______________________________________________
>> Kamailio (SER) - Users Mailing List
>> sr-users at lists.kamailio.org
>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>
>>
>
>
> --
> Daniel-Constantin Mierla - http://www.asipto.com
> http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
>
-- 
*Charles Chance*
Managing Director

t. 0330 120 1200    m. 07932 063 891

-- 
Sipcentric Ltd.
                Company registered in England & Wales no. 
7365592. Registered
                office: Faraday Wharf, Innovation 
Birmingham Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20180731/d40442ef/attachment.html>