[Kamailio-Users] Can openser.cfg lead to pkg memory problem?

mayamatakeshi mayamatakeshi at gmail.com
Mon Oct 6 12:22:11 CEST 2008


On Sun, Oct 5, 2008 at 5:28 PM, Daniel-Constantin Mierla
<miconda at gmail.com>wrote:

> Hello,
>
> mayamatakeshi wrote:
>
>>
>> On Fri, Sep 26, 2008 at 6:24 PM, mayamatakeshi <mayamatakeshi at gmail.com<mailto:
>> mayamatakeshi at gmail.com>> wrote:
>>
>>
>>    On Tue, Sep 23, 2008 at 6:00 PM, Daniel-Constantin Mierla
>>    <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>>        Hello,
>>
>>
>>        On 09/23/08 10:31, mayamatakeshi wrote:
>>
>>            Hello,
>>            we have openser 1.3.3 running in production (current rev.:
>>            4943).
>>            For 3 times in 50 days we had to restart openser to
>>            correct pkg memory problem.
>>
>>        openser 1.3.3 was released 3 weeks ago, so I guess you were
>>        running  previous version before, but it happened again since
>>        you upgraded to 1.3.3, right?
>>
>>
>>            After some time logging messages like this:
>>            /openser.log:Aug 19 10:39:18 ipx022
>>            /usr/local/sbin/openser[16991]:
>>            ERROR:core:new_credentials: no pkg memory left,
>>            openser will eventually run out of pkg memory and refuse
>>            all subsequent requests.
>>
>>            We are trying to recreate this in our lab so that we can
>>            follow memory troubleshooting instructions at
>>            http://kamailio.net/dokuwiki/doku.php/troubleshooting:memory,
>>            but so far we were unable to do it even when generating
>>            millions of calls and registration transactions (we are
>>            using SIPp to generate normal call flows and even abnormal
>>            call flows detected when reading openser.log, like
>>            'invalid cseq for aor', malformed SIP messages etc).
>>
>>        We can spot memory leaks even the "out of memory" message is
>>        not printed. Just archive the logs (the most important is the
>>        shut down time) and made them available for download so they
>>        can be investigated.
>>
>>        There could be two reasons:
>>        - there is memory leak but happens in some cases that you
>>        don't reproduce in lab, but they are in the production environment
>>        - you get memory fragmentation
>>
>>        Let's see first the debug messages...
>>
>>
>>    Hello,
>>    here are the link for openser.log and cfg files:
>>    http://www.yousendit.com/download/bVlEV0o4R3NoeWJIRGc9PQ
>>
>>    After compilation with debug flags for memory manager, I left
>>    openser running in production for 24 hours. Then, I moved all
>>    traffic to another host and waited for more than 30 minutes before
>>    stopping openser.
>>    In the openser.cfg, I set debug=2. If you need, I can run it again
>>    with a higher value (but I hope it doesn't have to be too high,
>>    due to overhead concerns).
>>
>>  Sorry, I forgot to tell one thing: the last revision that showed this
>> problem was 4809, so we reverted back to that revision before performing the
>> above.
>>
> to understand that you couldn't reproduce with latest svn version? So you
> had to get a previous version?
>

Hi,
no, the reason for reversion is that the latest version running in
production will not show the problem because we adopted preventive reset to
minimize impact to customer calls. So I don't know yet if it shows this
problem or not.
So I collected the logs using a revision that I was sure could recreate the
problem.

But here's some developments on my investigation:
Up to now, I was trying to recreate the problem using VirtualMachines
running the same OS (Fedora 5) as in production. It never happened there,
even after 30 million of calls.
But we eventually were able to test openser 1.3  using a production machine
with the same spec as the ones showing the problem and we were able to
generate pkg memory problem using a simple outgoing SIPp scenario. The
problem always happens after we reach around 28.000 calls and we confirmed
the amount of calls needed to cause the problem grows linearly with the
amount of pkg memory (after increase of pkg memory pool by 4, problem
started to happen only after around 128.000 calls).
However, we also tried the same tests with kamailio 1.4 (rev. 5017) on that
machine and we could not recreate the problem after 1.5 million calls, so we
are thinking in just upgrade to 1.4 after other scenarios show everything
else is working.

But I don't know why the problem cannot be recreated using the VMs: the only
significant difference is that the productions machines have 4 NICs that are
bound in 2 pairs (1 for private ip and another for public ip) while the VMs
have just one NIC.

I hope upgrading to 1.4 will solve everything, however, since nobody is
complaining about having openser stopping after 28.000 calls, I still
believe we have some problem in the openser.cfg itself. I'll check it after
we put kamailio 1.4 in production.

regards,
takeshi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20081006/2de202d0/attachment.htm>


More information about the sr-users mailing list