[Kamailio-Users] Can openser.cfg lead to pkg memory problem?

Daniel-Constantin Mierla miconda at gmail.com
Mon Oct 6 17:08:51 CEST 2008


Hello,

On 10/06/08 13:22, mayamatakeshi wrote:
> [...]
>
>
>
>                    Hello,
>                    we have openser 1.3.3 running in production
>         (current rev.:
>                    4943).
>                    For 3 times in 50 days we had to restart openser to
>                    correct pkg memory problem.
>
>                openser 1.3.3 was released 3 weeks ago, so I guess you were
>                running  previous version before, but it happened again
>         since
>                you upgraded to 1.3.3, right?
>
>
>                    After some time logging messages like this:
>                    /openser.log:Aug 19 10:39:18 ipx022
>                    /usr/local/sbin/openser[16991]:
>                    ERROR:core:new_credentials: no pkg memory left,
>                    openser will eventually run out of pkg memory and
>         refuse
>                    all subsequent requests.
>
>                    We are trying to recreate this in our lab so that
>         we can
>                    follow memory troubleshooting instructions at
>                  
>          http://kamailio.net/dokuwiki/doku.php/troubleshooting:memory,
>                    but so far we were unable to do it even when generating
>                    millions of calls and registration transactions (we are
>                    using SIPp to generate normal call flows and even
>         abnormal
>                    call flows detected when reading openser.log, like
>                    'invalid cseq for aor', malformed SIP messages etc).
>
>                We can spot memory leaks even the "out of memory"
>         message is
>                not printed. Just archive the logs (the most important
>         is the
>                shut down time) and made them available for download so
>         they
>                can be investigated.
>
>                There could be two reasons:
>                - there is memory leak but happens in some cases that you
>                don't reproduce in lab, but they are in the production
>         environment
>                - you get memory fragmentation
>
>                Let's see first the debug messages...
>
>
>            Hello,
>            here are the link for openser.log and cfg files:
>            http://www.yousendit.com/download/bVlEV0o4R3NoeWJIRGc9PQ
>
>            After compilation with debug flags for memory manager, I left
>            openser running in production for 24 hours. Then, I moved all
>            traffic to another host and waited for more than 30 minutes
>         before
>            stopping openser.
>            In the openser.cfg, I set debug=2. If you need, I can run
>         it again
>            with a higher value (but I hope it doesn't have to be too high,
>            due to overhead concerns).
>
>          Sorry, I forgot to tell one thing: the last revision that
>         showed this problem was 4809, so we reverted back to that
>         revision before performing the above.
>
>     to understand that you couldn't reproduce with latest svn version?
>     So you had to get a previous version?
>
>
> Hi,
> no, the reason for reversion is that the latest version running in 
> production will not show the problem because we adopted preventive 
> reset to minimize impact to customer calls. So I don't know yet if it 
> shows this problem or not.
> So I collected the logs using a revision that I was sure could 
> recreate the problem.
OK, I understand now. I was looking at the logs and there seems to be a 
leak with db operations - something does not free a db result. I will go 
over the modules that you are using and try to spot any issue -- i will 
check the change log to see if something happened in the last time 
regarding such issue..
>
> But here's some developments on my investigation:
> Up to now, I was trying to recreate the problem using VirtualMachines 
> running the same OS (Fedora 5) as in production. It never happened 
> there, even after 30 million of calls.
> But we eventually were able to test openser 1.3  using a production 
> machine with the same spec as the ones showing the problem and we were 
> able to generate pkg memory problem using a simple outgoing SIPp 
> scenario. The problem always happens after we reach around 28.000 
> calls and we confirmed the amount of calls needed to cause the problem 
> grows linearly with the amount of pkg memory (after increase of pkg 
> memory pool by 4, problem started to happen only after around 128.000 
> calls).
> However, we also tried the same tests with kamailio 1.4 (rev. 5017) on 
> that machine and we could not recreate the problem after 1.5 million 
> calls, so we are thinking in just upgrade to 1.4 after other scenarios 
> show everything else is working.
OK, 1.4 is recommended, it has lot of new features and many fixes.
>
> But I don't know why the problem cannot be recreated using the VMs: 
> the only significant difference is that the productions machines have 
> 4 NICs that are bound in 2 pairs (1 for private ip and another for 
> public ip) while the VMs have just one NIC.
I see no relation with the NICs.
>
> I hope upgrading to 1.4 will solve everything, however, since nobody 
> is complaining about having openser stopping after 28.000 calls, I 
> still believe we have some problem in the openser.cfg itself. I'll 
> check it after we put kamailio 1.4 in production.
OK, I will dig in further, I might be a bit slow, however, these days.

Cheers,
Daniel

-- 
Daniel-Constantin Mierla
http://www.asipto.com





More information about the sr-users mailing list