[sr-dev] Crash bug

Fri Mar 27 15:00:53 CET 2015

It would be good to add some sip keepalive monitoring (e.g., cron job
with sipsak sending options) that will alert/restart in case of no
response. The monit tool can also send sip keepalives and take actions
on no response.

On a deadlock, checking process table is not enough. There should have
been high cpu usage, though, if you monitored that.

Cheers,
Daniel

On 27/03/15 12:47, Alex Balashov wrote:
> This was a rather peculiar crash:
>
> From the logs, it would appear that Kamailio simply stopped processing
> messages at some point. There's about 8 minutes of zero log output at
> a time of constantly incoming traffic.
>
>
> At some point, this situation is resolved when all Kamailio processes
> die with a normal SIGTERM, when someone manually restarted it:
>
> Mar 26 20:40:10 Proxy1 /usr/local/sbin/kamailio[27498]: NOTICE: <core>
> [main.c:739]: handle_sigs(): Thank you for flying kamailio!!!
> Mar 26 20:40:10 Proxy1 /usr/local/sbin/kamailio[27535]: INFO: <core>
> [main.c:850]: sig_usr(): signal 15 received.
> ...
>
> But there are a few things here that are difficult to explain from the
> log:
>
> 1. Why was there no SIP stack response for 8 minutes, no logging
> activity, etc?
>
> 2. We have a script that checks if Kamailio processes are running
> every 1 second, and restarts Kamailio if it's not. It sends an e-mail
> informing us of that development also.
>
> It's a rather naive check:
>
>    ps aux | grep kamailio | grep -v 'grep kamailio' | wc -l
>
> But in this case, the script was not triggered, which would imply that
> some Kamailio processes--perhaps all--remained running.
>
> There is no indication in the logs that any process died for any
> reason, except for the 'signal 15' received by all processes at the
> time of manual restart.
>
> 3. Why was a core dump generated at the time of the restart, if
> nothing crashed?
>
> #3 is most interesting to me, because if it were some other problem,
> e.g. blocking of SIP worker threads for some reason, then I wouldn't
> expect a core dump upon service shutdown.
>
> There is no other indication of any child process dying with SIGSEGV
> or SIGABRT.
>
> -- Alex
>
> On 03/27/2015 06:17 AM, Alex Balashov wrote:
>
>> Hello,
>>
>> The system experienced another crash yesterday, but unfortunately the
>> core dump is not very insightful, possibly due to being incomplete:
>>
>> BFD: Warning: /tmp/./core.kamailio.500.1427402410.27498 is truncated:
>> expected core file size >= 8602058752, found: 1769852928.
>> [New Thread 27498]
>> Cannot access memory at address 0x7f52891e3168
>> Cannot access memory at address 0x7f52891e3168
>> Cannot access memory at address 0x7f52891e3168
>> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib64/ld-linux-x86-64.so.2
>> Failed to read a valid object file image from memory.
>> Core was generated by `/usr/local/sbin/kamailio -P /var/run/kamailio.pid
>> -m 8192 -u evaristesys -g eva'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00007f5286d97e45 in ?? ()
>> Missing separate debuginfos, use: debuginfo-install
>> glibc-2.12-1.149.el6_6.5.x86_64
>> (gdb) where
>> #0  0x00007f5286d97e45 in ?? ()
>> Cannot access memory at address 0x7fffbe32a210
>>
>>
>> That's not much help at all, so I cannot possibly say it is for the same
>> reasons as before.
>>
>>
>>
>
>

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio World Conference, May 27-29, 2015
Berlin, Germany - http://www.kamailioworld.com