[SR-Users] Best practices for troubleshooting deadlocks?

Daniel-Constantin Mierla miconda at gmail.com
Mon Sep 28 23:51:06 CEST 2015


Were you pulling the backtraces based on the script you pasted in your
previous email? That should be good source of information to analyze if
what kamailio was doing.

I already said, if the is a mutex deadlock, it will be also noticed by
high cpu usage. Was it the case, or you don't have any access to cpu
usage history?

If it is just no more sip message routing, but no high cpu usage, then:

- maybe processed were blocked in a lengthily I/O operation (e.g., query
to database)
- maybe someone/something was resetting the network interface (the
sockets were bound to previous address) -- e.g., it can be done by some
upgrades of OS or dhcp
- maybe some limits of OS were reached, the packets were filtered by
kernel (if you have centos with selinux, be sure it is properly configured)

Cheers,
Daniel

On 28/09/15 19:26, Alex Balashov wrote:
> We just encountered another one of these famed deadlocks. Any
> suggestions for how to analyse them beyond what I've already trotted
> out here?
>
> On 09/14/2015 05:47 PM, Alex Balashov wrote:
>
>> Hello,
>>
>> Very occasionally, we encounter what appear to be deadlocks in all UDP
>> receiver threads. All Kamailio processes are running, but no SIP
>> messages are being processed.
>>
>> On one of our high-volume installation, this happens extremely
>> infrequently -- maybe once every month or two. On these occasions, the
>> operator restarts the proxy before we get a chance to go in and figure
>> out what's going on.
>>
>> So, I'm trying to provide the operator with a procedure to execute prior
>> to restarting the proxy on these occasions, so that we can see a
>> snapshot of where the receiver threads are stuck. As far as I can tell,
>> unless Kamailio itself segfaults, there's no specific PID that one can
>> attach GDB to in order to get an overhead snapshot of all the child
>> processes.
>>
>> Here's what I came up with:
>>
>> ---------------------------------------------
>> #!/bin/bash
>>
>> kamcmd -s /tmp/kamailio_ctl ps > thread_log.txt
>> echo >> thread_log.txt
>>
>> while read PID;
>> do
>>      gdb --pid=$PID<<EOF>>thread_log.txt
>> set print elements 0
>> thread apply all bt full
>> generate-core-file
>> detach
>> EOF
>> done < <(kamcmd -s /tmp/kamailio_ctl ps | grep 'udp receiver' | awk
>> '{print $1}')
>> ---------------------------------------------
>>
>> As far as I can tell, this should give me the most ample visibility into
>> the state of the threads, with further core dumps to inspect if
>> follow-up is needed. Hopefully this will result in some fixes back to
>> the project.
>>
>> However, if there are any other suggestions for information to grab in
>> such a scenario, I'm all ears.
>>
>> Thanks in advance!
>>
>> -- Alex
>>
>
>

-- 
Daniel-Constantin Mierla
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Book: SIP Routing With Kamailio - http://www.asipto.com
Kamailio Advanced Training, Sep 28-30, 2015, in Berlin - http://asipto.com/u/kat




More information about the sr-users mailing list