[Kamailio-Users] kamailio / deadlock3

Aymeric Moizard jack at atosc.org
Thu Jan 28 14:41:18 CET 2010



On Thu, 28 Jan 2010, Henning Westerholt wrote:

> On Thursday 28 January 2010, Aymeric Moizard wrote:
>> here is the backtrace I have. unfortunatly without debug symbol!
>> I found the same for many of the kamailio process. "sched_yield"
>> is pending for ever. My system is a debian/etch.
>>
>> #0  0xffffe424 in __kernel_vsyscall ()
>> #1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
>> #2  0x080a93fd in tcp_send ()
>> #3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
>> #4  0xb79789ac in t_forward_nonack () from /usr/lib/kamailio/modules/tm.so
>> #5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
>> #6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
>> #7  0x081cf810 in mem_pool ()
>> #8  0x00000000 in ?? ()
>>
>> I guess most t_relay operation towards my "mobipouce.com" domain
>> with one IP being down breaks each kamailio process one after the
>> other... I'm not sure every such t_relay operation is always breaking
>> exactly one thread each time.
>>
>> I went through the lock/unlock of tcp_main.c but it seems every
>> lock has an unlock at least...
>
> Hi Aymeric,
>
> i remember that we observed this "sched_yield" problems on one old 0.9 system
> after some time (like weeks or month). We did not found the solution in this
> case, after a restart it was gone again..
>
> You mentioned in an earlier mail that you see this related to UDP traffic, but
> in the log file and also in your investigations you think its related to TPC?

This is the exact case:
1-> SUBSCRIBE sent to/received by over UDP to kamailio.
2-> kamailio does a SRV record lookup for "mobipouce.com"
3-> kamailio try sip2.mobipouce.com (91.199.234.47) over TCP first
4-> connection failed with logs:
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcp_blocking_connect: poll error: flags 18
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) Connection refused
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcpconn_connect: tcp_blocking_connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:core:tcp_send: connect failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:msg_send: tcp_send failed
Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:t_forward_nonack: sending request failed
5-> I guess kamailio is supposed to try other SRV record value:
     sip2.mobipouce.com (91.199.234.46) but it doesn't

Thus, I'm guessing the issue is related to SRV record with failover OR 
just tcp failure. Not related to UDP at all.

It's definitly possible to reproduce the issue now!

I guess anyone can try your version of kamailio and t_relay message
to "mobipouce.com" and you'll fall in that case! Sending plenty of
those messages will finally lock all kamailio process.

Regards,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


> Regards,
>
> Henning
>
> Viele Grüße,
>
> Henning
>


More information about the Users mailing list