[Kamailio-Users] kamailio / deadlock3

Aymeric Moizard jack at atosc.org
Thu Jan 28 15:13:38 CET 2010


Again additionnal information:

Doing new capture: after the failure, I can see that a TCP
connection is made with second SRV record: sip.mobipouce.com 
(91.199.234.46)

I got:
SYN ACK -> sip.mobipouce.com
ACK <- sip.mobipouce.com
PSH, ACK <- sip.mobipouce.com
ACK -> sip.mobipouce.com

I'm guessing that this is where the stack trace is dead locked because
no SUBSCRIBE is sent then... -> #2  0x080a93fd in tcp_send ()

strangly in this "tcp_send" method, there is no 
TCPCONN_LOCK/TCPCONN_UNLOCK: instead, there is
a
lock_get(&c->write_lock);
...
lock_release(&c->write_lock);

May be still correct anyway...

Tks,
Aymeric MOIZARD / ANTISIP
amsip - http://www.antisip.com
osip2 - http://www.osip.org
eXosip2 - http://savannah.nongnu.org/projects/exosip/


On Thu, 28 Jan 2010, Henning Westerholt wrote:

> On Thursday 28 January 2010, Aymeric Moizard wrote:
>> here is the backtrace I have. unfortunatly without debug symbol!
>> I found the same for many of the kamailio process. "sched_yield"
>> is pending for ever. My system is a debian/etch.
>>
>> #0  0xffffe424 in __kernel_vsyscall ()
>> #1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
>> #2  0x080a93fd in tcp_send ()
>> #3  0xb7975679 in send_pr_buffer () from /usr/lib/kamailio/modules/tm.so
>> #4  0xb79789ac in t_forward_nonack () from /usr/lib/kamailio/modules/tm.so
>> #5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
>> #6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
>> #7  0x081cf810 in mem_pool ()
>> #8  0x00000000 in ?? ()
>>
>> I guess most t_relay operation towards my "mobipouce.com" domain
>> with one IP being down breaks each kamailio process one after the
>> other... I'm not sure every such t_relay operation is always breaking
>> exactly one thread each time.
>>
>> I went through the lock/unlock of tcp_main.c but it seems every
>> lock has an unlock at least...
>
> Hi Aymeric,
>
> i remember that we observed this "sched_yield" problems on one old 0.9 system
> after some time (like weeks or month). We did not found the solution in this
> case, after a restart it was gone again..
>
> You mentioned in an earlier mail that you see this related to UDP traffic, but
> in the log file and also in your investigations you think its related to TPC?
>
> Regards,
>
> Henning
>
> Viele Grüße,
>
> Henning
>


More information about the sr-users mailing list