[sr-dev] [Kamailio-Users] kamailio / deadlock3

Daniel-Constantin Mierla miconda at gmail.com
Thu Jan 28 16:23:00 CET 2010


Hello,

On 1/28/10 3:34 PM, Aymeric Moizard wrote:
>
> some other answer below:
>
> On Thu, 28 Jan 2010, Daniel-Constantin Mierla wrote:
>
>> I am cc-ing sr-dev, since tcp code is from ser and Andrei may have 
>> more insights...
>>
>>
>> On 1/28/10 2:41 PM, Aymeric Moizard wrote:
>>>
>>>
>>> On Thu, 28 Jan 2010, Henning Westerholt wrote:
>>>
>>>> On Thursday 28 January 2010, Aymeric Moizard wrote:
>>>>> here is the backtrace I have. unfortunatly without debug symbol!


can you recompile with debug symbols? Do you have it installed from 
package or sources? It will give more hints about the place in the 
function...

I will try to reproduce, but now I do not have the proper environment 
for testing...

Thanks,
Daniel




>>>>> I found the same for many of the kamailio process. "sched_yield"
>>>>> is pending for ever. My system is a debian/etch.
>>>>>
>>>>> #0  0xffffe424 in __kernel_vsyscall ()
>>>>> #1  0xb7cef4ac in sched_yield () from /lib/tls/i686/cmov/libc.so.6
>>>>> #2  0x080a93fd in tcp_send ()
>>>>> #3  0xb7975679 in send_pr_buffer () from 
>>>>> /usr/lib/kamailio/modules/tm.so
>>>>> #4  0xb79789ac in t_forward_nonack () from 
>>>>> /usr/lib/kamailio/modules/tm.so
>>>>> #5  0xb7974784 in t_relay_to () from /usr/lib/kamailio/modules/tm.so
>>>>> #6  0xb7983a11 in load_tm () from /usr/lib/kamailio/modules/tm.so
>>>>> #7  0x081cf810 in mem_pool ()
>>>>> #8  0x00000000 in ?? ()
>>>>>
>>>>> I guess most t_relay operation towards my "mobipouce.com" domain
>>>>> with one IP being down breaks each kamailio process one after the
>>>>> other... I'm not sure every such t_relay operation is always breaking
>>>>> exactly one thread each time.
>>>>>
>>>>> I went through the lock/unlock of tcp_main.c but it seems every
>>>>> lock has an unlock at least...
>>>>
>>>> Hi Aymeric,
>>>>
>>>> i remember that we observed this "sched_yield" problems on one old 
>>>> 0.9 system
>>>> after some time (like weeks or month). We did not found the 
>>>> solution in this
>>>> case, after a restart it was gone again..
>>>>
>>>> You mentioned in an earlier mail that you see this related to UDP 
>>>> traffic, but
>>>> in the log file and also in your investigations you think its 
>>>> related to TPC?
>>>
>>> This is the exact case:
>>> 1-> SUBSCRIBE sent to/received by over UDP to kamailio.
>>> 2-> kamailio does a SRV record lookup for "mobipouce.com"
>>> 3-> kamailio try sip2.mobipouce.com (91.199.234.47) over TCP first
>>> 4-> connection failed with logs:
>>> Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
>>> ERROR:core:tcp_blocking_connect: poll error: flags 18
>>> Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
>>> ERROR:core:tcp_blocking_connect: failed to retrieve SO_ERROR (111) 
>>> Connection refused
>>> Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
>>> ERROR:core:tcpconn_connect: tcp_blocking_connect failed
>>> Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
>>> ERROR:core:tcp_send: connect failed
>>> Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: ERROR:tm:msg_send: 
>>> tcp_send failed
>>> Jan 27 12:56:38 ns26829 /usr/sbin/kamailio[9763]: 
>>> ERROR:tm:t_forward_nonack: sending request failed
>>> 5-> I guess kamailio is supposed to try other SRV record value:
>>>     sip2.mobipouce.com (91.199.234.46) but it doesn't
>>>
>>> Thus, I'm guessing the issue is related to SRV record with failover 
>>> OR just tcp failure. Not related to UDP at all.
>>
>> so TCP connect failed, the tcp worker returned as it prints the 
>> message and, to be sure I got it right, the UDP worker (the one that 
>> received) got blocked?
>
> 1-> TCP connect failed
> 2-> second SRV is used: TCP connect succeed, but lock in tcp_send
>
> That's what I understand.
>
> I have tested a TCP connection to my server: It seems to be still
> working.
>
>>> It's definitly possible to reproduce the issue now!
>>>
>>> I guess anyone can try your version of kamailio and t_relay message
>>> to "mobipouce.com" and you'll fall in that case! Sending plenty of
>>> those messages will finally lock all kamailio process.
>>
>> All? tcp and udp?
>
> Only udp!
> Aymeric
>
>> Cheers,
>> Daniel
>>
>>>
>>> Regards,
>>> Aymeric MOIZARD / ANTISIP
>>> amsip - http://www.antisip.com
>>> osip2 - http://www.osip.org
>>> eXosip2 - http://savannah.nongnu.org/projects/exosip/
>>>
>>>
>>>> Regards,
>>>>
>>>> Henning
>>>>
>>>> Viele Grüße,
>>>>
>>>> Henning
>>>>
>>>
>>> _______________________________________________
>>> Kamailio (OpenSER) - Users mailing list
>>> Users at lists.kamailio.org
>>> http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
>>> http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
>>
>> -- 
>> Daniel-Constantin Mierla
>> * http://www.asipto.com/
>>
>>
>
> _______________________________________________
> Kamailio (OpenSER) - Users mailing list
> Users at lists.kamailio.org
> http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
> http://lists.openser-project.org/cgi-bin/mailman/listinfo/users

-- 
Daniel-Constantin Mierla
* http://www.asipto.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20100128/690ed59b/attachment-0001.htm>


More information about the sr-dev mailing list