[Users] Re: [Devel] "detached" timer

Bogdan-Andrei Iancu bogdan at voice-system.ro
Fri Mar 30 14:48:06 CEST 2007


wrong again :)

as I mentioned in my previous email, the "detached timer" was more an 
maker that something else was going wrong - there was no amplification.

and as TR clearly said, the problem was with DB connectivity and had 
nothing to do with TM timers.

regards,
bogdan

Jiri Kuthan wrote:
> Actually more likely it has been both. The root problem lies in the timer subsystem
> and may be amplified by other troubles (or amplify those).
>
> -jiri
>
> At 01:35 30/03/2007, T.R. Missner wrote:
>   
>> FYI All
>>
>> This turned out to be a database write ( acc ) that was blocking due to a raid card problem.
>>
>>
>>
>> T.R. Missner wrote:
>>     
>>> Is it possible the locked state I am seeing with openser leads to the "detached" timer?
>>> Since the "detached" timer is a race, it would make sense to see the race condition after openser locks up and messages buffer up in the stack.
>>> When a bunch of messages are processed all at once by multiple threads the race condition would occur.
>>> Does this make sense?
>>>
>>> Maybe I have been focusing on the wrong place.
>>>
>>> Ignoring the "detached" timer what could cause openser to hang for a couple seconds then clear every 5 - 10 minutes?
>>>
>>> Ideas?
>>>
>>> We are seeing this on 3 different productions servers.
>>>
>>> Thanks
>>>
>>> TR
>>>
>>> using openser1.1.1
>>>
>>>
>>>
>>> T.R. Missner wrote:
>>>       
>>>> Bogdan,
>>>>
>>>> I have been chasing this for days and done lots of debugging.
>>>> using 1.1.1
>>>> While looking at the network trace at the time of these messages ( I usually see at least 5 in a row with differing hex values ) I see many incoming packets coming into the box and no response from the proxy for somewhere between 5 - 10 seconds, then a flood a responses from the proxy.
>>>> I can email you a sample pcap file if you like.
>>>> As part of my debugging I forced a 100 reply at the very top of my cfg file.
>>>> The forced 100 was not sent during the locked up time leading me to believe openser was not processing incoming packets.
>>>> I have now seen this on multiple servers in different locations. Likely a particular customer call flow is causing this but I have not been able to pin it down to the exact customer. These proxies run pretty fast during the day so finding a pattern leading up the this issue is difficult. What could I add to the Log output to identify the offending sip-callid? Is sip-callid or branch tag or anything similar easily accessible in any of the data structs in timer.c?
>>>>
>>>> TR
>>>>
>>>> Bogdan-Andrei Iancu wrote:
>>>>         
>>>>> Hi TR,
>>>>>
>>>>> it is race between expire even (from timer) and inserting again on a timer list.
>>>>>   1 is the final response timer list (fr_timer)
>>>>>   3 id the wait timer list (wt_timer)
>>>>>
>>>>> I would say there is no way this could leas to a any kind of lock.
>>>>>
>>>>> what version are you using? what makes you say it locks?
>>>>>
>>>>> regards,
>>>>> bogdan
>>>>>
>>>>> T.R. Missner wrote:
>>>>>           
>>>>>> Does anyone know what causes this?
>>>>>>
>>>>>> */set_timer for 1 list called on a "detached" timer -- ignoring /*
>>>>>>
>>>>>> I also see
>>>>>>
>>>>>> */set_timer for 3 list called on a "detached" timer -- ignoring /*
>>>>>>
>>>>>>
>>>>>>
>>>>>> When this happens Openser seems to lock up for 10 seconds or so.
>>>>>>
>>>>>> >From searching it appears this is caused by a race but I am not sure what the race is or why this results in an unresponsive openser instance for multiple seconds.
>>>>>>
>>>>>> Transaction expiration racing reply?
>>>>>>
>>>>>>
>>>>>> Desperately need to understand how this could be triggered so I can get customer to adjust system.
>>>>>>
>>>>>> Any way to adjust?
>>>>>>
>>>>>> tried tweaking fr_inv_timer but no joy.
>>>>>>
>>>>>>
>>>>>>
>>>>>> TR
>>>>>>             





More information about the sr-users mailing list