[Users] Re: [Devel] "detached" timer

T.R. Missner trmissner at bandwidth.com
Fri Mar 30 01:35:57 CEST 2007


FYI All

This turned out to be a database write ( acc ) that was blocking due to 
a raid card problem.



T.R. Missner wrote:
> Is it possible the locked state I am seeing with openser leads to the 
> "detached" timer?
> Since the "detached" timer is a race, it would make sense to see the 
> race condition after openser locks up and messages buffer up in the 
> stack.
> When a bunch of messages are processed all at once by multiple threads 
> the race condition would occur.
> Does this make sense?
>
> Maybe I have been focusing on the wrong place.
>
> Ignoring the "detached" timer what could cause openser to hang for a 
> couple seconds then clear every 5 - 10 minutes?
>
> Ideas?
>
> We are seeing this on 3 different productions servers.
>
> Thanks
>
> TR
>
> using openser1.1.1
>
>
>
> T.R. Missner wrote:
>> Bogdan,
>>
>> I have been chasing this for days and done lots of debugging.
>> using 1.1.1
>> While looking at the network trace at the time of these messages ( I 
>> usually see at least 5 in a row with differing hex values ) I see 
>> many incoming packets coming into the box and no response from the 
>> proxy for somewhere between 5 - 10 seconds, then a flood a responses 
>> from the proxy.
>> I can email you a sample pcap file if you like.
>> As part of my debugging I forced a 100 reply at the very top of my 
>> cfg file.
>> The forced 100 was not sent during the locked up time leading me to 
>> believe openser was not processing incoming packets.
>> I have now seen this on multiple servers in different locations. 
>> Likely a particular customer call flow is causing this but I have not 
>> been able to pin it down to the exact customer. These proxies run 
>> pretty fast during the day so finding a pattern leading up the this 
>> issue is difficult. What could I add to the Log output to identify 
>> the offending sip-callid? Is sip-callid or branch tag or anything 
>> similar easily accessible in any of the data structs in timer.c?
>>
>> TR
>>
>> Bogdan-Andrei Iancu wrote:
>>> Hi TR,
>>>
>>> it is race between expire even (from timer) and inserting again on a 
>>> timer list.
>>>    1 is the final response timer list (fr_timer)
>>>    3 id the wait timer list (wt_timer)
>>>
>>> I would say there is no way this could leas to a any kind of lock.
>>>
>>> what version are you using? what makes you say it locks?
>>>
>>> regards,
>>> bogdan
>>>
>>> T.R. Missner wrote:
>>>> Does anyone know what causes this?
>>>>
>>>> */set_timer for 1 list called on a "detached" timer -- ignoring /*
>>>>
>>>> I also see
>>>>
>>>> */set_timer for 3 list called on a "detached" timer -- ignoring /*
>>>>
>>>>
>>>>
>>>> When this happens Openser seems to lock up for 10 seconds or so.
>>>>
>>>> >From searching it appears this is caused by a race but I am not 
>>>> sure what the race is or why this results in an unresponsive 
>>>> openser instance for multiple seconds.
>>>>
>>>> Transaction expiration racing reply?
>>>>
>>>>
>>>> Desperately need to understand how this could be triggered so I can 
>>>> get customer to adjust system.
>>>>
>>>> Any way to adjust?
>>>>
>>>> tried tweaking fr_inv_timer but no joy.
>>>>
>>>>
>>>>
>>>> TR
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>
>>>> _______________________________________________
>>>> Devel mailing list
>>>> Devel at openser.org
>>>> http://openser.org/cgi-bin/mailman/listinfo/devel
>>>>   
>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at openser.org
>> http://openser.org/cgi-bin/mailman/listinfo/users
>
> _______________________________________________
> Users mailing list
> Users at openser.org
> http://openser.org/cgi-bin/mailman/listinfo/users




More information about the Users mailing list