wrong again :)
as I mentioned in my previous email, the "detached timer" was more an
maker that something else was going wrong - there was no amplification.
and as TR clearly said, the problem was with DB connectivity and had
nothing to do with TM timers.
regards,
bogdan
Jiri Kuthan wrote:
Actually more likely it has been both. The root
problem lies in the timer subsystem
and may be amplified by other troubles (or amplify those).
-jiri
At 01:35 30/03/2007, T.R. Missner wrote:
> FYI All
>
> This turned out to be a database write ( acc ) that was blocking due to a raid card
problem.
>
>
>
> T.R. Missner wrote:
>
>> Is it possible the locked state I am seeing with openser leads to the
"detached" timer?
>> Since the "detached" timer is a race, it would make sense to see the
race condition after openser locks up and messages buffer up in the stack.
>> When a bunch of messages are processed all at once by multiple threads the race
condition would occur.
>> Does this make sense?
>>
>> Maybe I have been focusing on the wrong place.
>>
>> Ignoring the "detached" timer what could cause openser to hang for a
couple seconds then clear every 5 - 10 minutes?
>>
>> Ideas?
>>
>> We are seeing this on 3 different productions servers.
>>
>> Thanks
>>
>> TR
>>
>> using openser1.1.1
>>
>>
>>
>> T.R. Missner wrote:
>>
>>> Bogdan,
>>>
>>> I have been chasing this for days and done lots of debugging.
>>> using 1.1.1
>>> While looking at the network trace at the time of these messages ( I usually
see at least 5 in a row with differing hex values ) I see many incoming packets coming
into the box and no response from the proxy for somewhere between 5 - 10 seconds, then a
flood a responses from the proxy.
>>> I can email you a sample pcap file if you like.
>>> As part of my debugging I forced a 100 reply at the very top of my cfg file.
>>> The forced 100 was not sent during the locked up time leading me to believe
openser was not processing incoming packets.
>>> I have now seen this on multiple servers in different locations. Likely a
particular customer call flow is causing this but I have not been able to pin it down to
the exact customer. These proxies run pretty fast during the day so finding a pattern
leading up the this issue is difficult. What could I add to the Log output to identify the
offending sip-callid? Is sip-callid or branch tag or anything similar easily accessible in
any of the data structs in timer.c?
>>>
>>> TR
>>>
>>> Bogdan-Andrei Iancu wrote:
>>>
>>>> Hi TR,
>>>>
>>>> it is race between expire even (from timer) and inserting again on a
timer list.
>>>> 1 is the final response timer list (fr_timer)
>>>> 3 id the wait timer list (wt_timer)
>>>>
>>>> I would say there is no way this could leas to a any kind of lock.
>>>>
>>>> what version are you using? what makes you say it locks?
>>>>
>>>> regards,
>>>> bogdan
>>>>
>>>> T.R. Missner wrote:
>>>>
>>>>> Does anyone know what causes this?
>>>>>
>>>>> */set_timer for 1 list called on a "detached" timer --
ignoring /*
>>>>>
>>>>> I also see
>>>>>
>>>>> */set_timer for 3 list called on a "detached" timer --
ignoring /*
>>>>>
>>>>>
>>>>>
>>>>> When this happens Openser seems to lock up for 10 seconds or so.
>>>>>
>>>>> >From searching it appears this is caused by a race but I am not
sure what the race is or why this results in an unresponsive openser instance for multiple
seconds.
>>>>>
>>>>> Transaction expiration racing reply?
>>>>>
>>>>>
>>>>> Desperately need to understand how this could be triggered so I can
get customer to adjust system.
>>>>>
>>>>> Any way to adjust?
>>>>>
>>>>> tried tweaking fr_inv_timer but no joy.
>>>>>
>>>>>
>>>>>
>>>>> TR
>>>>>