[Devel] Race conditions in TM, SER implementation, etc...

Bogdan-Andrei Iancu bogdan at voice-system.ro
Tue Apr 3 18:57:13 CEST 2007


Hi Jerome,

here are some technical details on the issue, in order to bring some light.

most probably your question was about what;s behind the "timer detached" 
warning, right? if so, there are two situation when it can happen :
    1) due a race condition on timer ops between "timeout" and "insert" 
events. It will take too long to explain how timeout is implemented and 
why the race occurs, but shortly - is for avoiding to run the timeout 
handler having the timer locked down; at timeout, the expired sublist is 
removed from timer (under lock) and marked as "detached"; outside the 
lock, the timeout handler is called for the detached elements. So the 
race occurs if a time link is detached but not processed yet and some 
other process tried to insert it back into list.
    2) due another bug/problem in another place. The warning acts as an 
indicator that something is not correct (with the timers) in some other 
place.

of course the trick is to "read" the warning and figure out what 
situation is. Because 1) is harmless (at least so far) and 2) may be 
helpful to spot other problems. Also what timer is involved may give you 
an hint, as the race may not occur for all of them (only for the one 
that can be retriggered).

the race (1) probability can be highly increased because of problems 
(delays) in other part of the code (non-timer related) - see the case 
reported by TR Missner, where the cause was acc blocking in db access.

I spend today some time to deeply analyse the report from TR and to see 
if was (1) or (2) and actually they were both. I found a bug in the 
re-triggering the "delete" timer that could result in mem leaks.

So, as result, the "detached timer" is not a bug and it is not the 
result of a major flow in the TM's timers design (as Jiri was trying to 
imply). It is not even reporting a bug - in 99% of the case is just a 
log, but it 1% can provide useful information.

Hope this helped.


regards,
bogdan


Jerome Martin wrote:
> On Mon, 2007-04-02 at 15:22 +0200, Henning Westerholt wrote:
>> On Monday 02 April 2007 14:52, Jerome Martin wrote:
>> > [..]
>> > All this is a bit hard to understand for me right now, if someone wants
>> > to explain to me privately what this is all about, I would be all ears
>> > (and mouth shut).
>>
>> Discussed with private e-mail, perhaps we can now wait for bug reports from 
>> Jiri.
>>
>>     
>
> Agreed.
>
> Best Regards,
> Jerome
> ------------------------------------------------------------------------
>
> _______________________________________________
> Devel mailing list
> Devel at openser.org
> http://openser.org/cgi-bin/mailman/listinfo/devel
>   




More information about the Devel mailing list