[Users] Re: Request for discussion: t_relay() internal error processing

Tue Sep 5 18:05:04 CEST 2006

>>    2) t_relay will return several error codes:
>>          -1 = no transaction created -> need to use sl_* functions
>>          -2 = transaction created, relay failed -> may destroy 

If I understand this correctly, the difference between these two
return codes really has to do with who it is that cares about it...

-2 relates to this exact call, at this exact moment,
chances are the very next call the openser box handles will be OK.

It looks to me like -1 means serious trouble, for the entire call
switching platform.  It sounds like it could happen because of a
resource shortage (memory), possibly because of a bug in openser, 
or a bug in the openser.cfg.  If you get -1 once, then the chances that
you'll get the same code on the next call are very high.

Let's say we've run out of memory.  Then, what tools are available at
the openser.cfg script level that can handle this?  Is there a way to
force some internal garbage collection?  Or to do a warm-restart
(e.g., trash all transactions, but not registrations)?

My point here is this: if I'm right about -1 meaning serious trouble
that is likely to mean serious problems from now on, then this is a more
general problem than just t_relay, and much bigger than just handling this
one call or a call to one particular neighbour.

It's true that the application may halt and catch fire, but there are
plenty of errors that are catchable by openser and the larger question
seems to be:  what do we do then?

If there were script-level tools that could be used to try to manage
the failure, possibly clean up (in a large way) and try to keep going,
then maybe we should have openser_failure route blocks that the
user can register with openser to handle various errors.

Thanks,
-mark