[OpenSER-Devel] SF.net SVN: openser: [4242] trunk

Mon May 26 21:47:40 CEST 2008

On Monday 26 May 2008, Juha Heinanen wrote:
> Dan Pascu writes:
>  > When a proxy dies, the others will take over
>  > the resources managed by the dead proxy and redistribute them among
>  > themselves, thus the subscribers of the dead proxy are automatically
>  > moved to a new proxy and become available again when they send their
>  > next registration. Even before they re-register with the new home
>  > proxy, they can still make outgoing calls immediately after their
>  > home proxy failure, because them being mapped to a new proxy is
>  > instantaneous once a dead proxy is removed from the network.
>
> dan,
>
> there is one more issue with this.  if a UA makes or receives a call
> via one of the proxies and this proxy dies before the call ends, how
> can the UA be able to deliver or receive a bye, when the proxy in the
> route set is gone?

It can't. This is something that is accepted as part of the design. Once 
any of the proxies that have added a Record-Route set is gone, further 
in-dialog messages cannot be sent. This is an issue with any design, not 
only the distributed design I mentioned.

> this is not an issue with loadbalancer, because its 
> address never changes even if is replaced by another one.

I'd say you have the same issue with the loadbalancer scheme as well. Only 
that you do not have it with the loadbalancer box itself, because its 
clustered, but you have the same issue with any of the proxies behind the 
loadbalancer. Unless you double every box in your system you have the 
same problem when one of the proxies behind the loadbalancer dies. But 
the distributed scheme can protect itself too against this problem, by 
also doubling every proxy in the network, however this is not justified 
as such failures are rare and the disruption is very small.

Even more, you cannot guarantee 100% resilience even if you make every 
proxy a cluster, because there is a small time window when the slave has 
not yet taken over but the master is dead. This can be between 30 to 120 
seconds depending on the cluster configuration, but during that interval 
all the in-dialog messages are lost.
Also there are cases where a clustered proxy loses network connectivity 
completely, so even if the master is alive and there, is not reachable so 
in-dialog messages will be lost. The cluster doesn't help at all in this 
case.

This issue is not new to the distributed design. The purpose of the 
distributed design is to enhance scalability and eliminate single points 
of failure and redundant boxes at the small expense of losing some 
resilience when catastrophic failures happen, but provide the ability of 
the network to recover itself quickly withot human intervention.

-- 
Dan