Hi Andreas,
You are probably one of the people on the list with the most experience with
replication. AFAIK, you are correct on all statements below. I assume you
have SERs on different locations since TCP connect timeout is a problem?
But I'm not sure why removing the cache would help you?! Unless you
want to move to a cluster or DB layer replication?
IMHO, there are only two valid paths for replication in SER: Either develop
a SIP-layer replication with guaranteed deliveries, queue, non-blocking etc
(which ends up being proprietary SER) or patch up SER to better be able to
handle DB-based replication.
I lean towards DB-based replication. Two prominent things that must be
handled: Storing the Path information for proper routing of messages to UAs
behind NAT and a cache that checks the DB if location is not found in
memory.
I would be very interested in patches for this in the experimental CVS
module ;-)
g-)
Andreas Granig wrote:
Hi all,
we use DNSSRV balancing and forward_tcp() to replicate registrations
from one SER to the other SERs in the system.
Now when one machine completely crashes, all other SER processes on
all other machines hang when processing a REGISTER until tcp-connect
times out, leading to a system load of ~16 per machine assuming 16
child processes per SER, and no other messages can be processed.
I understand that replicating using UDP would solve this issue, but
then replicated registrations get lost every now and then because of
unreliable transmission, and as far as I found out t_replicate() can
only be used for replicating to *one* other SER.
This really gets me thinking about patching out the internal location
cache and lookup every location from memory, because this additional
lookup really doesn't hurt because of ~10 other DB queries per call.
IMHO in systems with more than two SERs this cache is just a big pain.
Andy
_______________________________________________
Serusers mailing list
serusers(a)lists.iptel.org
http://lists.iptel.org/mailman/listinfo/serusers