Hi all,
we use DNSSRV balancing and forward_tcp() to replicate registrations
from one SER to the other SERs in the system.
Now when one machine completely crashes, all other SER processes on all
other machines hang when processing a REGISTER until tcp-connect times
out, leading to a system load of ~16 per machine assuming 16 child
processes per SER, and no other messages can be processed.
I understand that replicating using UDP would solve this issue, but then
replicated registrations get lost every now and then because of
unreliable transmission, and as far as I found out t_replicate() can
only be used for replicating to *one* other SER.
This really gets me thinking about patching out the internal location
cache and lookup every location from memory, because this additional
lookup really doesn't hurt because of ~10 other DB queries per call.
IMHO in systems with more than two SERs this cache is just a big pain.
Andy