[Serusers] Replication problem

Mon Aug 1 11:01:49 CEST 2005

:-) Jan participated in a discussion on serusers on a new cache system where 
all locations where not loaded at start-up, but rather at need. I'm not sure 
how this plays with nathelper, but I'm sure he has been thinking about it. 
Maybe we could help him with the cache coding?
BTW, only the SER acting as a SIP register for a given UA needs to ping. 
This means that it will be easier, as no locations outside the cache will 
have to be pinged.
g-)

Klaus Darilion wrote:
> Hi Andy!
>
> Another problem: nathelper uses the in memory location table to ping
> natted clients. Thus, also nathelper would have to query the database
> and we need a process to watch the expires and delete outdated
> entries.
> regards,
> klaus
>
> Greger V. Teigre wrote:
>> Hi Andreas,
>> You are probably one of the people on the list with the most
>> experience with replication.  AFAIK, you are correct on all
>> statements below. I assume you have SERs on different locations
>> since TCP connect timeout is a problem?
>>    But I'm not sure why removing the cache would help you?!  Unless
>> you want to move to a cluster or DB layer replication?
>>
>> IMHO, there are only two valid paths for replication in SER: Either
>> develop a SIP-layer replication with guaranteed deliveries, queue,
>> non-blocking etc (which ends up being proprietary SER) or patch up
>> SER to better be able to handle DB-based replication.
>> I lean towards DB-based replication.  Two prominent things that must
>> be handled: Storing the Path information for proper routing of
>> messages to UAs behind NAT and a cache that checks the DB if
>> location is not found in memory.
>>
>> I would be very interested in patches for this in the experimental
>> CVS module ;-)
>>
>> g-)
>>
>> Andreas Granig wrote:
>>
>>> Hi all,
>>>
>>> we use DNSSRV balancing and forward_tcp() to replicate registrations
>>> from one SER to the other SERs in the system.
>>>
>>> Now when one machine completely crashes, all other SER processes on
>>> all other machines hang when processing a REGISTER until tcp-connect
>>> times out, leading to a system load of ~16 per machine assuming 16
>>> child processes per SER, and no other messages can be processed.
>>>
>>> I understand that replicating using UDP would solve this issue, but
>>> then replicated registrations get lost every now and then because of
>>> unreliable transmission, and as far as I found out t_replicate() can
>>> only be used for replicating to *one* other SER.
>>>
>>> This really gets me thinking about patching out the internal
>>> location cache and lookup every location from memory, because this
>>> additional lookup really doesn't hurt because of ~10 other DB
>>> queries per call. IMHO in systems with more than two SERs this cache is 
>>> just a big
>>> pain. Andy
>>>
>>> _______________________________________________
>>> Serusers mailing list
>>> serusers at lists.iptel.org
>>> http://lists.iptel.org/mailman/listinfo/serusers
>>
>>
>> _______________________________________________
>> Serusers mailing list
>> serusers at lists.iptel.org
>> http://lists.iptel.org/mailman/listinfo/serusers