[Serusers] Carrier-grade framework for SER

Thu Jan 27 12:53:30 CET 2005

Let me try to sort out the issues we are discussing here, so we at least can 
see if we agree to the goals:

1. Reliability and scalability issues
-----------
Scenario: Tens of thousands or hundreds of thousands of users require a 
reliable and scalable infrastructure
Goal: Find a good reference scenario for building a reliable and scalable 
infrastructure of ser servers.
Problems: Everybody tries to solve this their own way and most keep their 
solutions as a secret because it is a competitive advantage not to tell 
anybody.

**** I think that your solution to #1 will dominate the discussions on the 
issues below.  Using RADIUS (and possibly LDAP back-ends) for everything but 
usrloc is one solution that seems to be Juha's scenario (and mine). Andreas 
uses mysql for subscriber info as well.  Do you have one server center with 
load balancing or geographically-distributed server centers?  It will 
influence your needs.
So, let's sort out our scenarios before we discuss what is the "best " 
solution.

2. Usrloc replication across standalone ser servers.
------------
Scenario: Independent servers with independent databases run either with 
some sort of load balancing or DNS SRV.
Goal: Make sure that all ser servers have updated usrloc information, so 
each can handle any SIP message.
Problems: Distribute REGISTER messages to all servers; Make sure that server 
unavailability does not corrupt the usrloc DB state

*** We all have this issue.  It is my understanding that t_replicate: a) 
uses SIP messages b) uses a best-effort algorithm (haven't looked at the 
code...) c) can be used between several servers, but when you introduce a 
new server, you need to change each server's ser.cfg
My suggestion for a simple solution based on the discussion so far:  Extend 
t_replicate with a guaranteed mode of replication.  mysql can be used as a 
queue with replication states (or even a text-file for that sake).  Whether 
SIP messages are used or TCP/IP-based FIFO is really based on an estimation 
of network traffic.
Result: The least work and the code is an integrated part of ser.

3. Network-based provisioning of new users, aliases, etc
------------
Scenario: One server need to be provisioned from a web server or process 
running on a remote server
Goal: Allow ser to receive TCP/IP based provisioning messages
Problems: ser's FIFO does not have a TCP/IP interface

*** I think this is an extension to ser that would benefit many people.  I 
also believe that a provisioning interface should be SOAP based due to share 
number of projects that probably will use the interface for provisioning.

4. Replication of user database, aliases, etc across standalone ser servers.
------------
Scenario: Independent servers with independent databases run either with 
some sort of load balancing or DNS SRV and subscriber information is stored 
in sql tables
Goal: Make sure that each server recognizes all subscribers, aliases, etc
Problems: Make sure that all servers have updated database tables

*** RADIUS/LDAP solutions do not need to do this as RADIUS servers, LDAP 
replication etc take care of both reliability and scalability.  However, I 
think ser support more than one RADIUS server. A defined secondary server 
would be useful.
With SQL-based scenarios however, I see three natural solutions:
a) Rely on sql-based replication. Without checking this, I believe ser 
always write such FIFO commands directly to the DB, so sql-level replication 
should work
b) Extend ser's FIFO to also have a replication configuration, i.e. in 
ser.cfg you define the peer servers that need replication. If the extension 
to t_replicate uses TCP/IP based FIFO, the code can be re-used.
c) Implement provisioning systems so that each ser server is updated through 
the TCP/IP-based FIFO

To be honest, I'm not sure if I see the value of such an effort (b).  Also, 
as usage of sql for storage is just one of several modes, it is probably not 
right to integrate such code into FIFO.  a) and b) are more natural choices.

--------------------------------------------------------------------------

My summary and conclusions:
- I believe a TCP/IP-based FIFO (#3) is a core feature that we all can agree 
would be useful and natural to implement;
- I don't know the details of how t_replicate functions, but Juha's opinion 
is that it takes care of all the issues Andreas points out except one: The 
amount of traffic SIP messages create.  I will not interfere with this 
discussion, of course, if t_replicate can handle unavailable servers etc, 
that would be great. Anyway, a reliable replication of usrloc is essential 
to a carrier-grade architecture
- After this discussion, I now believe we should keep provisioning (#3) and 
the two types of replication (#2 and #4) separate also in implementation.

Well, my attempt at sorting out issues.  Any succes, you think? ;-)

g-)

Andreas Granig wrote:
> Juha Heinanen wrote:
>> you can have any number of proxies participating in replication.
>
> What method are you thinking of? t_replicate() reports
>
>   ERROR: t_newtran: transaction already in process 0x4054d5ec
>
> if you call it twice, like
>
>   t_replicate("foohost", "5060");
>   t_replicate("barhost", "5060");
>
> Or do you mean something like
>
>   forward_tcp("foohost", "5060");
>   forward_tcp("barhost", "5060");
>
> and on the receiving hosts
>
>   if(/* register from replicating host */)
>     save_noreply("location");
>
> which would be a possibility, indeed...
>
>>  > Beside that the domain tables (location etc) get out of synch if
>>  one of > the SERs is down for a moment, because retransmission is
>>  only tried a > few times.
>>
>> i don't see why this needs to be the case with db mode 2.  when ser
>> comes back up, it updates its location table from database.
>
> I think mode 1 (Write-Through) should be used because the SER could
> start up while some of the contacts aren't flushed to DB yet.
>
> However, how would you set up your database connections here? Using a
> common usrloc database for all hosts (-> single point of failure)?
> This is the main point. _How_ do you share the contacts as reliable as
> possible so that a host can go down for a while without getting out of
> synch regarding the contacts?
>
> Andy
>
> _______________________________________________
> Serusers mailing list
> serusers at lists.iptel.org
> http://lists.iptel.org/mailman/listinfo/serusers