On Fri, Feb 27, 2015 at 9:59 AM, Olle E. Johansson <oej@edvina.net> wrote:
Actually, it's the latter. Our current high availability setup reilies on anycast. And with TCP, this would mean a huge change in our setup.
That is in fact an interesting topic. Can you please elaborate a bit more on this  as I would like to see what we can
do in the software to make things easier.

I guess I can. Maybe the presentation of Krischan from last year's Kamailio world will explain a lot:
http://www.kamailio.org/events/2014-KamailioWorld/day2/17-Krischan.Udelhoven-10-Years-Of-Working-With-Kamailio-At-Sipgate.pdf

The setup on slide 9 is basically how our setup still looks like. We have a few loadbalancers in different data centers sharing the same IP. And depending on where the customer hits our network and how the call gets routed, the loadbalancer handling the request and the one handling the reply don't have to be the same. That's why UDP is so comfortable to work with. We don't have a state, we don't care about the number of open TCP sessions on one machine, and if one machine goes offline, OSPF makes sure, the IP is still available.

When using TCP, we would have to make sure, every request leaves our network on the same machine where it came in, because that's the machine where the TCP session is open. That would mean to have a distinct OSPF weight for each IP, so all packets always get routed to the same machine. And that would mean, we probably would have to do a DNS round robin thing to loadbalance the incoming traffic, since we wouldn't want to have all traffic coming in on only one machine.
 
I had a similar discussion a while ago and it seems like failover handling is easier in UDP and we will need to fix this in order to be able to migrate more users to TLS.

Yes, it for sure is a failover thing. We have a TCP and TLS server, too, but this is not HA and only for test purposes right now. We probably would use something with DNS RR, but aren't sure about how many clients we could handle on one TCP or TLS machine. In our tests, we had about 16k sessions open at the same time, but that would mean a lot of machines for all our customers.
 
I haven't tested how different clients behave in regards of TCP if the server close a connection.

From what I saw with snom phones connected to the TLS machine, when we restart the Kamailio process, the clients aren't reachable for inbound calls until the next reregistration. They establish a new TCP connection for an outbound call, but this connection can't be used for inbound calls. After the next reregistration, everything is okay again.
 
Sebastian