[Devel] tcp keepalive question
Dan Pascu
dan at ag-projects.com
Tue Dec 19 10:34:51 CET 2006
On Tuesday 19 December 2006 08:59, Juha Heinanen wrote:
> Dan Pascu writes:
> > When you combine this fact with the blocking nature of OpenSER, it
> > can raise serious issues when many TCP clients are connected and
> > there is a high churn rate among them combined with high SIP
> > traffic.
>
> so your conclusion is that there is nothing openser can do to keep nat
> bindings open to tcp UAs and that they should take care of that
> themselves like the nokia phones do?
Not at all. Sorry if I generated this confusion. This is probably because
I was highlighting a different but related issue.
Using a form of keepalive (be it client based or server based) is needed
to prevent the NAT from closing, nobody is arguing with that. Also using
a server based solution like the one with TCP_KEEPALIVE has the advantage
that it works with UAs that do not support TCP keepalives.
What I was saying, is that even if the UA does it or the server does it,
when the moment comes for the server to send a request to the client it
may block even if the NAT was kept open.
Imagine the case where UA1 uses TCP and the NAT is kept open (either from
the client side or the server side it doesn't matter). Assume the
keepalive procedure was just performed and a few seconds later the
connection with UA1 is severed because of some network connectivity
issue. The server doesn't know this since the TCP connection was not
deliberately closed and the next TCP keepalive procedure is still not to
be done for a while. If at that moment a request comes for UA1, openser
will block until it timeouts. It doesn't matter that this timeout may be
small (say 3 seconds), the fact that all TCP workers may be blocked at
some point for 3 seconds and cannot do a thing is bad.
This cannot be solved with the current architecture and the more TCP
clients one has, the more increased chances there are that this scenario
will happen.
IMO, OpenSER's architecture is well suited for UDP clients and at most
some TCP/TLS peers. But even in this case if a peer is down and there are
many connection towards that peer, again many TCP workers will be
blocked, and other TCP peers cannot be handled meanwhile. While in this
case the worst that can happen is that for a while (tenths of seconds)
until the connection with the disconnected peer does timeout and it is
closed, there will be a small gap in which TCP/TLS connections will not
work or work poorly, but after that it'll resume.
However with many TCP clients, the chances that many of them get
disconnected and keep TCP workers idle waiting for a timeout and unable
to process anything else is much higher.
The only way out of this is to use an architecture based around an event
driven reactor, or to use multithreading. Of course in addition to these
(that only guarantee that openser won't block on a rogue client), a TCP
keepalive mechanism is required for keeping the NAT open, there is no
argument about that.
--
Dan
More information about the Devel
mailing list