Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
regards klaus
-------- Original-Nachricht -------- Betreff: Re: [Kamailio-Users] TCP supervisor process in Kamailio Datum: Tue, 7 Jul 2009 15:00:49 +0200 Von: Pascal Maugeri pascal.maugeri@gmail.com An: Daniel-Constantin Mierla miconda@gmail.com CC: kamailio users@lists.kamailio.org Referenzen: 990aed650907070551k594ee26cl3056fa7090742e1b@mail.gmail.com 4A53466F.1070102@gmail.com
On Tue, Jul 7, 2009 at 2:58 PM, Daniel-Constantin Mierlamiconda@gmail.com wrote:
Hello,
On 07/07/2009 02:51 PM, Pascal Maugeri wrote:
Hi
I recently read the following in order to optimize OpenSER in handling TCP connections:
"First, the TCP supervisor process must be given an elevated priority level in order to prevent anomalous behavior due to the Linux scheduler."
First of all, as this is quite old paper
where is this paper?
http://www.cs.rice.edu/CS/Architecture/docs/ram-ispass08.pdf (Section 4.3, page 6). -pascal
Thanks, Daniel
(it refers to OpenSER 1.2), I'm wondering if such a tuning is still needed for Kamailio 1.5 branch ? If yes, how can I do this ?
Regards, Pascal
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
-- Daniel-Constantin Mierla http://www.asipto.com/
_______________________________________________ Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.). I also don't agree with what they are trying to prove (that TCP is comparable with UDP in terms of scalability). While there might have been some performance problems in early TCP implementations (and there are still a few even in sr, at least until I'll add fd sharing), you could never make sip over TCP use so few resources as sip over UDP. It's not only slower (due to the extra complexity in dealing with TCP connections, both in kernel and userspace, and due to the extra parsing needed to find SIP message boundaries on TCP), but it uses way more resources (lots of FDs, and lots of memory). The same could be said also for SCTP. UDP is still the best option in terms of performance and resources used.
Andrei
-------- Original-Nachricht -------- Betreff: Re: [Kamailio-Users] TCP supervisor process in Kamailio Datum: Tue, 7 Jul 2009 15:00:49 +0200 Von: Pascal Maugeri pascal.maugeri@gmail.com An: Daniel-Constantin Mierla miconda@gmail.com CC: kamailio users@lists.kamailio.org Referenzen: 990aed650907070551k594ee26cl3056fa7090742e1b@mail.gmail.com 4A53466F.1070102@gmail.com
On Tue, Jul 7, 2009 at 2:58 PM, Daniel-Constantin Mierlamiconda@gmail.com wrote:
Hello,
On 07/07/2009 02:51 PM, Pascal Maugeri wrote:
Hi
I recently read the following in order to optimize OpenSER in handling TCP connections:
"First, the TCP supervisor process must be given an elevated priority level in order to prevent anomalous behavior due to the Linux scheduler."
First of all, as this is quite old paper
where is this paper?
http://www.cs.rice.edu/CS/Architecture/docs/ram-ispass08.pdf (Section 4.3, page 6). -pascal
Thanks, Daniel
(it refers to OpenSER 1.2), I'm wondering if such a tuning is still needed for Kamailio 1.5 branch ? If yes, how can I do this ?
Regards, Pascal
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
-- Daniel-Constantin Mierla http://www.asipto.com/
Kamailio (OpenSER) - Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users http://lists.openser-project.org/cgi-bin/mailman/listinfo/users
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Andrei Pelinescu-Onciul schrieb:
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.).
Hi Andrei!
How are incoming TCP messages handled in detail? e.g. if there is incoming data on a TCP connection: which process reads the data and constructs the SIP message? Is this the supervisor (which handles only full messages over to the TCP workers) or a worker?
What happens if a bad client only sends a half SIP message: is the process blocked until the full message is received or is it able to read from multiple connections concurrently?
thanks klaus
On Jul 09, 2009 at 10:27, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.).
Hi Andrei!
How are incoming TCP messages handled in detail? e.g. if there is incoming data on a TCP connection: which process reads the data and constructs the SIP message? Is this the supervisor (which handles only full messages over to the TCP workers) or a worker?
It's the worker ("tcp_receiver" in sercmd ps output). The supervisor ("tcp main") passes entire connections to the workers and not messages. When there is new data on a connection, tcp_main passes it to the workers (round-robin). The worker that gets the connection will read from it until it exhausts all the received data. After that it will start a 5 s timeout. If no new data is received in this interval, it will give-up the connection back to tcp_main. If new data is received, the timeout will be extended (this timeout will keep connection with heavy traffic in the same worker all the time allowing fast handling and it will also accommodate traffic peaks).
So a worker will read the data from the tcp connection, build the sip message and run the routing script.
What happens if a bad client only sends a half SIP message: is the process blocked until the full message is received or is it able to read from multiple connections concurrently?
No, each worker can handle as many concurrent connections as memory allows. If you are thinking about the recently publicized (but quite obvious to people writing proxies) "slowloris" HTTP DoS, it wouldn't work (either on sip-router or any other *ser, kamailio a.s.o).
Note that the tcp code is optimized for lots of connections (e.g. outbound proxy, registrar) and not for very few connections (e.g. inter-domain). Last time I've checked (older ser version pre 2.1) it could handle 120k tcp connections with traffic on them, on a 4Gb RAM machine (2Gb ser/ 2Gb free/kernel). After that the kernel will start to run out of mem.
Andrei
Andrei Pelinescu-Onciul schrieb:
On Jul 09, 2009 at 10:27, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.).
Hi Andrei!
How are incoming TCP messages handled in detail? e.g. if there is incoming data on a TCP connection: which process reads the data and constructs the SIP message? Is this the supervisor (which handles only full messages over to the TCP workers) or a worker?
It's the worker ("tcp_receiver" in sercmd ps output). The supervisor ("tcp main") passes entire connections to the workers and not messages. When there is new data on a connection, tcp_main passes it to the workers (round-robin). The worker that gets the connection will read from it until it exhausts all the received data. After that it will start a 5 s timeout. If no new data is received in this interval, it will give-up the connection back to tcp_main. If new data is received, the timeout will be extended (this timeout will keep connection with heavy traffic in the same worker all the time allowing fast handling and it will also accommodate traffic peaks).
So a worker will read the data from the tcp connection, build the sip message and run the routing script.
So, a worker will read from multiple connections concurrently and as soon as it received a full SIP message from any of these connections, it will process this single message. After message processing it continues reading from the connections. E.g:
|------ tcpcon1---->|tcp_receiver tcpcon2---->| |------
1. data is available on con1: read the data, e.g. a half SIP message 2. data is available on con2: read the data, e.g. a half SIP message 3. data is available on con1: read the data, e.g. the second part of the SIP message 4. a complete message is available (con1), process the message 5. data is available on con2: read the data, e.g. the second part of the SIP message 6. a complete message is available (con2), process the message
Is this description correct?
thanks Klaus
On Jul 09, 2009 at 13:50, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 09, 2009 at 10:27, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.).
Hi Andrei!
How are incoming TCP messages handled in detail? e.g. if there is incoming data on a TCP connection: which process reads the data and constructs the SIP message? Is this the supervisor (which handles only full messages over to the TCP workers) or a worker?
It's the worker ("tcp_receiver" in sercmd ps output). The supervisor ("tcp main") passes entire connections to the workers and not messages. When there is new data on a connection, tcp_main passes it to the workers (round-robin). The worker that gets the connection will read from it until it exhausts all the received data. After that it will start a 5 s timeout. If no new data is received in this interval, it will give-up the connection back to tcp_main. If new data is received, the timeout will be extended (this timeout will keep connection with heavy traffic in the same worker all the time allowing fast handling and it will also accommodate traffic peaks).
So a worker will read the data from the tcp connection, build the sip message and run the routing script.
So, a worker will read from multiple connections concurrently and as soon as it received a full SIP message from any of these connections, it will process this single message. After message processing it continues reading from the connections. E.g:
|------
tcpcon1---->|tcp_receiver tcpcon2---->| |------
- data is available on con1: read the data, e.g. a half SIP message
- data is available on con2: read the data, e.g. a half SIP message
- data is available on con1: read the data, e.g. the second part of the
SIP message 4. a complete message is available (con1), process the message 5. data is available on con2: read the data, e.g. the second part of the SIP message 6. a complete message is available (con2), process the message
Is this description correct?
Yes, it is.
Andrei
Andrei Pelinescu-Onciul schrieb:
On Jul 09, 2009 at 13:50, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 09, 2009 at 10:27, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Do you think this is also relevant for sip-router's TCP implementation?
The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.).
Hi Andrei!
How are incoming TCP messages handled in detail? e.g. if there is incoming data on a TCP connection: which process reads the data and constructs the SIP message? Is this the supervisor (which handles only full messages over to the TCP workers) or a worker?
It's the worker ("tcp_receiver" in sercmd ps output). The supervisor ("tcp main") passes entire connections to the workers and not messages. When there is new data on a connection, tcp_main passes it to the workers (round-robin). The worker that gets the connection will read from it until it exhausts all the received data. After that it will start a 5 s timeout. If no new data is received in this interval, it will give-up the connection back to tcp_main. If new data is received, the timeout will be extended (this timeout will keep connection with heavy traffic in the same worker all the time allowing fast handling and it will also accommodate traffic peaks).
So a worker will read the data from the tcp connection, build the sip message and run the routing script.
So, a worker will read from multiple connections concurrently and as soon as it received a full SIP message from any of these connections, it will process this single message. After message processing it continues reading from the connections. E.g:
|------
tcpcon1---->|tcp_receiver tcpcon2---->| |------
- data is available on con1: read the data, e.g. a half SIP message
- data is available on con2: read the data, e.g. a half SIP message
- data is available on con1: read the data, e.g. the second part of the
SIP message 4. a complete message is available (con1), process the message 5. data is available on con2: read the data, e.g. the second part of the SIP message 6. a complete message is available (con2), process the message
Is this description correct?
Yes, it is.
So, reading is completely non-blocking? Can it happen that the read from the connection (polling) can block for any reason (in the supervisor or worker)?
What about sending? If TCP asynch mode is enabled, who is sending - the worker or the supervisor process? Is it possible that the sending blocks (e.g. due to sending timeout, window-size 0, ...)?
regards klaus
On Jul 09, 2009 at 14:05, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 09, 2009 at 13:50, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 09, 2009 at 10:27, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Andrei Pelinescu-Onciul schrieb:
On Jul 07, 2009 at 15:53, Klaus Darilion klaus.mailinglists@pernau.at wrote: >Hi Andrei! > >Do you think this is also relevant for sip-router's TCP >implementation? The elevated priority for tcp_main is a good idea. I'll add a config option for it (right now there are real time prio config options, but only for the timer processes). The rest of the paper does not apply to sr or recent ser versions (fd cache was implemented long time ago, tcp timers are much better, there is no known dealdock a.s.o.).
Hi Andrei!
How are incoming TCP messages handled in detail? e.g. if there is incoming data on a TCP connection: which process reads the data and constructs the SIP message? Is this the supervisor (which handles only full messages over to the TCP workers) or a worker?
It's the worker ("tcp_receiver" in sercmd ps output). The supervisor ("tcp main") passes entire connections to the workers and not messages. When there is new data on a connection, tcp_main passes it to the workers (round-robin). The worker that gets the connection will read from it until it exhausts all the received data. After that it will start a 5 s timeout. If no new data is received in this interval, it will give-up the connection back to tcp_main. If new data is received, the timeout will be extended (this timeout will keep connection with heavy traffic in the same worker all the time allowing fast handling and it will also accommodate traffic peaks).
So a worker will read the data from the tcp connection, build the sip message and run the routing script.
So, a worker will read from multiple connections concurrently and as soon as it received a full SIP message from any of these connections, it will process this single message. After message processing it continues reading from the connections. E.g:
|------
tcpcon1---->|tcp_receiver tcpcon2---->| |------
- data is available on con1: read the data, e.g. a half SIP message
- data is available on con2: read the data, e.g. a half SIP message
- data is available on con1: read the data, e.g. the second part of the
SIP message 4. a complete message is available (con1), process the message 5. data is available on con2: read the data, e.g. the second part of the SIP message 6. a complete message is available (con2), process the message
Is this description correct?
Yes, it is.
So, reading is completely non-blocking? Can it happen that the read from the connection (polling) can block for any reason (in the supervisor or worker)?
Yes, it's non-blocking. I can't say that's impossible to block (there can always be a bug in some syscall or some unknown new bug), but theoretically is shouldn't ever block (it's designed to be non-blocking). If it does => critical bug. So far I've never seen it blocking.
What about sending? If TCP asynch mode is enabled, who is sending - the worker or the supervisor process? Is it possible that the sending blocks (e.g. due to sending timeout, window-size 0, ...)?
In async mode the sending can be done directly by a "worker" (a tcp reader process, a udp or sctp receiver) or by the supervisor. On a fresh connection (no write data queued), a worker will attempt to send directly. If it fails it enters async mode and it will queue the data. On a connection with data already queued (in "async mode"), the worker will directly queue the data (will not attempt sending directly anymore). All the "async" queued data is sent by the supervisor (in the future I might add some write workers if it proves to make a difference in tests). When all the async data was sent, the connection exists "async mode" (the next worker will try again to send directly).
Almost all ser-side initiated connections will enter "async" mode when they are first opened (because the connect() takes time and during the connect phase the kernel does not queue any data on the socket, so we have to do it in ser).
In async mode the send never blocks (with the same disclaimers as for the non-blocking read). If no "real" send happens for tcp_send_timeout (or tcp_connect_timeout if this is a not yet connected connection), the connection will be closed, a failure will be reported and the destination will be blacklisted. Same thing happens if the per connection queued data exceeds tcp_conn_wq_max or the total queued data in ser exceed tcp_wq_max.
Note that there are two data queues: one in the kernel (the socket write buffer) and one in ser. Above by queued data I meant the data queued in ser and not in the kernel.
Andrei
Hi Andrei!
Thanks for the detailed description, we should put it on the wiki.
Andrei Pelinescu-Onciul wrote:
In async mode the send never blocks (with the same disclaimers as for the non-blocking read). If no "real" send happens for tcp_send_timeout (or tcp_connect_timeout if this is a not yet connected connection), the connection will be closed, a failure will be reported and the destination will be blacklisted. Same thing happens if the per
So, how does send_timeout work in async mode? The write is non-blocking, that means the kernel accepts the data and the write/send function returns immediately. Now, the kernel tries to send the data for send_timeout seconds. If this fails, what happens now? Is there a callback from kernel to ser or does ser somehow poll if the sending was successful?
thanks klaus
On Jul 09, 2009 at 16:18, Klaus Darilion klaus.mailinglists@pernau.at wrote:
Hi Andrei!
Thanks for the detailed description, we should put it on the wiki.
Andrei Pelinescu-Onciul wrote:
In async mode the send never blocks (with the same disclaimers as for the non-blocking read). If no "real" send happens for tcp_send_timeout (or tcp_connect_timeout if this is a not yet connected connection), the connection will be closed, a failure will be reported and the destination will be blacklisted. Same thing happens if the per
So, how does send_timeout work in async mode? The write is non-blocking, that means the kernel accepts the data and the write/send function returns immediately. Now, the kernel tries to send the data for send_timeout seconds. If this fails, what happens now? Is there a callback from kernel to ser or does ser somehow poll if the sending was successful?
No, it does not work for data already queued in the kernel, only for data queued in ser. So it's in fact the timeout for moving data from ser buffers to kernel buffers and not a "on-the-wire" timeout. In fact even in non-async mode the send_timeout is the same thing (timeout for moving the data into the kernel socket buffers and not the real send timeout). We have real send timeouts only for sctp.
Andrei
Andrei Pelinescu-Onciul wrote:
In async mode the sending can be done directly by a "worker" (a tcp reader process, a udp or sctp receiver) or by the supervisor. On a fresh connection (no write data queued), a worker will attempt to send directly. If it fails it enters async mode and it will queue the data. On a connection with data already queued (in "async mode"), the worker will directly queue the data (will not attempt sending directly anymore). All the "async" queued data is sent by the supervisor (in the future I might add some write workers if it proves to make a difference in tests). When all the async data was sent, the connection exists "async mode" (the next worker will try again to send directly).
Almost all ser-side initiated connections will enter "async" mode when they are first opened (because the connect() takes time and during the connect phase the kernel does not queue any data on the socket, so we have to do it in ser).
In async mode the send never blocks (with the same disclaimers as for the non-blocking read). If no "real" send happens for tcp_send_timeout (or tcp_connect_timeout if this is a not yet connected connection), the connection will be closed, a failure will be reported and the destination will be blacklisted. Same thing happens if the per connection queued data exceeds tcp_conn_wq_max or the total queued data in ser exceed tcp_wq_max.
Note that there are two data queues: one in the kernel (the socket write buffer) and one in ser. Above by queued data I meant the data queued in ser and not in the kernel.
Andrei
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
Andrei,
Could you please elaborate on SSL based connections, how are they handled? The same as TCP-based.
Thanks Vadim
On Jul 09, 2009 at 18:14, Vadim Lebedev vadim@mbdsys.com wrote:
Andrei Pelinescu-Onciul wrote:
In async mode the sending can be done directly by a "worker" (a tcp reader process, a udp or sctp receiver) or by the supervisor. On a fresh connection (no write data queued), a worker will attempt to send directly. If it fails it enters async mode and it will queue the data. On a connection with data already queued (in "async mode"), the worker will directly queue the data (will not attempt sending directly anymore). All the "async" queued data is sent by the supervisor (in the future I might add some write workers if it proves to make a difference in tests). When all the async data was sent, the connection exists "async mode" (the next worker will try again to send directly).
Almost all ser-side initiated connections will enter "async" mode when they are first opened (because the connect() takes time and during the connect phase the kernel does not queue any data on the socket, so we have to do it in ser).
In async mode the send never blocks (with the same disclaimers as for the non-blocking read). If no "real" send happens for tcp_send_timeout (or tcp_connect_timeout if this is a not yet connected connection), the connection will be closed, a failure will be reported and the destination will be blacklisted. Same thing happens if the per connection queued data exceeds tcp_conn_wq_max or the total queued data in ser exceed tcp_wq_max.
Note that there are two data queues: one in the kernel (the socket write buffer) and one in ser. Above by queued data I meant the data queued in
Andrei,
Could you please elaborate on SSL based connections, how are they handled? The same as TCP-based.
In principle yes. However the async mode is not supported yet for SSL, so if a SSL write blocks, it will block the whole process. There is also a problem with read. On SSL a read might block because it wants to write data (SSL_ERROR_WANT_WRITE) if the kernel socket send buffer is full. This could happen due to a key renegotiation. This case is not handled and so a read blocked because it wants to write might never be awaken (this read waiting for write condition ends only if the peer sends more data or some ser process tries to send something on tls, but if neither happen the read will be blocked until the connection lifetime timeout hits and the connection is closed). Luckily in practice key renegotiation is very seldom and even then there's a very low probability of meeting all the condition needed to trigger this bug.
Support for making tls async and fixing the read problem is partially commited, however this is low priority so it might take me a while until I get back to it.
Another big problem is the licence of the module. The module is GPL licensed but uses OpenSSL and since the openssl license adds additional restrictions we need an openssl exemption granted by all of (c) holders. However one of (c) holders is FSF, so we might need to remove all that code (I don't think there is much remaining in the tls modules, but still someone would need to do the checking).
Note that the above problems are common to all *ser versions. The main differences between sip-router/ser 2.* and other versions are: - proper locking (vs. relying on luck and few simultaneous connections) - more workarounds for various openssl bugs - support for domain config file (certificate and various options on a per domain basis), which can be reloaded at runtime - less work on tcp_main (supervisor) (vs more work in tcp_main and less in the workers)
Andrei