[sr-dev] [Fwd: Re: [Kamailio-Users] TCP supervisor process in Kamailio]

Thu Jul 9 14:51:32 CEST 2009

On Jul 09, 2009 at 14:05, Klaus Darilion <klaus.mailinglists at pernau.at> wrote:
> 
> 
> Andrei Pelinescu-Onciul schrieb:
> >On Jul 09, 2009 at 13:50, Klaus Darilion <klaus.mailinglists at pernau.at> 
> >wrote:
> >>
> >>Andrei Pelinescu-Onciul schrieb:
> >>>On Jul 09, 2009 at 10:27, Klaus Darilion <klaus.mailinglists at pernau.at> 
> >>>wrote:
> >>>>Andrei Pelinescu-Onciul schrieb:
> >>>>>On Jul 07, 2009 at 15:53, Klaus Darilion 
> >>>>><klaus.mailinglists at pernau.at> wrote:
> >>>>>>Hi Andrei!
> >>>>>>
> >>>>>>Do you think this is also relevant for sip-router's TCP 
> >>>>>>implementation?
> >>>>>The elevated priority for tcp_main is a good idea. I'll add a config
> >>>>>option for it (right now there are real time prio config options, but
> >>>>>only for the timer processes).
> >>>>>The rest of the paper does not apply to sr or recent ser versions
> >>>>>(fd cache was implemented long time ago, tcp timers are much better,
> >>>>>there is no known dealdock a.s.o.).
> >>>>Hi Andrei!
> >>>>
> >>>>How are incoming TCP messages handled in detail? e.g. if there is 
> >>>>incoming data on a TCP connection: which process reads the data and 
> >>>>constructs the SIP message? Is this the supervisor (which handles only 
> >>>>full messages over to the TCP workers) or a worker?
> >>>It's the worker ("tcp_receiver" in sercmd ps output).
> >>>The supervisor ("tcp main") passes entire connections to the workers and
> >>>not messages.
> >>>When there is new data on a connection, tcp_main passes it to the
> >>>workers (round-robin). The worker that gets the connection will read
> >>>from it until it exhausts all the received data. After that it will start
> >>>a 5 s timeout. If no new data is received in this interval, it will
> >>>give-up the connection back to tcp_main. If new data is received, the
> >>>timeout will be extended (this timeout will keep connection with heavy
> >>>traffic in the same worker all the time allowing fast handling and it
> >>>will also accommodate traffic peaks).
> >>>
> >>>So a worker will read the data from the tcp connection, build the sip
> >>>message and run the routing script.
> >>So, a worker will read from multiple connections concurrently and as 
> >>soon as it received a full SIP message from any of these connections, it 
> >>will process this single message. After message processing it continues 
> >>reading from the connections. E.g:
> >>
> >>            |------
> >>tcpcon1---->|tcp_receiver
> >>tcpcon2---->|
> >>            |------
> >>
> >>1. data is available on con1: read the data, e.g. a half SIP message
> >>2. data is available on con2: read the data, e.g. a half SIP message
> >>3. data is available on con1: read the data, e.g. the second part of the 
> >>SIP message
> >>4. a complete message is available (con1), process the message
> >>5. data is available on con2: read the data, e.g. the second part of the 
> >>SIP message
> >>6. a complete message is available (con2), process the message
> >>
> >>Is this description correct?
> >
> >Yes, it is.
> 
> So, reading is completely non-blocking? Can it happen that the read from 
> the connection (polling) can block for any reason (in the supervisor or 
> worker)?

Yes, it's non-blocking. I can't say that's impossible to block 
(there can always be a bug in some syscall or some unknown new bug), 
but theoretically is shouldn't ever block (it's designed to be
non-blocking). If it does => critical bug.
So far I've never seen it blocking.

> 
> What about sending? If TCP asynch mode is enabled, who is sending - the 
> worker or the supervisor process? Is it possible that the sending blocks 
> (e.g. due to sending timeout, window-size 0, ...)?

In async mode the sending can be done directly by a "worker" (a tcp reader
process, a udp or sctp receiver) or by the supervisor.
On a fresh connection (no write data queued), a worker will attempt to
send directly. If it fails it enters async mode and it will queue the
data.
On a connection with data already queued (in "async mode"), the worker will
directly queue the data (will not attempt sending directly anymore).
All the "async" queued data is sent by the supervisor (in the future I
might add some write workers if it proves to make a difference in
tests). When all the async data was sent, the connection exists "async
mode" (the next worker will try again to send directly).

Almost all ser-side initiated connections will enter "async" mode when
they are first opened (because the connect() takes time and during the
connect phase the kernel does not queue any data on the socket, so we
 have to do it in ser).

In async mode the send never blocks (with the same disclaimers as for
the non-blocking read).  If no "real" send happens for tcp_send_timeout
(or tcp_connect_timeout if this is a not yet connected connection),
the connection will be closed, a failure will be reported and the
destination will be blacklisted. Same thing happens if the per
connection queued data exceeds tcp_conn_wq_max or the total queued data
in ser exceed tcp_wq_max.

Note that there are two data queues: one in the kernel (the socket write
buffer) and one in ser. Above by queued data I meant the data queued in
ser and not in the kernel.

Andrei