[sr-dev] Problem with TCP and EPOLL

Paul Pankhurst paul at crocodile-rcs.com
Fri Feb 17 11:35:36 CET 2012


I now understand what is going wrong....

To make the xcap server work with the size of documents generated by the SIP 
client, I had to significantly increase the size of tcp_rd_buf_size.
Increasing this value is what causes the problem described.
Returning tcp_rd_buf_size to it's default size resolves the problem, but 
causes the upload of documents to the xcap server  to fail.

One way of solving this would be to allow the buffer size to be settable on 
a per connection basis, or perhaps separately for local connections.
Does anyone have any thoughts, or other suggestions?

Thanks

Paul

-----Original Message----- 
From: Andrei Pelinescu-Onciul
Sent: Thursday, February 16, 2012 1:53 PM
To: Daniel-Constantin Mierla
Cc: Development mailing list of the sip-router project ; Paul Pankhurst
Subject: Re: [sr-dev] Problem with TCP and EPOLL

On Feb 15, 2012 at 12:05, Daniel-Constantin Mierla <miconda at gmail.com> 
wrote:
> Hello,
>
> I am cc-ing Andrei, since he authored that part, maybe he is
> available these days and can give a quick answer regarding the
> issue.
>
> Cheers,
> Daniel
>
> On 2/14/12 6:06 PM, Paul Pankhurst wrote:
> >Sorry this was originally posted incorrectly, so I'm reposting....
> >I have been having problems with TCP under load.  What I have been
> >seeing is
> >TCP buffers failing to be serviced and, when wr_timeout exceeds the
> >configured value for tcp_send_timeout, kamailio kills the connection.
> >Increasing tcp_send_timeout doesn't help, even setting this to a
> >big value
> >(such as 45 seconds) just delays the disconnection.
> >
> >Putting some tracing into the code shows that wbufq_add() is repeatedly
> >called, but wbufq_run() is called for that connection far less
> >than I would
> >expect.  wbufq_run() is frequently called for other connections.
> >It looks
> >like wbufq_run() doesn't get called when lots of wbufq_add()s are
> >happening
> >for a connection?  wbufq_run() only appears to be called for a connection
> >after some time has passed from the last wbufq_add().

It's called when the kernel says it can write again on the respective
socket.
It might be that your consumer cannot read fast enough and so the
buffers fill on ser/kamailio side.

> >
> >The connection in question is a local loopback between the RLS and
> >Presence
> >modules (both running in the same Kamailio instance).  However, it
> >may just
> >be a coincidence that this is the affected connection as it is
> >also the one
> >with the most traffic.

You might do something much more resource intensive on the receive side
and it might not be able to keep up with the traffic (one connection is
handled by one process, so if that process is too slow for some reason
it might not read fast enough => on the transmit side the send buffers
will fill-up).

> >
> >My suspicion is that the bug is in the io_wait_loop_epoll() routine.

You could try changing the poll method and see if that makes any
difference, e.g.:
tcp_poll_method = sigio_rt in the .cfg file.
The default is epoll-lt, so try "epoll-et", "sigio_rt2 and maybe "poll"
(slow for lots of connections).


Andrei

> >
> >Can anybody with experience of this part of the code help?
> >
> >Paul Pankhurst
> >Engineering Director
> >Crocodile RCS Ltd
> >
> >
> >_______________________________________________
> >sr-dev mailing list
> >sr-dev at lists.sip-router.org
> >http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>
> -- 
> Daniel-Constantin Mierla -- http://www.asipto.com
> http://linkedin.com/in/miconda -- http://twitter.com/miconda
> 



More information about the sr-dev mailing list