[sr-dev] Problem with TCP and EPOLL

Daniel-Constantin Mierla miconda at gmail.com
Fri Feb 17 14:10:18 CET 2012


Hello,

On 2/17/12 11:35 AM, Paul Pankhurst wrote:
> I now understand what is going wrong....
>
> To make the xcap server work with the size of documents generated by 
> the SIP client, I had to significantly increase the size of 
> tcp_rd_buf_size.
> Increasing this value is what causes the problem described.
> Returning tcp_rd_buf_size to it's default size resolves the problem, 
> but causes the upload of documents to the xcap server  to fail.
>
> One way of solving this would be to allow the buffer size to be 
> settable on a per connection basis, or perhaps separately for local 
> connections.
> Does anyone have any thoughts, or other suggestions?
perhaps the size of the buffer has to stay big in order to be able to 
receive mixed sip-xcap traffic. However, detection whether it is sip or 
http is done in tcp read code, so maybe the solution is to have a limit 
for read size and set it lower for sip, larger for http. I haven't 
checked the source code to see if it is possible, though.

If not, the only way I see now is to use different listen sockets (e.g., 
ports) and based on that, set the read buffer size, maybe similar to the 
new option I added for worker processes per socket.

Cheers,
Daniel

>
> Thanks
>
> Paul
>
> -----Original Message----- From: Andrei Pelinescu-Onciul
> Sent: Thursday, February 16, 2012 1:53 PM
> To: Daniel-Constantin Mierla
> Cc: Development mailing list of the sip-router project ; Paul Pankhurst
> Subject: Re: [sr-dev] Problem with TCP and EPOLL
>
> On Feb 15, 2012 at 12:05, Daniel-Constantin Mierla <miconda at gmail.com> 
> wrote:
>> Hello,
>>
>> I am cc-ing Andrei, since he authored that part, maybe he is
>> available these days and can give a quick answer regarding the
>> issue.
>>
>> Cheers,
>> Daniel
>>
>> On 2/14/12 6:06 PM, Paul Pankhurst wrote:
>> >Sorry this was originally posted incorrectly, so I'm reposting....
>> >I have been having problems with TCP under load.  What I have been
>> >seeing is
>> >TCP buffers failing to be serviced and, when wr_timeout exceeds the
>> >configured value for tcp_send_timeout, kamailio kills the connection.
>> >Increasing tcp_send_timeout doesn't help, even setting this to a
>> >big value
>> >(such as 45 seconds) just delays the disconnection.
>> >
>> >Putting some tracing into the code shows that wbufq_add() is repeatedly
>> >called, but wbufq_run() is called for that connection far less
>> >than I would
>> >expect.  wbufq_run() is frequently called for other connections.
>> >It looks
>> >like wbufq_run() doesn't get called when lots of wbufq_add()s are
>> >happening
>> >for a connection?  wbufq_run() only appears to be called for a 
>> connection
>> >after some time has passed from the last wbufq_add().
>
> It's called when the kernel says it can write again on the respective
> socket.
> It might be that your consumer cannot read fast enough and so the
> buffers fill on ser/kamailio side.
>
>> >
>> >The connection in question is a local loopback between the RLS and
>> >Presence
>> >modules (both running in the same Kamailio instance).  However, it
>> >may just
>> >be a coincidence that this is the affected connection as it is
>> >also the one
>> >with the most traffic.
>
> You might do something much more resource intensive on the receive side
> and it might not be able to keep up with the traffic (one connection is
> handled by one process, so if that process is too slow for some reason
> it might not read fast enough => on the transmit side the send buffers
> will fill-up).
>
>> >
>> >My suspicion is that the bug is in the io_wait_loop_epoll() routine.
>
> You could try changing the poll method and see if that makes any
> difference, e.g.:
> tcp_poll_method = sigio_rt in the .cfg file.
> The default is epoll-lt, so try "epoll-et", "sigio_rt2 and maybe "poll"
> (slow for lots of connections).
>
>
> Andrei
>
>> >
>> >Can anybody with experience of this part of the code help?
>> >
>> >Paul Pankhurst
>> >Engineering Director
>> >Crocodile RCS Ltd
>> >
>> >
>> >_______________________________________________
>> >sr-dev mailing list
>> >sr-dev at lists.sip-router.org
>> >http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>> -- 
>> Daniel-Constantin Mierla -- http://www.asipto.com
>> http://linkedin.com/in/miconda -- http://twitter.com/miconda
>>
>
> _______________________________________________
> sr-dev mailing list
> sr-dev at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev

-- 
Daniel-Constantin Mierla -- http://www.asipto.com
http://linkedin.com/in/miconda -- http://twitter.com/miconda




More information about the sr-dev mailing list