[sr-dev] Problem with TCP and EPOLL

14 Feb 2012


      I have been having problems with TCP under load.  What I have been seeing is 
TCP buffers failing to be serviced and, when wr_timeout exceeds the 
configured value for tcp_send_timeout, kamailio kills the connection. 
Increasing tcp_send_timeout doesn't help, even setting this to a big value 
(such as 45 seconds) just delays the disconnection.
Putting some tracing into the code shows that wbufq_add() is repeatedly 
called, but wbufq_run() is called for that connection far less than I would 
expect.  wbufq_run() is frequently called for other connections.  It looks 
like wbufq_run() doesn't get called when lots of wbufq_add()s are happening 
for a connection?  wbufq_run() only appears to be called for a connection 
after some time has passed from the last wbufq_add().
The connection in question is a local loopback between the RLS and Presence 
modules (both running in the same Kamailio instance).  However, it may just 
be a coincidence that this is the affected connection as it is also the one 
with the most traffic.
Configuring kamailio with the NO_EPOLL option enabled (and using the 
standard poll function?) seems to work much better.   My suspicion is that 
the bug is in the io_wait_loop_epoll() routine.
Can anybody with experience of this part of the code help?
Paul

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[sr-dev] Problem with TCP and EPOLL