Sorry this was originally posted incorrectly,
so I’m reposting....
I have been having problems with TCP under
load. What I have been seeing is
TCP buffers failing to be serviced
and, when wr_timeout exceeds the
configured value for tcp_send_timeout,
kamailio kills the connection.
Increasing tcp_send_timeout doesn't help,
even setting this to a big value
(such as 45 seconds) just delays the
disconnection.
Putting some tracing into the code shows that wbufq_add()
is repeatedly
called, but wbufq_run() is called for that connection far less
than I would
expect. wbufq_run() is frequently called for other
connections. It looks
like wbufq_run() doesn't get called when lots of
wbufq_add()s are happening
for a connection? wbufq_run() only appears
to be called for a connection
after some time has passed from the last
wbufq_add().
The connection in question is a local loopback between the
RLS and Presence
modules (both running in the same Kamailio instance).
However, it may just
be a coincidence that this is the affected connection
as it is also the one
with the most traffic.
My suspicion is that the
bug is in the io_wait_loop_epoll() routine.
Can anybody with experience
of this part of the code help?
Paul
Pankhurst
Engineering Director
Crocodile RCS
Ltd