Just to add some info

netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
...
udp   25167616      0 <local_interface>:5060     0.0.0.0:*                           211759/kamailio
...

So I see a huge Receive Queue on UDP for Kamailio which is not clearing.

Le mar. 29 août 2023 à 14:29, Ihor Olkhovskyi <igorolhovskiy@gmail.com> a écrit :
Hello,

I've faced a bit strange issue, but a bit of preface. I have Kamailio as a proxy (TLS/WS <-> UDP) and second Kamailio as a presence server. At some point presence server accepts around 5K PUBLISH within 1 minute and sending around the same amount of NOTIFY to proxy Kamailio.

Proxy is "transforming" protocol to TLS, but at sime point I'm starting to get these type of errors

tm [../../core/forward.h:292]: msg_send_buffer(): tcp_send failed
tm [t_fwd.c:1588]: t_send_branch(): sending request on branch 0 failed
<script>: [RELAY] Relay to <sip:X.X.X.X:51571;transport=tls> failed!
tm [../../core/forward.h:292]: msg_send_buffer(): tcp_send failed
tm [t_fwd.c:1588]: t_send_branch(): sending request on branch 0 failed

Some of those messages are 100% valid as client can go away or so. Some are not, cause I'm sure client is alive and connected. 

But the problem comes later. At some moment proxy Kamailio just stops accept UDP traffic on this interface (where it also accepts all NOTIFY's), at the start of the "stopping accepting" Kamailio sends OPTIONS via DISPATCHER but not able to receive 200 OK.

Over TLS on the same interface all is ok. On other (loopback) interface UDP is being processed fine, so I don't suspert some limit on open files here.

Only restart of Kamailio proxy process helps in this case.

I've tuned net.core.rmem_max and net.core.rmem_default to 25 Mb, so in theory buffer should not be the case.

Is there some internal "interface buffer" in Kamailio that is not freed upon failure send or maybe I've missed somethig?

Kamailio 5.6.4

fork=yes
children=12
tcp_children=12

enable_tls=yes

tcp_accept_no_cl=yes
tcp_max_connections=63536
tls_max_connections=63536
tcp_accept_aliases=no
tcp_async=yes
tcp_connect_timeout=10
tcp_conn_wq_max=63536
tcp_crlf_ping=yes
tcp_delayed_ack=yes
tcp_fd_cache=yes
tcp_keepalive=yes
tcp_keepcnt=3
tcp_keepidle=30
tcp_keepintvl=10
tcp_linger2=30
tcp_rd_buf_size=80000
tcp_send_timeout=10
tcp_wq_blk_size=2100
tcp_wq_max=10485760
open_files_limit=63536

Sysctl

# To increase the amount of memory available for socket input/output queues
net.ipv4.tcp_rmem = 4096 25165824 25165824
net.core.rmem_max = 25165824
net.core.rmem_default = 25165824
net.ipv4.tcp_wmem = 4096 65536 25165824
net.core.wmem_max = 25165824
net.core.wmem_default = 65536
net.core.optmem_max = 25165824

# To limit the maximum number of requests queued to a listen socket
net.core.somaxconn = 128

# Tells TCP to instead make decisions that would prefer lower latency.
net.ipv4.tcp_low_latency=1

# Optional (it will increase performance)
net.core.netdev_max_backlog = 1000
net.ipv4.tcp_max_syn_backlog = 128

# Flush the routing table to make changes happen instantly.
net.ipv4.route.flush=1
--
Best regards,
Ihor (Igor)


--
Best regards,
Ihor (Igor)