Description

When running a load test Kamailio eventually becames unresponsive and stops processing calls.
Kamilio is configured to use the DMQ replication for dialog and usrloc. Also the dialog keepalived is enabled.

Troubleshooting

From investigation, the problem happens faster and easier when there is some network degradation causing packet loss and/or retransmissions but even without any noticeable network issue the freeze eventually happens.

Reproduction

Run a simple load test making calls at a rate of ~5 cps and keep around ~2000 calls connected all the time. A higher cps seems to make it easier to reproduce the problem.
Adding network degradation to the environment makes the problem happens, but when running a tool such as SIPp for the load test, the retransmission can be forced by simply killing the SIPp instance receiving calls which will then force Kamailio to retransmit.

Debugging Data

Output of kamct trap:
gdb_kamailio_20201028_213030.txt

Log Messages

Local generated requests shows up in the log, but are not sent in the network

Possible Solutions

Not found so far

Additional Information

# kamailio -v
version: kamailio 5.4.2 (x86_64/linux) c3b91f
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: c3b91f 
compiled on 13:50:37 Oct 27 2020 with gcc 4.8.5
# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)
# uname -a
Linux hostname 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.