Hi,

I have been doing some load testing with Kamailio and having some issues.

I have been trying with 100 CPS and at the beginning, everything is working well but after some time (it can be 10 minutes or for example 40 minutes, is just random but normally happens more in the first 10 minutes) Kamailio stop replying to all SIP messages but still processing HTTP requests, some commands, etc.

If I use  "kamcmd dlg.list" no output happens and after that, all kamcmd commands just keep loading with no output.

Kamailio is sharing Dialogs and htables using DMQ with other Kamailio that is the failover, if I shut down the failover one (so no DMQ replication), the test works well with the 100 CPS.

Also tried with modparam("dmq", "worker_usleep", 1000) but the behavior is the same, it will stop processing traffic.

This is some of configuration used:

# ----------------- setting module-specific parameters ---------------
modparam("dmq", "server_address", "sip:INTERNAL_INSTANCE_IP:5060")
modparam("dmq", "notification_address", "DMQ_NOTIFICATION_ADDRESS")

modparam("dmq", "multi_notify", 1)
modparam("dmq", "ping_interval", 10)
modparam("dmq", "num_workers",4)


modparam("htable", "enable_dmq", 1)
modparam("htable", "dmq_init_sync", 1)


# ----- dialog params -----
modparam("dialog", "dlg_flag", FLD_DLG)
modparam("dialog", "dlg_match_mode", 1)
modparam("dialog", "db_url", DBURL_RW)
modparam("dialog", "db_mode", 0)
modparam("dialog", "enable_dmq", 1)
modparam("dialog", "db_update_period", 10)
modparam("dialog", "h_id_start", H_ID_START)
modparam("dialog", "h_id_step", H_ID_STEP)

 Memory:

Shared: 4096
Private: 512

kamailio -v
version: kamailio 5.5.3 (x86_64/linux) 473cef
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 473cef
compiled on 09:34:57 Dec 20 2021 with gcc 10.2.1


In attach is the trap collected after the issue happens.
Any more logs or configurations that can help identify or solve the issue?

Thanks for the help,
Regards,
Tiago