What is the call length of each call? How many NICs you have on that server? what networking switches are you using? all these variables have huge impacts on the call flow!

Let's say you make 100CPS and the call length is 10 minutes. 
At 10 seconds you will have 1.000 ongoing calls using g711 (64kbps). You will have around 64Mbps bandwidth being used. 
At 1 minute you get 6.000 ongoing calls and will use 390Mbps bandwidth!  
Near the 10th minute you will be using almost 4Gbps bandwidth and 60.000 ongoing calls!!

Maybe the issue is not within Kamailio and/or the media server, but within your network capacity ...

On my side, on a 4 vCPU and 4GB Ram, Kamailio 5.2 gets me around 400cps with calls up to 30seconds length, and RTPEngine goes OK too. I am sure I could do more, but I don't bother much, this is more than enough! 
But I have an enormous amount of bandwidth and 3 NICs on the server. NICs are the key . Remember that Kamailio only processes SIP messages. 
At 5 minutes, using the above call details, you've already sent 30.000 SIP INVITES minimum and many more SIP messages during that time. You need a very good network interface to do that kind of job! 
Add 30.000 channels / RTP ports to your media server and you get a whole lot of data going on in your network. I bet even the UTP5 cables won't handle the heat   =:o)

Also, check the RTP port range your media server was configured with. 
On Asterisk, usually ports 10.000 to 20.000 are set, which is about 10.000 calls. In the above example, you would stop receiving calls at 2minutes because Asterisk has all RTP ports busy/in use! 
I don't know about other media servers, but I would bet this is pretty much the same...

Hope this helps!


Sérgio Charrua


www.voip.pt
Tel.: +351 21 130 71 77

Email : sergio.charrua@voip.pt

This message and any files or documents attached are strictly confidential or otherwise legally protected. 

It is intended only for the individual or entity named. If you are not the named addressee or have received this email in error, please inform the sender immediately, delete it from your system and do not copy or disclose it or its contents or use it for any purpose. Please also note that transmission cannot be guaranteed to be secure or error-free.

 

 





On Tue, Dec 21, 2021 at 11:54 AM pwerspire <pwerspire@gmail.com> wrote:
Hi,

I have been doing some load testing with Kamailio and having some issues.

I have been trying with 100 CPS and at the beginning, everything is working well but after some time (it can be 10 minutes or for example 40 minutes, is just random but normally happens more in the first 10 minutes) Kamailio stop replying to all SIP messages but still processing HTTP requests, some commands, etc.

If I use  "kamcmd dlg.list" no output happens and after that, all kamcmd commands just keep loading with no output.

Kamailio is sharing Dialogs and htables using DMQ with other Kamailio that is the failover, if I shut down the failover one (so no DMQ replication), the test works well with the 100 CPS.

Also tried with modparam("dmq", "worker_usleep", 1000) but the behavior is the same, it will stop processing traffic.

This is some of configuration used:

# ----------------- setting module-specific parameters ---------------
modparam("dmq", "server_address", "sip:INTERNAL_INSTANCE_IP:5060")
modparam("dmq", "notification_address", "DMQ_NOTIFICATION_ADDRESS")

modparam("dmq", "multi_notify", 1)
modparam("dmq", "ping_interval", 10)
modparam("dmq", "num_workers",4)


modparam("htable", "enable_dmq", 1)
modparam("htable", "dmq_init_sync", 1)


# ----- dialog params -----
modparam("dialog", "dlg_flag", FLD_DLG)
modparam("dialog", "dlg_match_mode", 1)
modparam("dialog", "db_url", DBURL_RW)
modparam("dialog", "db_mode", 0)
modparam("dialog", "enable_dmq", 1)
modparam("dialog", "db_update_period", 10)
modparam("dialog", "h_id_start", H_ID_START)
modparam("dialog", "h_id_step", H_ID_STEP)

 Memory:

Shared: 4096
Private: 512

kamailio -v
version: kamailio 5.5.3 (x86_64/linux) 473cef
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 473cef
compiled on 09:34:57 Dec 20 2021 with gcc 10.2.1


In attach is the trap collected after the issue happens.
Any more logs or configurations that can help identify or solve the issue?

Thanks for the help,
Regards,
Tiago
__________________________________________________________
Kamailio - Users Mailing List - Non Commercial Discussions
  * sr-users@lists.kamailio.org
Important: keep the mailing list in the recipients, do not reply only to the sender!
Edit mailing list options or unsubscribe:
  * https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users