Hello everyone,
I'm doing some TLS performance testing on Kamailio 3.2.1. Here's my setup:
kamailio -V
version: kamailio 3.2.1 (x86_64/linux) 31c991
flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS,
USE_RAW_SOCKS, USE_STUN, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK,
SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX,
FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR,
USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 4MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: 31c991
compiled on 19:38:03 Dec 20 2011 with gcc 4.4.6
uname -a
Linux
null.null.com 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22
GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
CentOS 6.2 on a Dell PowerEdge R610 with 24 Intel X5650 Cores at
2.67GHz and 12GB of RAM (I could use more).
Kamailio is the default config with a few changes:
- WITH_TLS defined
- TLS is using self-generated CAs/certs (essentially openvpn easy-rsa)
with 1024 bit key size
- TLS is *not* configured to verify client OR server certs by default
- I'm using TLS v1 (SSL 3.1)
- TLS cipher suites are set to any (although my simulated UAs only
offer AES 256+SHA)
- Various changes to Kamailio children (up to 256 at times) and memory
sizes (up to 2048mb and even 4096mb at times)
- One DNS based alias added
- Maximum TCP connections increased to 65000
- Kamailio is configured to only listen on tested IP (UDP, TCP, TLS
sockets active)
- Syslog has been configured to log local0 (Kamailio) asynchronously
My test rig/call generator is an Ixia Xcellon-Ultra NP load module
with IXLoad. My call scenario does the following:
- Registers two simulated user agents (100000, 200000) to Kamailio with TLS
- Places call from 100000 to 200000 via Kamailio with TLS
- Increments both user agents by 1 and continues as quickly (cps) as I
like up to a channel limit (also configurable)
- The Ixia generates a valid SDP but no RTP is generated (although
that's certainly possible at these call levels)
Two 1 gig ports on the Ixia are connected to the Broadcom NICs on
the Dell R610 via a Cisco Catalyst 4948 switch. One port on the Ixia
emulates the 100000 agents (A leg) and the other emulates the 200000
agents (B leg). Of course I can provide more information if needed.
Here are some test numbers:
With TLS at 20cps, 120 sec calls, up to a total of 2470 calls (4940
registrations) life is good. Very good - call setup time averages
23ms, the cps rate holds indefinitely, and not a single call or
registration fails over long term tests.
UDP and TCP numbers are excellent (bordering ridiculous) - usually
around 500cps with practically no reasonable upper limit on
simultaneous calls. This doesn't need any further discussion :).
The TLS numbers start falling apart pretty quickly after 20cps,
however. If I change the TLS test to 40cps, 120 sec calls, up to a
total of 4940 calls (9,880 registrations) Kamailio starts to
(seriously) struggle. The rate starts fluctuating all over the place,
call setup time averages jump to 8000ms (or more) and things just
generally get ugly. Interestingly enough all of the user agents are
able to register, the logs look fine (to my eye at this log level) and
the system (CPU, network, etc) doesn't appear to be under stress at
all.
I have a few questions:
1) Is there something obviously wrong or stupid I'm doing here?
2) Why are the TLS tests so much worse than TCP and UDP? Am I
missing something here?
with tls the limit is usually the cpu or memory, due to
encryption/decryption, but you say they look ok.
What log level are you using in config?
Is the test tool keeping the tls connections open or they are closed and
have to be open for each call? Can you spot if the delay is on incoming
side or on outgoing? Wireshark can decrypt the traffic if you provide
the certificate and start the sniffing before starting tls connections.
Alternative, set:
modparam("tls", "cipher_list", "NULL")
and the traffic should be no longer encrypted, but this will not show if
the bottleneck is the encryption process.
Also, benchmark module can help to see if execution of config file takes
longer than usual.
Jan Janak did tls tests some time ago, the summary is part of:
I have some results from a test with 6000 SIP messages/sec over tls,
where CPU usage got about 60%. I guess something is becoming a
bottleneck in your case, very likely a blocking operation since cpu is
ok, just has to be discovered.
Cheers,
Daniel
--
Daniel-Constantin Mierla --