Hello everyone,
I'm doing some TLS performance testing on Kamailio 3.2.1. Here's my setup:
kamailio -V version: kamailio 3.2.1 (x86_64/linux) 31c991 flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, USE_STUN, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 4MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 31c991 compiled on 19:38:03 Dec 20 2011 with gcc 4.4.6
uname -a Linux null.null.com 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
CentOS 6.2 on a Dell PowerEdge R610 with 24 Intel X5650 Cores at 2.67GHz and 12GB of RAM (I could use more).
Kamailio is the default config with a few changes:
- WITH_TLS defined - TLS is using self-generated CAs/certs (essentially openvpn easy-rsa) with 1024 bit key size - TLS is *not* configured to verify client OR server certs by default - I'm using TLS v1 (SSL 3.1) - TLS cipher suites are set to any (although my simulated UAs only offer AES 256+SHA) - Various changes to Kamailio children (up to 256 at times) and memory sizes (up to 2048mb and even 4096mb at times) - One DNS based alias added - Maximum TCP connections increased to 65000 - Kamailio is configured to only listen on tested IP (UDP, TCP, TLS sockets active) - Syslog has been configured to log local0 (Kamailio) asynchronously
My test rig/call generator is an Ixia Xcellon-Ultra NP load module with IXLoad. My call scenario does the following:
- Registers two simulated user agents (100000, 200000) to Kamailio with TLS - Places call from 100000 to 200000 via Kamailio with TLS - Increments both user agents by 1 and continues as quickly (cps) as I like up to a channel limit (also configurable) - The Ixia generates a valid SDP but no RTP is generated (although that's certainly possible at these call levels)
Two 1 gig ports on the Ixia are connected to the Broadcom NICs on the Dell R610 via a Cisco Catalyst 4948 switch. One port on the Ixia emulates the 100000 agents (A leg) and the other emulates the 200000 agents (B leg). Of course I can provide more information if needed.
Here are some test numbers:
With TLS at 20cps, 120 sec calls, up to a total of 2470 calls (4940 registrations) life is good. Very good - call setup time averages 23ms, the cps rate holds indefinitely, and not a single call or registration fails over long term tests.
UDP and TCP numbers are excellent (bordering ridiculous) - usually around 500cps with practically no reasonable upper limit on simultaneous calls. This doesn't need any further discussion :).
The TLS numbers start falling apart pretty quickly after 20cps, however. If I change the TLS test to 40cps, 120 sec calls, up to a total of 4940 calls (9,880 registrations) Kamailio starts to (seriously) struggle. The rate starts fluctuating all over the place, call setup time averages jump to 8000ms (or more) and things just generally get ugly. Interestingly enough all of the user agents are able to register, the logs look fine (to my eye at this log level) and the system (CPU, network, etc) doesn't appear to be under stress at all.
I have a few questions:
1) Is there something obviously wrong or stupid I'm doing here? 2) Why are the TLS tests so much worse than TCP and UDP? Am I missing something here?
Thanks (in advance) for any advice anyone might be able to offer!
Hello,
On 12/20/11 10:27 PM, Kristian Kielhofner wrote:
Hello everyone,
I'm doing some TLS performance testing on Kamailio 3.2.1. Here's my setup:
kamailio -V version: kamailio 3.2.1 (x86_64/linux) 31c991 flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, USE_STUN, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 4MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 31c991 compiled on 19:38:03 Dec 20 2011 with gcc 4.4.6
uname -a Linux null.null.com 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
CentOS 6.2 on a Dell PowerEdge R610 with 24 Intel X5650 Cores at 2.67GHz and 12GB of RAM (I could use more).
Kamailio is the default config with a few changes:
- WITH_TLS defined
- TLS is using self-generated CAs/certs (essentially openvpn easy-rsa)
with 1024 bit key size
- TLS is *not* configured to verify client OR server certs by default
- I'm using TLS v1 (SSL 3.1)
- TLS cipher suites are set to any (although my simulated UAs only
offer AES 256+SHA)
- Various changes to Kamailio children (up to 256 at times) and memory
sizes (up to 2048mb and even 4096mb at times)
- One DNS based alias added
- Maximum TCP connections increased to 65000
- Kamailio is configured to only listen on tested IP (UDP, TCP, TLS
sockets active)
Syslog has been configured to log local0 (Kamailio) asynchronously
My test rig/call generator is an Ixia Xcellon-Ultra NP load module
with IXLoad. My call scenario does the following:
- Registers two simulated user agents (100000, 200000) to Kamailio with TLS
- Places call from 100000 to 200000 via Kamailio with TLS
- Increments both user agents by 1 and continues as quickly (cps) as I
like up to a channel limit (also configurable)
- The Ixia generates a valid SDP but no RTP is generated (although
that's certainly possible at these call levels)
Two 1 gig ports on the Ixia are connected to the Broadcom NICs on the Dell R610 via a Cisco Catalyst 4948 switch. One port on the Ixia emulates the 100000 agents (A leg) and the other emulates the 200000 agents (B leg). Of course I can provide more information if needed.
Here are some test numbers:
With TLS at 20cps, 120 sec calls, up to a total of 2470 calls (4940 registrations) life is good. Very good - call setup time averages 23ms, the cps rate holds indefinitely, and not a single call or registration fails over long term tests.
UDP and TCP numbers are excellent (bordering ridiculous) - usually around 500cps with practically no reasonable upper limit on simultaneous calls. This doesn't need any further discussion :).
The TLS numbers start falling apart pretty quickly after 20cps, however. If I change the TLS test to 40cps, 120 sec calls, up to a total of 4940 calls (9,880 registrations) Kamailio starts to (seriously) struggle. The rate starts fluctuating all over the place, call setup time averages jump to 8000ms (or more) and things just generally get ugly. Interestingly enough all of the user agents are able to register, the logs look fine (to my eye at this log level) and the system (CPU, network, etc) doesn't appear to be under stress at all.
I have a few questions:
- Is there something obviously wrong or stupid I'm doing here?
- Why are the TLS tests so much worse than TCP and UDP? Am I
missing something here?
with tls the limit is usually the cpu or memory, due to encryption/decryption, but you say they look ok.
What log level are you using in config?
Is the test tool keeping the tls connections open or they are closed and have to be open for each call? Can you spot if the delay is on incoming side or on outgoing? Wireshark can decrypt the traffic if you provide the certificate and start the sniffing before starting tls connections. Alternative, set:
modparam("tls", "cipher_list", "NULL")
and the traffic should be no longer encrypted, but this will not show if the bottleneck is the encryption process.
Also, benchmark module can help to see if execution of config file takes longer than usual.
Jan Janak did tls tests some time ago, the summary is part of:
http://sip-router.org/wiki/performance/v3.0-capacity
I have some results from a test with 6000 SIP messages/sec over tls, where CPU usage got about 60%. I guess something is becoming a bottleneck in your case, very likely a blocking operation since cpu is ok, just has to be discovered.
Cheers, Daniel
On Tue, Dec 20, 2011 at 5:20 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
with tls the limit is usually the cpu or memory, due to encryption/decryption, but you say they look ok.
What log level are you using in config?
Is the test tool keeping the tls connections open or they are closed and have to be open for each call? Can you spot if the delay is on incoming side or on outgoing? Wireshark can decrypt the traffic if you provide the certificate and start the sniffing before starting tls connections. Alternative, set:
modparam("tls", "cipher_list", "NULL")
and the traffic should be no longer encrypted, but this will not show if the bottleneck is the encryption process.
Also, benchmark module can help to see if execution of config file takes longer than usual.
Jan Janak did tls tests some time ago, the summary is part of:
http://sip-router.org/wiki/performance/v3.0-capacity
I have some results from a test with 6000 SIP messages/sec over tls, where CPU usage got about 60%. I guess something is becoming a bottleneck in your case, very likely a blocking operation since cpu is ok, just has to be discovered.
Cheers, Daniel
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Daniel,
Thank you for getting back to me so quickly.
The log level is the default (debug=2). I tried setting cipher_list to NULL on client and server and now Kamailio has crashed (producing core file) during the last seven tests. It seems I can get to about 1000 registered users before this happens. How should I provide a backtrace?
Once I get past this issue I will try your other suggestions.
Thanks again!
On 12/20/11 11:55 PM, Kristian Kielhofner wrote:
On Tue, Dec 20, 2011 at 5:20 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
with tls the limit is usually the cpu or memory, due to encryption/decryption, but you say they look ok.
What log level are you using in config?
Is the test tool keeping the tls connections open or they are closed and have to be open for each call? Can you spot if the delay is on incoming side or on outgoing? Wireshark can decrypt the traffic if you provide the certificate and start the sniffing before starting tls connections. Alternative, set:
modparam("tls", "cipher_list", "NULL")
and the traffic should be no longer encrypted, but this will not show if the bottleneck is the encryption process.
Also, benchmark module can help to see if execution of config file takes longer than usual.
Jan Janak did tls tests some time ago, the summary is part of:
http://sip-router.org/wiki/performance/v3.0-capacity
I have some results from a test with 6000 SIP messages/sec over tls, where CPU usage got about 60%. I guess something is becoming a bottleneck in your case, very likely a blocking operation since cpu is ok, just has to be discovered.
Cheers, Daniel
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Daniel,
Thank you for getting back to me so quickly.
The log level is the default (debug=2). I tried setting cipher_list to NULL on client and server and now Kamailio has crashed (producing core file) during the last seven tests. It seems I can get to about 1000 registered users before this happens. How should I provide a backtrace?
Once I get past this issue I will try your other suggestions.
interesting, let's see who is fault of that -- backtrace:
gdb /path/to/kamailio /path/to/corefile
- for short backtrace:
bt
- for full backtrace:
bt full
Provide both of them.
Cheers, Daniel
On Tue, Dec 20, 2011 at 6:08 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
interesting, let's see who is fault of that -- backtrace:
gdb /path/to/kamailio /path/to/corefile
- for short backtrace:
bt
- for full backtrace:
bt full
Provide both of them.
Cheers, Daniel -- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hi Daniel,
I've collected all of them and attached them to this message.
Let me know if there is anything else I can do.
Thanks!
Hello,
can you send also the last log messages (especially the ones before the crash, printed by the same pid that wrote the core, there should be a 'BUG...' message)?
Looks like a double free in the libssl, what is your libssl version?
Cheers, Daniel
On 12/21/11 2:58 AM, Kristian Kielhofner wrote:
On Tue, Dec 20, 2011 at 6:08 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
interesting, let's see who is fault of that -- backtrace:
gdb /path/to/kamailio /path/to/corefile
- for short backtrace:
bt
- for full backtrace:
bt full
Provide both of them.
Cheers, Daniel -- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hi Daniel,
I've collected all of them and attached them to this message.
Let me know if there is anything else I can do.
Thanks!
On Wed, Dec 21, 2011 at 4:36 AM, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
can you send also the last log messages (especially the ones before the crash, printed by the same pid that wrote the core, there should be a 'BUG...' message)?
Looks like a double free in the libssl, what is your libssl version?
Cheers, Daniel
Hi Daniel,
I don't see relevant lines in the log (surprisingly). The strange part is there is always at least one kamailio process running when this happens (or maybe not so strange).
This is an up to date CentOS 6 with openssl-1.0.0-20.el6.x86_64 installed. I do get this warning on startup (same on CentOS 5.x):
WARNING: tls [tls_init.c:587]: tls: openssl bug #1491 (crash/mem leaks on low memory) workaround enabled (on low memory tls operations will fail preemptively) with free memory thresholds 271581184 and 135790592 bytes
Hello,
On 12/21/11 8:26 PM, Kristian Kielhofner wrote:
On Wed, Dec 21, 2011 at 4:36 AM, Daniel-Constantin Mierla miconda@gmail.com wrote:
Hello,
can you send also the last log messages (especially the ones before the crash, printed by the same pid that wrote the core, there should be a 'BUG...' message)?
Looks like a double free in the libssl, what is your libssl version?
Cheers, Daniel
Hi Daniel,
I don't see relevant lines in the log (surprisingly). The strange part is there is always at least one kamailio process running when this happens (or maybe not so strange).
This is an up to date CentOS 6 with openssl-1.0.0-20.el6.x86_64 installed. I do get this warning on startup (same on CentOS 5.x):
WARNING: tls [tls_init.c:587]: tls: openssl bug #1491 (crash/mem leaks on low memory) workaround enabled (on low memory tls operations will fail preemptively) with free memory thresholds 271581184 and 135790592 bytes
it has to be at least the message printed by:
LOG(L_CRIT, "BUG: qm_free: freeing already freed pointer," " first free: %s: %s(%ld) - aborting\n", f->file, f->func, f->line);
Can you grep for it in logs?
The abort in frame #1 of bracktrace is triggered on purpose, at mem/q_malloc.c line 458 Cheers, Daniel
On Wed, Dec 21, 2011 at 2:56 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
it has to be at least the message printed by:
LOG(L_CRIT, "BUG: qm_free: freeing already freed pointer," " first free: %s: %s(%ld) - aborting\n", f->file, f->func, f->line);
Can you grep for it in logs?
The abort in frame #1 of bracktrace is triggered on purpose, at mem/q_malloc.c line 458
Cheers, Daniel
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hi Daniel,
I'm very sorry for missing this the first time around. There are over 3GB of logs on this machine already (it's only been up for one day).
I've attached the output of grep -i bug kamailio.log | gzip -c > kamailio-bug.log.gz
Thanks again!
Hi,
I'm seeing similar issue in 3.1.0.. How did we fix it?
The back trace from my test is,
#2 0x0819c66c in qm_free (qm=0x7861c000, p=0x798fa2b0, file=0xb6f0b140 "tls: tls_init.c", func=0xb6f0bce0 "ser_free", line=296) at mem/q_malloc.c:462
*#3 0xb6eec404 in ser_free (ptr=0x798fa2b0) at tls_init.c:296*
*#4 0xb72339ba in CRYPTO_free () from /lib/libcrypto.so.1.0.0*
#5 0xb72a7f64 in ?? () from /lib/libcrypto.so.1.0.0
#6 0xb72a813e in ERR_get_error () from /lib/libcrypto.so.1.0.0
#7 0xb6f0483e in tls_read_f (c=0x797fedb4, flags=0xbfe236d4) at tls_server.c:1187
#8 0x08171e6e in tcp_read_headers (c=0x797fedb4, read_flags=0xbfe236d4) at tcp_read.c:406
I appreciate if we could point the reason the for the double free.
Thanks Jijo
On Wed, Dec 21, 2011 at 3:13 PM, Kristian Kielhofner kris@kriskinc.comwrote:
On Wed, Dec 21, 2011 at 2:56 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
it has to be at least the message printed by:
LOG(L_CRIT, "BUG: qm_free: freeing already freed pointer," " first free: %s: %s(%ld) - aborting\n", f->file, f->func, f->line);
Can you grep for it in logs?
The abort in frame #1 of bracktrace is triggered on purpose, at mem/q_malloc.c line 458
Cheers, Daniel
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hi Daniel,
I'm very sorry for missing this the first time around. There are over 3GB of logs on this machine already (it's only been up for one day).
I've attached the output of grep -i bug kamailio.log | gzip -c > kamailio-bug.log.gz
Thanks again!
-- Kristian Kielhofner
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Hi Daniel,
Please let me know if you are aware about any fix on this issue.
Thanks Jijo
On Tue, May 15, 2012 at 6:55 PM, Jijo realjijo@gmail.com wrote:
Hi,
I'm seeing similar issue in 3.1.0.. How did we fix it?
The back trace from my test is,
#2 0x0819c66c in qm_free (qm=0x7861c000, p=0x798fa2b0, file=0xb6f0b140 "tls: tls_init.c", func=0xb6f0bce0 "ser_free", line=296) at mem/q_malloc.c:462
*#3 0xb6eec404 in ser_free (ptr=0x798fa2b0) at tls_init.c:296*
*#4 0xb72339ba in CRYPTO_free () from /lib/libcrypto.so.1.0.0*
#5 0xb72a7f64 in ?? () from /lib/libcrypto.so.1.0.0
#6 0xb72a813e in ERR_get_error () from /lib/libcrypto.so.1.0.0
#7 0xb6f0483e in tls_read_f (c=0x797fedb4, flags=0xbfe236d4) at tls_server.c:1187
#8 0x08171e6e in tcp_read_headers (c=0x797fedb4, read_flags=0xbfe236d4) at tcp_read.c:406
I appreciate if we could point the reason the for the double free.
Thanks Jijo
On Wed, Dec 21, 2011 at 3:13 PM, Kristian Kielhofner kris@kriskinc.comwrote:
On Wed, Dec 21, 2011 at 2:56 PM, Daniel-Constantin Mierla miconda@gmail.com wrote:
it has to be at least the message printed by:
LOG(L_CRIT, "BUG: qm_free: freeing already freed pointer," " first free: %s: %s(%ld) - aborting\n", f->file, f->func, f->line);
Can you grep for it in logs?
The abort in frame #1 of bracktrace is triggered on purpose, at mem/q_malloc.c line 458
Cheers, Daniel
-- Daniel-Constantin Mierla -- http://www.asipto.com http://linkedin.com/in/miconda -- http://twitter.com/miconda
Hi Daniel,
I'm very sorry for missing this the first time around. There are over 3GB of logs on this machine already (it's only been up for one day).
I've attached the output of grep -i bug kamailio.log | gzip -c > kamailio-bug.log.gz
Thanks again!
-- Kristian Kielhofner
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users