Hi,
I was asked some interesting questions yesterday by a rather techy client related to Kamailio and RTPEngine performance.
Q1: For Kamailio we have say 8 child processes, which means there are 8 threads per listen address to process sip messages, i.e. for example 8 INVITE messages at any instance of time can be processed by Kamailio. If say, one INVITE takes 200ms to be processed and send out to destination, then we can estimate processing power of kamailio to be 40 INVITE requests per second per listen address, right?
Q2: If so, then assuming each INVITE requires to engage RTPEngine. How can we ensure that RTPEngine would be capable to handle these 40 INVITE requests? The only relevant parameter I see in RTPEngine daemon script is "NUM_THREADS" (default value 5). What does this represent? Does this analogs to children as in Kamailio?
Q3: How does RTPEngine manages SRTP <=> RTP translation (e.g. if one endpoint is WebRTC and other is traditional SIP)? My understanding is that RTPEngine has a kernel module (assuming kernel module is installed) which also manages this conversion besides forwarding the media packets. The Linux kernel already has encoders / decoders for nearly all encryption algorithms which are utilized by RTPEngine for doing the conversion in kernel space, right? if not then how it is done?
Q4: Continuing to Q2, If these NUM_THREADS process actual media packets (RTP or SRTP) then are these synchronous or asynchronous? I think these are asynchronous, just want to confirm (otherwise RTPEngine won't process more then e.g. 5 calls at a time). So, assuming asynchronous how many packets can be queue to each thread? This would help estimating RTPEngine throughput using various codecs in calls. (e.g. assuming G.711 codec we have 50pps, if each thread queue size is 1000 then each thread can process 1000/50=20 concurrent calls and whole RTPEngine would process 20x5=100 concurrent calls).
Thank you.
On 10/13/2015 05:53 PM, M S wrote:
Q2: If so, then assuming each INVITE requires to engage RTPEngine. How can we ensure that RTPEngine would be capable to handle these 40 INVITE requests? The only relevant parameter I see in RTPEngine daemon script is "NUM_THREADS" (default value 5). What does this represent? Does this analogs to children as in Kamailio?
Correct. Kamailio uses separate processes (and shared memory) for multi tasking, while rtpengine is a true multi threaded (single process) application. Each thread can handle one request/packet simultaneously. The default value is to launch one thread per CPU core present, but this can be overridden.
Q3: How does RTPEngine manages SRTP <=> RTP translation (e.g. if one endpoint is WebRTC and other is traditional SIP)? My understanding is that RTPEngine has a kernel module (assuming kernel module is installed) which also manages this conversion besides forwarding the media packets. The Linux kernel already has encoders / decoders for nearly all encryption algorithms which are utilized by RTPEngine for doing the conversion in kernel space, right? if not then how it is done?
Also correct. The userspace part uses OpenSSL, the kernel part uses the kernel's crypto API. In theory, this should mean that any crypto hardware present is automatically used, but I've never tested this.
Q4: Continuing to Q2, If these NUM_THREADS process actual media packets (RTP or SRTP) then are these synchronous or asynchronous? I think these are asynchronous, just want to confirm (otherwise RTPEngine won't process more then e.g. 5 calls at a time).
If you mean that the threads are multiplexing all open sockets, then yes, of course they do.
So, assuming asynchronous how many packets can be queue to each thread? This would help estimating RTPEngine throughput using various codecs in calls. (e.g. assuming G.711 codec we have 50pps, if each thread queue size is 1000 then each thread can process 1000/50=20 concurrent calls and whole RTPEngine would process 20x5=100 concurrent calls).
It doesn't quite work like that. The only existing queue is provided by the OS and it's only used if the incoming requests/packets can't be processed fast enough, which normally only happens when there isn't enough CPU time available and then you start having serious problems (packet jitter/latency/drops, dropped calls etc). Otherwise, during normal operation, the processing time for a single packet is much less then the pps for any given codec and so a lot more calls can be handled. The last number I've heard from a customer was 12k simultaneous calls without any noticeable performance impact (at which point they started running out of ports and couldn't increase the number any more).
Cheers
Thanks, this is quite useful info indeed.
Have a great weekend.
On Fri, Oct 16, 2015 at 4:55 PM, Richard Fuchs rfuchs@sipwise.com wrote:
On 10/13/2015 05:53 PM, M S wrote:
Q2:
If so, then assuming each INVITE requires to engage RTPEngine. How can we ensure that RTPEngine would be capable to handle these 40 INVITE requests? The only relevant parameter I see in RTPEngine daemon script is "NUM_THREADS" (default value 5). What does this represent? Does this analogs to children as in Kamailio?
Correct. Kamailio uses separate processes (and shared memory) for multi tasking, while rtpengine is a true multi threaded (single process) application. Each thread can handle one request/packet simultaneously. The default value is to launch one thread per CPU core present, but this can be overridden.
Q3:
How does RTPEngine manages SRTP <=> RTP translation (e.g. if one endpoint is WebRTC and other is traditional SIP)? My understanding is that RTPEngine has a kernel module (assuming kernel module is installed) which also manages this conversion besides forwarding the media packets. The Linux kernel already has encoders / decoders for nearly all encryption algorithms which are utilized by RTPEngine for doing the conversion in kernel space, right? if not then how it is done?
Also correct. The userspace part uses OpenSSL, the kernel part uses the kernel's crypto API. In theory, this should mean that any crypto hardware present is automatically used, but I've never tested this.
Q4:
Continuing to Q2, If these NUM_THREADS process actual media packets (RTP or SRTP) then are these synchronous or asynchronous? I think these are asynchronous, just want to confirm (otherwise RTPEngine won't process more then e.g. 5 calls at a time).
If you mean that the threads are multiplexing all open sockets, then yes, of course they do.
So, assuming asynchronous how many
packets can be queue to each thread? This would help estimating RTPEngine throughput using various codecs in calls. (e.g. assuming G.711 codec we have 50pps, if each thread queue size is 1000 then each thread can process 1000/50=20 concurrent calls and whole RTPEngine would process 20x5=100 concurrent calls).
It doesn't quite work like that. The only existing queue is provided by the OS and it's only used if the incoming requests/packets can't be processed fast enough, which normally only happens when there isn't enough CPU time available and then you start having serious problems (packet jitter/latency/drops, dropped calls etc). Otherwise, during normal operation, the processing time for a single packet is much less then the pps for any given codec and so a lot more calls can be handled. The last number I've heard from a customer was 12k simultaneous calls without any noticeable performance impact (at which point they started running out of ports and couldn't increase the number any more).
Cheers
sr-dev mailing list sr-dev@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
On 10/16/2015 10:55 AM, Richard Fuchs wrote:
The last number I've heard from a customer was 12k simultaneous calls without any noticeable performance impact (at which point they started running out of ports and couldn't increase the number any more).
Any idea what the hardware and number of CPU cores was here?
Hi Alex, this is the output of "lscpu" on that machine:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Stepping: 4 CPU MHz: 1205.566 BogoMIPS: 4999.92 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 15360K NUMA node0 CPU(s): 0-11
I'd also like to correct the number reported by Richard to 16382 simultaneous calls.
Regards, Paul
Am 18.10.2015 um 06:12 schrieb Alex Balashov:
On 10/16/2015 10:55 AM, Richard Fuchs wrote:
The last number I've heard from a customer was 12k simultaneous calls without any noticeable performance impact (at which point they started running out of ports and couldn't increase the number any more).
Any idea what the hardware and number of CPU cores was here?