[sr-dev] Kamailio vs RTPEngine Performance

Fri Oct 16 16:55:07 CEST 2015

On 10/13/2015 05:53 PM, M S wrote:

> Q2:
> If so, then assuming each INVITE requires to engage RTPEngine. How can
> we ensure that RTPEngine would be capable to handle these 40 INVITE
> requests? The only relevant parameter I see in RTPEngine daemon script
> is "NUM_THREADS" (default value 5). What does this represent? Does this
> analogs to children as in Kamailio?

Correct. Kamailio uses separate processes (and shared memory) for multi 
tasking, while rtpengine is a true multi threaded (single process) 
application. Each thread can handle one request/packet simultaneously. 
The default value is to launch one thread per CPU core present, but this 
can be overridden.

> Q3:
> How does RTPEngine manages SRTP <=> RTP translation (e.g. if one
> endpoint is WebRTC and other is traditional SIP)? My understanding is
> that RTPEngine has a kernel module (assuming kernel module is installed)
> which also manages this conversion besides forwarding the media packets.
> The Linux kernel already has encoders / decoders for nearly all
> encryption algorithms which are utilized by RTPEngine for doing the
> conversion in kernel space, right? if not then how it is done?

Also correct. The userspace part uses OpenSSL, the kernel part uses the 
kernel's crypto API. In theory, this should mean that any crypto 
hardware present is automatically used, but I've never tested this.

> Q4:
> Continuing to Q2, If these NUM_THREADS process actual media packets (RTP
> or SRTP) then are these synchronous or asynchronous? I think these are
> asynchronous, just want to confirm (otherwise RTPEngine won't process
> more then e.g. 5 calls at a time).

If you mean that the threads are multiplexing all open sockets, then 
yes, of course they do.

> So, assuming asynchronous how many
> packets can be queue to each thread? This would help estimating
> RTPEngine throughput using various codecs in calls. (e.g. assuming G.711
> codec we have 50pps, if each thread queue size is 1000 then each thread
> can process 1000/50=20 concurrent calls and whole RTPEngine would
> process 20x5=100 concurrent calls).

It doesn't quite work like that. The only existing queue is provided by 
the OS and it's only used if the incoming requests/packets can't be 
processed fast enough, which normally only happens when there isn't 
enough CPU time available and then you start having serious problems 
(packet jitter/latency/drops, dropped calls etc). Otherwise, during 
normal operation, the processing time for a single packet is much less 
then the pps for any given codec and so a lot more calls can be handled. 
The last number I've heard from a customer was 12k simultaneous calls 
without any noticeable performance impact (at which point they started 
running out of ports and couldn't increase the number any more).

Cheers