[Serusers] How rest of call is steered to same rtpproxy instance ( was Re: Running multiple instances of rtpproxy...)

Wed Apr 1 07:56:36 CEST 2009

IUaki Baz Castillo wrote:
 > Very interesting. Let me [ask] a question:
 >
 > You are setting "=1" for all the RtpProxies nodes. This means that
 > when you call "force_rtpproxy()" it will choose one of them randomly.
 >
 > In case an initial INVITE-200 is handled by RtpProxy A, and later a
 > re-INVITE arrives, how do you get "force_rtpproxy()" contacts the
 > same RtpProxy A during re-INVITE transaction?
 >

First off, the default isn't random.  Having a higher weight increases
the opportunity for calls to be assigned to a given rtpproxy, but the
choice for a given call is not random.  When you run multiple
instances, you will quickly notice that some rtpproxy processes
burn more CPU than others, which can be for a variety of reasons,
including getting more calls than others because the distribution 
across rtpproxy instances isn't perfectly even even if you ask for
it to be.  You'll see why that happens in a moment.

Now, I take it you are asking what happens in the case of the INVITE
coming in from the calling party, and later the 183 or 200 comes back
from the called party, and for each of these a force_rtp_proxy()
is performed.  How do you get both directions of the call to go to the
same rtpproxy instance, or if something else changes after the 200 OK,
even though nathelper maintains no call state?  You have three options.

One, run with only one rtpproxy, which seems to be the most common
choice made.  With no choices you have no worries,  but you are very
limited on how many calls you can handle.

Two, force_rtp_proxy() offers the primitive capability of using the "Nn"
flag to literally hard-code a specific rtpproxy instance to go to
without regard to any other factors.  This is only vaguely useful
because it means you have to force all the calls for a given
call-condition (based on some criteria, calling source IP, account
credentials, destination IP, last digit of the called number, etc)
to get some hand-balanced distribution.  If you have six tiny call
sources and one huge one, it doesn't help you much, unless the
last digit trick works for you.

The limitions implicit in having the script specify the proxy to
use via the Nn flag, combined with what I consider to be the most
stupid/horrible missing bit of functionality in SER (that of not
being able to pass variables to functions/modules*), and you
are greatly limited in what smart things you can do to scale
SER+rtpproxy, if you do it via the ser.cfg file.

*I note OpenSER supposedly allows you to pass variables to functions
 (at least reading their lexical scanner seems to have added the
 rules to allow this, but I don't know if it actually works or not.
 I would certainly use it for a large number of other things if
 it was available.)

So choice One limits your ability to scale, and choice Two potentially
means a horribly complex ser.cfg file.   Fortunately, there is
choice Three.

Three, SER quietly (and possibly inadvertantly) takes care of this
issue by using the Call-ID as the variable value that is used to select
the rtpproxy.  The Call-ID string is ground-up in a hashing algorithm
(see select_rtpp_node() in modules/nathelper/nathelper.c) and a value
between 0 and N comes out of that, and that combined with the total
number of weight possibilities selected at start time, skipping any
disabled proxies (presumably because they became unresponsive in the
past) dictates which rtpproxy that call will be sent to.  So the
distribution is not at all random nor does it do a traditional
ascending/descending orderly assignment to rtpproxy instances like
one might find in circuit assignments in TDM or CAS trunk groups,
and what I thought the behavior would be when I first read what
little documentation there was on how the weighting system worked.
The even distribution of the calls across the instances of rtpproxy
that have even weighting values still depends on how good the
hashing algorithm is, and having a good mix of incoming Call-IDs.
In practice, it will usually be somewhat off-balance, favoring one
proxy over others at any given time.  Just make sure you have
enough CPU capacity so that any rtpproxy instance has room to a
little hot.   If you are using an OS that allows you to lock
processes to specific CPUs, don't use that feature on rtpproxy,
unless you are giving each rtpproxy 100% of its own CPU.

So, this means that each time force or unforce rtpproxy calls,
this same hash gets performed on the same Call-ID for a given call,
and except for rare cases where a proxy has failed, you will end up
sending the force/unforce for a given Call-ID to the same rtpproxy
instance every time.  At least, this is how I read the source code.

I'll point out that if the initial selection of rtpproxy from
the first force_rtp_proxy() of a given call session had simply been
recorded as an integer somewhere with the other trivia that is
maintained for the duration of a given call session, nathelper
wouldn't have to burn cycles and time recomputing the hash as
many as additional three times for the typical call (two more
force_rtp_proxy() calls for 183 and 200 responses, then an
unforce_rtp_proxy() to tear things down), but that's the limited
behavior that exists in there today.