IUaki Baz Castillo wrote:
Very interesting. Let me [ask] a question:
You are setting "=1" for all the RtpProxies nodes. This means that when you call "force_rtpproxy()" it will choose one of them randomly.
In case an initial INVITE-200 is handled by RtpProxy A, and later a re-INVITE arrives, how do you get "force_rtpproxy()" contacts the same RtpProxy A during re-INVITE transaction?
First off, the default isn't random. Having a higher weight increases the opportunity for calls to be assigned to a given rtpproxy, but the choice for a given call is not random. When you run multiple instances, you will quickly notice that some rtpproxy processes burn more CPU than others, which can be for a variety of reasons, including getting more calls than others because the distribution across rtpproxy instances isn't perfectly even even if you ask for it to be. You'll see why that happens in a moment.
Now, I take it you are asking what happens in the case of the INVITE coming in from the calling party, and later the 183 or 200 comes back from the called party, and for each of these a force_rtp_proxy() is performed. How do you get both directions of the call to go to the same rtpproxy instance, or if something else changes after the 200 OK, even though nathelper maintains no call state? You have three options.
One, run with only one rtpproxy, which seems to be the most common choice made. With no choices you have no worries, but you are very limited on how many calls you can handle.
Two, force_rtp_proxy() offers the primitive capability of using the "Nn" flag to literally hard-code a specific rtpproxy instance to go to without regard to any other factors. This is only vaguely useful because it means you have to force all the calls for a given call-condition (based on some criteria, calling source IP, account credentials, destination IP, last digit of the called number, etc) to get some hand-balanced distribution. If you have six tiny call sources and one huge one, it doesn't help you much, unless the last digit trick works for you.
The limitions implicit in having the script specify the proxy to use via the Nn flag, combined with what I consider to be the most stupid/horrible missing bit of functionality in SER (that of not being able to pass variables to functions/modules*), and you are greatly limited in what smart things you can do to scale SER+rtpproxy, if you do it via the ser.cfg file.
*I note OpenSER supposedly allows you to pass variables to functions (at least reading their lexical scanner seems to have added the rules to allow this, but I don't know if it actually works or not. I would certainly use it for a large number of other things if it was available.)
So choice One limits your ability to scale, and choice Two potentially means a horribly complex ser.cfg file. Fortunately, there is choice Three.
Three, SER quietly (and possibly inadvertantly) takes care of this issue by using the Call-ID as the variable value that is used to select the rtpproxy. The Call-ID string is ground-up in a hashing algorithm (see select_rtpp_node() in modules/nathelper/nathelper.c) and a value between 0 and N comes out of that, and that combined with the total number of weight possibilities selected at start time, skipping any disabled proxies (presumably because they became unresponsive in the past) dictates which rtpproxy that call will be sent to. So the distribution is not at all random nor does it do a traditional ascending/descending orderly assignment to rtpproxy instances like one might find in circuit assignments in TDM or CAS trunk groups, and what I thought the behavior would be when I first read what little documentation there was on how the weighting system worked. The even distribution of the calls across the instances of rtpproxy that have even weighting values still depends on how good the hashing algorithm is, and having a good mix of incoming Call-IDs. In practice, it will usually be somewhat off-balance, favoring one proxy over others at any given time. Just make sure you have enough CPU capacity so that any rtpproxy instance has room to a little hot. If you are using an OS that allows you to lock processes to specific CPUs, don't use that feature on rtpproxy, unless you are giving each rtpproxy 100% of its own CPU.
So, this means that each time force or unforce rtpproxy calls, this same hash gets performed on the same Call-ID for a given call, and except for rare cases where a proxy has failed, you will end up sending the force/unforce for a given Call-ID to the same rtpproxy instance every time. At least, this is how I read the source code.
I'll point out that if the initial selection of rtpproxy from the first force_rtp_proxy() of a given call session had simply been recorded as an integer somewhere with the other trivia that is maintained for the duration of a given call session, nathelper wouldn't have to burn cycles and time recomputing the hash as many as additional three times for the typical call (two more force_rtp_proxy() calls for 183 and 200 responses, then an unforce_rtp_proxy() to tear things down), but that's the limited behavior that exists in there today.