[Serusers] Re: Fw: [Users] TM : retransmission timers

Mon Nov 27 17:35:29 CET 2006

Hi Klaus,

as the discussion about ser's improvements bounces again to openser, I 
had to do a bit of digging to provide a correct answer to openser's users.

yes, there were some improvements did by Andrei to TM, mainly in timer 
implementation. As you were wondering, the 0.9.6 SER should be 
relatively close to openser 0.9.4 as TM performance and merging the 
results from Vaclav with Andrei tests before the timer improvement in 
SER 0.9.6, seams to be correct. See:
    http://lists.iptel.org/pipermail/serdev/2005-December/006583.html

After this improvement, SER's 0.9.6 performances dramatically increased, 
but unfortunately, according to our tests it is also dramatically wrong. 
TM timers are not working correctly when variable timeouts are used in SER.

With the improvement, the following scenario gets broken - 3 calls only, 
no load, default cfg:

CALL 1: has 60 secs timeout for Final_response_timeout - nobody answers, 
still ringing
in less than 2 secs ->
CALL 2: has 70 secs timeout  - it is immediately answered.
in less than 2 secs ->
CALL 3: has 10 secs timeout for Final_response_timeout - nobody answers, 
ringing.

Of course, everybody expects that CALL3 will timeout before CALL1 (with 
more than 40 secs), but in SER 0.9.6 (latest stable for the last 2 
years), this will not happened - both CALL3 and CALL1 will timeout 
simultaneously when the CALL3 timer hits.

It is a simple test that anybody can easily reproduce.

A lot of people are saying that OpenSER is a less stable, but dynamic 
version of SER. Results say something else here.

"performance" should have no penalty over "stability", I would say.

This bug was not in the devel tree, but it is in the current SER  0.9.6 
stable version for ~ 1 year.

In this case I would say it is not relevant how many CPS you have if you 
cannot handle them correctly.

regards,
Bogdan

Klaus Darilion wrote:

> This leads to one question:
>
> Are there improvements to ser's stable branch since the fork, or is it 
> degradation in openser?
>
> regards
> klaus
>
> Vaclav Kubart wrote:
>
>> I'm sorry to nip in, but I tried to rerun the tests again and add more
>> info into output as requested and add stable ser and CVS openser.
>>
>> I know that this test doesn't conform much to real life (for example
>> generated callid/branch/tags differs only in a number, etc) but it can
>> give at least an image about simple stateful forward.
>>
>> So, if anybody is interested:
>>
>> http://www.iptel.org/~vku/performance/tm.serXopenser.correct/
>>
>> I tried the same once more with less iterations because there were some
>> errors in log from openser speaking about low memory (I used -m to
>> specify shared mem size but with 768M it still said errors, might be a
>> memleak or did I anything wrong?). With 1M iterations it was without
>> errors:
>>
>> http://www.iptel.org/~vku/performance/tm.serXopenser.1M/
>>
>>     Vaclav
>>
>> P.S. I have forgotten - SIPP was "Sipp v1.1, version 20060829, built Sep
>> 5 2006, 15:07:25", I'm attaching simple patch which I have used.
>>
>> On Wed, Nov 22, 2006 at 12:48:12AM +0200, Daniel-Constantin Mierla 
>> wrote:
>>
>>> I love such "independent" and "very very useful" tests ... one 
>>> selected the versions he liked, latest development of ser with 
>>> latest stable version of openser, the details about testing 
>>> scenarios are pretty limited. However these details are very very 
>>> insignificant, really.
>>>
>>> What matters is this particular case: what you tested is useless and 
>>> someone can better implement a tiny kernel module to perform same 
>>> job much faster that will make openser/ser trashed instantly if that 
>>> is their only usage. More important are the performances in real 
>>> world cases. I am not going to do comparison tests and reveal 
>>> numbers, I will let you do and hope make the results available.
>>>
>>> I will exemplify with just two common use cases:
>>> A) ITSP where usrloc is required - to get the throughput from your 
>>> tests one needs to have over million of online users. Let me know 
>>> how SER is doing with loading them, I can bet that it takes several 
>>> minutes to start (so service down for a significat time) and lot to 
>>> lookup a record afterwards, do not forget to mention required 
>>> memory. Then we will see if the forwarding throughput is the 
>>> bottleneck.
>>> B) carrier - heavy accounting needed - take the latest cvs snapshots 
>>> and test it, look at flexibility in same time and see if the balance 
>>> of throughput and features is satisfactory. Do not forget that 
>>> behind database should be redundant for a reliable accounting storage.
>>>
>>> My conclusion and the point I wanted to underline is that forwarding 
>>> is not the bottleneck by far and so far in real-world deployments -- 
>>> or at least nobody reported in openser mailing lists. Once it will 
>>> be, for sure there will be effort and focus to optimize it. I don't 
>>> even bother to check the scenarios, environment and test results you 
>>> had, because makes no sense today.
>>>
>>> It is more important to look at the results gave, for example, here 
>>> by an independent party:
>>> http://openser.org/pipermail/users/2006-November/007777.html
>>>
>>> With a real config and clustering system the performance of a box 
>>> was 300calls per second -- having at least 5 database accesses!!!. 
>>> If you need double you can add one more hardware, without extra 
>>> configuration overhead, just plug and play. And that is stable 
>>> version of OpenSER since July this year (btw, for those who keep 
>>> saying that OpenSER does not focus on stability, just check the CVS 
>>> and see the number of bugs encountered with this release, maybe you 
>>> can change your opinion), and you can have a safe environment 
>>> distributed geographically where each hardware can undertake the 
>>> traffic from the others on the fly. With single box crashing because 
>>> of different independent reasons (hardware failure, power outages 
>>> ...) you get no service ... with three boxes you can serve huge 
>>> number of active subscribers in peak hours and have failover 
>>> support, so service availability 100%. I am sure most of the people 
>>> look now how to build reliable platforms that scale very easy and 
>>> can be distributed around the world, with a bunch of useful features 
>>> -- simple first line replacement is not the business case for VoIP 
>>> anymore.
>>>
>>> We didn't try at OpenSER to get a airplane when we have to drive 
>>> city streets, we looked to get feature rich and reliable application 
>>> for its use cases. I would propose to have focus on making own 
>>> applications better than trying to show the other one is worse.
>>>
>>> Cheers,
>>> Daniel
>>>
>>> PS. You can use stateless forwarding to get even better results, the 
>>> usefulness will be the same.
>>>