Hi,

 

We have found the root cause for the problem that was reported (refer below mail for details) in async module.

 

Below is the brief description,

 

-          async_route(“Resume”, “1”)

-          At time t, the async records are stored at slot ‘t+1’ of async list.

-          Every second, async records from the async list  are processed.

-          In function ‘async_timer_exec’, slot = ticks % ASYNC_RING_SIZE; since slot is based on the ticks and if the previous invocation of ‘async_timer_exec’ did not finish in time, the next tick is missed. Subsequent call to ‘async_timer_exec’ will be with t+2. Here slot t+1 will be missed and the same will be processed only during next cycle. That means the async records will be stored for the next 100 seconds before they are actually processed. TM module will drop those timed out transactions and hence the error “t_continue: transaction not found”

 

Now we have modified the code in such a way that slot is incremented sequentially irrespective of the ticks that is sent to the function ‘async_timer_exec’ function. This way we do not see any call failures and all our load runs are successful.

 

If someone is interested we can share the code as well.

 

Regards,

Shankar

 

From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Thursday, January 23, 2014 12:18 PM
To: 'Jason Penton'; 'SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users Mailing List'
Subject: RE: [SR-Users] FW: Regd. t_suspend() and t_continue()

 

Hi,

 

From our repeated load tests what we can conclude irrespective of the number of simultaneous calls, there is always this error “t_continue: transaction not found” occurring.

 

If I run say 20 cps, then after running 5000 calls, we observe exactly 20 calls failing with the above error. We doubt that there is something happening during a particular point in time (for a second) which impacts the saving of those new transactions into shared memory.

 

For 10 cps run, we observe exactly 10 call failures. We repeated with different cps and found that error is exactly equal to the cps being run.

 

Any configuration we are missing. Anyone can help?

 

Regards,

Shankar

 

From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Tuesday, January 21, 2014 3:09 PM
To: 'Jason Penton'; 'SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users Mailing List'
Subject: FW: [SR-Users] FW: Regd. t_suspend() and t_continue()

 

Hi Jason,

 

Below is our config,

 

route[LOCATION] {

       if(is_method("INVITE"))

        {

                if(!route(FROMCSCF))

                {

                        setflag(FLT_ACC);

                        setflag(FLT_ACCFAILED);

                        dlg_manage();

                        dlg_setflag("4");

 

                        async_route("RESUME", "1");

 

                        exit;

                }

        }

}

 

route[RESUME]

{

        route(TO_LOCATION);            // here t_relay to REGISTRAR is done for user lookup.

        exit;

}

 

Regards,

Shankar

 

 

Date: Tue, 21 Jan 2014 11:14:21 +0200

From: Jason Penton <jason.penton@smilecoms.com>

To: "Kamailio (SER) - Users Mailing List"

      <sr-users@lists.sip-router.org>

Subject: Re: [SR-Users] FW: Regd. t_suspend() and t_continue()

Message-ID:

      <CAE=KcrghqJHgnGDxqS1fYvUzM=HqRAcKWfEAsNJjm8xUDCq68w@mail.gmail.com>

Content-Type: text/plain; charset="iso-8859-1"

 

We use it heavily, but not using the async module - we use it directly from the IMS code.

 

Can you please provide your config (or a relevant snippet) file so I can see what exactly you are testing/trying to do

 

Cheers

jason

 

From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Tuesday, January 21, 2014 2:25 PM
To: 'SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users Mailing List'
Subject: RE: [SR-Users] FW: Regd. t_suspend() and t_continue()

 

Hi,

 

Anyone who had used t_suspend() and t_continue() can share the performance details?

 

I tried async module with one sec sleep time. I tried only 5 calls per second but still it was not successful. After sometime I see below logs,

 

Jan 21 13:51:55 PLT-RA-RD-W167A PCscf[16520]: ERROR: tm [t_suspend.c:128]: t_continue(): ERROR: t_continue: transaction not found

Jan 21 13:52:49 PLT-RA-RD-W167A last message repeated 15 times

Jan 21 13:59:38 PLT-RA-RD-W167A last message repeated 12 times

Jan 21 14:13:03 PLT-RA-RD-W167A last message repeated 5 times

 

Any configuration changes can help here?

 

Regards,

Shankar

 

From: Shankar [mailto:shankar.rk@plintron.com]
Sent: Wednesday, January 15, 2014 1:26 PM
To: 'Jason Penton'
Cc: 'SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users Mailing List'
Subject: RE: [SR-Users] FW: Regd. t_suspend() and t_continue()

 

Hi Jason,

 

I am using 4.0.2

 

Regards,

Shankar

 

From: Jason Penton [mailto:jason.penton@smilecoms.com]
Sent: Wednesday, January 15, 2014 1:21 PM
To: Shankar
Cc: SIP Router - Kamailio (OpenSER) and SIP Express Router (SER) - Users Mailing List
Subject: Re: [SR-Users] FW: Regd. t_suspend() and t_continue()

 

Hi Shankar,

 

What version of Kamailio are you running? Kamailio -V

 

Cheers

Jason

 

On Wed, Jan 15, 2014 at 6:58 AM, Shankar <shankar.rk@plintron.com> wrote:

Hi Jason,

 

Please find below my response inline,

 

 

I have some questions for you as we have used suspend/continue quite a lot in the IMS code and don't have any leaks.

 

Firstly, why are you using pkg_mem for your hash_id and label? Remember that you will be in 2 different processes in the suspend and continue portions of the code... so pkg_mem will not work - you should use shm_mem instead.

 

[Shankar] We use pkg_mem because we are invoking t_continue from the same process ( using thread ).

 

Secondly, how are you using top to tell that you have a leak? Kamailio's memory is internally managed.

 

[Shankar] After running for say 20minutes, we get out of shared memory error. Also in top output we observed incremental increase in the shared usage of shared memory for the process.

 

Cheers

Jason

 

 

On Mon, Jan 13, 2014 at 1:29 PM, Shankar <shankar.rk@plintron.com> wrote:

 

> Re-sending without the attachment.

> *From:* Shankar [mailto:shankar.rk@plintron.com]

> *Sent:* Monday, January 13, 2014 4:57 PM

> *To:* 'sr-users@lists.sip-router.org'

> *Subject:* Regd. t_suspend() and t_continue()

> Hi,

> We are trying out the t_suspend() and t_continue() in our test setup.

> We are facing memory leak ( both shm and pkg as per top command results).

> Please find below the scenario,

> 1)      Do a t_newtran()

> 2)      Allocate pkg memory for hashid and label.

> 3)      Call t_suspend()

> 4)      Do t_continue() when async result is available

> 5)      De-allocate pkg memory reserved for hashid and label

> 6)      Do a t_relay() which forwards the sip message to another sip node.

> In the  step (6) above, we see t_newtran() allocates one more time

> shared memory for the same transaction.

> We tried t_release() after step (4) to release the transaction as

> t_relay() anyways allocates new shared memory. Nothing helped.

> Please let me know what are the logs you would require to debug the same.

> I am attaching syslog for this run.

> Regards,

> Shankar

 



 

--

 

Jason Penton

Senior Manager: Applications and Services

Smile Communications Pty (Ltd)

Mobile:

+27 (0) 83 283 7000

Skype:

jason.barry.penton

jason.penton@smilecoms.com

www.smilecoms.com

 

This email is subject to the disclaimer of Smile Communications at http://www.smilecoms.com/disclaimer