[Devel] Re: [Serusers] Re: Fw: [Users] TM : retransmission timers

Andrei Pelinescu-Onciul andrei at iptel.org
Tue Nov 28 02:03:01 CET 2006


On Nov 27, 2006 at 23:14, Andrei Pelinescu-Onciul <andrei at iptel.org> wrote:
> On Nov 27, 2006 at 19:02, Bogdan-Andrei Iancu <bogdan at voice-system.ro> wrote:
> > small typo below...
> > 
> > Bogdan-Andrei Iancu wrote:
> > 
> > >Hi Klaus,
> > >
> > >as the discussion about ser's improvements bounces again to openser, I 
> > >had to do a bit of digging to provide a correct answer to openser's 
> > >users.
> > >
> > >yes, there were some improvements did by Andrei to TM, mainly in timer 
> > >implementation. As you were wondering, the 0.9.6 SER should be 
> > >relatively close to openser 0.9.4 as TM performance and merging the 
> > >results from Vaclav with Andrei tests before the timer improvement in 
> > >SER 0.9.6, seams to be correct. See:
> > >   http://lists.iptel.org/pipermail/serdev/2005-December/006583.html
> > >
> > >After this improvement, SER's 0.9.6 performances dramatically 
> > >increased, but unfortunately, according to our tests it is also 
> > >dramatically wrong. TM timers are not working correctly when variable 
> > >timeouts are used in SER.
> > >
> > >With the improvement, the following scenario gets broken - 3 calls 
> > >only, no load, default cfg:
> > >
> > >CALL 1: has 60 secs timeout for Final_response_timeout - nobody 
> > >answers, still ringing
> > >in less than 2 secs ->
> > >CALL 2: has 70 secs timeout  - it is immediately answered.
> > >in less than 2 secs ->
> > >CALL 3: has 10 secs timeout for Final_response_timeout - nobody 
> > >answers, ringing.
> > >
> > >Of course, everybody expects that CALL3 will timeout before CALL1 
> > >(with more than 40 secs), but in SER 0.9.6 (latest stable for the last 
> > >2 years), this will not happened - both CALL3 and CALL1 will timeout 
> > >simultaneously when the CALL3 timer hits.
> 
> 
> Thanks a lot for the bug report.
> 

I've made a quick fix attempt (patch attached).
Note that this is work in progress, very lightly tested, not optimized.
It's just a "patch" draft, if it proves not to break something else (it
needs a lot of testing which complicated call scenarios), it can be 
optimized later (e.g. lock only once when deleting the timers for all
 the branches). I've made it available so early so that other people can
  test it, improve it, point obvious problems a.s.o.
The basic ideea is to immediately remove the fr_timer from the fr timer lists,
so that no fr_timer with TIMER_DELETED will remain on the FR*LISTs
(which is what causes problems with the variable timers).


Some quick numbers (don't quote them, just for patched/not patched/old stuff
 comparison):

ser 0.9.7 (no patch)                     ~20000 cps (*)

ser 0.9.7 (patched)                       about the same (*)

ser 0.9.7 with the 0.9.4 fix removed      <4000 cps (**)
(TIMER_SKIP_DELETED defined )

(*) - unfortunately I cannot generate more then 20000-21000 cps right
now, so this is not the maximum ser can handle, is just how much I could
 generate without too much trouble. Memory becomes also a problem,
  on a 64 bit machine the 20000 cps need over 900Mb of memory for
   ser 0.9.7 (ser.cvs needs ~800Mb).

(**) starts @4000 then drops to <1000

Note also that this are in fact non-INVITE transaction/s tests.
INVITEs should be slower (more complex &  more packets).
Normal calls per second (INVITE, then BYE) should be a little faster
then  twice as slow (2 transaction per call).

I've left openser.org lists in the cc in case the openser developers are
also interested in fixing this (which would enable them to use the 0.9.4
fix).


Andrei
-------------- next part --------------
Index: t_cancel.c
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/t_cancel.c,v
retrieving revision 1.13.2.1
diff -u -r1.13.2.1 t_cancel.c
--- t_cancel.c	13 Jun 2006 13:31:48 -0000	1.13.2.1
+++ t_cancel.c	28 Nov 2006 00:27:42 -0000
@@ -147,7 +147,7 @@
 	    
 	    /* stop_rb_timers(&t->uac[i].request); */
 	    reset_timer( &t->uac[i].request.retr_timer );
-	    reset_timer( &t->uac[i].request.fr_timer );
+	    del_fr_timer( &t->uac[i].request.fr_timer );
 	}
 	else {
 
Index: t_funcs.c
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/t_funcs.c,v
retrieving revision 1.175.2.3
diff -u -r1.175.2.3 t_funcs.c
--- t_funcs.c	28 Aug 2006 11:21:02 -0000	1.175.2.3
+++ t_funcs.c	28 Nov 2006 00:27:42 -0000
@@ -133,7 +133,7 @@
 {
 	set_kr(REQ_RLSD);
 
-	reset_timer( & trans->uas.response.fr_timer );
+	del_fr_timer( & trans->uas.response.fr_timer );
 	reset_timer( & trans->uas.response.retr_timer );
 
 	cleanup_uac_timers( trans );
Index: t_fwd.c
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/t_fwd.c,v
retrieving revision 1.61.2.2
diff -u -r1.61.2.2 t_fwd.c
--- t_fwd.c	28 Aug 2006 11:21:02 -0000	1.61.2.2
+++ t_fwd.c	28 Nov 2006 00:27:42 -0000
@@ -349,7 +349,7 @@
 				      * retransmission timers
 				      */
 				reset_timer(&t_invite->uac[i].request.retr_timer);
-				reset_timer(&t_invite->uac[i].request.fr_timer);
+				del_fr_timer(&t_invite->uac[i].request.fr_timer);
 
 				/* Generate faked reply */
 				LOCK_REPLIES(t_invite);
Index: t_reply.c
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/t_reply.c,v
retrieving revision 1.98.2.6
diff -u -r1.98.2.6 t_reply.c
--- t_reply.c	12 Jun 2006 08:22:48 -0000	1.98.2.6
+++ t_reply.c	28 Nov 2006 00:27:43 -0000
@@ -66,6 +66,7 @@
  *              the request (bogdan)
  *  2005-09-01  reverted to the old way of checking response.dst.send_sock
  *               in t_retransmit_reply & reply_light (andrei)
+ *  2006-12-27  replaced reset(fr_timer) with del_fr_timer(...)  (andrei)
  */
 
 
@@ -954,7 +955,7 @@
 	/* reset FR/retransmission timers */
 	for (i=0; i<t->nr_of_outgoings; i++ )  {
 		reset_timer( &t->uac[i].request.retr_timer );
-		reset_timer( &t->uac[i].request.fr_timer );
+		del_fr_timer( &t->uac[i].request.fr_timer );
 	}
 	DBG("DEBUG: cleanup_uac_timers: RETR/FR timers reset\n");
 }
@@ -1282,7 +1283,7 @@
 		     /* ... then just stop timers */
 		reset_timer( &uac->local_cancel.retr_timer);
 		if ( msg_status >= 200 ) {
-				reset_timer( &uac->local_cancel.fr_timer);
+				del_fr_timer( &uac->local_cancel.fr_timer);
 		}
 		DBG("DEBUG: reply to local CANCEL processed\n");
 		goto done;
@@ -1294,7 +1295,7 @@
 	
 	     /* stop final response timer only if I got a final response */
 	if ( msg_status >= 200 ) {
-		reset_timer( &uac->request.fr_timer);
+		del_fr_timer(&uac->request.fr_timer);
 	}
 
 	     /* acknowledge negative INVITE replies (do it before detailed
Index: timer.c
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/timer.c,v
retrieving revision 1.58.2.3
diff -u -r1.58.2.3 timer.c
--- timer.c	6 Dec 2005 13:20:14 -0000	1.58.2.3
+++ timer.c	28 Nov 2006 00:27:43 -0000
@@ -98,6 +98,8 @@
  *  2003-06-27  timers are not unlinked if timerlist is 0 (andrei)
  *  2004-02-13  t->is_invite, t->local, t->noisy_ctimer replaced;
  *              timer_link.payload removed (bogdan)
+ *  2006-11-27  added del_fr_timer(): fr timers are immediately removed
+ *              from the FR* lists (andrei)
  */
 
 #include "defs.h"
@@ -322,7 +324,7 @@
 				"request resending (t=%p, %.9s ... )\n", 
 				r_buf->my_T, r_buf->buffer);
 			if (SEND_BUFFER( r_buf )==-1) {
-				reset_timer( &r_buf->fr_timer );
+				del_fr_timer( &r_buf->fr_timer );
 				fake_reply(r_buf->my_T, r_buf->branch, 503 );
 				return;
 			}
@@ -436,7 +438,7 @@
 	int i;
 	for (i=0; i<t->nr_of_outgoings; i++ )  {
 		reset_timer(  &t->uac[i].local_cancel.retr_timer );
-		reset_timer(  &t->uac[i].local_cancel.fr_timer );
+		del_fr_timer(  &t->uac[i].local_cancel.fr_timer );
 	}
 }
 
@@ -776,6 +778,33 @@
 #endif
 }
 
+
+
+/* remove a timer from the FR_TIMER_LIST or FR_INV_TIMER_LIST
+ *  (it allows immediate delete of a fr timer => solves the race with
+ *   variables timers inserted after longer deleted timers)
+ * WARNING: - don't try to use it to "move" a timer from one list
+ *            to another, you'll run into races
+ */
+void del_fr_timer( struct timer_link *tl)
+{
+	/* the FR lock is common/shared by both FR_INV_TIMER_LIST 
+	 * and FR_TIMER_LIST, so we must lock only one of them */
+	lock(timertable->timers[FR_TIMER_LIST].mutex);
+	/* check first if we  are on  the "detached" timer_routine list (the fr
+	 * handle is executing or  timer_routine prepares to execute it).
+	 * if so do nothing, except reseting the timer to TIMER_DELETED
+	 *  (just to give us a change at racing with timer_routine, if 
+	 *  TIMER_DELETED is set and the fr handle is not already executing =>
+	 *   it will not be called anymore)
+	 */
+	if (tl->timer_list!=DETACHED_LIST){
+		remove_timer_unsafe(tl); /* safe to call for null list */
+	}else{
+		reset_timer(tl);
+	}
+	unlock(timertable->timers[FR_TIMER_LIST].mutex);
+}
 
 
 
Index: timer.h
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/timer.h,v
retrieving revision 1.37
diff -u -r1.37 timer.h
--- timer.h	1 Nov 2004 14:09:09 -0000	1.37
+++ timer.h	28 Nov 2006 00:27:43 -0000
@@ -109,6 +109,8 @@
 */
 
 void reset_timer( struct timer_link* tl );
+/* remove a timer from FR_TIMER_LIST or FR_INV_TIMER_LIST */
+void del_fr_timer( struct timer_link *tl);
 /* determine timer length and put on a correct timer list */
 void set_timer( struct timer_link *new_tl, enum lists list_id, unsigned int* ext_timeout );
 /* similar to set_timer, except it allows only one-time


More information about the Users mailing list