[Devel] Re: [Serusers] Re: Fw: [Users] TM : retransmission timers

Andrei Pelinescu-Onciul andrei at iptel.org
Tue Nov 28 16:08:37 CET 2006


On Nov 28, 2006 at 02:03, Andrei Pelinescu-Onciul <andrei at iptel.org> wrote:
> On Nov 27, 2006 at 23:14, Andrei Pelinescu-Onciul <andrei at iptel.org> wrote:
> > On Nov 27, 2006 at 19:02, Bogdan-Andrei Iancu <bogdan at voice-system.ro> wrote:
> > > small typo below...
> > > 
> > > Bogdan-Andrei Iancu wrote:
> > > 
> > > >Hi Klaus,
> > > >
> > > >as the discussion about ser's improvements bounces again to openser, I 
> > > >had to do a bit of digging to provide a correct answer to openser's 
> > > >users.
> > > >
> > > >yes, there were some improvements did by Andrei to TM, mainly in timer 
> > > >implementation. As you were wondering, the 0.9.6 SER should be 
> > > >relatively close to openser 0.9.4 as TM performance and merging the 
> > > >results from Vaclav with Andrei tests before the timer improvement in 
> > > >SER 0.9.6, seams to be correct. See:
> > > >   http://lists.iptel.org/pipermail/serdev/2005-December/006583.html
> > > >
> > > >After this improvement, SER's 0.9.6 performances dramatically 
> > > >increased, but unfortunately, according to our tests it is also 
> > > >dramatically wrong. TM timers are not working correctly when variable 
> > > >timeouts are used in SER.
> > > >
> > > >With the improvement, the following scenario gets broken - 3 calls 
> > > >only, no load, default cfg:
> > > >
> > > >CALL 1: has 60 secs timeout for Final_response_timeout - nobody 
> > > >answers, still ringing
> > > >in less than 2 secs ->
> > > >CALL 2: has 70 secs timeout  - it is immediately answered.
> > > >in less than 2 secs ->
> > > >CALL 3: has 10 secs timeout for Final_response_timeout - nobody 
> > > >answers, ringing.
> > > >
> > > >Of course, everybody expects that CALL3 will timeout before CALL1 
> > > >(with more than 40 secs), but in SER 0.9.6 (latest stable for the last 
> > > >2 years), this will not happened - both CALL3 and CALL1 will timeout 
> > > >simultaneously when the CALL3 timer hits.
> > 
> > 
> > Thanks a lot for the bug report.
> > 
> 
> I've made a quick fix attempt (patch attached).
> Note that this is work in progress, very lightly tested, not optimized.
> It's just a "patch" draft, if it proves not to break something else (it
> needs a lot of testing which complicated call scenarios), it can be 
> optimized later (e.g. lock only once when deleting the timers for all
>  the branches). I've made it available so early so that other people can
>   test it, improve it, point obvious problems a.s.o.
> The basic ideea is to immediately remove the fr_timer from the fr timer lists,
> so that no fr_timer with TIMER_DELETED will remain on the FR*LISTs
> (which is what causes problems with the variable timers).
> 
> 
> Some quick numbers (don't quote them, just for patched/not patched/old stuff
>  comparison):
> 
> ser 0.9.7 (no patch)                     ~20000 cps (*)
> 
> ser 0.9.7 (patched)                       about the same (*)
> 
> ser 0.9.7 with the 0.9.4 fix removed      <4000 cps (**)
> (TIMER_SKIP_DELETED defined )
> 
> (*) - unfortunately I cannot generate more then 20000-21000 cps right
> now, so this is not the maximum ser can handle, is just how much I could
>  generate without too much trouble. Memory becomes also a problem,
>   on a 64 bit machine the 20000 cps need over 900Mb of memory for
>    ser 0.9.7 (ser.cvs needs ~800Mb).
> 
> (**) starts @4000 then drops to <1000
> 
> Note also that this are in fact non-INVITE transaction/s tests.
> INVITEs should be slower (more complex &  more packets).
> Normal calls per second (INVITE, then BYE) should be a little faster
> then  twice as slow (2 transaction per call).
> 
> I've left openser.org lists in the cc in case the openser developers are
> also interested in fixing this (which would enable them to use the 0.9.4
> fix).


Another completely different version, this time a new member is added to the 
timer_link structure, just  to keep an is_deleted flag.
Advantages: - simpler, much easier to verify (doesn't need so much testing)
Disadvantages:  a little more memory is used (can be seen after testing),
probably a little slower in the normal case (though both versions maxed
 out my testing sipp, I think the other is faster), probably
 significantly slower if a lot of different variable timers are used.



Andrei




> 
> 
> Andrei

> Index: t_cancel.c
> ===================================================================
> RCS file: /cvsroot/ser/sip_router/modules/tm/t_cancel.c,v
> retrieving revision 1.13.2.1
> diff -u -r1.13.2.1 t_cancel.c
> --- t_cancel.c	13 Jun 2006 13:31:48 -0000	1.13.2.1
> +++ t_cancel.c	28 Nov 2006 00:27:42 -0000
> @@ -147,7 +147,7 @@
>  	    
>  	    /* stop_rb_timers(&t->uac[i].request); */
>  	    reset_timer( &t->uac[i].request.retr_timer );
> -	    reset_timer( &t->uac[i].request.fr_timer );
> +	    del_fr_timer( &t->uac[i].request.fr_timer );
>  	}
>  	else {
>  
> Index: t_funcs.c
> ===================================================================
> RCS file: /cvsroot/ser/sip_router/modules/tm/t_funcs.c,v
> retrieving revision 1.175.2.3
> diff -u -r1.175.2.3 t_funcs.c
> --- t_funcs.c	28 Aug 2006 11:21:02 -0000	1.175.2.3
> +++ t_funcs.c	28 Nov 2006 00:27:42 -0000
> @@ -133,7 +133,7 @@
>  {
>  	set_kr(REQ_RLSD);
>  
> -	reset_timer( & trans->uas.response.fr_timer );
> +	del_fr_timer( & trans->uas.response.fr_timer );
>  	reset_timer( & trans->uas.response.retr_timer );
>  
>  	cleanup_uac_timers( trans );
> Index: t_fwd.c
> ===================================================================
> RCS file: /cvsroot/ser/sip_router/modules/tm/t_fwd.c,v
> retrieving revision 1.61.2.2
> diff -u -r1.61.2.2 t_fwd.c
> --- t_fwd.c	28 Aug 2006 11:21:02 -0000	1.61.2.2
> +++ t_fwd.c	28 Nov 2006 00:27:42 -0000
> @@ -349,7 +349,7 @@
>  				      * retransmission timers
>  				      */
>  				reset_timer(&t_invite->uac[i].request.retr_timer);
> -				reset_timer(&t_invite->uac[i].request.fr_timer);
> +				del_fr_timer(&t_invite->uac[i].request.fr_timer);
>  
>  				/* Generate faked reply */
>  				LOCK_REPLIES(t_invite);
> Index: t_reply.c
> ===================================================================
> RCS file: /cvsroot/ser/sip_router/modules/tm/t_reply.c,v
> retrieving revision 1.98.2.6
> diff -u -r1.98.2.6 t_reply.c
> --- t_reply.c	12 Jun 2006 08:22:48 -0000	1.98.2.6
> +++ t_reply.c	28 Nov 2006 00:27:43 -0000
> @@ -66,6 +66,7 @@
>   *              the request (bogdan)
>   *  2005-09-01  reverted to the old way of checking response.dst.send_sock
>   *               in t_retransmit_reply & reply_light (andrei)
> + *  2006-12-27  replaced reset(fr_timer) with del_fr_timer(...)  (andrei)
>   */
>  
>  
> @@ -954,7 +955,7 @@
>  	/* reset FR/retransmission timers */
>  	for (i=0; i<t->nr_of_outgoings; i++ )  {
>  		reset_timer( &t->uac[i].request.retr_timer );
> -		reset_timer( &t->uac[i].request.fr_timer );
> +		del_fr_timer( &t->uac[i].request.fr_timer );
>  	}
>  	DBG("DEBUG: cleanup_uac_timers: RETR/FR timers reset\n");
>  }
> @@ -1282,7 +1283,7 @@
>  		     /* ... then just stop timers */
>  		reset_timer( &uac->local_cancel.retr_timer);
>  		if ( msg_status >= 200 ) {
> -				reset_timer( &uac->local_cancel.fr_timer);
> +				del_fr_timer( &uac->local_cancel.fr_timer);
>  		}
>  		DBG("DEBUG: reply to local CANCEL processed\n");
>  		goto done;
> @@ -1294,7 +1295,7 @@
>  	
>  	     /* stop final response timer only if I got a final response */
>  	if ( msg_status >= 200 ) {
> -		reset_timer( &uac->request.fr_timer);
> +		del_fr_timer(&uac->request.fr_timer);
>  	}
>  
>  	     /* acknowledge negative INVITE replies (do it before detailed
> Index: timer.c
> ===================================================================
> RCS file: /cvsroot/ser/sip_router/modules/tm/timer.c,v
> retrieving revision 1.58.2.3
> diff -u -r1.58.2.3 timer.c
> --- timer.c	6 Dec 2005 13:20:14 -0000	1.58.2.3
> +++ timer.c	28 Nov 2006 00:27:43 -0000
> @@ -98,6 +98,8 @@
>   *  2003-06-27  timers are not unlinked if timerlist is 0 (andrei)
>   *  2004-02-13  t->is_invite, t->local, t->noisy_ctimer replaced;
>   *              timer_link.payload removed (bogdan)
> + *  2006-11-27  added del_fr_timer(): fr timers are immediately removed
> + *              from the FR* lists (andrei)
>   */
>  
>  #include "defs.h"
> @@ -322,7 +324,7 @@
>  				"request resending (t=%p, %.9s ... )\n", 
>  				r_buf->my_T, r_buf->buffer);
>  			if (SEND_BUFFER( r_buf )==-1) {
> -				reset_timer( &r_buf->fr_timer );
> +				del_fr_timer( &r_buf->fr_timer );
>  				fake_reply(r_buf->my_T, r_buf->branch, 503 );
>  				return;
>  			}
> @@ -436,7 +438,7 @@
>  	int i;
>  	for (i=0; i<t->nr_of_outgoings; i++ )  {
>  		reset_timer(  &t->uac[i].local_cancel.retr_timer );
> -		reset_timer(  &t->uac[i].local_cancel.fr_timer );
> +		del_fr_timer(  &t->uac[i].local_cancel.fr_timer );
>  	}
>  }
>  
> @@ -776,6 +778,33 @@
>  #endif
>  }
>  
> +
> +
> +/* remove a timer from the FR_TIMER_LIST or FR_INV_TIMER_LIST
> + *  (it allows immediate delete of a fr timer => solves the race with
> + *   variables timers inserted after longer deleted timers)
> + * WARNING: - don't try to use it to "move" a timer from one list
> + *            to another, you'll run into races
> + */
> +void del_fr_timer( struct timer_link *tl)
> +{
> +	/* the FR lock is common/shared by both FR_INV_TIMER_LIST 
> +	 * and FR_TIMER_LIST, so we must lock only one of them */
> +	lock(timertable->timers[FR_TIMER_LIST].mutex);
> +	/* check first if we  are on  the "detached" timer_routine list (the fr
> +	 * handle is executing or  timer_routine prepares to execute it).
> +	 * if so do nothing, except reseting the timer to TIMER_DELETED
> +	 *  (just to give us a change at racing with timer_routine, if 
> +	 *  TIMER_DELETED is set and the fr handle is not already executing =>
> +	 *   it will not be called anymore)
> +	 */
> +	if (tl->timer_list!=DETACHED_LIST){
> +		remove_timer_unsafe(tl); /* safe to call for null list */
> +	}else{
> +		reset_timer(tl);
> +	}
> +	unlock(timertable->timers[FR_TIMER_LIST].mutex);
> +}
>  
>  
>  
> Index: timer.h
> ===================================================================
> RCS file: /cvsroot/ser/sip_router/modules/tm/timer.h,v
> retrieving revision 1.37
> diff -u -r1.37 timer.h
> --- timer.h	1 Nov 2004 14:09:09 -0000	1.37
> +++ timer.h	28 Nov 2006 00:27:43 -0000
> @@ -109,6 +109,8 @@
>  */
>  
>  void reset_timer( struct timer_link* tl );
> +/* remove a timer from FR_TIMER_LIST or FR_INV_TIMER_LIST */
> +void del_fr_timer( struct timer_link *tl);
>  /* determine timer length and put on a correct timer list */
>  void set_timer( struct timer_link *new_tl, enum lists list_id, unsigned int* ext_timeout );
>  /* similar to set_timer, except it allows only one-time

> _______________________________________________
> Serusers mailing list
> Serusers at lists.iptel.org
> http://lists.iptel.org/mailman/listinfo/serusers

-------------- next part --------------
Index: timer.c
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/timer.c,v
retrieving revision 1.58.2.3
diff -u -r1.58.2.3 timer.c
--- timer.c	6 Dec 2005 13:20:14 -0000	1.58.2.3
+++ timer.c	28 Nov 2006 14:56:28 -0000
@@ -98,6 +98,8 @@
  *  2003-06-27  timers are not unlinked if timerlist is 0 (andrei)
  *  2004-02-13  t->is_invite, t->local, t->noisy_ctimer replaced;
  *              timer_link.payload removed (bogdan)
+ *  2006-11-28  switched to using the new timer_link flag for the timer
+ *               deleted check (andrei)
  */
 
 #include "defs.h"
@@ -536,7 +538,9 @@
 struct timer_table *tm_init_timers()
 {
 	enum lists i;
-
+#ifdef TIMER_DELETED_FLAG
+	LOG(L_CRIT, "timer init, TIMER_DELETED_FLAG suppport\n");
+#endif
 	timertable=(struct timer_table *) shm_malloc(sizeof(struct timer_table));
 	if (!timertable) {
 		LOG(L_ERR, "ERROR: tm_init_timers: no shmem for timer_Table\n");
@@ -660,7 +664,7 @@
 	    ptr = ptr->prev_tl) {
 		if (
 #ifdef TIMER_SKIP_DELETED
-				(ptr->time_out != TIMER_DELETED) && 
+				(!timer_is_deleted(ptr)) && 
 #endif
 				(ptr->time_out <= time_out)) break;
 	}
@@ -770,7 +774,11 @@
 	   but not execute; there is a race condition, though -- see
 	   timer.c for more details
 	*/
+#ifdef TIMER_DELETED_FLAG
+	tl->is_deleted=1;
+#else
 	tl->time_out = TIMER_DELETED;
+#endif
 #ifdef EXTRA_DEBUG
 	DBG("DEBUG: reset_timer (group %d, tl=%p)\n", tl->tg, tl );
 #endif
@@ -854,7 +862,7 @@
 	list= &(timertable->timers[ list_id ]);
 
 	lock(list->mutex);
-	if (!(new_tl->time_out>TIMER_DELETED)) {
+	if (!(timer_is_active(new_tl))) {
 		/* make sure I'm not already on a list */
 		/* remove_timer_unsafe( new_tl ); */
 		/*
@@ -944,7 +952,7 @@
 		(_tl)->next_tl = (_tl)->prev_tl = 0;\
 		DBG("DEBUG: timer routine:%d,tl=%p next=%p\n",\
 			id,(_tl),tmp_tl);\
-		if ((_tl)->time_out>TIMER_DELETED) \
+		if (timer_is_active(_tl)) \
 			(_handler)( _tl );\
 		(_tl) = tmp_tl;\
 	}
Index: timer.h
===================================================================
RCS file: /cvsroot/ser/sip_router/modules/tm/timer.h,v
retrieving revision 1.37
diff -u -r1.37 timer.h
--- timer.h	1 Nov 2004 14:09:09 -0000	1.37
+++ timer.h	28 Nov 2006 14:56:28 -0000
@@ -29,6 +29,8 @@
  * --------
  *  2003-09-12  timer_link.tg exists only if EXTRA_DEBUG (andrei)
  *  2004-02-13  timer_link.payload removed (bogdan)
+ *  2006-11-28  added a special is_deleted flag for the timer_link structure,
+ *               added timer_is_deleted() and timer_is_active()  (andrei)
  */
 
 
@@ -42,9 +44,16 @@
 /* timer timestamp value indicating a timer has been 
    deactivated and shall not be executed
 */
-#define TIMER_DELETED	1
-
 
+#define TIMER_DELETED_FLAG
+#ifdef TIMER_DELETED_FLAG
+	#define timer_is_deleted(tl) (tl->is_deleted)
+	#define timer_is_active(tl) ( ! tl->is_deleted && tl->time_out!=0)
+#else
+	#define TIMER_DELETED	1
+	#define timer_is_deleted(tl) (tl->time_out==TIMER_DELETED)
+	#define timer_is_active(tl) (tl->time_out>TIMER_DELETED)
+#endif
 
 /* identifiers of timer lists;*/
 /* fixed-timer retransmission lists (benefit: fixed timer$
@@ -67,6 +76,9 @@
 	struct timer_link     *prev_tl;
 	volatile unsigned int  time_out;
 	struct timer          *timer_list;
+#ifdef TIMER_DELETED_FLAG
+	volatile char is_deleted;
+#endif
 #ifdef EXTRA_DEBUG
 	enum timer_groups  tg;
 #endif


More information about the Users mailing list