Hello,
I'm seeing an increasing amount of spurious timeouts on my proxies. Even though the proxy receives a "100 Trying" response immediately after relaying the request, tm will send retransmits and times out eventually. Below is a commented log snippet. The timeout occurs on the proxy named 'balancer1'. Both are on the same host.
I verified that the 100 Trying response is received in a t_on_reply block, so it is recognized by tm as belonging to a transaction.
When needed, I'll provide a debug log when it occurs again.
Meanwhile, any idea what is causing this?
This is on kamailio 1.5+svn5768
fr_timer = 10 fr_timer_avp = 2
# Initial request received on balancer May 15 12:32:01 balancer1[24170]: [2 INVITE] - New request. sip:uac@domain => sip:uas@domain May 15 12:32:01 balancer1[24170]: [2 INVITE] Relaying
# Initial request received on proxy May 15 12:32:01 proxy1[23803]: [2 INVITE] - New request. sip:uac@domain => sip:uas@domain
# Balancer receives 100 response May 15 12:32:01 balancer1[24166]: [2 INVITE] - Reply. 100 Trying
# Proxy relays request May 15 12:32:01 proxy1[23803]: [2 INVITE] Relaying
# Proxy receives retransmission from balancer ??? May 15 12:32:02 proxy1[23804]: [2 INVITE] - New request. sip:uac@domain => sip:uas@domain
# Balancer times out ??? May 15 12:32:02 balancer1[24171]: [2 INVITE] Failure (Status: 408) May 15 12:32:02 balancer1[24171]: [2 INVITE] No more gateways
May 15 12:32:02 proxy1[23804]: [2 CANCEL] - New request. sip:uac@domain => sip:uas@domain
Hello,
On 05/15/2009 02:19 PM, Alex Hermann wrote:
Hello,
I'm seeing an increasing amount of spurious timeouts on my proxies. Even though the proxy receives a "100 Trying" response immediately
is this 100 coming very fast? Can you try with r5803, it was a(nother) fix to the race between request/reply.
Cheers, Daniel
after relaying the request, tm will send retransmits and times out eventually. Below is a commented log snippet. The timeout occurs on the proxy named 'balancer1'. Both are on the same host.
I verified that the 100 Trying response is received in a t_on_reply block, so it is recognized by tm as belonging to a transaction.
When needed, I'll provide a debug log when it occurs again.
Meanwhile, any idea what is causing this?
This is on kamailio 1.5+svn5768
fr_timer = 10 fr_timer_avp = 2
# Initial request received on balancer May 15 12:32:01 balancer1[24170]: [2 INVITE] - New request. sip:uac@domain => sip:uas@domain May 15 12:32:01 balancer1[24170]: [2 INVITE] Relaying
# Initial request received on proxy May 15 12:32:01 proxy1[23803]: [2 INVITE] - New request. sip:uac@domain => sip:uas@domain
# Balancer receives 100 response May 15 12:32:01 balancer1[24166]: [2 INVITE] - Reply. 100 Trying
# Proxy relays request May 15 12:32:01 proxy1[23803]: [2 INVITE] Relaying
# Proxy receives retransmission from balancer ??? May 15 12:32:02 proxy1[23804]: [2 INVITE] - New request. sip:uac@domain => sip:uas@domain
# Balancer times out ??? May 15 12:32:02 balancer1[24171]: [2 INVITE] Failure (Status: 408) May 15 12:32:02 balancer1[24171]: [2 INVITE] No more gateways
May 15 12:32:02 proxy1[23804]: [2 CANCEL] - New request. sip:uac@domain => sip:uas@domain
On Friday 15 May 2009, you wrote:
I'm seeing an increasing amount of spurious timeouts on my proxies. Even though the proxy receives a "100 Trying" response immediately
is this 100 coming very fast?
Yes. Proxies on the same host, first thing the proxy does after checking loose_route is sending a 100 response.
Can you try with r5803, it was a(nother) fix to the race between request/reply.
I'll try, thanks.
Weird thing is that the 100 trying is recognized as belonging to a/the transaction as it shows up in the t_onreply route. So I was assuming the transaction is correctly created before the response arrives.
On 05/15/2009 03:17 PM, Alex Hermann wrote:
On Friday 15 May 2009, you wrote:
I'm seeing an increasing amount of spurious timeouts on my proxies. Even though the proxy receives a "100 Trying" response immediately
is this 100 coming very fast?
Yes. Proxies on the same host, first thing the proxy does after checking loose_route is sending a 100 response.
Can you try with r5803, it was a(nother) fix to the race between request/reply.
I'll try, thanks.
Weird thing is that the 100 trying is recognized as belonging to a/the transaction as it shows up in the t_onreply route. So I was assuming the transaction is correctly created before the response arrives.
the transaction is ok, the race is between first request sent out and adding to retransmission timers. sip-router has another implementation here, so next stable is going to do much better.
Cheers, Daniel
On Friday 15 May 2009, Daniel-Constantin Mierla wrote:
On 05/15/2009 02:19 PM, Alex Hermann wrote:
I'm seeing an increasing amount of spurious timeouts on my proxies. Even though the proxy receives a "100 Trying" response immediately
is this 100 coming very fast? Can you try with r5803, it was a(nother) fix to the race between request/reply.
I see you have put this 'fix' away in a define in rev 5868 (quoted below for reference). Could you elaborate on the possible side-effect? I have run a proxy with r5803 enabled for quite some time without noticable side effects. What effect should I've been on the lookout for?
And I have a question on patch 5803: the fix seems to depend on the fact that another process fills the 'last_received' field on an incoming reply. I assume 't' is in shared memory then, but I fail to see any locking around the access to this structure. Where is the locking in tm?
r5801 | miconda | 2009-04-23 11:48:08 +0200 (Thu, 23 Apr 2009) | 4 lines
- narrow reply race time frame - patch by Zappasodi Daniele - note: new devel on svn trunk has stopped and continues on sip-router.org git, tm is the only one K module not imported in SR and this patch is applied to have it when comparing additional features in K and for easier backporting
Index: modules/tm/t_fwd.c =================================================================== --- modules/tm/t_fwd.c (revision 5800) +++ modules/tm/t_fwd.c (revision 5801) @@ -719,7 +719,11 @@ -p_msg->REQ_METHOD); }
- start_retr( &t->uac[i].request ); + if(p_msg->REQ_METHOD==METHOD_INVITE + && t->uac[i].last_received>=100) + LM_DBG("Last received %d\n",t->uac[i].last_received); + else + start_retr( &t->uac[i].request ); set_kr(REQ_FWDED); } }
r5868 | miconda | 2009-06-02 11:25:18 +0200 (Tue, 02 Jun 2009) | 2 lines
- ifdef'ed last recevied checked added last time (possible side effects to be more investigated)
Index: modules/tm/t_fwd.c =================================================================== --- modules/tm/t_fwd.c (revision 5867) +++ modules/tm/t_fwd.c (revision 5868) @@ -718,11 +718,12 @@ run_trans_callbacks( TMCB_REQUEST_BUILT, t, p_msg,0, -p_msg->REQ_METHOD); } - +#ifdef TM_FWD_LRCV_CHECK if(p_msg->REQ_METHOD==METHOD_INVITE && t->uac[i].last_received>=100) LM_DBG("Last received %d\n",t->uac[i].last_received); else +#endif start_retr( &t->uac[i].request ); set_kr(REQ_FWDED); }