From timo.reimann@1und1.de Mon Jun 14 15:52:04 2010 From: Timo Reimann To: sr-dev@lists.kamailio.org Subject: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2 Date: Mon, 14 Jun 2010 15:51:57 +0200 Message-ID: <4C1633FD.5090904@1und1.de> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0932651648==" --===============0932651648== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, Ovidiu Sas wrote: > It is a race between the 200ok being sent out and the update of the > dialog state machine. The ACK is received while the dialog state > machine was not updates for the previously sent 200ok. > IIRC, in opensips this was fixed by introducing a new tm callback in > order to update the dialog state machine as soon as the 200ok was sent > out. Taking a look at dlg_handlers.c from 3.0.2, one can see that the function which transitions the dialog state (dlg_onreply) is called back from TMCB_RESPONSE_FWDED or TMCB_RESPONSE_READY. While the former is called just before a response is forwarded (but hasn't been so yet), the latter is mapped to TMCB_RESPONSE_OUT which, according to t_hooks.h and the code in t_reply.c, is called *after* the response has been delivered. In 1.5, however, dialog's reply handler function already relies on TMCB_RESPONSE_PRE_OUT which properly avoids the possibility of receiving an ACK prior to the complete processing of a previous OK response, and in turn race conditions. Consequently, it seems to me that 3.0.x introduced a bug by not using the proper tm callback anymore, specifically not one that finishes for sure before any message has been sent out. I'm not quite sure whether TMCB_RESPONSE_READY has simply been given the wrong value (its name indicates to me that the response is still waiting to be transmitted) or the tm module uses the wrong callback macros: Apparently, 1.5's *_PRE_* callbacks have gone and *_SENT introduced instead in 3.0.2. Not sure if that circumstance is truly reflected by the code. Cheers, --Timo > For now, the only way to avoid this issue is to run with one single > listener. This will keep the ACK into the received queue until the > listener finishes the processing of received 200ok. >=20 >=20 > Regards, > Ovidiu Sas >=20 > On Mon, Jun 14, 2010 at 9:01 AM, Klaus Feichtinger > wrote: >> Hello Ovidiu, >> >> I've been testing reproducablility of the described error and found that i= t is reproducable in "standard" mode onyl. This should mean: forked applicati= on with child processes. As soon as I start the application even as non-forke= d or with only one child (children=3D1), the error was not reproducable until= now. I've fastened reproduction with SIPp the call generating tool and in st= andard mode this error messages are visible after about 1-2 minutes (the call= rate is one call per second with a pause of 1500ms between ACK and the follo= wing BYE). >> >> Under these test circumstances no subscription for presence or dialog stat= e was active; it was tested on a "isolated" server where no other UAs were ac= tive; only SIPp and one other UA were active. >> >> I found that this problem is UserAgent-independent, too. Do you have an id= ea how to find out any more details about this problem? >> >> regards, >> >> Klaus Feichtinger >> >>> Datum: Thu, 10 Jun 2010 22:24:52 +0200 >>> Von: Klaus Feichtinger >>> An: Ovidiu Sas >>> CC: sr-dev(a)lists.sip-router.org >>> Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2 >>> Hello, >>> >>> according tcpdump-traces on the linux machine and log entries in syslog >>> (see below) the 200 OK passed the SIP Router (this means - the call was >>> accounted and afterwards the bogus event was noted): >>> >>> Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: