[sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2

Timo Reimann timo.reimann at 1und1.de
Mon Jun 14 15:51:57 CEST 2010


Hi,


Ovidiu Sas wrote:
> It is a race between the 200ok being sent out and the update of the
> dialog state machine.  The ACK is received while the dialog state
> machine was not updates for the previously sent 200ok.
> IIRC, in opensips this was fixed by introducing a new tm callback in
> order to update the dialog state machine as soon as the 200ok was sent
> out.

Taking a look at dlg_handlers.c from 3.0.2, one can see that the
function which transitions the dialog state (dlg_onreply) is called back
from TMCB_RESPONSE_FWDED or TMCB_RESPONSE_READY. While the former is
called just before a response is forwarded (but hasn't been so yet), the
latter is mapped to TMCB_RESPONSE_OUT which, according to t_hooks.h and
the code in t_reply.c, is called *after* the response has been delivered.

In 1.5, however, dialog's reply handler function already relies on
TMCB_RESPONSE_PRE_OUT which properly avoids the possibility of receiving
an ACK prior to the complete processing of a previous OK response, and
in turn race conditions.

Consequently, it seems to me that 3.0.x introduced a bug by not using
the proper tm callback anymore, specifically not one that finishes for
sure before any message has been sent out. I'm not quite sure whether
TMCB_RESPONSE_READY has simply been given the wrong value (its name
indicates to me that the response is still waiting to be transmitted) or
the tm module uses the wrong callback macros: Apparently, 1.5's *_PRE_*
callbacks have gone and *_SENT introduced instead in 3.0.2. Not sure if
that circumstance is truly reflected by the code.


Cheers,

--Timo



> For now, the only way to avoid this issue is to run with one single
> listener.  This will keep the ACK into the received queue until the
> listener finishes the processing of received 200ok.
> 
> 
> Regards,
> Ovidiu Sas
> 
> On Mon, Jun 14, 2010 at 9:01 AM, Klaus Feichtinger
> <klaus.feichtinger at gmx.net> wrote:
>> Hello Ovidiu,
>>
>> I've been testing reproducablility of the described error and found that it is reproducable in "standard" mode onyl. This should mean: forked application with child processes. As soon as I start the application even as non-forked or with only one child (children=1), the error was not reproducable until now. I've fastened reproduction with SIPp the call generating tool and in standard mode this error messages are visible after about 1-2 minutes (the call rate is one call per second with a pause of 1500ms between ACK and the following BYE).
>>
>> Under these test circumstances no subscription for presence or dialog state was active; it was tested on a "isolated" server where no other UAs were active; only SIPp and one other UA were active.
>>
>> I found that this problem is UserAgent-independent, too. Do you have an idea how to find out any more details about this problem?
>>
>> regards,
>>
>> Klaus Feichtinger
>>
>>> Datum: Thu, 10 Jun 2010 22:24:52 +0200
>>> Von: Klaus Feichtinger <klaus.feichtinger at gmx.net>
>>> An: Ovidiu Sas <osas at voipembedded.com>
>>> CC: sr-dev at lists.sip-router.org
>>> Betreff: Re: [sr-dev] Dialog module: "CRITICAL bogus event 6 in state 2
>>> Hello,
>>>
>>> according tcpdump-traces on the linux machine and log entries in syslog
>>> (see below) the 200 OK passed the SIP Router (this means - the call was
>>> accounted and afterwards the bogus event was noted):
>>>
>>> Jun 10 18:29:09 debian /usr/sbin/kamailio[20026]: INFO: <script>:
>>> ---xlog: a call with sip:116202 at 10.16.10.99 is coming in
>>> Jun 10 18:29:12 debian /usr/sbin/kamailio[20027]: NOTICE: acc
>>> [acc.c:275]: ACC: transaction answered:
>>> timestamp=1276208952;method=INVITE;from_tag=3120233282-36939680-1276187634579;to_tag=3338003860-56654088-1276187447718;call_id=741466613-36939680-1276187634579 at 10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116202;dst_domain=10.16.10.90;src_ip=10.16.10.152
>>> Jun 10 18:29:12 debian /usr/sbin/kamailio[20025]: CRITICAL: dialog
>>> [dlg_hash.c:591]: bogus event 6 in state 2 for dlg 0xa7eb3ea8
>>> [2285:1225504260] with clid
>>> '741466613-36939680-1276187634579 at 10.16.10.152' and tags
>>> '3120233282-36939680-1276187634579' ''
>>>
>>>
>>> However, this is not the one and only event / state combination I found
>>> in syslog. There is also an entry with event 6 in state 1 present:
>>>
>>> Jun 10 19:35:49 debian /usr/sbin/kamailio[20026]: INFO: <script>:
>>> ---xlog: a call with sip:116002 at 10.16.10.99 is coming in
>>> Jun 10 19:35:50 debian /usr/sbin/kamailio[20027]: NOTICE: acc
>>> [acc.c:275]: ACC: transaction answered:
>>> timestamp=1276212950;method=INVITE;from_tag=3805046041-37017968-1276191635112;to_tag=befc3f7326747dei0;call_id=789390150-37017968-1276191635112 at 10.16.10.152;code=200;reason=OK;src_user=116201;src_domain=10.16.10.99;dst_user=116002;dst_domain=10.16.10.205;src_ip=10.16.10.152
>>> Jun 10 19:35:50 debian /usr/sbin/kamailio[20026]: CRITICAL: dialog
>>> [dlg_hash.c:591]: bogus event 6 in state 1 for dlg 0xa7e979b8
>>> [1648:1273002173] with clid
>>> '789390150-37017968-1276191635112 at 10.16.10.152' and tags
>>> '3805046041-37017968-1276191635112' ''
>>>
>>> But tomorrow I will test the setting - as recommended by you - to reduce
>>> kamailio to a single worker and check if the problem will keep
>>> repreduceable.
>>>
>>> Regards,
>>>
>>> Klaus Feichtinger
>>>
>>>> Bogus event 6 in state 2 means ACK received in early state.
>>>> This is a little bit bizarre, because it seems that the dialog did not
>>>> see the 200ok passing by.
>>>> Try to use a single worker (children=1) and see if you can reproduce
>>>> the problem.
>>>> Also, check that the 200ok is really going through the proxy.
>>>>
>>>> Regards,
>>>> Ovidiu Sas
>>>>
>>>> On Thu, Jun 10, 2010 at 6:44 AM, Klaus Feichtinger
>>>> <klaus.feichtinger at gmx.net> wrote:
>>>>
>>>>> Hello list,
>>>>>
>>>>> under special circumstances I have a problem with the DIALOG module of
>>> SIP-Router/Kamailio version 3.0.2. The dialog module is used in combination
>>> with presence, presence_xml, pua, pua_usrloc and pua_dialoginfo modules +
>>> db_mysql modules. The db_mode is set to "write_through", because of
>>> redundancy requirements.
>>>>> From my point of view this problem has something to do with timing in
>>> general. E.g. a so called "data call" has a duration of about 50 ms only;
>>> INVITE....BYE). That problem occurs on this short calls only. Because of the
>>> problems in the dialog state machine, NOTIFY messages are incorrect. Even
>>> when the call is already finished, the NOTIFY message includes the state
>>> "confirmed" and causes a wrong status indication....
>>>>> The detailed error message can be seen here:
>>>>> "Jun  9 16:00:39 debian /usr/sbin/kamailio[14992]: CRITICAL: dialog
>>> [dlg_hash.c:591]: bogus event 6 in state 2 for dlg [dialog-ID] with clid
>>> [Call-ID] and tags '1299370188-28358304-1276092068837' ''"
>>>>>
>>>>> I found in old mails of the developer list another error that looked
>>> nearly the same, but it differed in the event and state of the dialog state
>>> machine (the thread can be found under the link
>>> http://www.mail-archive.com/devel@lists.kamailio.org/msg03234.html). That bug should - according the
>>> information that I've found in the list - be solved. Therefore I will ask
>>> you: is the error as displayed above another well known error / bug, which
>>> (maybe) should already be solved? Is it a new bug?
>>>>> Thanks for any information.
>>>>>
>>>>> regards,
>>>>>
>>>>> Klaus Feichtinger
>>>>>
>>>>> _______________________________________________
>>>>> sr-dev mailing list
>>>>> sr-dev at lists.sip-router.org
>>>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> sr-dev mailing list
>>> sr-dev at lists.sip-router.org
>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>> --
>> GMX DSL: Internet-, Telefon- und Handy-Flat ab 19,99 EUR/mtl.
>> Bis zu 150 EUR Startguthaben inklusive! http://portal.gmx.net/de/go/dsl
>>
> 
> _______________________________________________
> sr-dev mailing list
> sr-dev at lists.sip-router.org
> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev


-- 
Timo Reimann
Software-Developer

1&1 Internet AG - Consumer Core
Ernst-Frey-Straße 9 · DE-76135 Karlsruhe
Telefon: +49 721 91374-8122
timo.reimann at 1und1.de

Amtsgericht Montabaur / HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas
Gottschlich, Robert Hoffmann, Markus Huhn, Hans-Henning Kettler, Dr.
Oliver Mauss, Gert Nowotny, Jan Oetjen
Aufsichtsratsvorsitzender: Michael Scheeren



More information about the sr-dev mailing list