[OpenSER-Devel] [Serdev] Possible bug in the tm module in the presence of packet loss/branches

Andrei Pelinescu-Onciul andrei at iptel.org
Fri Mar 7 19:29:22 CET 2008


On Mar 07, 2008 at 10:00, Maxim Sobolev <sobomax at sippysoft.com> wrote:
> Andrei Pelinescu-Onciul wrote:
> >On Mar 06, 2008 at 10:43, Maxim Sobolev <sobomax at sippysoft.com> wrote:
> >>Bogdan-Andrei Iancu wrote:
> >>>Hi Maxim,
> >>>
> >>>You stated:
> >>>
> >>><quote>
> >>>The correct behavior of the tm module in this case would be to continue 
> >>>with INVITE re-transmits until we get provisional response and immediate 
> >>>CANCEL once that response comes in.
> >>></quote>
> >>>
> >>>Is this based on RFC indication or a personal opinion? If RFC based, 
> >>>could you please point me out the relevant section?
> >>>
> >>>I'm asking mainly because, following my own logic, I would rather say 
> >>>that once the transaction is cancelled on UAS side, no further attempts 
> >>>(read retransmissions) should be done on UAC side.
> >>Bogdan,
> >>
> >>It's based on common sense. Unless UAC does number of retransmits 
> >>specified by the RFC it can never be sure whether absence of provisional 
> >>reply has been caused by the dead destination or network packet loss 
> >>issue and the destination is in fact ringing. In the tm module you 
> >>always assume "dead destination", which is IMHO wrong. In my situation 
> >>this problem has been aggravated by the magnitude of packet loss, but in 
> >>general I've seen this issue before once in a while on a network with 
> >>close to zero packet loss rate.
> >>
> >>Another bad decision is to generate 487 locally in the presence of 
> >>unconfirmed active branches. SIP proxy should not do it unless it is 
> >>prepared to generate BYE if 200 OK comes from any of those branches 
> >>(i.e. proxy provides some kind of dialog functionality). Again, in the 
> >>real world, where packets are getting lost from time to time this could 
> >>lead to 200 OK coming from the branch even if you do stop INVITE 
> >>retransmitions. You will get yourself in the situation with originating 
> >>UA already received fake final negative 487 from proxy, while 
> >>terminating UA having dialog established, so that the only way to "fix" 
> >>the issue is to send BYE from the proxy to the terminating UA.
> >>
> >
> >OTOH if you wait for the timeout and you have some unreachable branches
> >(I think this is quite common due, e.g. changing ip address), you'll
> > delay possible 6xx answers and use more memory (even a 2xx replied
> > transaction could be kept longer in memory if it has a not responing
> > branch).
> >
> >I'm also not sure if this would  be rfc conformant (although after a
> >quick look I cannot find anything for or against it).
> >
> >I would rather send CANCEL even on branches with no provisional response
> >(although the rfc seems to deny this due some possible race conditions
> >(?)).
> >
> >
> >I guess we can have yet another tm config param. for specifying cancel
> >non-pending branches behaviour, but what should be its default value
> >(present way, keep retransmitting INV, or send CANCEL)?
> 
> There are three issues, really:
> 
> 1. According to the RFC CANCEL on non-pending branches should be 
> deferred until provisional reply comes. Neither SER nor OpenSER do that. 
> They simply don't relay CANCEL if provisional reply comes later on. It 
> looks like a clear bug to me and it needs to be fixed unconditionally.

ser 2.1 does it: if a provisional response arrives on a canceled branch
it will immediately result in a CANCEL being sent on that branch (or in
case a CANCEL was already sent on the branch in a forced CANCEL
retransmission).


> 
> 2. The questionable issue is whether or not UAC should keep 
> retransmitting INVITE when waiting for (1) to happen. This could be made 
> an option.

Yes, I agree.

> 
> 3. Another questionable issue is whether or not UAS should be sending 
> 487 immediately in the presence of such non-pending branches, it could 
> also be made an option.

I think it's linked to (2). If ones send a 487 immediately then you
could have 200 after 487 in the same branch (as you pointed out).
IMHO it makes little sense to have only (2) or only (3), it should be
(2)+(3) or nothing.

I'm also thinking of adding:
4. send CANCELs always (even if no provisional response received). It's
not rfc conformant, but I don't see any problems with it (IMHO it won't
break anything), I like it more then (2)+(3) and the code is already
there (it's missing only the module option).


There is also the questions on whether or not to do the same thing for:

a. cancels because of received 2xx 
b. cancels becuase of 6xx
c. cancels because of script t_reply()


I'm thinking of doing it only for (b). (a) and (c) already send a final
reply back and the only risk is to loose a provisional response, not
send a CANCEL and later receive a final response on that branch (if we
receive a provisional response we already automatically send a CANCEL
back). For (a) this won't be a problem, if it's a 2xx the UAC should be
prepared anyway to accept multiple 2xxs. (c) is rarely used with open
branches and the impact is too small to fix IMHO (the perfect fix for
(c) would include generating BYEs for possible 2xxs).


Andrei



More information about the Devel mailing list