Greetings,
Is there a place somewhere that expounds upon or otherwise illustrates the inner workings of call forking for multiple concurrent contacts? I can only find a patchwork of hints by perusing the documentation for the 'tm' and 'registrar' modules, and they don't provide me entirely with the understanding that I am looking for.
Failing that, I am seeking answers and clarifications to the following general questions. All of them concern handling of an INVITE in a situation where the registrar module allows multiple concurrent contacts, and multiple contacts are in fact registered, and the "append_branches" modparam to "registrar" is set to 1.
1. What is officially supposed to happen if multiple contacts to which the INVITE is relayed all answer the call instantaneously, whether via 183 early media or 200 OK -- but not OOB ringing events?
What I get seems a little strange, although perhaps there is an explanation that I am missing.
Both far-endpoints send a 200 OK w/SDP and I hear both media streams simultaneously when I call, interleaved by clicks. After a while, the second (most recent) contact's media stream drops off because the UAC decides that the call has timed out because it has not received ACK replies to its 200 OKs; at the same time, OpenSER appears to generate a CANCEL for that second call leg. The CANCEL is replied to with a 200 OK (not a 487 Request Terminated?) although this seems a little bizarre since the dialog's state is already established -- but since the 200 OKs are never replied to, I suppose it is not necessary to receive a BYE in order to terminate that request. Meanwhile, 200 OKs in response to INVITE keep coming from the contact that was slower to pick up and was CANCEL'd (why? this is Asterisk 1.4), but the ACKs from them keep being routed to the first contact (the one that remains), which must be understandably confused as to why they're there although processed as retransmissions.
I have a packet capture I'd love to send someone for interpretation, but would rather do it privately off-list.
2. Is there a shared REPLY-ROUTE for replies on each branch? Or is there a way to fragment particular reply routes for each branch?
The goal I am ultimately looking to achieve is surely commonplace enough; I would like multiple concurrent registrants to be able to be rung at one request URI. But at present, it seems some very strange things are happening when this occurs that confuse the far-end UACs.
That's what governs my intuition that I just need to properly understand how branching is supposed to work.
Thanks,
Alex Balashov wrote:
Both far-endpoints send a 200 OK w/SDP and I hear both media streams simultaneously when I call, interleaved by clicks. After a while, the second (most recent) contact's media stream drops off because the UAC decides that the call has timed out because it has not received ACK replies to its 200 OKs; at the same time, OpenSER appears to generate a CANCEL for that second call leg. The CANCEL is replied to with a 200 OK (not a 487 Request Terminated?) although this seems a little bizarre since the dialog's state is already established -- but since the 200 OKs are never replied to, I suppose it is not necessary to receive a BYE in order to terminate that request. Meanwhile, 200 OKs in response to INVITE keep coming from the contact that was slower to pick up and was CANCEL'd (why? this is Asterisk 1.4), but the ACKs from them keep being routed to the first contact (the one that remains), which must be understandably confused as to why they're there although processed as retransmissions.
OK, that was rather incoherent. Let me take a step back and summarise from what I see in my Kamailio debugging output:
1. SBC sends call to Kamailio proxy.
2. Proxy does registrar dip and resolves two contacts - A and B.
3. Proxy bifurcates the call into two branches 'branch A' (to A) and 'branch B' (to B). Rewrites RURI, relays INVITE.
4. A answers with 200 OK.
5. B answers with 200 OK.
6. Proxy passes back 200 OK to SBC for A. Then for B.
7. SBC issues in-dialog end-to-end ACK for that 200 OK; proxy decides to forward it only to A as per the ONREPLY-ROUTE. No replies are forwarded to B. It is here that I think things go wrong.
8. B keeps sending 200 OKs and getting no ACKs for them, and eventually gives up and kills the session.
So, it looks like not all replies are being statefully relayed to both branches.
Additionally, it looks like the following is happening:
- At step #6 above, the 200 OK passed to the SBC is for A only.
- The proxy elects to CANCEL the other branch to B between #6 and #7.
- After sending the CANCEL, the proxy decides to pass back the original 200 OK for the INVITE (with SDP) for B back to the SBC as well.
- After that, B replies with a 200 OK for the CANCEL issued by the proxy. Why does it reply with a 200 OK? Simply because it is after the INVITE was already OK'd? Is that per the RFC? I thought a call leg could not be CANCEL'd at this stage at all and requires a BYE?
- SBC ACKs the 200 OK (for INVITE) from A, and proxy relays to A.
- Meanwhile, B keeps sending 200 OKs for the INVITE (AFTER a CANCEL on that branch!) and the proxy keeps relaying them back to the SBC, which replies with ACKs. But these ACKs keep getting forwarded back to A, not B, presumably because from the proxy's POV the B leg is now CANCEL'd and OK'd (in the penultimate step).
So, I am not really sure what to make of this... I would appreciate any help!
Alex Balashov writes:
- Proxy bifurcates the call into two branches 'branch A' (to A) and
'branch B' (to B). Rewrites RURI, relays INVITE.
A answers with 200 OK.
B answers with 200 OK.
Proxy passes back 200 OK to SBC for A. Then for B.
SBC issues in-dialog end-to-end ACK for that 200 OK; proxy decides
to forward it only to A as per the ONREPLY-ROUTE. No replies are forwarded to B. It is here that I think things go wrong.
- B keeps sending 200 OKs and getting no ACKs for them, and eventually
gives up and kills the session.
So, it looks like not all replies are being statefully relayed to both branches.
Additionally, it looks like the following is happening:
- At step #6 above, the 200 OK passed to the SBC is for A only.
canceling of b branch should happen already after step 4, but perhaps 4 and 5 take place almost simultaneously and there is some race condition related bug in tm module.
-- juha
Juha Heinanen wrote:
Alex Balashov writes:
- Proxy bifurcates the call into two branches 'branch A' (to A) and
'branch B' (to B). Rewrites RURI, relays INVITE.
A answers with 200 OK.
B answers with 200 OK.
Proxy passes back 200 OK to SBC for A. Then for B.
SBC issues in-dialog end-to-end ACK for that 200 OK; proxy decides
to forward it only to A as per the ONREPLY-ROUTE. No replies are forwarded to B. It is here that I think things go wrong.
- B keeps sending 200 OKs and getting no ACKs for them, and eventually
gives up and kills the session.
So, it looks like not all replies are being statefully relayed to both branches.
Additionally, it looks like the following is happening:
- At step #6 above, the 200 OK passed to the SBC is for A only.
canceling of b branch should happen already after step 4, but perhaps 4 and 5 take place almost simultaneously and there is some race condition related bug in tm module.
I think it's just the order of events. According to my packet capture:
- Packet 9, time index 7.953711: 200 OK arrives from A. - Packet 10, time index 7.954636: 200 OK arrives from B. - Packet 11, time index 7.969227: Proxy passes 200 OK from A back to SBC. - Packet 12, time index 7.969268: Proxy originates CANCEL for branch B. - Packet 13, time index 7.970279: Proxy passes 200 OK from B back to SBC. - Packet 14, time index 7.971508: 200 OK for CANCEL request arrives from B. [1] - Packet 15, time index 8.018730: SBC originates ACK for branch to A. - Packet 16, time index 8.018895: Proxy passes ACK for branch A to A. - Packet 17, time index 8.153957: B retransmits 200 OK for INVITE. - Packet 18, time index 8.155309: Proxy forwards 200 OK from B to SBC. - Packet 19, time index 8.155853: SBC sends ACK again to A's contact. This is really strange because the Contact address in Packet 18 is for B.
Then, the sequence 17-19 repeats itself. B keeps sending 200 OKs, SBC keeps ACKing them back to A, and nothing is actually CANCEL'd.
I'm stumped. Is this the SBC choosing to handle branching that way, or Kamailio?
-- Alex
[1] Again, I would ask, why would a CANCEL be replied to with a 200 OK (by bleeding-edge 1.4 release of Asterisk) rather than a 487 Request Terminated? Is this what the RFC prescribes if the session is already set up (i.e. after the INVITE has been 200 OK'd in packet 10)? Wouldn't a CANCEL just be invalid at that point - in favour of a BYE?
Alex Balashov wrote:
Juha Heinanen wrote:
Alex Balashov writes:
- Proxy bifurcates the call into two branches 'branch A' (to A) and
'branch B' (to B). Rewrites RURI, relays INVITE.
A answers with 200 OK.
B answers with 200 OK.
Proxy passes back 200 OK to SBC for A. Then for B.
SBC issues in-dialog end-to-end ACK for that 200 OK; proxy decides
to forward it only to A as per the ONREPLY-ROUTE. No replies are forwarded to B. It is here that I think things go wrong.
- B keeps sending 200 OKs and getting no ACKs for them, and eventually
gives up and kills the session.
So, it looks like not all replies are being statefully relayed to both branches.
Additionally, it looks like the following is happening:
- At step #6 above, the 200 OK passed to the SBC is for A only.
canceling of b branch should happen already after step 4, but perhaps 4 and 5 take place almost simultaneously and there is some race condition related bug in tm module.
I think it's just the order of events. According to my packet capture:
- Packet 9, time index 7.953711: 200 OK arrives from A.
- Packet 10, time index 7.954636: 200 OK arrives from B.
- Packet 11, time index 7.969227: Proxy passes 200 OK from A back to SBC.
- Packet 12, time index 7.969268: Proxy originates CANCEL for branch B.
- Packet 13, time index 7.970279: Proxy passes 200 OK from B back to SBC.
- Packet 14, time index 7.971508: 200 OK for CANCEL request arrives from
B. [1]
- Packet 15, time index 8.018730: SBC originates ACK for branch to A.
- Packet 16, time index 8.018895: Proxy passes ACK for branch A to A.
- Packet 17, time index 8.153957: B retransmits 200 OK for INVITE.
- Packet 18, time index 8.155309: Proxy forwards 200 OK from B to SBC.
- Packet 19, time index 8.155853: SBC sends ACK again to A's contact.
This is really strange because the Contact address in Packet 18 is for B.
Then, the sequence 17-19 repeats itself. B keeps sending 200 OKs, SBC keeps ACKing them back to A, and nothing is actually CANCEL'd.
I'm stumped. Is this the SBC choosing to handle branching that way, or Kamailio?
-- Alex
[1] Again, I would ask, why would a CANCEL be replied to with a 200 OK (by bleeding-edge 1.4 release of Asterisk) rather than a 487 Request Terminated? Is this what the RFC prescribes if the session is already set up (i.e. after the INVITE has been 200 OK'd in packet 10)? Wouldn't a CANCEL just be invalid at that point - in favour of a BYE?
I should add that this sequence appears to play out in the exact same order every single time.
Alex Balashov writes:
canceling of b branch should happen already after step 4, but perhaps 4 and 5 take place almost simultaneously and there is some race condition related bug in tm module.
I think it's just the order of events. According to my packet capture:
- Packet 9, time index 7.953711: 200 OK arrives from A.
- Packet 10, time index 7.954636: 200 OK arrives from B.
- Packet 11, time index 7.969227: Proxy passes 200 OK from A back to SBC.
- Packet 12, time index 7.969268: Proxy originates CANCEL for branch B.
- Packet 13, time index 7.970279: Proxy passes 200 OK from B back to
- SBC.
as you see, 9 and 10 arrive to proxy very close to each other, which may result in a race condition bug causing proxy to send packet 13, which it should not do.
i suggest you file a bug report on this.
-- juha
Juha Heinanen wrote:
Alex Balashov writes:
canceling of b branch should happen already after step 4, but perhaps 4 and 5 take place almost simultaneously and there is some race condition related bug in tm module.
I think it's just the order of events. According to my packet capture:
- Packet 9, time index 7.953711: 200 OK arrives from A.
- Packet 10, time index 7.954636: 200 OK arrives from B.
- Packet 11, time index 7.969227: Proxy passes 200 OK from A back to SBC.
- Packet 12, time index 7.969268: Proxy originates CANCEL for branch B.
- Packet 13, time index 7.970279: Proxy passes 200 OK from B back to
- SBC.
as you see, 9 and 10 arrive to proxy very close to each other, which may result in a race condition bug causing proxy to send packet 13, which it should not do.
i suggest you file a bug report on this.
OK, at your suggestion, I will.
Do you think that a different thread is generating the CANCEL than the one that is passing back to the 200 OK at packet 13 to the SBC?