Hi guys,
I'm having an issue which I have narrowed down to is_first_hop(), I can
apply a workaround, but I don't know if I'm doing it correctly or if my
problem is caused by misconfiguration of anything.
Let's say we have the following flow:
Client (NAT) TLS -> Kamailio1 (only Public IP) UDP -> Kamailio2 (only
Public IP) UDP -> FreeSWITCH (only Public IP).
Regarding this setup, let's focus only on Kamailio1. And for the sake of a
clear example, I have done a little draw.io diagram:
From the above flow, let's stick to the *200 OK*
that Kam1 is going to
receive from Kam2. (marked in *blue* in the screenshot).
First 200 OK (replying to initial INVITE):
SIP/2.0 200 OK
Via: SIP/2.0/UDP KAM1_PUB_IP;rport=5060;branch=z9hG4bK1e2b.
e1307b519c9b5f0015343e13f35aeace.0;i=3
Via: SIP/2.0/TLS [2607:fb90:489b:6e13:85f8:596b:b86b:c831]:54744;
received=172.58.17.149;branch=z9hG4bK.jmXSnh-7x;rpo
=43842
Record-Route: <sip:KAM2_PUB_IP;lr=on;ftag=7vidaJ3Hw;did=36e.d441>
Record-Route: <sip:KAM1_PUB_IP;r2=on;lr=on;ftag=7vidaJ3Hw;did=36e.a792;
nat=yes>
Record-Route: <sip:KAM1_PUB_FQDN:443;transport=tls;r2=on;lr=on;
ftag=7vidaJ3Hw;did=36e.a792;nat=yes>
From: "Joel Test 1" <sip:8bd2a0aba14541789bb7269800646458@MY_DOMAIN
;tag=7vidaJ3Hw
To: "Joel Test 2"
<sip:e78f2617b0d345d3bdb7b6780ece903c@MY_DOMAIN>;tag=
vceg6N0m5ypHa
Call-ID: 3ezoQGF1kp
CSeq: 21 INVITE
Contact: <sip:e78f2617b0d345d3bdb7b6780ece903c@FS_PUB_IP:6061;transport=udp>
User-Agent: TP MEDIA 2.0
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, MESSAGE, INFO, UPDATE, REGISTER,
REFER, NOTIFY
Supported: timer, path, replaces
Allow-Events: talk, hold, conference, refer
Content-Type: application/sdp
Content-Disposition: session
Content-Length: 358
Remote-Party-ID: "e78f2617b0d345d3bdb7b6780ece903c" <sip:
e78f2617b0d345d3bdb7b6780ece903c@MY_DOMAIN>;party=calling;
privacy=off;screen=no
v=0
o=TP 1529576021 1529576022 IN IP4 FS_PUB_IP
s=TP
c=IN IP4 FS_PUB_IP
t=0 0
m=audio 25484 RTP/AVP 96 101
a=rtpmap:96 opus/48000/2
a=fmtp:96 useinbandfec=1; maxplaybackrate=8000; sprop-maxcapturerate=8000
a=rtpmap:101 telephone-event/48000
a=fmtp:101 0-16
a=silenceSupp:off - - - -
a=ptime:20
a=rtcp:25485 IN IP4 FS_PUB_IP
Second 200 OK (replying to in-dialog INVITE with updated SDP):
SIP/2.0 200 OK
Via: SIP/2.0/UDP KAM1_PUB_IP;rport=5060;branch=z9hG4bKed2b.
0006a6a159e800129a62b4415fdd64e6.0;i=7
Via: SIP/2.0/TLS 192.168.30.63:54752;received=A.B.C.D;branch=z9hG4bK.
iKO2iYIgK;rport=27819
From: "Joel Test 1" <sip:8bd2a0aba14541789bb7269800646458@MY_DOMAIN
;tag=7vidaJ3Hw
To: "Joel Test 2"
<sip:e78f2617b0d345d3bdb7b6780ece903c@MY_DOMAIN>;tag=
vceg6N0m5ypHa
Call-ID: 3ezoQGF1kp
CSeq: 22 INVITE
Contact: <sip:e78f2617b0d345d3bdb7b6780ece903c@FS_PUB_IP:6061;transport=udp>
User-Agent: TP MEDIA 2.0
Accept: application/sdp
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, MESSAGE, INFO, UPDATE, REGISTER,
REFER, NOTIFY
Supported: timer, path, replaces
Content-Type: application/sdp
Content-Disposition: session
Content-Length: 358
v=0
o=TP 1529576021 1529576022 IN IP4 FS_PUB_IP
s=TP
c=IN IP4 FS_PUB_IP
t=0 0
m=audio 25484 RTP/AVP 96 101
a=rtpmap:96 opus/48000/2
a=fmtp:96 useinbandfec=1; maxplaybackrate=8000; sprop-maxcapturerate=8000
a=rtpmap:101 telephone-event/48000
a=fmtp:101 0-16
a=silenceSupp:off - - - -
a=ptime:20
a=rtcp:25485 IN IP4 FS_PUB_IP
Now here comes the problem, I have the following in my kam1 config:
route[NATMANAGE] {
...
if (is_reply()) {
if (isbflagset(FLB_NATB)) {
if (is_first_hop()) {
fix_nated_contact();
}
}
}
...
}
So, on the first 200 OK, when it reaches that part of the config:
1- is_reply() -> OK
2- isbflagset(FLB_NATB) -> OK (because on the initial request NAT was
detected blablabla....)
3- is_first_hop() -> FAIL
fix_nated_contact() is NOT applied.
(This is the correct and the expected behavior).
Now, on the second 200 OK, again in that part of the config:
1- is_reply() -> OK
2- isbflagset(FLB_NATB) -> OK
3-is_first_hop() -> OK
fix_nated_contact() is applied, thus the contact will be changed, and the
client will send the ACK with incorrect information leading to another set
of issues...
modules/siputils.html#siputils.f.is_first_hop
*4.30. is_first_hop()*
*The function returns true if the proxy is first hop after the original
sender. For incoming SIP requests, it means there is only one Via header.
For incoming SIP replies, it means that top Record-Route URI is 'myself'
and source address is not matching it (to avoid detecting in case of local
loops). Note that it does not detect spirals, which can have the condition
for replies true also in the case of additional SIP reply receival.*
So going back to the examples:
first 200 OK:
1- "top Record-Route URI is 'myself' -> FAIL
So we we are NOT the first hop, we do nothing and forward the reply to the
client.
second 200 OK:
1- "top Record-Route URI is 'myself' -> No record-route headers are
present, so we enter the is_first_hop() condition and modify the contact
with fix_nated_contact().
Now to the real topic, I have a workaround as:
...
if (is_reply()) {
if (isbflagset(FLB_NATB)) {
if (is_first_hop()) {
if (!ds_is_from_list()) { # <-- Check to see if the reply is
coming from our internal servers
fix_nated_contact();
}
}
}
}
...
And that is working correctly, but I would like to understand the reasons.
I hope I have explained myself correctly, otherwise it's impossible to get
to the point of my questions:
1- Is it correct for is_first_hop() to detect the second 200 OK as a first
hop when it isn't? the behavior matches the documentation, so I don't know.
If we stick to the check of the headers etc etc, it's working as described,
if we stick to the concept of kam1 being actually being the first hop of
that 200OK, then the check would need another condition to exclude the 200
OK (or better, the replies) of an in-dialog INVITE.
2- Do you guys consider the workaround something reasonable? Under my
opinion I would like to not have to add that, but I also don't know.
3- Am I missing something super standard that avoids all of this? I'm
starting to go crazy trying and comparing different things to get to
understand this and I want to make sure it's not just a is_first_hop() bug?
Sorry for such a long email, but I think that describing the scenario and
flow was required.
Any input is more than welcome!
Thanks!
Joel.