[sr-dev] Inbound NAT detection and fix_nated_register() very poorly documented

Thu Nov 26 03:44:22 CET 2009

Greetings,

I was faced with the following problem in Kamailio 1.5.2:  I was using 
nathelper + rtpproxy + registrar and needed to relay media for calls 
going inbound to a NAT'd registrant through rtpproxy - only if the 
registrant is NAT'd.

Here is the scenario:

    Media gateway          Kamailio
    (on public IP)  ---->  registrar      -----> NAT'd registrant
                           (w/nathelper)

I was handling registrations from NAT'd endpoints like this:

     if(nat_uac_test("1")) {
           force_rport();
           fix_nated_contact();
     }

I was aware of fix_nated_register() but was not clear on its 
relationship to the 'received_param' parameter stored in 'location' and 
set as a modparam to both the 'registrar' and 'nathelper' modules, so I 
was not using it.

Very quickly I ran into an obvious problem:  if the contact stored in 
the 'contact' column in 'location' is already fixed up with the received 
IP:port, there is no way to know that the endpoint to which the call is 
going is behind NAT -- when lookup() is called, the RURI is set to the 
public IP:port.  There are no flags of any kind that one can set during 
save() that persist while the contact binding is present for that AOR 
and can be resurrected on lookup().

Anyway, I eventually figured out how to fix this problem by empirical 
means, with no clear help from the nathelper documentation.  It turns 
out that if I set the 'received_param' as a modparam to 'nathelper' and 
'registrar' and handle registrations from NAT'd endpoints with 
fix_nated_register() instead, it will magically work.

I made the following discoveries to arrive at this conclusion, both of 
which are not documented.  This is why it will work:

1) When the original (RFC1918) 'contact' is stored in the 'location' 
table (if using DB, which I am), the 'received' parameter is stored 
alongside it and contains the public IP:port.

When lookup() is called, the RURI domain is set to private address, e.g.

    if(!lookup("location")) {
          # Error handling here
          exit;
    }

    xlog("L_INFO", "[R-2:$ci] -> Registration resolved to RURI: $ru\n"):

$ru here will contain something like this: sip:s at 10.1.0.2:5060, the 
original and unmodified contact supplied by the UAC.

But somehow, magically, when t_relay() is called the request will be 
relayed to a different RURI - one with the 'received' IP:port 
substituted in the domain portion.  So, the request goes end up going to 
the correct place.

This is, of course, because of the way the 'received' parameter is 
supposed to work when appended to a Contact URI.  But the point is that 
it is not documented;  nowhere in the documentation for lookup() or 
t_relay() does it say that this will transpire, and I have no way of 
knowing it except by observation.

2) For some reason, nat_uac_test("1") returns a positive result after 
the lookup() and confirms that the destination is NAT'd.

This allows me to set a flag and then mangle the SDP in the reply to use 
rtpproxy for NAT traversal of media as well, e.g.

route[2] {

     ...

     if(!lookup("location")) {
           # Error handling here
           exit;
     }

     if(nat_uac_test("1"))
           setflag(9);

     ...

     t_on_reply("1");

     if(!t_relay())
           sl_reply_error();

     ...
}

...

onreply_route[1] {
     if(t_check_status("(180|183|200)")) {

         if(nat_uac_test("5"))
             fix_nated_contact();

         if(search("Content-Type: application/sdp")) {
              if(isflagset(9)) {
                  set_rtp_proxy_set("1");
                  force_rtp_proxy();
              }
          }

      }
}

But if you examine carefully the documentation for nat_uac_test(), it 
says the following for bit flag 1:

      * 1- Contact header field is searched for occurrence of RFC1918 
addresses.

To me this means that nat_uac_test() should not work after the lookup() 
above.  The only Contact header value that is present in the inbound 
INVITE handler is the Contact URI of the media gateway, which is a 
public IP address!  Yet for some reason it works, as if by magic; 
apparently it is somehow implicit that a hidden "received" attribute of 
the RURI is also part of this check.

This is also not documented, and is completely counterintuitive.

Is there any possibility of clearing up documentation as to this point? 
  Perhaps I have done something wrong here unknowingly;  I have no idea. 
  I know that my solution works but I cannot justify why in terms of the 
documentation.  Is it very much to ask that the documentation be 
explicit about hidden but critically important mysteries like this?

Thanks!

-- Alex

-- 
Alex Balashov - Principal
Evariste Systems
Web     : http://www.evaristesys.com/
Tel     : (+1) (678) 954-0670
Direct  : (+1) (678) 954-0671