At 5:56 PM +0100 on 3/16/05, Adrian Georgescu wrote:
Hello Juha,
The ENUM prefix for France +33 (3.3.e164.arpa) does not work, it is delegated but the servers do not answer and the resolver times-out in 12-20 seconds. If you prefer to do ENUM look-ups before going to normal PSTN this affect SIP call flows.
Do you have any idea how to work around this problem? Besides fixing the DNS server side :)
Regards, Adrian
[this post became somewhat non-specific to SER as I wrote it - apologies. However, it is directly relevant to most of the community so I will post it.]
One method would be to write custom resolver code for ENUM lookups that has shorter timeouts for specific zones.
Another "feature" of such a program could be to do parallel resolution in an even more highly modified resolver, when there are several ENUM root zones that a particular host may care about. The requirement for rapid timeout would also apply in these multi-threaded lookups, since PDD is the most important thing to worry about when completing calls (at least, currently delay is most important, IMHO.) Perhaps being able to give each zone a preference and timeout would be interesting, too... let's expand on this.
There should in theory only be one "root" for ENUM (e164.arpa.) or at worst just one ENUM zone lookup per SIP routing engine, but in practice it seems like there are more and more root zones springing up for various administrative, technical, and political reasons and it is becoming possible (probable?) that each server should look through more than one zone during an ENUM query cycle. I'd love to be able to use those zones all at once, but cascading through multiple instances of setting the domain_suffix and then doing the lookup is impractical - if there are failures or slowness, then post-dial-delay becomes unacceptable.
So, there are two features that I'd love to see built into a generic resolver: the ability to hand-tune failure intervals, and the ability to paralellize lookups into multiple top-level zones for the same query (and to weight the answers if we get multiple replies.)
This seems like it might be useful for the entire VoIP community, and not just for SER users. Michael Haberler at nic.at had someone who was interested in writing this code a while back, but recently (yesterday) he said that the project didn't get wings. I've got $150 to donate towards anyone who comes up with something remotely resembling a generic resolver that handles parallel queries in the format I describe below:
Let's take a hypothetical config file for such a resolver daemon. This resolver assumes that it will be handed a non-qualified lookup like "2.1.2.1.5.5.5.2.1.2.1" without a top-level domain attached. The hacked resolver will then scan through the multiple top-level zones and try to get a match in a paralellized and possibly cascading fashion.
# Which domains do we want to rip apart and handle in specific ways? # We first look at the IP address of the device that is sending # the query. If it matches one or more lines, then see if it # matches the top-level zone that is being requested. If there # is a match, and a "permit", then strip off the zone from the # query, and hand off to the groups specified in the list of # groups after the "permit" keyword, in the order the groups # are listed. # # Wildcards can be used for domains or IP addresses. # # Rules are interpreted in order of entry, and the first match # ends the lookup process. # # If no group(s) is specified, then hand off resolution to the # default-forwarder resolver(s). # # host [ip address] [zone suffix] [permit,deny] [group, group, ...] # host 10.*.*.* e164.arpa permit 1 2 host 10.*.*.* * permit host *.*.*.* * deny # # # # Any lookup that doesn't match a "host" line with a group list # above gets pushed out to a list of normal DNS resolvers. The # replies from those resolvers are simply forwarded back # through this hacked resolver to the querying device. In # this example, if 10.10.10.88 asked for the A record for # "foo.com", then that query would be permitted and handed # off to the default forwarders for resolution. (note: the # default-fowarders aren't parallelized, though I suppose # they could be, but that would perhaps create unnecessary # DNS traffic for "non-critical" lookups.) # # default-forwarder [ip address] [port] # default-forwarder 192.148.33.13 53 default-forwarder 205.11.29.2 53 # # # # # Group 1 # # Group 1 is for my internal e164 zones, which have their own # resolvers and speed assumptions. # # forwarder [group] [weight] [ip address] [port] # forwarder 1 1 10.10.10.4 53 forwarder 1 1 10.10.22.9 53 # # zone [group] [zone] [weight] [max ms wait] # zone 1 e164.mycompany.com 1 50 zone 1 e164.myothercompany.com 1 70 # # # # Group 2 # # If all the resolvers in group 1 don't come back with any valid # answers after 70ms, then we move on to group 2, which is external # zones and "outside" resolvers... # # forwarder [group] [weight] [ip address] [port] # forwarder 2 1 4.33.12.94 53 forwarder 2 1 12.39.113.5 53 # # zone [group] [zone] [weight] [max ms wait] # zone 2 e164.arpa 2 1 100 zone 2 e164.info 2 2 170 zone 2 e164.org 2 3 200 # # end
If we get a lookup from a host 10.10.10.44 for 2.1.2.1.5.5.5.2.1.2.1.e164.arpa, here's what happens:
The system determines that 10.10.10.44 is a permitted host. Additionally, the zone "e164.arpa" is one of our trigger suffixes. The system permits the lookup, and strips off "e164.arpa" from the lookup, and then hands the lookup to group #1's rules, and then (if no answer) to group #2's rules.
First, we look up the number in our "internal" DNS trees, which we've set up in group #1. There are two zones (representing perhaps subsidiary companies) that we resolve for locally, so we should look up the numbers in those servers first. We have a very very short resolution time (70ms maximum) since those servers are local and have small zones. We have two resolvers for our internal trees, and we send the query to each resolver. We prefer answers from either zone, and whoever answers first gets the call (thus, the weights from each zone are identical.) If the lookup is successful here, we stop and don't proceed to group #2.
If there was no answer from our queries in group #1, we go to Group #2. This is where we look up the external ENUM queries. We have a few external zones in which we're going to look up the number. If we get an answer from more than one of these zones, we indicate that we prefer e164.arpa first, e164.info second, and e164.org third as far as what answer we actually forward back to the entity that requested the ENUM lookup. Even though our original query came in with "e164.arpa." as the suffix, we stripped that off - we'll do e164.arpa. lookups as part of the whole set of other possible roots - it's not "special", nor does the system treat it any differently than any other possible root. In our example, we believe that e164.arpa will have an answer to us in 100ms or less, and we will ignore any answers that come in after that interval (though, usefully they may be cached in our upstream DNS server even if we ignore the answer.) We think that answers from e164.info will come to us in under 170ms, and e164.org in under 200ms.
The "forwarder" lines indicate which DNS forwarders we're going to use for resolution (and their port numbers.) Queries would get split to each forwarder with the same weight at the same time, thus giving some redundancy if one forwarder should go down or become latent. This increases the amount of DNS traffic to the end authoritative resolvers. It would be rare that there might be forwarders with different weights, but possibly that may be desired in certain circumstances.
Comment: "Why don't you just have a universal timeout? Won't the system always wait for the longest interval?" Answer: No. Using the above example file, think about the case when e164.org answers in 40 milliseconds, but e164.arpa doesn't answer after 100 milliseconds. We will use the e164.org answer, but we won't wait for the longest possible time (200ms) which is in the config file, saving ourselves 100ms off the PDD.
This code would work with almost any ENUM compatible system, since it will strip off the trailing suffix in a configurable way and then force the lookup to be run through several other top level zones of the administrator's choosing. Cisco, Snom, or any SBC could use this without modification, as long as it supports ENUM. It would be important not to make this system a forwarder for other resolvers; that would probably get ugly.
Note that Asterisk has the ability to recursively query many zones automatically, but I don't think it's a parallel lookup and there are no timeouts other than standard DNS timeouts (which really makes ENUM unusable in a public network, from my experience.) The same functionality can be done in SER, but it involves multiple switching of the domain_suffix parameters and then doing iterative enum_query calls. This doesn't fix the case of very long lookup delays if there are DNS failures or slowness. Both systems suffer from poor granular control of lookup timeouts and alternate zone use, so some external program I think is required that solves this problem for all ENUM-capable systems.
Interesting links: http://www.corpit.ru/mjt/udns.html http://www.chiark.greenend.org.uk/~ian/adns/ http://daniel.haxx.se/projects/c-ares/
Last note: Yes, this is all a hack. ENUM is a hack. But a lack of options marvelously clears the mind. This is one of those things that has to be built before ENUM becomes actually useful in anything other than the most tightly controlled private networks. A BSD-licensed version of this hypothetical code would do wonders to help VoIP interconnectivity lurch forward. (Better yet would be a real routing protocol, but I'll conserve my wishes to that which might actually happen.)
JT