[Serusers] ENUM timeout to +33 - parallel resolvers, tuneable timeout
John Todd
jtodd at loligo.com
Wed Mar 16 21:03:43 CET 2005
At 5:56 PM +0100 on 3/16/05, Adrian Georgescu wrote:
>Hello Juha,
>
>The ENUM prefix for France +33 (3.3.e164.arpa) does not work, it is
>delegated but the servers do not answer and the resolver times-out
>in 12-20 seconds. If you prefer to do ENUM look-ups before going to
>normal PSTN this affect SIP call flows.
>
>Do you have any idea how to work around this problem? Besides fixing
>the DNS server side :)
>
>Regards,
>Adrian
[this post became somewhat non-specific to SER as I wrote it -
apologies. However, it is directly relevant to most of the community
so I will post it.]
One method would be to write custom resolver code for ENUM lookups
that has shorter timeouts for specific zones.
Another "feature" of such a program could be to do parallel
resolution in an even more highly modified resolver, when there are
several ENUM root zones that a particular host may care about. The
requirement for rapid timeout would also apply in these
multi-threaded lookups, since PDD is the most important thing to
worry about when completing calls (at least, currently delay is most
important, IMHO.) Perhaps being able to give each zone a preference
and timeout would be interesting, too... let's expand on this.
There should in theory only be one "root" for ENUM (e164.arpa.) or at
worst just one ENUM zone lookup per SIP routing engine, but in
practice it seems like there are more and more root zones springing
up for various administrative, technical, and political reasons and
it is becoming possible (probable?) that each server should look
through more than one zone during an ENUM query cycle. I'd love to
be able to use those zones all at once, but cascading through
multiple instances of setting the domain_suffix and then doing the
lookup is impractical - if there are failures or slowness, then
post-dial-delay becomes unacceptable.
So, there are two features that I'd love to see built into a generic
resolver: the ability to hand-tune failure intervals, and the ability
to paralellize lookups into multiple top-level zones for the same
query (and to weight the answers if we get multiple replies.)
This seems like it might be useful for the entire VoIP community, and
not just for SER users. Michael Haberler at nic.at had someone who
was interested in writing this code a while back, but recently
(yesterday) he said that the project didn't get wings. I've got $150
to donate towards anyone who comes up with something remotely
resembling a generic resolver that handles parallel queries in the
format I describe below:
Let's take a hypothetical config file for such a resolver daemon.
This resolver assumes that it will be handed a non-qualified lookup
like "2.1.2.1.5.5.5.2.1.2.1" without a top-level domain attached.
The hacked resolver will then scan through the multiple top-level
zones and try to get a match in a paralellized and possibly cascading
fashion.
# Which domains do we want to rip apart and handle in specific ways?
# We first look at the IP address of the device that is sending
# the query. If it matches one or more lines, then see if it
# matches the top-level zone that is being requested. If there
# is a match, and a "permit", then strip off the zone from the
# query, and hand off to the groups specified in the list of
# groups after the "permit" keyword, in the order the groups
# are listed.
#
# Wildcards can be used for domains or IP addresses.
#
# Rules are interpreted in order of entry, and the first match
# ends the lookup process.
#
# If no group(s) is specified, then hand off resolution to the
# default-forwarder resolver(s).
#
# host [ip address] [zone suffix] [permit,deny] [group, group, ...]
#
host 10.*.*.* e164.arpa permit 1 2
host 10.*.*.* * permit
host *.*.*.* * deny
#
#
#
# Any lookup that doesn't match a "host" line with a group list
# above gets pushed out to a list of normal DNS resolvers. The
# replies from those resolvers are simply forwarded back
# through this hacked resolver to the querying device. In
# this example, if 10.10.10.88 asked for the A record for
# "foo.com", then that query would be permitted and handed
# off to the default forwarders for resolution. (note: the
# default-fowarders aren't parallelized, though I suppose
# they could be, but that would perhaps create unnecessary
# DNS traffic for "non-critical" lookups.)
#
# default-forwarder [ip address] [port]
#
default-forwarder 192.148.33.13 53
default-forwarder 205.11.29.2 53
#
#
#
#
# Group 1
#
# Group 1 is for my internal e164 zones, which have their own
# resolvers and speed assumptions.
#
# forwarder [group] [weight] [ip address] [port]
#
forwarder 1 1 10.10.10.4 53
forwarder 1 1 10.10.22.9 53
#
# zone [group] [zone] [weight] [max ms wait]
#
zone 1 e164.mycompany.com 1 50
zone 1 e164.myothercompany.com 1 70
#
#
#
# Group 2
#
# If all the resolvers in group 1 don't come back with any valid
# answers after 70ms, then we move on to group 2, which is external
# zones and "outside" resolvers...
#
# forwarder [group] [weight] [ip address] [port]
#
forwarder 2 1 4.33.12.94 53
forwarder 2 1 12.39.113.5 53
#
# zone [group] [zone] [weight] [max ms wait]
#
zone 2 e164.arpa 2 1 100
zone 2 e164.info 2 2 170
zone 2 e164.org 2 3 200
#
# end
If we get a lookup from a host 10.10.10.44 for
2.1.2.1.5.5.5.2.1.2.1.e164.arpa, here's what happens:
The system determines that 10.10.10.44 is a permitted host.
Additionally, the zone "e164.arpa" is one of our trigger suffixes.
The system permits the lookup, and strips off "e164.arpa" from the
lookup, and then hands the lookup to group #1's rules, and then (if
no answer) to group #2's rules.
First, we look up the number in our "internal" DNS trees, which
we've set up in group #1. There are two zones (representing perhaps
subsidiary companies) that we resolve for locally, so we should look
up the numbers in those servers first. We have a very very short
resolution time (70ms maximum) since those servers are local and have
small zones. We have two resolvers for our internal trees, and we
send the query to each resolver. We prefer answers from either zone,
and whoever answers first gets the call (thus, the weights from each
zone are identical.) If the lookup is successful here, we stop and
don't proceed to group #2.
If there was no answer from our queries in group #1, we go to Group
#2. This is where we look up the external ENUM queries. We have a
few external zones in which we're going to look up the number. If we
get an answer from more than one of these zones, we indicate that we
prefer e164.arpa first, e164.info second, and e164.org third as far
as what answer we actually forward back to the entity that requested
the ENUM lookup. Even though our original query came in with
"e164.arpa." as the suffix, we stripped that off - we'll do
e164.arpa. lookups as part of the whole set of other possible roots -
it's not "special", nor does the system treat it any differently than
any other possible root.
In our example, we believe that e164.arpa will have an answer to us
in 100ms or less, and we will ignore any answers that come in after
that interval (though, usefully they may be cached in our upstream
DNS server even if we ignore the answer.) We think that answers from
e164.info will come to us in under 170ms, and e164.org in under 200ms.
The "forwarder" lines indicate which DNS forwarders we're going to
use for resolution (and their port numbers.) Queries would get split
to each forwarder with the same weight at the same time, thus giving
some redundancy if one forwarder should go down or become latent.
This increases the amount of DNS traffic to the end authoritative
resolvers. It would be rare that there might be forwarders with
different weights, but possibly that may be desired in certain
circumstances.
Comment: "Why don't you just have a universal timeout? Won't the
system always wait for the longest interval?" Answer: No. Using the
above example file, think about the case when e164.org answers in 40
milliseconds, but e164.arpa doesn't answer after 100 milliseconds.
We will use the e164.org answer, but we won't wait for the longest
possible time (200ms) which is in the config file, saving ourselves
100ms off the PDD.
This code would work with almost any ENUM compatible system, since
it will strip off the trailing suffix in a configurable way and then
force the lookup to be run through several other top level zones of
the administrator's choosing. Cisco, Snom, or any SBC could use this
without modification, as long as it supports ENUM. It would be
important not to make this system a forwarder for other resolvers;
that would probably get ugly.
Note that Asterisk has the ability to recursively query many zones
automatically, but I don't think it's a parallel lookup and there are
no timeouts other than standard DNS timeouts (which really makes ENUM
unusable in a public network, from my experience.) The same
functionality can be done in SER, but it involves multiple switching
of the domain_suffix parameters and then doing iterative enum_query
calls. This doesn't fix the case of very long lookup delays if there
are DNS failures or slowness. Both systems suffer from poor granular
control of lookup timeouts and alternate zone use, so some external
program I think is required that solves this problem for all
ENUM-capable systems.
Interesting links:
http://www.corpit.ru/mjt/udns.html
http://www.chiark.greenend.org.uk/~ian/adns/
http://daniel.haxx.se/projects/c-ares/
Last note: Yes, this is all a hack. ENUM is a hack. But a lack of
options marvelously clears the mind. This is one of those things
that has to be built before ENUM becomes actually useful in anything
other than the most tightly controlled private networks. A
BSD-licensed version of this hypothetical code would do wonders to
help VoIP interconnectivity lurch forward. (Better yet would be a
real routing protocol, but I'll conserve my wishes to that which
might actually happen.)
JT
More information about the sr-users
mailing list