Failing-over DNS Server Addresses
1. Problem Summary
The current implementation of NTP uses DNS to retrieve the IP addresses of the server name in the configuration file. However, it just copies the first IP address and deletes the rest. For servers in the pool
(pool.ntp.org) for example the number of addresses returned by DNS can be quite large but only the first is used. This can be a problem if the NTP server at that address is not available, has been removed or
gets removed after some period of time. The NTP reference implementation however never comes back to DNS to try to look up the address again nor does it have any longer the list of IP addresses previously received so it continues to retry the one IP Address.
2. Proposed Solution
To fix this problem we will implement the following:
- Save all returned IP addresses from the lookup in a structure associated with the server name so that the NTP server can try the next IP address in the returned list if the first one fails.
- If there are no more entries in the IP address list, run a new DNS query and get a fresh list.
- Allow the configuration to specify how many IP addresses to use simultaneously and avoid to need to specify the same server name multiple times which could potentially result in getting the same IP address list.
3. Design Issues
In order to implement this a number of issues need to be decided. Because of the architecture of NTP it is expect that packets will be lost during the usage of the server. Indeed, UDP is meant for such situations.
- For an IP address that has never responded to an NTP Packet, how many times should an NTP packet before it gives up and tries a different server?
- For a server that has been responding to NTP packets, after it stops responding, how many retries of NTP packets should be made before it stops trying and moves on to the next IP address in the list?
Since the TTL of the IP addresses returned by DNS is not easily available to the client we will not rely on it to make decisions on whether or not we can use the IP addresses.
In order to address these issues, there will be a number of configurable variables available to be set in the configuration file. The following is a partial list of parameters that will be available to be set:
There will be defaults for these in case the configuration files do not specify them. The default values have yet to be decided.
The code will perform a DNS lookup of the name and save a copy of all the IP addresses returned in an an association list with the name. The first address on the list will be used to try an form an association.
The peer association will need to keep track of both whether or not it's successfully received a packet and whether or not it's stopped receiving packets. It will also need to reset an address-specific not-received
count every time it receives a response from the server.
The code will track the number of consecutive times that the server does not respond. When the number of consecutive failures exceeds the limit the association is demobilised and the next unused address is fetched from the previously fetched list of addresses. If there are no more unused addresses in the list then it will perform another DNS lookup to fetch a new address list and start again.
Note that using IP addresses for the server effectively disables the mechanism since only that address can be used. Setting the value to 0 of the above variables will disable the failover mechanism for which the
variable is used.
- 05 Aug 2007
Steve Atkins writes:
There are some problems with the proposed solution
that would risk causing operational problems if
they were done.
Some DNS servers will randomize the order they return a
response in (in order to work around buggy applications that
only use the first response) but some don't.
If the response isn't randomised, then every NTP server accessing
a peer in the pool will preferentially connect to the first A record
returned, until that system falls over or degrades service sufficiently
that clients think it's failed. Then they'll try the second one. This
isn't good load balancing. (In practice quite a lot of dns servers
randomize the results, so you many not care. But why implement
a bad algorithm when it's as easy to implement a good one?)
A better algorithm is to use
(or one of it's more
modern friends) to get all the A records for the hostname. Try
connecting to those in random order, until you decide you have
a working peer. If none of them are live, start over.
Just as good, better in some respects, is to use
pick one response at random, try and connect. If the peer doesn't
seem live, start over. That requires less state, and is more slightly
more responsive to DNS changes.
The first approach extends to trying multiple connections in
parallel a little more easily (shuffle the results, attempt to connect
to the first in parallel, repeat).
Danny Mayer replies:
Most DNS Servers do not randomize. BIND which has something like 90% of the market uses a rotating
order by default (The behavior can be changed). Randomizing after receipt might not be a bad idea once
the code is working. Getting things working is the main problem.
gethostbyname() is not used anywhere in the code except in the one place where we need to emulate
getaddrinfo() for systems without the real function. We couldn't support IPv6 without it. More importantly we cannot wait on the resolver, that has to be done asynchronously and that's the biggest problem since that means that on Unix at least running a separate process and using the process to communicate with ntpd. That is done today, but the problem with it is the way that it then communicates with ntpd using private mode 7 packets to configure the server. That will have to change.
-- DannyMayer - 21 Dec 2007
Ronald F. Guilmette says:
In my (rarely humble) opinion, the "Problem Summary" given [above] specified fails to fully list all of the real issues here.
Most particularly, I think that the presentation of the "Problem" fails
to even consider one potentially important issue entirely.
I am speaking of load balancing.
Although I do not count myself as a serious or hard-core DNS expert,
I am nonetheless a diligent and careful user of DNS, and one who has
tried to understand even its less common modes of usage. And one thing
that I was taught about DNS (by people more knowledgeable than I) some
long time ago is that in cases where a single domain name is associated
with more than one `A' type record, that multiple address association is
often, usually, and customarily done with the intent being that clients
of the service(s) associated with the relevant host name should try,
whenever possible, to treat all of the associated IP address as being
essentially equal priority redundant servers providing the exact same
service, and ones that should, ideally, be utilized by the client(s) in
a round-robin fashion, so as to distribute the client load among multiple
this set of multiple redundant servers.
This method of distributing load is certainly used in conjunction with
web (http) service, and also, I believe, with mail (smtp) service. Why
should NTP services be any different?
In short, I think that any NTP client that does a DNS query for a given
host name (i.e. one designating an NTP server) should, ideally, if it gets
back multiple A records, cache all of those A records (along with their
respective TTLs) and should, at each and every point in time, treat all
of the associated A records that are still "live" (according to their
TTLs) as being part of a load-balancing round-robin pool of IP addresses
for the relevant server... a pool which the client should make an effort
to balance/distribute the load among.
P.S. The issue of how/when to declare a given NTP server IP address as
"non-functioning" or "non-responding" is almost completely separate from,
and orthogonal to the issue of load balancing that I've mentioned above,
except for the fact that the simplest way to handle a non-responding
IP address might be to simply deleted it from the current round-robin pool
for the relevant NTP server... at least until such time as the client does
the next DNS lookup on the server hostname. (It is a more complex question
of policy as to when, exactly, a fresh DNS query should be performed for
a given NTP server hostname. In my own view, the NTP client should perform
a fresh DNS query for the (server) hostname whenever the size of the current
round-robin pool of IP addresses associated with that server name drops to
zero, i.e. because all IP addresses in the pool have been eliminated, either
due to TTL expiry of the A records, or because the associated server(s)
has/have been deemed "non-responding" by the NTP client.)
Danny Mayer replies:
One of the goals of this is load balancing of servers listed in the DNS records. Steve Atkins suggestion of randomizing the list of resultant IP addresses is a good one once the service is working. One doesn't want to do this during development since you need to check your results.
While you may not be a DNS expert, I am, having been part of the BIND9 development team. BIND9 at least will by default rotate the order of the returned list of records each time it's asked so the first record is usually different. Note however that resolving DNS servers are intermediaries and you may go through several layers of DNS servers to get an answer and each has it's own idea of what the order is that gets returned.
http servers need A and AAAA records and follow the above rules. SMTP servers use MX records and the rules are different since the MX records contain priority orders of contacting the SMTP servers for the domain. Also the contacts with those servers are shortlived (relatively speaking). This is not true of NTP which wants a longterm association with the NTP server.
The goal of this design is to cache all of the A and AAAA records returned. However it cannot get the TTL values of these records using standard function calls like
getaddrinfo() since that information is not returned. The only way to get that information is to write our own DNS packets (using
libbind for example) and doing everything ourselves. That's just not realistic. In any case there is a more subtle problem: Even if we were able to get the TTL and store it (timestamp of receipt + TTL) because this is NTP and we are disciplining the clock a jump in the system clock, makes those TTL's invalid. I much prefer the solution I outlined in the Implementation section above.
-- DannyMayer - 21 Dec 2007