r3 - 2007-10-06 - 01:07:49 - HarlanStennYou are here: NTP >  Dev Web > DevelopmentIssues > DNSServerFailover
NTP users are strongly urged to take immediate action to ensure that their NTP daemons are not susceptible to being used in distributed denial-of-service (DDoS) attacks. Please also take this opportunity to defeat denial-of-service attacks by implementing Ingress and Egress filtering through BCP38.

ntp-4.2.8p13 was released on 07 March 2019. It addresses 1 medium-severity security issue in ntpd, and provides 17 non-security bugfixes and 1 other improvements over 4.2.8p12.

Please see the NTP Security Notice for vulnerability and mitigation details.

Are you using Autokey in production? If so, please contact Harlan - he's got some questions for you.

Failing-over DNS Server Addresses

1. Problem Summary

The current implementation of NTP uses DNS to retrieve the IP addresses of the server name in the configuration file. However, it just copies the first IP address and deletes the rest. For servers in the pool (pool.ntp.org) for example the number of addresses returned by DNS can be quite large but only the first is used. This can be a problem if the NTP server at that address is not available, has been removed or gets removed after some period of time. The NTP reference implementation however never comes back to DNS to try to look up the address again nor does it have any longer the list of IP addresses previously received so it continues to retry the one IP Address.

2. Proposed Solution

To fix this problem we will implement the following:
  1. Save all returned IP addresses from the lookup in a structure associated with the server name so that the NTP server can try the next IP address in the returned list if the first one fails.
  2. If there are no more entries in the IP address list, run a new DNS query and get a fresh list.
  3. Allow the configuration to specify how many IP addresses to use simultaneously and avoid to need to specify the same server name multiple times which could potentially result in getting the same IP address list.

3. Design Issues

In order to implement this a number of issues need to be decided. Because of the architecture of NTP it is expect that packets will be lost during the usage of the server. Indeed, UDP is meant for such situations.
  1. For an IP address that has never responded to an NTP Packet, how many times should an NTP packet before it gives up and tries a different server?
  2. For a server that has been responding to NTP packets, after it stops responding, how many retries of NTP packets should be made before it stops trying and moves on to the next IP address in the list?

Since the TTL of the IP addresses returned by DNS is not easily available to the client we will not rely on it to make decisions on whether or not we can use the IP addresses.

4. Implementation

In order to address these issues, there will be a number of configurable variables available to be set in the configuration file. The following is a partial list of parameters that will be available to be set:
  1. InitialFailoverRetryLimit
  2. ServerFailureRetryLimit
  3. MaxAddresses

There will be defaults for these in case the configuration files do not specify them. The default values have yet to be decided.

The code will perform a DNS lookup of the name and save a copy of all the IP addresses returned in an an association list with the name. The first address on the list will be used to try an form an association.

The peer association will need to keep track of both whether or not it's successfully received a packet and whether or not it's stopped receiving packets. It will also need to reset an address-specific not-received count every time it receives a response from the server.

The code will track the number of consecutive times that the server does not respond. When the number of consecutive failures exceeds the limit the association is demobilised and the next unused address is fetched from the previously fetched list of addresses. If there are no more unused addresses in the list then it will perform another DNS lookup to fetch a new address list and start again.

Note that using IP addresses for the server effectively disables the mechanism since only that address can be used. Setting the value to 0 of the above variables will disable the failover mechanism for which the variable is used.

-- DannyMayer - 05 Aug 2007


Steve Atkins writes:

There are some problems with the proposed solution that would risk causing operational problems if they were done.

Some DNS servers will randomize the order they return a response in (in order to work around buggy applications that only use the first response) but some don't.

If the response isn't randomised, then every NTP server accessing a peer in the pool will preferentially connect to the first A record returned, until that system falls over or degrades service sufficiently that clients think it's failed. Then they'll try the second one. This isn't good load balancing. (In practice quite a lot of dns servers randomize the results, so you many not care. But why implement a bad algorithm when it's as easy to implement a good one?)

A better algorithm is to use gethostbyname() (or one of it's more modern friends) to get all the A records for the hostname. Try connecting to those in random order, until you decide you have a working peer. If none of them are live, start over.

Just as good, better in some respects, is to use gethostbyname(), pick one response at random, try and connect. If the peer doesn't seem live, start over. That requires less state, and is more slightly more responsive to DNS changes.

The first approach extends to trying multiple connections in parallel a little more easily (shuffle the results, attempt to connect to the first in parallel, repeat).

Ronald F. Guilmette says:

In my (rarely humble) opinion, the "Problem Summary" given [above] specified fails to fully list all of the real issues here.

Most particularly, I think that the presentation of the "Problem" fails to even consider one potentially important issue entirely.

I am speaking of load balancing.

Although I do not count myself as a serious or hard-core DNS expert, I am nonetheless a diligent and careful user of DNS, and one who has tried to understand even its less common modes of usage. And one thing that I was taught about DNS (by people more knowledgeable than I) some long time ago is that in cases where a single domain name is associated with more than one `A' type record, that multiple address association is often, usually, and customarily done with the intent being that clients of the service(s) associated with the relevant host name should try, whenever possible, to treat all of the associated IP address as being essentially equal priority redundant servers providing the exact same service, and ones that should, ideally, be utilized by the client(s) in a round-robin fashion, so as to distribute the client load among multiple this set of multiple redundant servers.

This method of distributing load is certainly used in conjunction with web (http) serrvice, and also, I believe, with mail (smtp) service. Why should NTP services be any different?

In short, I think that any NTP client that does a DNS query for a given host name (i.e. one designating an NTP server) should, ideally, if it gets back multiple A records, cache all of those A records (along with their respective TTLs) and should, at each and every point in time, treat all of the associated A records that are still "live" (according to their TTLs) as being part of a load-balancing round-robin pool of IP addresses for the relevant server... a pool which the client should make an effort to balance/distribute the load among.

Regards, rfg

P.S. The issue of how/when to declare a given NTP server IP address as "non-functioning" or "non-responding" is almost completely separate from, and orthogonal to the issue of load balancing that I've mentioned above, except for the fact that the simplest way to handle a non-responding IP address might be to simply deleted it from the current round-robin pool for the relevant NTP server... at least until such time as the client does the next DNS lookup on the server hostname. (It is a more complex question of policy as to when, exactly, a fresh DNS query should be performed for a given NTP server hostname. In my own view, the NTP client should perform a fresh DNS query for the (server) hostname whenever the size of the current round-robin pool of IP addresses associated with that server name drops to zero, i.e. because all IP addresses in the pool have been eliminated, either due to TTL expiry of the A records, or because the associated server(s) has/have been deemed "non-responding" by the NTP client.)

Edit | WYSIWYG | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r5 < r4 < r3 < r2 < r1 | More topic actions...
SSL security by CAcert
Get the CAcert Root Certificate
This site is powered by the TWiki collaboration platform
IPv6 Ready
Copyright & 1999-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors. Ideas, requests, problems regarding the site? Send feedback