Selecting Offsite NTP Servers
Upstream Time Server Selection Criteria
When looking for off-site NTP servers to be used, keep some things in mind:
Rules of Engagement
- Make sure to read, understand, and follow the access policy for each of the upstream time servers that you are going to use
- You absolutely, positively, MUST NOT use any upstream time servers that you do not have permission to use. They should be:
- Public time servers made available by your service provider or OS vendor to their customers
- And/or, other private time servers for which you have been given explicit permission and instructions on proper use
- And/or, the time servers designated by the pool.ntp.org project
- And/or, on one of the public lists of time servers (e.g., The NTP Public Services Project Time Server Lists)
Upstream Time Server Proximity
- Where possible, you should try to use upstream time servers that are relatively "close" to your network location
- This will help reduce variations in latency and jitter that may reduce your accuracy
- This will help reduce unnecessary network traffic
- This will help spread the load among the various timeservers, so that none of them receive too much traffic
- Keep in mind that servers which are located "close" to you in geographic terms may not be "close" in terms of the network topology, and vice-versa -- the shortest path between you and a server next door might be through networks that are half a world away
- Make sure to check your routing paths and periodically update your list of upstream time servers to suit
Time Source Diversity
- Where possible, you should try to use upstream time servers that take their synchronization information from a variety of sources and reference clocks
- If they all sync directly or indirectly from a single source (e.g., GPS) and that source goes down, you're toast
- Note that some sources may be sync'ed to other sources via methods outside the scope of NTP, so they are not necessarily as independant as you may think (e.g., CDMA is dependant on GPS)
Network Routing Diversity
- Where possible, use upstream servers that are routed via different networks
- If one network goes down, you still have the others
- There are many different methods of potential communication between clients and servers (unicast, broadcast, multicast, and manycast)
- Not all clients or servers support all methods
- Some methods place a heavier load on the server because they set up a one-to-one communications model with the client (e.g., unicast, manycast)
- Some methods require protocol support at the network layer, and not all vendors of network equipment ship their machines to handle that kind of traffic by default (e.g., multicast, manycast)
- Some methods will not work beyond the subnet on which the server is located, requiring you to have time servers available on each separate subnet (e.g., broadcast)
- Make sure you understand the differences and select methods that are appropriate for your environment
- There is no security inherent to the base NTP protocol
- Based on UDP, it is trivially easy for attackers to spoof NTP
- Virtually all network security assumes that you have good time synchronization between your systems
- If you are concerned about security, make sure that you read and understand the various authentication methods and choose one appropriate to your environment
- Note that authentication in NTP is only server to client -- proving to the client that the server really is who it says it is and that the client can trust the time information that it is given by the server
- There is no client to server authentication in NTP -- if you wish to restrict which clients may send queries to your server, etc..., then you need to control those issues using other techniques, such as the "restrict" option in your configuration file, rules on your router/firewall as to who may send what packets where, etc....
Stratum in Time Servers
On the lists of known public time servers, some of the servers are indicated as Stratum 1 (i.e., they are directly connected to a refclock) or Stratum 2 (they directly sync to at least one upstream Stratum 1 time server).
If you are setting up your own network of internal time servers, and you are serving on the order of 100 or more internal clients, and you otherwise comply with all the appropriate policies, it may be appropriate for you to connect directly to Stratum 1 time servers. Otherwise, you should instead connect only to upstream time servers that are at least one level away from Stratum 1.
Upstream Time Server Quantity
Many people wonder how many upstream time servers they should list in their NTP configuration file. The mathematics are complex, and fully understood by very few people. However, we can boil them down to some simple rules-of-thumb:
- If you list just one, there can be no question which will be considered to be "right" or "wrong". But if that one goes down, you are toast.
- With two, it is impossible to tell which one is better, because you don't have any other references to compare them with.
- This is actually the worst possible configuration -- you'd be better off using just one upstream time server and letting the clocks run free if that upstream were to die or become unreachable.
- With three servers, you have the minimum number of time sources needed to allow
ntpd to dectect if one time source is a "falseticker". However ntpd will then be in the position of choosing from the two remaining sources.This configuration provides no redundancy.
- With at least four upstream servers, one (or more) can be a "falseticker", or just unreachable, and ntpd will have a sufficient number of sources to choose from.
According to Brian Utterback, the math officially goes like this:
While the general rule is for 2n+1 to protect against "n" falsetickers, this actually isn't true for the case where n=1. It actually takes 2 servers to produce a "candidate" time, which is really an interval. The winner is the shortest interval for which more than half (counting the two that define the interval) have an offset (+/- the dispersion) that lies on the interval and that contains the point of greatest overlap.
So, in the case of four servers, the truechimer with the largest offset defines one end of the interval, the truechimer with the smallest offset defines the other end, and the third truechimer overlaps these two, with a overlap count of at least two and possibly three. The falseticker's interval will overlap few if any of these intervals (or it wouldn't be a falseticker) and will be eliminated.
With only three servers, the interval defined by the two truechimers has no overlap with any other servers, but the interval defined by one of the truechimers and the falseticker overlaps the other truechimer, so this is the interval chosen, and thus the falseticker is still included.
Excessive Number of Upstream Time Servers
Note that four upstream time servers will protect you against only one falseticker. Using the 2n+1 algorithm, five upstreams will protect you against two falsetickers, seven will protect you against three falsetickers, etc....
Conventional wisdom is that using at least five upstream time servers would probably be a good idea, and you may want more. Note that
won't use more than ten upstream time servers, although it will continue to monitor as many as you configure.
You may have heard previous guidance which set a specific number and said that using any more than that would be considered "unfriendly" and "abusive". That's not really true. So long as your NTP client is correctly configured, once you're up and running you should not be contacting your upstream servers any more than once every 1024 seconds (or so), and this shouldn't be a problem for the time servers to support.
However, you do need to guarantee that your NTP client is configured correctly and is not doing inappropriate things.
In addition to your various upstream time servers, you may wish to configure your local hardware clock as an input time source for
. In the event that the network goes away completely, the server will be able to use the drift and frequency data that it has collected so far and will continue to run in "free" mode, and should pick up where it left off when the network connectivity is restored. However, doing this may require that your
binary be built with "refclock" support, and this is not guaranteed for all
binaries from all OS vendors or other binary package providers.
When configuring the local hardware clock as an input time source, you should arbitrarily "fudge" the stratum to be a very high number, such as fourteen (14). Note that a stratum of sixteen (16) is the maximum, and if your time server is running at this stratum then it is not synchronized to any upstream time server and will not provide time information to client queries. Using an artificially high stratum for your local clock will help make sure that the system will never choose this refclock as the selected "truechimer" unless there is no other option available.
Note that the local clock cannot be used in the falseticker/truechimer selection process, and does not count against the minimum number of upstream time servers that you need in order to protect yourself against a given number of falsetickers.
will resolve the name of a designated time server into an IP address when it starts, and will not attempt to re-resolve that information once it is running (unless/until you tell it to re-read/reload the configuration file).
This means that if you think you're getting higher reliability by having just one server name but multiple IP addresses (which might point to multiple machines), you should think again. When you're managing a very large group of NTP servers (e.g.,
) there are advantages to this, but for anything smaller, you're almost certainly not going to get the behaviour you're probably thinking about.
Also, don't try to give multiple machines the same IP address, or hide them behind a load-balancing device, either - that will really confuse
and make the situation far worse.
It's much better to have a specified set of servers, each with their own unique IP address, and configure the clients to connect to multiple servers in this set and then let the clients deal with the issues of what happens when one or more of their servers becomes unreachable or unreliable.
Here are some quick tips:
- Start off looking for servers that are specific to your country
- E.g., if you live in Belgium, start by looking for
- If there aren't any servers specific to your country, look for servers in nearby countries
- E.g., check
- Also, check
uk.pool.ntp.org (the UK is just across the English channel) and
lu.pool.ntp.org (Luxembourg is much smaller than Belgium, but you never know)
- Finally, check other countries that are nearby and tend to have good high-speed/low-latency connectivity with your country, but may not directly share a border (e.g.,
- If there aren't any servers (or enough servers) specific to your country or in a nearby country, start looking for servers in your world region
- E.g., look for servers in
- If all else fails, select the main
- It probably is not a good idea to use a particular group of
pool.ntp.org servers unless there are at least four or five servers in the group, unless you are using multiple separate groups of
- Otherwise, you're likely to be better off using a larger group of servers that is more geographically dispersed
- Note that DNS resolvers may or may not implement round-robin internally, so putting the same entry in your configuration file multiple times may or may not result in different IP addresses being selected
- Don't depend on this kind of behaviour -- list different sets of servers independently
pool.ntp.org maintainers have developed another method to help you deal with this problem
- For the larger
pool.ntp.org groups, they have created three separate subdomains
- For example, for
europe.pool.ntp.org, they have created
- You can now list each of these subdomains in your configuration file once, as opposed to listing the parent
europe.pool.ntp.org domain three times, and this work-around will help ensure that you get three separate and unique servers
- For smaller
pool.ntp.org domains, there may be some overlap between the
2.* subdomains, so there is a chance you might get the same server IP address back under different names. This is another reason why you should list multiple different
pool.ntp.org subdomains for your site, so as to reduce the potential damage that could be caused if some of them are not operating correctly.
Note that many of the
servers are running in the homes of private citizens, in their attics or basements, connected via DSL lines to the Internet, and they have been kind to share their equipment and resources for the benefit of the rest of the Internet. If you abuse these servers, do not be surprised to find that they are dropping your packets, feeding you bogus time values (in hopes of screwing up your clock so that you decide to go somewhere else), contacting your ISP to try to get your Internet access terminated, putting your IP address on various "black hole lists" around the world in hopes that they will be able to get all carriers to throw away all your packets, etc....
Generally speaking, once you're past "startup" mode, your system should not be sending time queries to the upstream time servers more often than about once every five to fifteen minutes or so. Some abusive clients will send packets once every second, and if this is done inside of an embedded device (such as a SoHo router/firewall) from a popular vendor, you can cause massive problems, the likes of which even large well-connected Universities would be hard-put to withstand. See https://www.cs.wisc.edu/~plonka/netgear-sntp/
for one example, and more recently at https://people.freebsd.org/~phk/dlink/
With regards to
, if the abuse of these servers gets to be too much, then many of the server providers are likely to decide that they don't want to be in the pool anymore, and the remaining pool members may get buried. The end result of this would be bad for everyone involved, so you had better make sure that your clients absolutely, positively
, DO NOT
- 03 Oct 2004
- access policy (See NTPAccessPolicy for a possible solution)
- the m*cast thing that may make this moot
- different scripts that help find them
- using different servers, over different physical networks
- 31 Jul 2003