Lecture 9: The Domain Name System (DNS)

IP Addresses Revisited

In lecture 2 we first introduced the concept of the IP Address of a computer. We stated that it's simply another way of identifying a specific machine -- we could equally use the machine's name to identify it. These two different ways of referring to a system come from two semi-conflicting requirements:

The hierarchical domain-based naming system used in the Internet to identify machines is designed for human usage -- in general, we find this system incredibly convenient to use, having a high correlation with how the Real World^™ is organised.We are very comfortable with the use of names like amazon.com and ironbark.bendigo.latrobe.edu.au.
Actual packet delivery in the Internet is based on a separate, fixed-length (four byte) numeric IP Address. For example, the IP address of ironbark is 149.144.21.60. IP addresses are used by the the routers which implement the Internet's delivery service, the Internet Protocol, or IP, to route packets of data through the Internet to their destination.

The Internet Domain Name System (DNS) is used to provide a mapping between these two alternative identification approaches: the human-oriented domain name and the delivery-oriented IP address. Its most common usage is to look up the IP address corresponding to a known domain name.

The DNS Hierarchy

The DNS Namespace is based on a "tree" structure, with a small(ish) number of generic Top Level Domains (eg, .com, .edu, .org) and a large number of country-based domains (eg .au, .my, .uk). Each TLD supports a group of "second-level" domains, and so on, all the way down to individual hosts.

A domain name is a dotted sequence describing a path through the name hierarchy from the root, maybe with a trailing dot, thus: bindi.bendigo.latrobe.edu.au.
An individual name component must be less than 63 characters, must begin with a letter, etc...
Upper and lowercase may be used, although name lookups are case insensitive by definition.

Resource Records

Each domain name has one or more resource records associated with it. Resource records are 5-tuples:

Domain_name  TTL  Type  Class  Value

Domain_name

the name of the domain to which this RR applies.

TTL

the Time To Live of this RR. When this RR is returned as a result of a DNS lookup, the remote host normally caches the information for efficiency. The TTL is the time, in seconds, after which the cached information should be regarded as potentially out of date.

Type

there are several types of RR, including:

SOA: Start Of Authority.
A: IP address of a host.
NS: Name Server, etc

Class

Always set to "IN", for Internet

Value

The actual value of this particular RR. Can be, for example, an IP address, a number, some ASCII text or a combination.

DNS Servers and Resolvers

A nameserver provides domain-name-to-IP-address mappings (and a few other functions, but "looking up" IP addresses is the most common) for one or more zones, which are sub-trees of the domain name space. For example, sheoak is a nameserver for the zone bendigo.latrobe.edu.au. This means that if I want to look up a particular IP address in that zone, I can ask sheoak.

Exactly which server is responsible for a particular zone is specified in start of authority (SOA) RRs. An SOA RR specifies, for the particular name server, the zones for which it has authority. It also has the email address of the site administrator, a unique serial number and various other bits and pieces.

The DNS system forms a distributed database of domain information.

A resolver is a library function^[1] which queries the nameserver when called from a user program. It can check the local cache of names and, if necessary, request a RR from a nameserver (caching the response). In other words, a resolver is software which asks a nameserver for information.

^[1] Such as is built-in to the Unix library function gethostbyname(3).

Nameserver Queries

The resolver sends a question to a name server, of the form:

{query domain name, type, class}

The server responds with one or more appropriate RRs. It also sends an ADDITIONAL INFORMATION section, which contains extra RRs which the resolver will probably find useful. For example, if a resolver queries for a particular NS RR, the server will return it, plus additional information giving the IP address of the name server specified in the main body of the reply.

The most common DNS query is of type A, where the resolver is required to map a domain name to an IP address - that is, "looking up" an IP address. Some typical type A RRs look like:

ironbark  86400  IN  A  149.144.21.60
redgum    86400  IN  A  149.144.21.3
bindi     86400  IN  A  149.144.20.82

Note that the "domain name" part of these RRs has been omitted (leaving only the hostnames) for clarity.

Recursive Queries

A host within a specified domain (eg a machine at Bendigo, in domain bendigo.latrobe.edu.au) is configured to "know" the IP address of its local nameserver. What happens when it sends a query for a non-local name, (eg amazon.com)? The sequence of events is something like:

The local nameserver will forward the query to its "parent" nameserver -- in our case, the nameserver for domain latrobe.edu.au.
This nameserver, in turn, (usually) forwards the query recursively up the "tree", to where a root nameserver will pass it to a nameserver for the .com domain, which will have the desired name-to-address mapping.
The result of the query is then passed back through the chain of nameservers (each of whom will normally cache the information), finally arriving at the originating host. This process is called a recursive query.

It's obvious that recursive queries could be quite slow. The DNS provides a way of "short-circuiting" the whole process. If (for example) the local nameserver already knows (due to caching) the IP address of a nameserver for the .com domain in the above example, it can contact it directly, thus avoiding many recursive stages. This is called an iterative query. In practice, most (all?) queries to root nameservers are iterative. Every nameserver is configured to know the IP address of at least one root nameserver.

Some DNS Subtleties

Mail eXchange

the DNS provides the MX type of RR to discover where email is to be delivered. An MX RR specifies a primary mailhost, and lesser preferential hosts where mail for a specified domain is be delivered. For For example, ironbark has:

ironbark  IN  MX  10  ironbark
          IN  MX  20  redgum
          IN  MX  40  sheoak

Reverse lookups

a special domain (in-addr.arpa) and address format is used to map IP addresses to domain names, thus:

60.21.144.149.in-addr.arpa

This is called a PTR RR. Performing reverse lookups is much more difficult than normal "forward" address lookups.

CNAME

Often a host may be known by several names: names other than the official host name are called aliases, and a CNAME RR maps an alias name to a host's "real" name.

HINFO

describes some basic information about the type of CPU and the OS it is running. Rarely kept up-to-date.

DNS Implementation Technicalities

The DNS works as a distributed database because of two fundamental ideas: replication and caching. We have already seen how caching works -- at any point in a query, if a nameserver has a current copy of the desired information, it can supply it instead of contacting other nameservers.

The DNS requires that all nameservers be replicated at least once -- that is, for each zone of authority there must be at least two authoritative nameservers. The rules for replication of nameservers make for quite entertaining reading...

DNS queries and responses are an excellent example of an application where the reliable, connection-oriented transport mechanism of TCP is not necessary, and simply has too much overhead. In fact, queries are encapsulated in unreliable UDP datagrams, see later. UDP is a connectionless transport service, with the same level of reliability as IP packet delivery itself -- in other words, UDP messages can be lost, delivered out of order and even duplicated. If a resolver does not receive a reply from a nameserver, it usually either tries again, or tries the next nameserver for the same domain.

Finally, although it is beyond the scope of our subject, DNS messages are NOT simple ASCII strings -- the DNS formats are quite complex and designed for efficient parsing. It's not trivial (for obvious reasons) to write a DNS client. In a sense, DNS is not strictly an application protocol -- it provides support for application protocols, but isn't one itself.

Extra infomation

Here's a definitive guide to DNS.
This is a nice tutorial on DNS.
Here's a slide show on DNS.
Here's the bare bones of another lecture on DNS, with something of a Linux emphasis.
Finally, here's a good technical tutorial from Connect.com.au on how DNS works in the Real World... La Trobe Uni Logo