by Cynthia Taylor
My favorite thing about the Domain Name System (DNS), the system used to translate from Domain Names to IP addresses, is that it used to just be a guy named Jon.1 When the internet was young, if you had an IP address that you wanted a name connected to, you would contact Jon, and he would add your name-IP translation to a text file, and the updated text file would be shipped out to all twenty people on the internet. (This text file, hosts.txt, still exists in some form on most operating systems.2) Then, around 1983, there were suddenly more computers on the internet than Jon could reasonably deal with, and they threw together the Domain Name System, using all the assumptions of an internet where all of the users were reasonably nice graduate students, and it had not yet crossed anyone’s mind that maybe one day you could use this thing to steal credit card numbers. Later we figured out that maybe, just maybe, people might lie on the internet, but we never bothered to go back and fix it. Now it’s too big to reasonably change. This, dear readers, is the story of the internet.
Let’s say you want to go to a website. The first thing your computer has to do is translate from the domain name, the nice, human-readable and -rememberable name (e.g., http://www.tinykittens.com) to the IP address, the series of ugly numbers that lets your computer actually find and download a website (e.g., 126.96.36.199).
Your computer starts by asking the DNS resolver at your ISP what the IP address for http://www.tinykittens.com is. This resolver will figure out the IP address by asking a series of DNS name servers, servers whose job is to either tell you the IP address you need, or tell you the next name server that you should ask.
Your ISP’s resolver needs a place to start asking questions. So it goes to the nearest root server. Run by 13 different institutions around the world, the root servers are the starting point of every DNS query. Your ISP’s resolver asks the root server what the IP address is for http://www.tinykittens.com.
There is no way that a single root server could know the IP address for every website. Instead, it just knows the name servers for all of the top level domains. The top level domains are those familiar words that are after the last dot in a url—.com, .edu, .net, .edu, .biz, and so on. The root server tells the resolver “I don’t know about http://www.tinykittens.com, but here’s who you should ask all questions about .com.” (This answer is sent in the form of a domain name rather than an IP, potentially requiring its own DNS lookup!)
The resolver next asks the .com name server about http://www.tinykittens.com. And this name server responds “I don’t know about http://www.tinykittens.com, but here’s the name server for tinykittens.com.” (Note that the www actually matters, because of subdomains—tinykittens.com could contain not only http://www.tinykittens.com, but also images.tinykittens.com or mail.tinykittens.com.)
Finally the resolver asks the name server for tinykittens.com for the IP address for http://www.tinykittens.com, and it responds with the IP address. Then the resolver passes this IP along to your browser, and finally you download some photos of tiny, adorable kittens.
All of this work is being done by the resolver of whatever network you’re connected to, which means you are trusting the network of the random coffee shop you’re working at to not lie to you about the IP address for your bank. This is frequently exploited by places that offer public wifi (airports, universities, etc) to create captive portals, those websites that make you enter your email address or agree to terms of service before you can use the internet. The ISP redirects you to the portal by responding to every DNS query with the IP address of the portal until you log in.
Unfortunately, lying DNS servers can also be used to create a man-in-the-middle attack, in which all of the information you send a website, and all of its responses, are secretly intercepted by a third party. Because this third party is forwarding all of the website’s responses, to the user it looks just like a normal web session—except that all of your login information is being saved by the third party, who can then log in as you at their leisure. In fact, a massive malware campaign has recently been discovered that resets your router’s DNS settings to point them at bad guy controlled DNS servers, for just this reason.3 (Have you updated your router’s firmware lately? You should definitely do that.)
You may have noticed that when you when you try to log in to a site you frequently visit while you’re at the airport, or somewhere else with a captive portal, instead of taking you to the captive portal your browser shows you a page that says something like “AHHHHH DANGER SPIES BADNESS UNSAFE!!!” This is your browser trying to protect from the malicious use of a lying DNS server, rather than the relatively benevolent captive portal use. (Your browser saves some information about websites you visit, in order to detect lying DNS servers.) The problem is that most of us experience these warning pages all the time at places like airports, and almost none of the time when some miscreant is actively man-the-in-middling us. And so we become used to dismissing them with no ill effects. (Research has actually shown that the only way to stop users from clicking through these warnings is to make it relatively difficult to do. This has started something of a browser war in making it hard to figure out how to dismiss these warnings, with Chrome not allowing it at all for certain sites.)
DNS has a clear weakness, in that you trust the DNS resolver of whatever network you’re currently connected to not to lie to you. Evil-doers can use this vulnerability to steal your credit card information, and so we want to alert users when they’re vulnerable. However, at the same time, established institutions that you can trust—airports, universities—are using that same vulnerability to do something trivial (get you to agree to their terms of service). Your browser has no way to tell the difference between these two things, so it gives you the same dire warning regardless, and you become accustomed to clicking through the warning. By taking advantage of these vulnerabilities at all, these institutions are habituating you to these warnings, teaching you that DNS hijacking isn’t a big deal.
DNS fundamentally breaks a lot of ways that we think about the internet. We think of the internet as fundamentally decentralized and massively redundant—lots of different companies and institutions support various small pieces of the internet, and as long as all of those pieces speak the same language, everything is okay. Parts of internet infrastructure can be removed, and redundancy will allow everything to continue to work. But DNS requires centralization: we all need to agree on which domains translate to which IP addresses, and in order to support that, we need to agree on which nameservers we use to do lookups. Because of this centralization, the DNS root servers remain a weak point in internet infrastructure, and there have been multiple distributed denial of service attacks attempting to take them down. So far, none have succeeded.
We also think of internet architecture as being fairly evenly geographically distributed. This is important because of latency—the further away you are from a server, the longer it takes your messages to get there and back. Originally, the DNS root servers were highly concentrated geographically, with ten of the thirteen in the US, two in Europe, and one in Japan. Currently all but two of the servers use anycast4, a networking technique in which an IP address can be mapped to multiple servers and will be resolved to whichever server is physically closest to the computer sending requests to it. Anycast also provides redundancy, helping to defend the root servers against denial of service attacks.
The end-to-end principle of network design tells us that functionality should be in the end points of the network whenever possible, except when necessary for correctness or speed. When you visit a website, the two end points of the connection are your computer and the server hosting the website. Because DNS is centralized, there’s no way to offload its functionality into the end points—in fact, you need DNS to even figure out where one of the endpoints is. The end-to-end principle is important because the internet fundamentally depends on doing something amazing: creating reliable communication using a bunch of unreliable parts. When you visit a website, you communicate with it by sending data through lots of different machines, all owned by different organizations, running different hardware and software, with no guarantees at all as to their reliability or trustworthiness. The two endpoints are the only machines that we can have any knowledge about.
When I started writing this article, I planned to talk about the many existing security flaws in DNS. There are a lot of working DNS exploits in the wild, some of which have fixes, and some of which do not. However, in the process of writing it, I realized I didn’t really need to. Even without discussing bugs at all, even when just talking about DNS with the assumption that it is implemented correctly and working exactly as it should, DNS is still fundamentally untrustworthy, and breaks just about every networking principle.
So what should we do, given that it’s amazing that URLs ever resolve to the correct IP address? I think that we should acknowledge the fact that DNS is fundamentally untrustworthy, and once we get an IP address from it, we should make the server at that IP address prove to us that it actually is the domain we want. Fortunately, this is exactly what happens when we use TLS.
TLS, or Transport Layer Security (previously known as SSL, or Secure Sockets Layer), is a protocol that both encrypts communication between your computer and a web server, and has the web server cryptographically prove that it actually is at the domain you think it is. It’s used whenever you type https instead of http, and indicated by the little green lock symbol in the status bar of your browser. TLS is not without its own problems, but it gives us two things that we should demand in all of our communication over the web: encrypted communication and the guarantee that we are actually talking to whom we think we are. Furthermore, it gets us back in line with the end-to-end principle: while we still need DNS to find the end point we’re going to talk to, the endpoint itself is doing the work of proving itself trustworthy.
We, as users of the internet, can’t really do anything to fix DNS. It’s a system that we have no control over. But we can be aware of the fact that it’s insecure, and we can make sure that we guarantee security via other means, like TLS. Fortunately for us, more and more websites and browsers are enabling TLS by default, and plugins like HTTPS Everywhere5 help make it easy to use.
You should think of every message you send over the internet as being passed through the hands of thousands of strangers. The fact that our messages get through, that they actually reach the destination we want them to, that they’re not constantly read and altered by third parties, is basically a miracle made up of math and redundancy and trust.
Cynthia Taylor is a Clinical Assistant Professor at the University of Illinois, Chicago. She is mildly obsessed with the many, many ways in which the internet is broken.
Victoria Wang is a designer/developer and cofounder of Neocities.