After studying this section you should be able to do the following:
- Describe how the technologies of the Internet combine to answer these questions: What are you looking for? Where is it? And how do we get there?
- Interpret a URL, understand what hosts and domains are, describe how domain registration works, describe cybersquatting, and give examples of conditions that constitute a valid and invalid domain-related trademark dispute.
- Describe certain aspects of the Internet infrastructure that are fault-tolerant and support load balancing.
- Discuss the role of hosts, domains, IP addresses, and the DNS in making the Internet work.
The Internet is a network of networks—millions of them, actually. If the network at your university, your employer, or in your home has Internet access, it connects to an Internet service provider (ISP). Many (but not all) ISPs are big telecommunications companies like Verizon, Comcast, and AT&T. These providers connect to one another, exchanging traffic, and ensuring your messages can get to any other computer that’s online and willing to communicate with you.
The Internet has no center and no one owns it. That’s a good thing. The Internet was designed to be redundant and fault-tolerant—meaning that if one network, connecting wire, or server stops working, everything else should keep on running. Rising from military research and work at educational institutions dating as far back as the 1960s, the Internet really took off in the 1990s, when graphical Web browsing was invented, and much of the Internet’s operating infrastructure was transitioned to be supported by private firms rather than government grants.
Enough history—let’s see how it all works! If you want to communicate with another computer on the Internet then your computer needs to know the answer to three questions: What are you looking for? Where is it? And how do we get there? The computers and software that make up Internet infrastructure can help provide the answers. Let’s look at how it all comes together.
The URL: “What Are You Looking For?”
When you type an address into a Web browser (sometimes called a URL for uniform resource locator), you’re telling your browser what you’re looking for. Figure 12.2 “Anatomy of a Web Address” describes how to read a typical URL.
The http:// you see at the start of most Web addresses stands for hypertext transfer protocol. A protocol is a set of rules for communication—sort of like grammar and vocabulary in a language like English. The http protocol defines how Web browser and Web servers communicate and is designed to be independent from the computer’s hardware and operating system. It doesn’t matter if messages come from a PC, a Mac, a huge mainframe, or a pocket-sized smartphone; if a device speaks to another using a common protocol, then it will be heard and understood.
The Internet supports lots of different applications, and many of these applications use their own application transfer protocol to communicate with each other. The server that holds your e-mail uses something called SMTP, or simple mail transfer protocol, to exchange mail with other e-mail servers throughout the world. FTP, or file transfer protocol, is used for—you guessed it—file transfer. FTP is how most Web developers upload the Web pages, graphics, and other files for their Web sites. Even the Web uses different protocols. When you surf to an online bank or when you’re ready to enter your payment information at the Web site of an Internet retailer, the http at the beginning of your URL will probably change to https (the “s” is for secure). That means that communications between your browser and server will be encrypted for safe transmission. The beauty of the Internet infrastructure is that any savvy entrepreneur can create a new application that rides on top of the Internet.
Hosts and Domain Names
The next part of the URL in our diagram holds the host and domain name. Think of the domain name as the name of the network you’re trying to connect to, and think of the host as the computer you’re looking for on that network.
Many domains have lots of different hosts. For example, Yahoo!’s main Web site is served from the host named “www” (at the address http://www.yahoo.com), but Yahoo! also runs other hosts including those named “finance” (finance.yahoo.com), “sports” (sports.yahoo.com), and “games” (games.yahoo.com).
Host and Domain Names: A Bit More Complex Than That
While it’s useful to think of a host as a single computer, popular Web sites often have several computers that work together to share the load for incoming requests. Assigning several computers to a host name offers load balancing and fault tolerance, helping ensure that all visits to a popular site like http://www.google.com won’t overload a single computer, or that Google doesn’t go down if one computer fails.
It’s also possible for a single computer to have several host names. This might be the case if a firm were hosting several Web sites on a single piece of computing hardware.
Some domains are also further broken down into subdomains—many times to represent smaller networks or subgroups within a larger organization. For example, the address http://www.rhsmith.umd.edu is a University of Maryland address with a host “www” located in the subdomain “rhsmith” for the Robert H. Smith School of Business. International URLs might also include a second-level domain classification scheme. British URLs use this scheme, for example, with the BBC carrying the commercial (.co) designation—http://www.bbc.co.uk—and the University of Oxford carrying the academic (.ac) designation—http://www.ox.ac.uk. You can actually go 127 levels deep in assigning subdomains, but that wouldn’t make it easy on those who have to type in a URL that long.
Most Web sites are configured to load a default host, so you can often eliminate the host name if you want to go to the most popular host on a site (the default host is almost always named “www”). Another tip: most browsers will automatically add the “http://” for you, too.
Host and domain names are not case sensitive, so you can use a combination of upper and lower case letters and you’ll still get to your destination.
I Want My Own Domain
You can stake your domain name claim in cyberspace by going through a firm called a domain name registrar. You don’t really buy a domain name; you simply pay a registrar for the right to use that name, with the right renewable over time. While some registrars simply register domain names, others act as Web hosting services that are able to run your Web site on their Internet-connected servers for a fee.
Registrars throughout the world are accredited by ICANN (Internet Corporation for Assigning Names and Numbers), a nonprofit governance and standards-setting body. Each registrar may be granted the ability to register domain names in one or more of the Net’s generic top-level domains (gTLDs), such as “.com,” “.net,” or “.org.” There are dozens of registrars that can register “.com” domain names, the most popular gTLD.
Some generic top-level domain names, like “.com,” have no restrictions on use, while others limit registration. For example, “.edu” is restricted to U.S.-accredited, postsecondary institutions. ICANN has also announced plans to allow organizations to sponsor their own top-level domains (e.g., “.berlin,” or “.coke”).
There are also separate agencies that handle over 250 different two-character country code top-level domains, or ccTLDs (e.g., “.uk” for the United Kingdom and “.jp” for Japan). Servers or organizations generally don’t need to be housed within a country to use a country code as part of their domain names, leading to a number of creatively named Web sites. The URL-shortening site “bit.ly” uses Libya’s “.ly” top-level domain; many physicians are partial to Moldova’s code (“.md”); and the tiny Pacific island nation of Tuvulu might not have a single broadcast television station, but that doesn’t stop it from licensing its country code to firms that want a “.tv” domain name (Maney, 2004). Recent standards also allow domain names in languages that use non-Latin alphabets such as Arabic and Russian.
Domain name registration is handled on a first-come, first-served basis and all registrars share registration data to ensure that no two firms gain rights to the same name. Start-ups often sport wacky names, partly because so many domains with common words and phrases are already registered to others. While some domain names are held by legitimate businesses, others are registered by investors hoping to resell a name’s rights.
Trade in domain names can be lucrative. For example, the “Insure.com” domain was sold to QuinStreet for $16 million in fall 2009 (Bosker, 2010). But knowingly registering a domain name to profit from someone else’s firm name or trademark is known as cybersquatting and that’s illegal. The United States has passed the Anticybersquatting Consumer Protection Act (ACPA), and ICANN has the Domain Name Dispute Resolution Policy that can reach across boarders. Try to extort money by holding a domain name that’s identical to (or in some cases, even similar to) a well-known trademark holder and you could be stripped of your domain name and even fined.
Courts and dispute resolution authorities will sometimes allow a domain that uses the trademark of another organization if it is perceived to have legitimate, nonexploitive reasons for doing so. For example, the now defunct site Verizonreallysucks.com was registered as a protest against the networking giant and was considered fair use since owners didn’t try to extort money from the telecom giant (Streitfeld, 2000). However, the courts allowed the owner of the PETA trademark (the organization People for the Ethical Treatment of Animals) to claim the domain name peta.org from original registrant, who had been using that domain to host a site called “People Eating Tasty Animals” (McCullagh, 2001).
Trying to predict how authorities will rule can be difficult. The musician Sting’s name was thought to be too generic to deserve the rights to Sting.com, but Madonna was able to take back her domain name (for the record, Sting now owns Sting.com) (Knorad & Hansen, 2000). Apple executive Jonathan Ive was denied the right to reclaim domain names incorporating his own name, but that had been registered by another party and without his consent. The publicity-shy design guru wasn’t considered enough of a public figure to warrant protection (Morson, 2009). And sometimes disputing parties can come to an agreement outside of court or ICANN’s dispute resolution mechanisms. When Canadian teenager Michael Rowe registered a site for his part-time Web design business, a firm south of the border took notice of his domain name—Mikerowesoft.com. The two parties eventually settled in a deal that swapped the domain for an Xbox and a trip to the Microsoft Research Tech Fest (Kotadia, 2004).
Path Name and File Name
Look to the right of the top-level domain and you might see a slash followed by either a path name, a file name, or both. If a Web address has a path and file name, the path maps to a folder location where the file is stored on the server; the file is the name of the file you’re looking for.
Most Web pages end in “.html,” indicating they are in hypertext markup language. While http helps browsers and servers communicate, html is the language used to create and format (render) Web pages. A file, however, doesn’t need to be .html; Web servers can deliver just about any type of file: Acrobat documents (.pdf), PowerPoint documents (.ppt or .pptx), Word docs (.doc or .docx), JPEG graphic images (.jpg), and—as we’ll see in Chapter 13 “Information Security: Barbarians at the Gateway (and Just About Everywhere Else)”—even malware programs that attack your PC. At some Web addresses, the file displays content for every visitor, and at others (like amazon.com), a file will contain programs that run on the Web server to generate custom content just for you.
You don’t always type a path or file name as part of a Web address, but there’s always a file lurking behind the scenes. A Web address without a file name will load content from a default page. For example, when you visit “google.com,” Google automatically pulls up a page called “index.html,” a file that contains the Web page that displays the Google logo, the text entry field, the “Google Search” button, and so on. You might not see it, but it’s there.
Butterfingers, beware! Path and file names are case sensitive—amazon.com/books is considered to be different from amazon.com/BOOKS. Mistype your capital letters after the domain name and you might get a 404 error (the very unfriendly Web server error code that means the document was not found).