Skip to main content

What happens when you click this link?

· 12 min read
Fazal
Software Development Engineer

Everything you need to know about what goes under the hood when you click a link or type a URL in the browser.

boom


You just clicked a link and BOOOM! This is what you are seeing and reading. How did this reach your device? Where did you get this from? Let's deep dive into everything——or most of the things to keep it simple——that happens when you click a link or type it in a browser. Let's start with the fundamental building block: The Internet.

Internet

The Internet is a vast network of computers interconnected worldwide. This global network comprises numerous routers and extensive fiber optic cables laid under oceans and underground. Your Internet Service Provider(ISP) is part of this network as well. You pay your ISP to become part of this network, called the internet, and connect with others who must also be part of this network. Connecting here basically means talking to other computers. For example, if you want to use Google Search to search for something, it requires you to connect to Google's computer. That's why, the internet can be summed up as a giant connection between computers which can be located anywhere in the world. One misconception generally people have is with messaging on chat applications like Facebook Messenger. When you are messaging your friend on Messenger, you are not connecting to your friend's computer; you and your friend are actually connecting to Facebook's computer, which acts as a postman.

internet

info

It is realistically possible to connect directly to your friend's computer without using Facebook Messenger, but it is out of the scope of this article.

The question now is, how do you know where, for example, Google's computer is? You simply go to the browser and type Google's URL, google.com, and the rest of the job is done by your computer. URL is commonly known as a link, a website, a web address, and it is important to know its structure and why we need it.

URL

URL stands for Uniform Resource Locator. It is a human-readable address of people and places, which are nothing but a computer in this global network, usually referred to as a host machine. It is used to locate resources. A resource can be anything like text, audio, video, image, etc. In the context of google.com, it is an HTML, CSS, and JavaScript code, which is basically text. Your browser then renders that beautiful Google home page by parsing that code. URL is structured into multiple parts.

url

Consider the URL https://www.google.com.

Networking Protocol

The first part, https:// is the network communication protocol. To communicate between two computers, you first need to agree on how to communicate, and communication protocols are used for that purpose. It tells the other computer that it should follow the HTTPS protocol for talking. There are multiple communication protocols but HTTPS is the most commonly used protocol for accessing the web, that's why you don't need to specify https:// while browsing.

protocol

Subdomain

The second part www, is the subdomain. The full form of www, the World Wide Web, has no meaning or alternatives; everything that is publicly exposed is a World Wide Web. Using "www" is a convention from the early days of the internet, but it is not technically necessary. Many websites now omit it and still function correctly. The subdomains are now actually used for directing to different applications provided by the same service. When Google was first started, it only had one application, Google Search. So www subdomain is used for Google's core and the first application, Google Search. Later, they started other services such as Gmail and Google Drive. If you want to use Google's other applications, such as Gmail and Google Drive, you will have to visit mail.google.com and drive.google.com respectively.

Second Level Domain

The third part, google is the second level domain. It is just a name you want people to identify your service, something they can easily remember and type if they want to use your service. They are like digital real estate and you need to buy them from the Domain Registrars, like GoDaddy, for a decent amount of fee. These services are authorized by the Internet Corporation for Assigned Names and Numbers(ICANN), which is a non-profit organization responsible for overseeing the entire global internet infrastructure.

Top Level Domain

The fourth part is com, called Top Level Domain(TLD). Technically, it matters what the TLD of a website is, but it is just a domain owner's choice to make their website more human memory friendly, and descriptive enough of their service. For example, a university chooses '.edu' as their TLD.

Domain

Subdomain, Second Level Domain, Top Level Domain constitutes a Domain. This is the most important part of URL as we will know in DNS Resolution.

Path and Query Parameters

The path and query parameters are used to request a specific content. For example, if you want to get Google Search results for 'cats', you can directly visit https://www.google.com/search?q=cats. The path prameter is used to specify a specific path of the application, and query is used for specific resource in that path. In the example, search is the path, and q is the query.

Domain Name System

As it is clear so far when you visit some valid website, you are connecting to a computer that is located somewhere in the world. There are probably millions of websites, and you don't need to know where they are located to connect. Your computer does that for you. But how?

Just like every place has an address to make it reachable, every computer, as long as they are part of the internet, will have an address, called Internet Protocol(IP) Address. These addresses are assigned by your Internet Service Provider. But these addresses are cryptic and hard to remember for a layman. That's why the Domain Name System(DNS) was created. DNS serves as a book that maps IP Addresses to domains so that you don't have to remember them. You don't remember all your friends and family contact numbers. You have them saved in your Contacts app and can call or text them by searching with their name. Similarly, you can just type the domain name in your browser and use any online services you want. Your computer will resolve the domain to their IP Address and help you reach wherever you want. But how does your computer know IP Addresses of all domains in the world? That's an interesting part, and it is called Domain Name System(DNS) Resolution.

tip
Your current IP address is:

Switch your network(WiFi/Mobile Data) and reload this page to see how your IP Address changes.

DNS Resolution

DNS Resolution is the most complicated and important part of the internet. Without this, the internet would not have scaled to the level it did. It is a system designed to quickly and efficiently find a domain's mapped IP Address. It is one of the reasons you have a smooth experience browsing the internet because, for various reasons, a modern network-based application is usually set up as multiple services that are served from different computers, leading to different IP Addresses. Imagine having to memorize multiple IP Addresses to just use a single application. DNS Resolution is a true savior here.

dns

To know the IP Address of any domain, you have to connect to their Authoritative Name Server, where their domain mapping is stored. But now the question how do you know where a domain's Authoritative Name Server is located? That's what DNS Resolution does.

DNS Resolution is basically a recursive operation of going through multiple components in a network. Every component checks if they have an IP Address of a domain. If they have it, they return to the previous component, who will cache it for future queries, until it reaches back to the browser.

Resolver

The initial stages of DNS Resolution are very simple. The browser, Operating System, WiFi Router, and Internet Service Provider, all maintain a cache of all previously queried domains. Every component caches the IP Address of a domain for a period defined by the TTL(Time To Live).

cache

If there is no cache found even at the ISP level, the actual DNS Resolution starts here. Your ISP will connect to Root Name Servers. The ISP here acts as the resolver, but your router can be resolver as well. The ISP itself is a giant router, with additional capabilities.

Root Name Servers

There are designated 13 Root Name Servers, named alphabetically 'A' to 'M', all around the world. These servers are owned by a mixed group of private entities and institutions, again authorized by ICANN. The number of Root Name Servers, 13, is just on paper. Technically there are thousands or more around the world, but they are all owned by 13 organizations. The way to reach these servers is a bit complex to explain for non-technical people. Basically, 13 static IP addresses are reserved for these domains of these 13 Root Name Servers. Yes, these servers also have their own domain. When you try to reach any of these domains or IP Addresses, you will be connected to any of the nearest physical servers.

To explain this in detail, at the risk of making it too complex, all the Root Name Servers globally 'claim' to have the same designated static IP Address, based on their service name(13 services, which are named alphabetically). This technique is called anycast. This is how the resolver will connect to the nearest physical Root Name Server.

rns

For all these technicalities, the only job of a Root Name Server is to have all the information about Top Level Domain(TLD) servers. Hence, it just returns the location of the TLD server based on the TLD(.com, .net).

Top Level Domain Servers

So here lies the technical difference in choosing the TLD we discussed earlier. After receiving the TLD server address, your resolver will then connect to it. Again these TLD servers are also managed by different organizations. For example, '.com', and '.net' are managed by Verisign. These TLD servers return the address of the Authoritative Name Server, where the DNS record of a domain is hosted ultimately.

Authoritative Name Server

When somebody buys a domain, they configure it with the IP Address of a computer that is required to connect to use their service. A DNS record is created with all the configuration details, like Time To Live(cache expiry), etc., and will be hosted on something called an Authoritative Name Server(think of it as just another computer) by the Domain Registrar. You can have your own Name Server, like Google does, because they can afford it, and gives them flexibility as well.

The DNS record is added to something called Zone File, which is a text file that contains mappings between domain names and IP addresses along with other resource records for a specific portion of the DNS namespace, known as a DNS zone.

Ultimately, they return the mapped IP Address of a domain, if it is found, else throws an error. All the components in the network, that is browser to ISP, cache it for future queries, reducing the repetitions of this mammoth task.

Now that we know the address of the computer we want to connect, how do we reach the destination? With the help of Routers.

Routers

Routers are the most integral part of the internet. Routers are gateways to the actual internet because your computer is not directly connected to a fiber optic cable that your ISP provides, the router does. You just have a wireless connection to your router. As a result, all the computers that are connected to the same router will have the same IP Address. When a group of computers are connected to the same router, they form a local network. To repeat my earlier statement, the internet is nothing but a giant network of computers that can be connected to each other using routers and fiber optic cables.

When your browser initiates a connection, it sends a network request from your computer to a destination computer. This request contains metadata like the source IP Address, that is your IP Address, and the destination IP Address, among many other things. The first point of contact on this internet is your router. When your router receives this request, it checks the destination IP Address, and whether this computer is in the same local network. If yes, it forwards the request to that computer. Otherwise, it just forwards it to your ISP, which is also a giant router.

This router again follows the same checks. But the catch here is the ISP's router is connected to probably thousands of other routers. They use a routing table, which is a data structure stored in a router or a networked computer that lists the routes to various network destinations. The primary function of a routing table is to guide the router in determining the best path for forwarding requests toward their destination. This router hopping continues until the request reaches a router that has the destined computer in its local area network, and ultimately gets delivered to it. The computer accepts the request, queries for the content requested, and sends back a response after switching the source and destination IP Address. The routing path back doesn't need to be the same as the arrival one.

router

Networking is a very complex and sensitive subject, and I have simplified it for the sake of simplicity. Before the act of request and response, the computers first need to implement the communication protocol——HTTPS for browsers——to establish a connection. The task of a connection is an art in itself. That will be a blog for another day.

Thank you for reading this till the end. Please consider sharing this if you think it will help other people. Connect with me on LinkedIn for more.

Disclaimer: Any public media used on this blog are the property of their respective owners and are used in accordance with the fair use provisions outlined in 17 U.S. Code § 107. This section allows for limited use of copyrighted material without permission for purposes such as commentary, criticism, news reporting, education, and research.