What happens when you type google.com in your browser?

Introduction

This question is a classic and still widely used interview question for many types of software engineering positions. It is used to assess a candidate’s general knowledge of how the web stack works on top of the internet. One important guideline to begin answering this question is that you should ask your interviewer whether they would like you to focus on one specific area of the workflow. For a front-end position, they may want you to talk at length about how the DOM is rendering. For an SRE position, they may want you to go into the load-balancing mechanism.

This question is also a good test of whether you understand DNS. Many software engineering candidates struggle with this concept, so if you do well on this question, you are already way ahead of the curve. If you take this question seriously, it may be something that will grab the attention of future employers. Let's dive in!

What happens when you first type the letter "g"?

When you press the key "g" in your browser's URL bar, the browser receives the event and the auto-complete functions kick in. Depending on your browser's algorithm and if you are in private/incognito mode or not various suggestions will be presented to you in the dropdown below the URL bar. Most of these algorithms sort and prioritize results based on your local search history in your cache, bookmarks, cookies, and popular searches from the internet as a whole. As you are typing "google.com" many blocks of code run and the suggestions will be refined with each keypress. For our case, let's assume that you have just installed your browser and have never used it. This means that you have no search history. It may even suggest "google.com" before you finish typing it.

You then hit "Enter"

You type "google.com" and hit "Enter". You have not included the protocol used, whether it's http or https. The browser has to figure out whether this is a URL or a search term. In our case, it's a valid URL.

The browser then has to figure out which protocol to use and which port to connect to. Is it http on port 80 or https on port 443? Remember we didn't specify this. All we typed was "google.com". If we had specified this, the browser would not have to go through this step.

Check HSTS list

Every browser is preloaded with HSTS. HSTS stands for HTTP Strict-Transport-Security. This is a list of websites that have requested to be contacted via HTTPS only.

The browser checks its HSTS list.

If the website is on the list, the browser sends its request via HTTPS instead of HTTP. Otherwise, the initial request is sent via HTTP. (Note that a website can still use the HSTS policy without being in the HSTS list. The first HTTP request to the website by a user will receive a response requesting that the user only send HTTPS requests. However, this single HTTP request could potentially leave the user vulnerable to a downgrade attack, which is why the HSTS list is included in modern web browsers.)

The Phonebook of the Internet

When we type the website name or address — or more technically called a URL, google.com into our browser and press Enter, the browser has to break down the URL in pieces — and for that, we need Domain Name System (DNS) servers.

What’s a URL?

URL stands for Uniform Resource Locator and is used to access a website. There are several parts in a URL: the protocol, hostname, port and path-and-file-name, and more e.g. query strings. In this case, https is the protocol and google.com is the hostname.

DNS

Think of the Domain Name System (DNS) as the phonebook of the internet. DNS servers are application servers that convert domain names, easily understood by humans, into machine-readable IP addresses. Instead of having to remember a string of numbers e.g. 142.250.185.78, all we have to type in the browser address bar is google.com. Each device connected to the internet has a unique IP address which other machines use to find the device.

When a client needs the address of a system, it sends a DNS request with the name of the desired resource to a DNS server. The DNS server responds with the necessary IP address from its table of names. How does all this work under the hood though? Keep reading!

How DNS lookup works

Browser checks if the domain is in its cache. (to see the DNS Cache in Chrome, go to chrome://net-internals/#dns)
If not found, the browser calls gethostbyname library function (varies by OS) to do the lookup.
gethostbyname checks if the hostname can be resolved by reference in the local hosts file (whose location varies by OS) before trying to resolve the hostname through DNS.
If gethostbyname does not have it cached nor can it find it in the hosts file then it requests the DNS server configured in the network stack. This is typically the local router or the ISP's caching DNS server. DNS requests are done via UDP on port 53 so your ISP (Internet Service Provider) can listen in on your DNS requests because it's unencrypted.
If the DNS server is on the same subnet the network library follows the ARP process below for the DNS server.
If the DNS server is on a different subnet, the network library follows the ARP process below for the default gateway IP.

ARP process

To send an ARP (Address Resolution Protocol) broadcast the network stack library needs the target IP address to lookup. It also needs to know the MAC address address of the interface it will use to send out the ARP broadcast.

The ARP cache is first checked for an ARP entry for our target IP. If it is in the cache, the library function returns the result: Target IP = MAC.

If the entry is not in the ARP cache:

The route table is looked up, to see if the Target IP address is on any of the subnets on the local route table. If it is, the library uses the interface associated with that subnet. If it is not, the library uses the interface that has the subnet of our default gateway.
The MAC address of the selected network interface is looked up.
The network library sends a Layer 2 (data link layer of the OSI model) ARP request:

ARP Request:

Sender MAC: interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here

In our case, we assume our computer is directly connected to the router therefore we get the following ARP reply:

ARP Reply:

Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: interface:mac:address:here
Target IP: interface.ip.goes.here

Now that the network library has the IP address of either our DNS server or the default gateway it can resume its DNS process.

DNS resolution

The resolver checks its cache. If it can't locate the IP for google.com, it locates the root server. The resolver server is usually your ISP (Internet Service Provider). All resolvers must know one thing: where to locate the root server.

The root server knows where to locate the .com TLD server. TLD stands for Top-Level Domain.
The root server first checks if it can locate the IP for google.com. If it can't it locates the .com TLD server. In the meantime, the resolver stores the location for the .com TLD server so that next time it locates it directly.

Root servers sit at the top of the DNS hierarchy. They are scattered around the globe and operated by 13 independent organizations. They are named [letter].root-servers.net where [letter] ranges from A to M. This doesn't mean that we have only 13 physical servers to support the whole internet! Each organization provides multiple physical servers distributed around the globe.

The coordination of most top-level domains (TLDs) belongs to the Internet Corporation for Assigned Names and Numbers (ICANN). The .com TLD was one of the first created in 1985. Today, it is the largest TLD on the internet.

If the .com TLD server can't locate the IP for google.com, it moves on to locate the Authoritative name server. In our case, google.com is an authoritative name server so we can locate its IP.

Each component in the DNS resolution caches the result when a DNS query gets resolved. Also, DNS entry is assigned a time-to-live (TTL) expiry limit.

TCP connection

The browser uses Hypertext Transfer Protocol (HTTP) to transfer data.

HTTP is an abstract protocol. It’s part of the application layer or layer 7 in the Open Systems Interconnection (OSI) model. Besides it transfers data in a human-readable format.

While Transmission Control Protocol (TCP) is a low-level protocol. It belongs to the transport layer or layer 4 in the OSI model. It allows the detection of errors and the retransmission of corrupted data packets.

TCP uses a bi-directional communication channel. So, it makes a three-way handshake to create one.

This is how a browser opens a TCP connection to the server:

The browser sends a SYN request with a random sequence number. SYN is short for synchronize.
The servers respond with SYN-ACK, short for synchronize request acknowledged. The acknowledgment number is created by incrementing the received sequence number by 1. Also, the server sends a random sequence number.
The browser sends ACK. The acknowledgment number is created by incrementing the received sequence number by 1.

TLS handshake

The client's computer sends a ClientHello message to the server with its Transport Layer Security (TLS) version, list of cipher algorithms, and compression methods available.
The server replies with a ServerHello message to the client with the TLS version, selected cipher, selected compression methods, and the server's public certificate signed by a CA (Certificate Authority). The certificate contains a public key that will be used by the client to encrypt the rest of the handshake until a symmetric key can be agreed upon.
The client verifies the server's digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this with the server's public key. These random bytes can be used to determine the symmetric key.
The server decrypts the random bytes using its private key and uses these bytes to generate its copy of the symmetric master key.
The client sends a Finished message to the server, encrypting a hash of the transmission up to this point with the symmetric key.
The server generates its hash and then decrypts the client-sent hash to verify that it matches. If it does, it sends its own Finished message to the client, also encrypted with the symmetric key.
From now on the TLS session transmits the application (HTTP) data encrypted with the agreed symmetric key.

Firewall check

At this point, no data has been exchanged yet. However, your browser (the client) has just sent a request over the internet to a server and is awaiting a response. Before this happens, a firewall check is conducted to ensure there are no security violations.

A firewall is a security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. Its primary purpose is to protect a network from external threats, such as hackers and malware.

When you hit enter on a URL like “google.com” in your browser, the request that your browser makes to Google’s server passes through the firewall on its way. The firewall checks the outgoing request to make sure it is allowed based on its security rules.

There are two main types of security rules that a firewall uses to check outgoing requests:

Rules that allow or block traffic based on the source and destination of the request. For example, a firewall may be configured to block all traffic from certain countries or to allow only certain IP addresses to access the network.
Rules that allow or block traffic based on the type of traffic. For example, a firewall may be configured to block all traffic on certain ports (such as those used by malware) or to allow only certain types of traffic (such as HTTP or HTTPS).

If the outgoing request meets the security rules set by the firewall in front of Google’s server, it is allowed through, and the browser can access the website.

However, if the request does not meet the security rules, it is blocked, and the browser is unable to access the website.

Load-balancer

A load balancer is a device that distributes incoming network traffic across a group of servers or resources.

Its primary function is to ensure that the traffic is distributed evenly across the servers to avoid overloading any single server and to increase the overall capacity and reliability of the system.

A company like Google, which receives billions of requests a day, will need a lot of servers to serve all these users. Therefore, there will be a need for them to set up a load balancer to ensure that some of the servers are not overburdened while others are being underutilized.

In the case of a browser trying to access google.com, the load balancer would receive the incoming request from the browser and then forward it to one of the servers in the Google server network. The particular server chosen will depend on the type of load-balancing algorithm implemented.

HTTP Request Response

The browser has made a GET request to view the Google webpage which has gone through the load balancer.

An HTTP request consists of different entities:

Uniform Resource Locator (URL)
HTTP headers
HTTP body (optional)

The HTTP method (Verb) defines the type of action to be performed on the server. The popular HTTP methods are:

GET; is used to get data from a web server.
POST; is used to send data from a web server.
PUT; is used to update data from a web server
DELETE; is used to delete data from a web server

An HTTP request will always return a response, at least with the status code indicating if the request has been successful or not. In brief, status codes are categorized as:

Informational responses (100–199)
Successful responses (200–299)
Redirection messages (300–399)
Client error responses (400–499)
Server error responses (500–599)

This is how the server handles an HTTP request:

The server forwards the HTTP request to the request handlers. The request handler is a piece of code defined in any programming language like Python, Node.js, or Java.
The request handler checks the HTTP request headers (content-type, content-encoding, cookies, etc.)
The request handler validates the HTTP request body.
The request handler generates a response in the content type (JSON or XML) requested by the client.

Web server

A web server is a computer program that is responsible for handling requests for web pages from clients (such as a browser trying to access google.com). When a client sends a request for a web page to a web server, the server processes the request and returns the appropriate response to the client.

This means that when trying to access google.com, Google’s web server will receive a request from the load balancer. The response from the web server would typically include the HTML, CSS, and other static files e.g. images that make up the web page.

The browser would then use the HTML, CSS, and static files to render the web page for the user.

Application server and database

Unlike the web server, the application server handles dynamic content. That means, for example, it’s possible to interact with the website, save information on it, log in with a username and a password, and so on. When using “google.com”, the application server will be responsible for generating the search results (which change based on the query you put into the search engine).

When you submit a search query to Google, the request is first sent to the load balancer, which forwards it to one of the web servers in the Google server network. The web server then sends the request to the application server, which processes the request and generates the search results.

Depending on the complexity of the search query, the application server may need to request a database to retrieve the necessary data.

For example, if you are searching for a specific product on an e-commerce website, the application server may need to retrieve information about the product from a database.

Once the application server has obtained the necessary data, it sends it back to the web server, which includes it in the response that is sent back to the browser. The browser then uses this information to display the search results to you.

Rendering the page

When a browser receives a response from a web server, it processes the HTML, CSS, and JavaScript files that are included in the response to render the web page.

The rendering process involves interpreting the HTML and CSS code, rendering any images or other media that are included on the page, and executing any JavaScript code that is present on the page.

In your case, your browser would receive the response from the web server, which includes the HTML, CSS, and JavaScript files that make up the Google web page.

The browser would then use these files to render the page and display it to you. This process typically involves the following:

displaying the text and images on the page in the appropriate positions
formatting the text and layout according to the CSS styles
executing any JavaScript code that is present on the page

Once the page has been fully rendered, you can now interact with it by clicking links, entering text, or interacting with other elements on the page.

The diagram below illustrates the entire process described above:

That's it! Thanks for reading this far! Hope you crush that interview! Happy coding!