Written on the 13th of May 2024, last modified on the 19th of may 2024.
LinkHow do browsers load websites?
When surfing the internet, you will click on numerous links leading to all sorts of websites, and each time you click a link to a new webpage, you have to wait for everything to load before you can continue. In the background, your browser has to go through a lot of steps to make that happen. This article will break down the entire process so you know what happens when you click on a link. We will start with how the browser gets all the information it needs, then continue with how the browser makes sense of all this data and ultimately figures out how it should all be shown to the end user. An finally we will close of with how you can impact when browser load what as well some other methods to improve the loading speed of webpage.
Knowing this allows you to appreciate all the work that has to happen to make using the internet possible. It also provides you with fundamental knowledge so you can improve the performance of the websites and online applications you build. Learning about these details on how the browser operates has helped me optimise the Content Management System (CMS) my colleagues and I use to build our customers' websites.
Let’s set the scene first: imagine you want to know more about how browsers load websites, so you go to your favourite search engine and type in your question, then hit enter. Bing will then show you an interesting first link to an article exactly about this very subject, perhaps even this one! You go to this first result and click on it. What happens now?
LinkProcessing the link
Your browser receives the information that it needs to go to another page that can be found at the link you just clicked. In order to figure out how to get this information, the browser needs to figure out what is being requested. This is done using a link, also referred to as a Uniform Resource Locator (URL). A URL is a way to identify a resource somewhere in the world using a simple piece of text. Brilliant, right? The first step is to break the link down into the components a URL can consist of. Let’s take a slightly modified version of the link to this section of the page: https://rondekker.nl/en-gb/articles/web-performance/how-browsers-load-webpages/index.html#processing-the-link?utm_source=google
. Seems like a lot but we’ll go through it one step at a time starting on the left.
https://
: This is the protocol of the Hypertext Transfer Protocol Secure (HTTPS). You might have heard of other protocols as well, such as Secure Shell (SSH) or File Transfer Protocol (FTP). A protocol just means that there is an agreed-upon way to do something and in this case, the HTTPS protocol is being used. More on what that means later.rondekker.nl
: This is the domain name, a part of the address that the website uses and that the browser can use to look up where the computer that hosts the website can be found./articles/web-performance/how-browsers-load-webpages/index.html
: This is the path. Just like your computer has files and a path that leads to them, this is the path that leads to a file on the computer that serves the website. Often when the file name isindex.html
, it is thought of to be implicit and leaving it out means the same thing. Just like this website does.#processing-the-link
: This is a fragment. It refers to an element of the file we want to visit, in other words, a section of the webpage. After the page has loaded, the browser will use this to automatically scroll to the correct part of the page.?utm_source=google
: This is a query parameter. It allows for additional information to be transferred when fetching the resource. Query parameters are used in various ways; a common example can be to indicate which page of data you want via apage=4
parameter. If multiple parameters are needed, they can be strung together using an ampersand.
Now that the URL has been dissected into its component parts, the browser can start to figure out how and where it can find the information that the user has requested. The "how" is, of course, the protocol, in this case, the HTTPS protocol. The next part is to figure out where the information can be retrieved from.
LinkFinding the right server
To find the right computer that serves the information, the domain name needs to be used by the browser so it can perform a Domain Name System (DNS) lookup. Currently, the browser has a name, but what it needs is an Internet Protocol (IP) address. This can be compared to a postal address in the physical world. This address tells us where something is, but not in absolute terms. For this, we need to know the coordinates. A DNS lookup allows us to convert this postal address to the right coordinates. It just means that we are able to link an arbitrary set of numbers to an address we can remember and type because otherwise, every time we want to watch cat videos on the internet, we would have to remember a set of numbers instead of just a simple name.
But why do we need to get the absolute address instead of just using a relative address? Great question. When your computer talks to a server, it needs to establish an exact connection, just like dialling a specific phone number. This way, there can be many computers that use the same domain name, and a different absolute address will be given to the browser. This is the same as when dialling an emergency phone number. In many places in the world, this is 112. But if you were to call this number, you will get your country’s emergency line, not that of a different one. This is done by dynamically looking at what the closest related centre would be. The same can be done when visiting websites since there isn’t much use to have a single server in the world for a massive website. Especially if it were to restrict it to a single location in the world, the latency would be incredibly long. There are more reasons why, but this is my favourite.
The domain name itself consists of a hierarchy which starts at the end with a top-level domain (TLD). This is often either a generic top-level domain, such as com and org, or a country code top-level domain, examples being nl and de. These are then prepended with the name that you register with a domain name registrar. In my case, my name is rondekker. Optionally, you can prepend it with more subdomains. You might be used to seeing www a lot in the websites that you visit, which is a good example of a subdomain.
The way that your computer figures out the exact IP address is to go through a simple process. Once a step does not know where the domain name is located, it will move on to the next step.
- The browser checks its own temporary list, also called a cache, of domain names that it has recently checked to see if it already knows the address. This is done because accessing the local storage of your computer is orders of magnitude faster than asking the network.
- The browser checks the operating system’s cache. If it knows the domain name, perhaps another programme on the computer has recently requested it, and the operating system still has it stored away.
- The browser asks another server for which your computer already knows the address, a so-called DNS Resolver. These can be preconfigured in your computer or network router. For example,
8.8.8.8
managed by Google or1.1.1.1
managed by CloudFlare. This DNS resolver acts as a middleman and tries to hold a list of domains and their addresses in its cache. - If the DNS Resolver does not have the IP address in its cache, it can then ask the Root name server for where it can request information about any top-level domains if it doesn’t know this already. The DNS Resolver can then ask the TLD name server for the Authoritative name server for the domain. The Authoritative name server can then finally respond with the IP address the browser is looking for.
In the end, the DNS resolver will, if all things go right, return to your browser the IP address of where the domain is hosted and the information can be found.
LinkEstablishing a secure connection with TLS
Now that we know where the computer is that serves the website, we need to establish a secure connection before asking for the information. This is done using the Transport Layer Security (TLS) protocol, which ensures that the data exchanged between your browser and the server is secure. TLS is a cryptographic protocol designed to provide secure communication over a computer network. This is important because it guarantees the privacy and integrity of the data being transferred, preventing eavesdropping and tampering.
- When your browser connects to a server, it sends a "ClientHello" message. This message includes information like the TLS version and the cipher suites (encryption algorithms) supported by the browser.
- The server responds with a "ServerHello" message, selecting the TLS version and cipher suite that both the server and client support. The server also sends its digital certificate, which contains the server's public key and is issued by a trusted Certificate Authority (CA).
- The browser verifies the server’s certificate against a list of trusted CAs. If the certificate is valid, the browser generates a "pre-master secret" and encrypts it with the server's public key.
- Both the browser and the server use the pre-master secret to generate session keys. These keys are used to encrypt and decrypt the data exchanged during the session.
- Both the browser and server send a message to each other to confirm that future communications will be encrypted using these session keys. This is often referred to as the Finished message in the TLS handshake.
This process, while seemingly complex, occurs in milliseconds, ensuring that your data remains private and secure as it travels over the internet. Understanding TLS is crucial because it underpins the secure connections that protect user data and maintain trust in online communications. Most browser will show the successful use of TLS with a little lock next to the URL of the webpage you're on. Modern browsers will now also throw up warning before the page loads in if a website does not support a secure connection, luckily it has now become standard for all websites to have this set up correctly thanks to non-profit organisations like Let's Encrypt.
Once the connection is established we can delve further into the protocol that uses TLS, which is of course the earlier mention HTTPS protocol.
LinkUnderstanding the HTTPS protocol
Now that the data transfer is underway, it's important to understand the protocol being used, the Hypertext Transfer Protocol Secure (HTTPS). HTTPS is the secure version of HTTP, the protocol over which data is sent between your browser and the website that you are connected to. HTTPS uses the aforementioned TLS to encrypt data, ensuring that it cannot be read or altered by third parties. This not only provides privacy but also data integrity and authentication.
HTTP, or Hypertext Transfer Protocol, is the foundation of any data exchange on the Web. It is a protocol used for transmitting hypertext requests and information between servers and browsers. HTTP operates at the application layer of the Internet protocol suite (TCP/IP) and is essential for loading web pages and interacting with web services.
When you enter a URL in your web browser, an HTTP request is sent to the server hosting the website. This request includes various details such as the type of request, the path to the resource, and any additional headers that provide more information about the request. The server processes this request and sends back an HTTP response, which includes the requested resource (such as an HTML page, image, or JSON data) and status information indicating whether the request was successful. A HTTP request, for example, might look like this:
GET /index.html HTTP/1.1
Host: rondekker.nl
And in response a server might respond with this:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 96
<!DOCTYPE html>
<html>
<head><title>Ron Dekker</title></head>
<body>Welcome to rondekker.nl!</body>
</html>
As the example shows, HTTP defines headers which are key-value pairs sent along with both requests and responses. They provide essential information about the request or response, such as content type, content length, server information, and cache directives.
A key concept of HTTP is that each request from a client to a server is independent. This means that the server does not retain any information about previous requests. While this simplifies server design and allows for greater scalability, it requires additional mechanisms, such as cookies or sessions, to maintain stateful interactions when needed.
In addition HTTP also specifies a set of request methods to indicate the desired action to be performed on the identified resource. These methods can be one of the following:
- The OPTIONS method describes the communication options for the target resource, often used for checking supported HTTP methods or CORS (Cross-Origin Resource Sharing) preflight requests. An OPTIONS request to
/api/
might be used to determine the methods supported by the API. This method is safe and idempotent. - The GET method retrieves data from a specified resource, commonly used for accessing web pages and downloading files. For example, a GET request might be sent to
/articles/
to fetch the list of articles. It is considered safe and idempotent, meaning it does not modify the resource and can be called multiple times without changing the outcome. - The HEAD method retrieves headers for a resource without the response body, similar to GET. It is useful for checking if a resource exists or inspecting headers before downloading a large file. An example HEAD request to
/articles/
might be used to check the existence of the articles resource. This method is safe and idempotent. - The POST method submits data to be processed to a specified resource, typically used for form submissions and creating new records in a database. An example POST request could be sent to
/users/
to create a new user account. Unlike GET, POST is not idempotent; calling it multiple times may result in different outcomes, such as creating multiple records. - The PUT method updates or creates a resource at a specified URL. It is often used for updating user profiles or uploading files to a specific location. For instance, a PUT request to
/users/123/
might update the information for user 123. This method is idempotent, so multiple identical requests will produce the same result. - The PATCH method applies partial modifications to a resource. It is used for updating specific fields of a resource without affecting the entire resource. For example, a PATCH request to
/users/123/
might update only the email address of user 123. This method is not necessarily idempotent, as it depends on the changes applied. - The DELETE method removes a specified resource, such as deleting a user account or a file. A DELETE request to
/users/123/
would delete user 123. This method is idempotent, meaning multiple identical requests will have the same effect.
HTTP has evolved over time to improve performance, security, and capabilities.
- HTTP/1.1 introduced persistent connections, chunked transfer encoding, and enhanced cache control mechanisms. Persistent connections allow multiple requests and responses to be sent over a single connection, reducing the latency associated with establishing new connections.
- HTTP/2 brought significant performance improvements, including multiplexing, which allows multiple requests and responses to be sent simultaneously over a single connection. This reduces latency and improves page load times. HTTP/2 also uses header compression to reduce the overhead of headers, making data transfer more efficient.
- HTTP/3 uses the QUIC transport protocol, which operates over UDP instead of TCP. This reduces connection setup times and improves performance, especially on unreliable networks. QUIC handles packet loss more efficiently, resulting in faster and more reliable connections.
By understanding HTTP and its evolution, you can hopefully appreciate the role it plays in web communication and how advancements in the protocol continue to improve the user experience.
LinkAsking the server for the information
With a secure connection established, the browser can now ask the server for the information. The browser sends a request over this connection, including the URL and additional headers. The server can then process the request and respond with the requested data if available and allowed to do so.
The server can handle a response in any number of ways. For now, let’s think of it as simply having a file on the computer called index.html
in an /articles/web-performance/how-browsers-load-webpages/
directory. The server can then take this file and simply start sending it over to the browser. If the response is larger than a few kilobytes, it will take more than one response from the server to provide the browser with all the data. But this does not mean that the browser will keep sitting still; it will immediately start processing the data.
The first response probably contains some headers. These are first read to figure out what the server is sending back. The browser probably sends information in the headers that it can process HTML; the server can then respond with information in the header that it is, in fact, sending over HTML as well as that the request was successful. This means the browser can take the data that follows and start showing it.
LinkRendering content to the screen
The first step in parsing the HTML data is to create the page’s structure. Using this, the browser can start rendering elements to the screen. The browser wants to do this as quickly as possible, and the way it does this is by using the so-called critical rendering path.
The critical rendering path is the sequence of steps that a browser takes to render the initial viewable content of a webpage as quickly as possible. It encompasses the key processes and resources necessary for rendering the portion of the webpage that is initially visible in the viewport without scrolling, known as the above-the-fold content. It is therefore crucial for optimising web performance and user experience, as it determines how quickly users can perceive and interact with the content of a webpage. By prioritising the loading and rendering of critical resources, browsers can minimise perceived load times and enhance the overall experience. The critical rendering path includes the steps and stages as described below.
- HTML parsing: The browser parses the HTML document received from the server to construct the Document Object Model (DOM), representing the structure of the webpage’s content. HTML parsing involves parsing individual elements and building a tree-like structure that can be manipulated and rendered by the browser.
- CSS parsing: As the browser parses the HTML document, it also processes and parses any linked CSS stylesheets or embedded style blocks, representing the styling rules applied to each element on the page. CSS parsing involves resolving selectors, computing specificity, and applying styles to the corresponding elements to determine their visual appearance and layout.
- Render tree construction: Once the document and styles are computed, the browser combines them to create the render tree, which represents the visual hierarchy of the webpage’s content. The render tree consists of only the elements that will be rendered on the screen, excluding non-visible elements such as script tags or other hidden elements.
- Layout: The browser then performs layout calculations to determine the position, size, and style of each element in the render tree relative to the viewport. This process involves computing the dimensions of block-level and inline elements based on their content, styles, and positioning properties. Layout calculations are necessary to establish the initial visual layout of the webpage. When the document has changed, this process is often referred to as reflow as well.
- Painting: Once layout calculations are complete, the browser paints the pixels onto the screen to render the visual representation of the webpage. Painting involves filling in the pixels of each element with the appropriate colours, textures, and effects according to the specified styles and layout properties. The painted elements are then composited together to create the final rendered image.
Afterward, the page is almost done, and scripts are then parsed and executed. These scripts may manipulate the document structure and styling, affecting the render tree and subsequent reflow and painting. This behaviour can include dynamic content generation, event handling, and modifying structure or styles based on user interactions or other triggers. The next question to ask ourselves is how does the browser discover all these resources and when does it load these in?
LinkResource discovery and loading
Since there is more to a webpage than only the HTML, it is important to know how and when these resources are discovered and loaded. A webpage can include styling directly, but more commonly, it is split up into a separate file and only referenced using a link tag in the structure of the webpage. For example, a simple page that outputs a funny cat image together with some styling and interaction can look something like the following:
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="style.css">
<script src="script.js"></script>
</head>
<body>
<img src="cat.jpg" alt="A cat’s head poking through a slice of bread.">
</body>
</html>
The way the browser reads this is that it will first look at the head tag and its contents. It starts at the top and will encounter the first resource, which in this case is a link to a stylesheet. The browser will then load this resource and move on to the script tag, seeing that it requires an additional file. When that has loaded, it will continue to the body tag. This, of course, contains the most important asset: our funny cat picture. The important takeaway is that for each external resource, the browser has to make an additional network request, similar to when it requested the HTML of the page.
LinkImproving performance the next time using caching
Any resources loaded can, of course, also be cached, which depends on what the server responds with in the headers of the response. Therefore, this can mean that the next time the user visits the page, it already has the stylesheet or any other resource downloaded, making the second time the page is visited a lot faster.
In the example, the user will, of course, have to wait for the stylesheet and script to load in before the image will be downloaded and shown. Here, we have encountered the first thing we can influence to improve the user experience. Since there is nothing to interact with before the body has been parsed and shown, we can move the script down into the body tag. And this is where I want to focus on: how are resources discovered and how can you influence this process.
LinkInfluencing resource discovery and loading
Tag order, as mentioned before, is the first thing that impacts the discovery of resources. The further down the page, the later the resources will be found, and the later they can be retrieved by the browser, which in some scenarios is what you want. But it can be altered further to break this top-down loading approach using several attributes.
Next, we can use the loading attribute to utilise lazy loading. Lazy loading is a technique that defers the loading of non-essential resources until they are needed, typically triggered by user interactions such as scrolling or clicking. By delaying the loading of off-screen or below-the-fold content, lazy loading minimises the initial page load time and reduces the amount of data transferred over the network.
An async attribute can be placed on a script tag. It indicates to the browser that the script should be downloaded asynchronously, allowing the HTML parsing and rendering process to continue without blocking while the script is being fetched. It will, however, be executed as soon as it is made available.
Similarly, the defer attribute, when placed on a script tag, instructs the browser to initiate the download of the script file immediately. However, the execution of the script is deferred until after the HTML parsing is complete. If this is placed on a script tag with the async attribute, then it will ignore the defer attribute and act as if only the async attribute has been placed on the script tag.
Before a reference to a resource is made, the browser can initiate a connection to the server beforehand using the rel=preconnect attribute on a link tag. It informs the browser to initiate an early connection to a specified domain in anticipation of future resource requests. This preconnect hint allows the browser to establish the necessary network connections pre-emptively, reducing the latency for subsequent resource requests.
A very close relative of this is the rel=dns-prefetch attribute. It will perform a DNS lookup of the specified domain name but will not establish a connection with the server just yet, only when the resource from it will be loaded.
Then we have the rel=prefetch attribute. It tells the browser to fetch and cache a specified resource in the background, without rendering or executing it immediately. This prefetching mechanism allows the browser to proactively retrieve resources that are likely to be needed in the future.
Another variation of this is the rel=preload attribute. It instructs the browser to fetch and cache a specified resource as early as possible during the page loading process. Preload is used to indicate resources that are critical for rendering the current page or upcoming interactions and should be prioritised for early fetching and caching.
Resources can also be loaded by other resources. For example, a font might be specified in a stylesheet. This can only be discerned when the stylesheet has been loaded in and the font is used on the webpage. But using the previously mentioned preload attribute, we can speed this up by letting the browser know that we need this as soon as possible. The opposite of course also holds true, we can delay the loading of less important resources by putting it behind another resource. For example, you can run a script which modifies the page and adds additional resources, or have it fetch data.
LinkService workers and their role in performance
Service workers are a powerful feature that can significantly enhance web performance and user experience. A service worker is a script that runs in the background, separate from the web page, enabling features that do not require a web page or user interaction. Understanding how service workers work is crucial because they can improve the performance, reliability, and capabilities of your web applications. When a user visits a website for the first time, the browser installs and activates the service worker. This process involves downloading the service worker script and registering it with the browser.
Service workers can then intercept network requests and serve responses from a cache. This allows developers to implement caching strategies, such as caching static assets during installation and serving them from the cache for subsequent requests. This reduces the need to fetch resources from the network, improving load times and reducing data usage. By caching critical resources, service workers enable websites to function offline. When the user is offline, the service worker can serve cached content, providing a seamless experience even without an internet connection.
By managing how and when resources are fetched and cached, service workers can significantly reduce load times, especially on repeat visits. This leads to a more responsive and faster website, enhancing overall user satisfaction. Service workers can thus play a significant role in making web applications faster, more reliable, and capable of providing a better user experience.
LinkOptimising web performance
Understanding how the browser loads webpages allows us to optimise web performance effectively. There are several strategies that can be employed to enhance the speed and efficiency of your website.
Firstly, minimising HTTP requests is crucial. Each resource on a webpage requires an HTTP request. Reducing the number of these requests can significantly improve load times. This can be achieved by combining files, such as CSS and JavaScript, using CSS sprites to combine multiple images into a single file, and inlining small CSS and JavaScript files directly into the HTML.
Enabling compression can also reduce the size of HTML, CSS, and JavaScript files. Using Gzip or Brotli compression on your server ensures that compressed versions of files are served, reducing data transfer times.
Leveraging browser caching is another important strategy. Caching allows browsers to store resources locally, reducing the need for repeated requests. This can be controlled using HTTP headers such as Cache-Control, which specifies caching policies, Expires, which sets an expiration date for cached resources, and ETag, which allows browsers to validate cached resources with the server.
Prioritising above-the-fold content ensures that the content visible to users without scrolling loads first. This improves perceived performance and user experience. Critical CSS can be inlined, and JavaScript can be loaded asynchronously or deferred until after the HTML parsing is complete.
Optimising images is essential as they are often the largest assets on a webpage. Using modern image formats like WebP for better compression, resizing and compressing images to the appropriate dimensions before uploading them can drastically reduce load times, if the Content Management System or build system does not do this for you automatically already. Additionally, using responsive images and the srcset attribute allows different images to be served based on the device’s resolution, ensuring optimal performance.
Lastly, monitoring performance regularly is vital to identify and address issues. Tools like Google Lighthouse, PageSpeed Insights, and WebPageTest can be used to analyse performance metrics. Setting up Real User Monitoring (RUM) gathers data on real user experiences, allowing for data-driven optimisations.
LinkCore Web Vitals
Core Web Vitals are crucial metrics introduced by Google to measure and quantify issues related to the user experience when visiting a webpage. These include metrics such as Cumulative Layout Shift (CLS), First Contentful Paint (FCP), Largest Contentful Paint (LCP), and Time to Interactive (TTI). Understanding and optimising these metrics are essential for improving web performance and therefore the user experience.
LinkGreat, now what?
Now that you understand how the browser loads webpages and how you can influence the resources referenced by it, I recommend opening up your browser’s developer tools and start reloading some pages. Throttling via the network can help see how the process works and give a breakdown of all the resources that are being loaded as well as when these are fetched. With this knowledge, you should be able to optimise your websites and online applications, providing faster and more efficient experiences for your users.
Link