How the browser renders a webpage

The time it takes the browser to render meaningful content is one of the most important aspects of user experience. To paint initial content, the browser converts HTML, CSS, and JavaScript to pixels, a process known as the critical rendering path. The critical rendering path has four steps (MDN contributors 2022a):

Parsing
Render tree construction
Layout
Painting

Parsing

The browser begins by requesting an HTML file located at a URL. While downloading the file, the browser converts bytes to characters based on the file’s encoding. Depending on the size of the file, network bandwidth, and network latency, the browser may incrementally execute the steps of the rending path while it downloads the rest of the HTML file on a separate thread (Hiwarale 2019).

DOM

With the HTML downloaded, the work of rendering the webpage begins. The document object model (DOM) represents the structure of a web page and offers an API for programming languages, such as JavaScript (MDN contributors 2022e). It consists of a collection of JavaScript objects arranged in a tree structure; each JavaScript object is a node in the tree. When parsing HTML, the browser creates a JavaScript object whenever it encounters an HTML token, like html, div, or p (Hiwarale 2019). For example, when the browser parses a div tag, it creates a node from the HTMLDivElement class, which has the Node class as one of its ancestors (i.e., prototypes).

When the browser encounters a link, video, or img tag referencing an external file, it downloads the file on a separate thread (Hiwarale 2019). Parsing on the main thread continues uninterrupted.

CSSOM

When the browser encounters CSS, be it directly on an HTML tag, between style tags, or upon downloading it from an external file via a link tag, it begins building the cascading stylesheet object model (CSSOM). The CSSOM is a tree structure containing HTML nodes capable of appearing on the screen (Hiwarale 2019). This means tags like link, script, and title do not exist in the CSSOM.

Each CSSOM node contains all CSS properties applicable to the node. If our CSS does not set a property directly (or indirectly, when inherited from a parent node), the property gets a default value from the browser’s user agent stylesheet.

Within a block of CSS, rules appearing later may override rules defined at the beginning. Therefore, the browser parses an external CSS file in full even when there is network slowness (Hiwarale 2019). This makes CSS render blocking, because the browser cannot determine which styles to render until it completely parses a CSS file.

Along similar lines, the browser waits to execute a script file until it downloads and parses all previously encountered CSS (Hiwarale 2019). This prevents JavaScript from accessing style values that might become stale if changed by a CSS rule later in the stylesheet. The upshot is CSS can block script execution, specifically in the case of inline or async scripts.

One strategy to prevent CSS from blocking rendering or scripting is to set the media attribute on an external stylesheet. For example, the code below indicates this stylesheet applies only to viewports with a width of at least 1,080 pixels. The browser will not load this stylesheet on small screens, preventing it from blocking rendering or scripting when the CSS does not apply to the current screen size.

<link href="”wide.css”" rel="”stylesheet”" media="”(min-width:" 1080px)” />

JavaScript

When the browser encounters a script tag, it stops parsing until it downloads, parses, and executes the script. HTML parsing pauses because scripts commonly alter the DOM. If a script altered the DOM on the main thread while HTML parsing continued on a separate thread, race conditions would make the resulting DOM unpredictable (Hiwarale 2019).

Adding a defer or async attribute to a script tag allows the browser to continue parsing on the main thread while the script downloads on a separate thread. According to the Mozilla Developer Network (2022b):

The defer attribute instructs the browser to download the script on a separate thread and defer executing the script until it finishes parsing all HTML. The browser waits to fire the DOMContentLoaded event until it executes all deferred scripts. Once the browser constructs the entire DOM, the deferred scripts execute in the order their script tags appeared in the HTML. The defer attribute has no effect on inline scripts. It also has no effect on module scripts, because module scripts defer by default.
The async attribute instructs the browser to continue parsing HTML on the main thread until the script finishes downloading. As soon as the download finishes, the browser pauses parsing HTML to execute the script on the main thread. The async attribute has no effect on inline scripts. Unlike the defer attribute, the DOMContentLoaded event fires as soon as the browser finishes parsing the HTML, even if it is still downloading async scripts. Because the browser executes async scripts as soon as they finish downloading, these scripts might not execute in the order they appear in the HTML.

When the browser begins parsing a script, it parses the top-level JavaScript into an abstract syntax tree (AST) and adds it to the global execution context (Saini 2020). During parsing, the browser allocates memory to variables, giving each a value of undefined; it allocates memory to functions, setting each function’s value to the unparsed code inside the function (Saini 2020).

Next, the browser interprets/executes the code, compiling the AST into byte code (Hinkelmann 2017). During this phase, the browser recursively parses JavaScript inside the invoked functions. The browser creates a new execution context during each function invocation, just as it did when creating the global execution context (Saini 2020). The browser adds each new execution context to its call stack. When the browser executes a return statement or reaches the end of an execution context, it pops that execution context off the call stack. The browser continues this process until the call stack is empty, meaning it has returned to the global execution context.

If the browser repeatedly executes the same function, it compiles the function into optimized machine code using type information inferred from previous calls (Hinkelmann 2017). As long as the function’s parameters maintain the same types, this optimized code executes faster during future calls. If the parameter types change in a subsequent call, the browser de-optimizes the function back to byte code (Hinkelmann 2017).

Speculative parsing

When the browser waits for a file to download on the main thread, it performs speculative parsing on a separate thread, looking for external resources later in the document (Hiwarale 2019). In separate thread, the browser downloads resources so they will be available later to the parser on the main thread. A speculatively-parsed resource may go unused if a script invalidates it by removing or hiding the element which references the external resource.

DOM events

When the browser finishes creating the DOM and CSSOM and executes all parser-blocking scripts, it fires the DOMContentLoaded event (MDN contributors 2022c). When the browser finishes downloading all external files, such as stylesheets, images, videos, and non-async scripts, it fires the window.load event (MDN contributors 2022d).

Render tree

Once the browser finishes constructing the DOM and CSSOM, it combines them to generate the render tree (MDN contributors 2022a). The browser traverses every node of the DOM and, using the CSSOM, determines which CSS rules to attach to each node. The render tree contains only nodes that take up space on the page, so the browser does not include DOM nodes with display:none. It will contain nodes with visibility:hidden or opacity:0 because these nodes occupy space.

Note the DOM and CSSOM may undergo further changes if requests for external CSS or JavaScript are still pending, triggering further updates to the render tree during initial page load. JavaScript may alter either tree after page load by adding, removing, or modifying DOM nodes in response to event handlers.

Layout

Having constructed the render tree, the browser moves to the layout step. During layout, the browser determines how to position nodes on the page. This consists of calculating the width and height of each node and its position relative to other nodes (MDN contributors 2022a). Nodes in the render tree commonly overlap one another, so the browser creates layers to draw nodes in the correct stacking order (Hiwarale 2019).

Available width and height for layout depends on the viewport meta tag; without it, the browser sets the viewport according to the user agent stylesheet—usually 960 px (Hiwarale 2019).

Because layout occurs towards the end of the rendering path, there are many events that cause the browser to recalculate layout. These include:

Adding, removing, or moving a DOM node
Adding or removing a stylesheet
Changing a node’s class or style attributes
Resizing or changing the orientation of the viewport
Calculating layout metrics, such as offsetWidth or offsetHeight (Irish 2020)

Paint

With the layout calculated, the browser is ready to paint pixels to the screen. The browser paints each layer separately, beginning with the bottom layer and making its way to the top layer (Hiwarale 2019). As it paints each layer, the browser fills individual pixels based on the visible properties of each layer, a process known as rasterization (Hiwarale 2019). The output of this operation is a collection of bitmap images (Hiwarale 2019). The browser sends these bitmaps to the GPU to draw them on the screen.

After painting the initial page, the browser tries to optimize updates during future paints. To update the screen, the browser calculates the difference from what it has already painted and only repaints the changed portion (Hiwarale 2019). To highlight areas of the page the browser is painting, enable Rendering > Paint flashing in Chrome DevTools.

Conclusion

To render a webpage’s content, the browser walks through the steps of parsing, constructing the render tree, performing layout, and painting. By knowing what work the browser performs in each step, we can develop strategies to optimize the critical rendering path.

Sources

Hinkelmann, F. (2017, May 16). Franziska Hinkelmann: JavaScript engines - how do they even? | JSConf EU [Video]. YouTube. https://www.youtube.com/watch?v=p-iiEDtpy6I

Hiwarale, U. (2019, August 2). How the browser renders a web page? —DOM, CSSOM, and Rendering. JsPoint. https://medium.com/jspoint/how-the-browser-renders-a-web-page-dom-cssom-and-rendering-df10531c9969

Holtta, M. (2017, May 18). Marja Hölttä: Parsing JavaScript - better lazy than eager? | JSConf EU 2017 [Video]. YouTube. https://www.youtube.com/watch?v=Fg7niTmNNLg

Irish, P. (2020, April 22). What forces layout / reflow. https://gist.github.com/paulirish/5d52fb081b3570c81e3a

MDN contributors. (2022a, June 5). Critical rendering path. https://developer.mozilla.org/en-US/docs/Web/Performance/Critical_rendering_path

MDN contributors. (2022b, June 5). : The Script element. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/script

MDN contributors. (2022c, June 3). Window: DOMContentLoaded event. https://developer.mozilla.org/en-US/docs/Web/API/Window/DOMContentLoaded_event

MDN contributors. (2022d, June 3). Window: load event. https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event

MDN contributors. (2022e, June 3). Document Object Model (DOM). https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model

Saini, A. (2020, October 20). How JavaScript Code is executed? ❤️& Call Stack | Namaste JavaScript Ep. 2 [Video]. YouTube. https://www.youtube.com/watch?v=iLWTnMzWtj4

Architecture Optimization