Render tree construction
The browser begins by requesting an HTML file located at a URL. While downloading the file, the browser converts bytes to characters based on the file’s encoding. Depending on the size of the file, network bandwidth, and network latency, the browser may incrementally execute the steps of the rending path while it downloads the rest of the HTML file on a separate thread (Hiwarale 2019).
When the browser encounters a link, video, or img tag referencing an external file, it downloads the file on a separate thread (Hiwarale 2019). Parsing on the main thread continues uninterrupted.
When the browser encounters CSS, be it directly on an HTML tag, between style tags, or upon downloading it from an external file via a link tag, it begins building the cascading stylesheet object model (CSSOM). The CSSOM is a tree structure containing HTML nodes capable of appearing on the screen (Hiwarale 2019). This means tags like link, script, and title do not exist in the CSSOM.
Each CSSOM node contains all CSS properties applicable to the node. If our CSS does not set a property directly (or indirectly, when inherited from a parent node), the property gets a default value from the browser’s user agent stylesheet.
Within a block of CSS, rules appearing later may override rules defined at the beginning. Therefore, the browser parses an external CSS file in full even when there is network slowness (Hiwarale 2019). This makes CSS render blocking, because the browser cannot determine which styles to render until it completely parses a CSS file.
One strategy to prevent CSS from blocking rendering or scripting is to set the media attribute on an external stylesheet. For example, the code below indicates this stylesheet applies only to viewports with a width of at least 1,080 pixels. The browser will not load this stylesheet on small screens, preventing it from blocking rendering or scripting when the CSS does not apply to the current screen size.
When the browser encounters a script tag, it stops parsing until it downloads, parses, and executes the script. HTML parsing pauses because scripts commonly alter the DOM. If a script altered the DOM on the main thread while HTML parsing continued on a separate thread, race conditions would make the resulting DOM unpredictable (Hiwarale 2019).
Adding a defer or async attribute to a script tag allows the browser to continue parsing on the main thread while the script downloads on a separate thread. According to the Mozilla Developer Network (2022b):
The defer attribute instructs the browser to download the script on a separate thread and defer executing the script until it finishes parsing all HTML. The browser waits to fire the DOMContentLoaded event until it executes all deferred scripts. Once the browser constructs the entire DOM, the deferred scripts execute in the order their script tags appeared in the HTML. The defer attribute has no effect on inline scripts. It also has no effect on module scripts, because module scripts defer by default.
The async attribute instructs the browser to continue parsing HTML on the main thread until the script finishes downloading. As soon as the download finishes, the browser pauses parsing HTML to execute the script on the main thread. The async attribute has no effect on inline scripts. Unlike the defer attribute, the DOMContentLoaded event fires as soon as the browser finishes parsing the HTML, even if it is still downloading async scripts. Because the browser executes async scripts as soon as they finish downloading, these scripts might not execute in the order they appear in the HTML.
If the browser repeatedly executes the same function, it compiles the function into optimized machine code using type information inferred from previous calls (Hinkelmann 2017). As long as the function’s parameters maintain the same types, this optimized code executes faster during future calls. If the parameter types change in a subsequent call, the browser de-optimizes the function back to byte code (Hinkelmann 2017).
When the browser waits for a file to download on the main thread, it performs speculative parsing on a separate thread, looking for external resources later in the document (Hiwarale 2019). In separate thread, the browser downloads resources so they will be available later to the parser on the main thread. A speculatively-parsed resource may go unused if a script invalidates it by removing or hiding the element which references the external resource.
When the browser finishes creating the DOM and CSSOM and executes all parser-blocking scripts, it fires the DOMContentLoaded event (MDN contributors 2022c). When the browser finishes downloading all external files, such as stylesheets, images, videos, and non-async scripts, it fires the window.load event (MDN contributors 2022d).
Once the browser finishes constructing the DOM and CSSOM, it combines them to generate the render tree (MDN contributors 2022a). The browser traverses every node of the DOM and, using the CSSOM, determines which CSS rules to attach to each node. The render tree contains only nodes that take up space on the page, so the browser does not include DOM nodes with display:none. It will contain nodes with visibility:hidden or opacity:0 because these nodes occupy space.
Having constructed the render tree, the browser moves to the layout step. During layout, the browser determines how to position nodes on the page. This consists of calculating the width and height of each node and its position relative to other nodes (MDN contributors 2022a). Nodes in the render tree commonly overlap one another, so the browser creates layers to draw nodes in the correct stacking order (Hiwarale 2019).
Available width and height for layout depends on the viewportmeta tag; without it, the browser sets the viewport according to the user agent stylesheet—usually 960 px (Hiwarale 2019).
Because layout occurs towards the end of the rendering path, there are many events that cause the browser to recalculate layout. These include:
Adding, removing, or moving a DOM node
Adding or removing a stylesheet
Changing a node’s class or style attributes
Resizing or changing the orientation of the viewport
Calculating layout metrics, such as offsetWidth or offsetHeight (Irish 2020)
With the layout calculated, the browser is ready to paint pixels to the screen. The browser paints each layer separately, beginning with the bottom layer and making its way to the top layer (Hiwarale 2019). As it paints each layer, the browser fills individual pixels based on the visible properties of each layer, a process known as rasterization (Hiwarale 2019). The output of this operation is a collection of bitmap images (Hiwarale 2019). The browser sends these bitmaps to the GPU to draw them on the screen.
After painting the initial page, the browser tries to optimize updates during future paints. To update the screen, the browser calculates the difference from what it has already painted and only repaints the changed portion (Hiwarale 2019). To highlight areas of the page the browser is painting, enable Rendering > Paint flashing in Chrome DevTools.
To render a webpage’s content, the browser walks through the steps of parsing, constructing the render tree, performing layout, and painting. By knowing what work the browser performs in each step, we can develop strategies to optimize the critical rendering path.