Design & Development

Use HTML Caching to Increase Page Speed

Heavy traffic to a website can result in performance problems, slower page speed, and fewer conversions.

In this article, I’ll share a simple process to speed up page loads. The idea is to cache not just images and scripts, but also the HTML content. After all, if the HTML takes too long to load, it will delay every other request.

Note the screenshot below from Macy’s. It shows the effect of Macys.com not caching the HTML of the home page in its content delivery network. This adds one second of page load time. Every other page resource will load after the HTML content is downloaded and parsed. The page took 16.84 seconds to load, which is slow.

Macys is not caching their homepage, leading to an extra second of load time.

Macys.com is not caching its home page, leading to an extra second of load time. Click image to enlarge.

Now, consider Gap.com, which I addressed in March as an example of a slow site. Since then, The Gap launched a new website that caches HTML content in its CDN. The HTML request now adds just 78.91 milliseconds — the home page loads in 3.60 seconds, which is much better than the 15 to 20 seconds it took in March.

The Gap is caching HTML content, leading to faster page loads.

The Gap is caching HTML content, leading to faster page loads. Click image to enlarge.

Caching HTML content on ecommerce websites — and dynamic websites in general — is tricky. It doesn’t happen by default in a CDN. Most normally cache just static page resources such as images, style sheets, and scripts.

Dynamic vs. Static Content

For sites with static page content — i.e., not personalized in any way — page caching creates no problems. But for sites with dynamic content that changes among users, caching HTML content could create errors.

For example, a visitor that adds products to his shopping cart changes the content on all pages to show the number of items in the cart. If an ecommerce merchant cached the pages of this user, other users would see an inaccurate number of items in their cart. This concept applies to any type of personalization.

There are at least two solutions to the problem.

  • Implement web page personalization in separate JavaScript files and don’t cache them, or cache them for a short period.
  • Cache HTML only for anonymous users — users that are not logged in or haven’t added any products to their cart.

Let’s review each option.

Personalization in Scripts

The first option, implementing personalization in separate JavaScript files, is what The Gap is doing.

The Gap uses scripts for user personalization so it can still cache the page's HTML.

The Gap uses scripts for user personalization so it can still cache the page’s HTML. Click image to enlarge.

(To confirm The Gap’s approach, I disabled JavaScript in my Chrome browser at View > Developer > Developer Tools. Then, I clicked on the three dots to the far right, and selected “Settings.” “Disable JavaScript” is under the “Debugger” preference.)

Implementing user personalization in scripts allows caching of the page’s HTML. Then the scripts can modify the page after loading asynchronously.

Beyond using JavaScript for personalization, The Gap is caching HTML. How do I know this? Gap.com sets the standard HTTP caching header — x-cache-status — to report the status of cache resources. In the image below, the caching status of home page’s HTML says “EXPIRED.”

We can tell The Gap is caching by looking at the documentation of their web server, Nginx.

We can tell Gap.com is caching HTML by looking at the documentation for EXPIRED from Nginx, the web server. Click image to enlarge.

The documentation for Nginx (Gap.com’s server) states that EXPIRED means: “The entry in the cache has expired. The response contains fresh content from the origin server.”

After refreshing the page, the x-cache-status changed to HIT.

After refreshing the page, the x-cache-status changed to HIT. Click image to enlarge.

I refreshed the page a bit later, and the x-cache-status changed to “HIT” — the HTML was fetched from cached CDN. The page loaded much faster.

Anonymous Users

The Gap launched a new website that utilized the latest technologies. If, however, you need to cache HTML on an existing ecommerce platform, the anonymous user option might work better.

This technique is known as “punching a hole” in the cache. It works in the following way.

The web server or CDN will cache every page but avoids caching any request that meets exclusion criteria. The most common is a session cookie that the application sets when users log in or add items to the cart. The cookie is necessary to track each user individually.

Here are some sample session cookies for popular ecommerce and content platforms.

PlatformSession Cookies (Wildcards * Mean Any Character)
WooCommercewp-.*|wordpress.*|comment_.*|woocommerce_.*
WordPress wp-.*|wordpress.*|comment_.*
Magento 1external_no_cache|PHPSESSID|adminhtml
Magento 2admin| PHPSESSID|private_content_version
DrupalSESS.*|phpsessid

Again, these are cookies for users that have personalized content — such as those that log in or add items to their carts. Excluding their pages from the cache will not benefit them in terms of faster page speed. But they are likely a small percentage of total visitors. The rest will experience fast-loading pages.

Assume your site’s web server is Nginx, and Magento 2 powers your store. Here is the configuration setting to enable caching for anonymous users.

location /{
    proxy_cache my_cache;
    proxy_cache_bypass $cookie_admin $cookie_PHPSESSID 
$cookie_private_content_version;
    # ...
}

 
Enabling this on a web server or load balancer will increase performance. But the greatest benefit would come from implementing this on the CDN layer.

Here is how to do this for popular CDNs. Be sure to confirm with the CDN, however.

Finally, for some sites it is not possible to find cookies to bypass. In those instances, we can explicitly cache key pages such as the home page, primary category pages, product listing pages, and product detail pages. A disadvantage of this approach is that the rules must be updated for new pages and categories.

Hamlet Batista

Hamlet Batista

Bio   •   RSS Feed


x