Technical SEO

SEO: Optimizing Crawl Budget for Ecommerce Sites

Optimizing an ecommerce site for crawl budget may impact how often Google’s web crawler visits a given page, and, therefore, may mean that new or updated content will appear more quickly in Google search results.

For Google, crawl budget describes the number of pages on a particular site that the company’s search spider, Googlebot, can and wants to crawl.

Remember, the pages that Googlebot crawls are saved, indexed, and ranked. Those pages then appear on Google’s search results.

Optimizing an ecommerce site for crawl budget may impact how often Google’s web crawler visits a given page…

It is important to note that how often Googlebot crawls a web page does not impact how well that page will rank for a given search query. But optimizing for crawl budget may guide Googlebot to the most important content on a site. This, in turn, may impact how well some of those pages rank, especially if they were not indexed otherwise.

Crawl Budget Determined

In 2017, Google’s Gary Illyes described how Google determines crawl budget for a specific website. His explanation had three parts: crawl limit, crawl demand, and other factors.

Crawl limit. Google doesn’t want to overwhelm a site or its server. In this way, “Googlebot is designed to be a good citizen of the web. Crawling is its main priority while making sure it doesn’t degrade the experience of users visiting the site. We call this the ‘crawl rate limit,’ which limits the maximum fetching rate for a given site,” Illyes wrote.

If Googlebot sees indications that it is impacting a site’s performance, it will slow down, effectively visiting pages on the site less frequently. This may mean that some pages are not indexed at all. Conversely, if Googlebot is getting fast responses from the server, it may increase the frequency and intensity of its visits.

Crawl demand.“Even if the crawl rate limit isn’t reached, if there’s no demand from indexing, there will be low activity from Googlebot,” wrote Illyes.

“Demand from indexing” can take a couple of forms. First, for popular websites, Google wants to ensure that it has indexed the most recent and up-to-date content. Second, Google doesn’t want a stale index. So if it has been a while since Googlebot visited a site, even if it’s not popular, there could be relatively greater crawl demand.

Other factors. Content quality and site structure also matter. Illyes suggested avoiding low-quality content, certain kinds of faceted navigation, duplicate content, and similar.

“Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site,” wrote Illyes.

For example, a popular supplement retailer may be experiencing this specific problem now. The company has a massive user forum with millions of URLs. This forum is mostly low-value content, but it consumes a significant portion of this particular ecommerce company’s estimated crawl budget.

Large Sites

Limits in crawl budget impact relatively few websites. Google Webmaster Trends Analyst John Mueller wrote in a Tweet that “most sites never need to worry about this. It’s an interesting topic, and if you’re crawling the web or running a multi-billion-URL site, it’s important, but for the average site owner less so.”

Thus, crawl budget may be essential for sites such as Lands’ End, Bodybuilding.com, Walmart, or other established ecommerce enterprises. These sites may experience drops in organic traffic if they have crawl budget problems.

Nonetheless, owners and managers of all ecommerce sites, regardless of size, should be aware of crawl budget.

…owners and managers of all ecommerce sites, regardless of size, should be aware of crawl budget.

SEO Benefits

Fortunately, the methods used to optimize for crawl budget are also helpful for search engine optimization. When it focuses on the things that can improve your crawl budget, your company is also implementing measures that can improve organic rankings.

What follows are five tips that could help your ecommerce site optimize for crawl budget. All of these are also beneficial for SEO, even if your site doesn’t have a crawl budget issue.

Prioritize what Googlebot crawls. This may mean blocking portions of your site from Googlebot. For example, does the customer forum mentioned above really need to be on Google? If not, the supplement retailer could increase the number of product pages included in Google’s index. Thus, using robots.txt and a nofollow directive might go a long way toward crawl budget optimization.

Have an excellent sitemap. In 2018, Illyes said that sitemaps were one of the main ways Google discovered URLs. This doesn’t guarantee that Google will crawl or index a particular page, but a sitemap can help. Your sitemaps should be consistent; they should help to eliminate indexation problems; and for large sites, they should be dynamic, according to SEO consultant Michael Cottam.

Reduce errors and redirect chains. Crawl budget optimization is often about clean technical SEO. When it visits a page on your site, Googlebot should receive a status code 200 (meaning everything is alright) or a permanent redirect code 301. Make sure, however, that one redirect is not leading to another in a chain.

Improve site performance. “According to Google’s crawling patents, the robot budget is exactly matched to a given server’s performance. In other words, if Googlebot is intensively crawling servers and its efficiency is declining, then the bot slows down…in such a situation, the number of crawled URLs during a given period of time certainly decreases,” wrote Wojciech Murawski, a senior SEO specialist at Onely.

Update often. An ecommerce website should be regularly updated and pruned. There should be additions to product detail pages, new blog posts, and updates to aging content.

Crawl Budget Resources

Optimizing crawl budget can be complicated. If you manage a large ecommerce site, it is certainly worth some research. Here are resources to help.

Armando Roggio
Armando Roggio
Bio   •   RSS Feed


x