How Does Google Find and Index Web Pages?

When a potential customer enters a query on Google, searching for the products your business sells, she is not searching the live internet. Rather, she is looking up web pages on Google’s index of the internet.

In a sense, this consumer is searching the recent, known web rather than the real-time, live web. So even before you worry about how well the pages on your ecommerce site will rank on Google, it’s important to understand how the Google search engine finds and indexes those pages.

Spiders and Sitemaps

Google uses two primary methods for finding ecommerce web pages: sitemaps and software called web spiders or crawlers.

A web spider downloads a copy of a given web page. Imagine for a moment that the Googlebot (this is what Google calls its web spider) lands on the “Checkerboard Slip-on” page of

This Vans product detail page includes many links. Googlebot will follow these links to discover other pages.

This Vans’ product detail page includes many links. Googlebot will follow these links to discover other pages.

Googlebot will note the content on the page — product name, description, price, images — but it will also track the dozens of links on the page.

Then, unless the link or robot.txt file explicitly tells Googlebot not to follow it, the spider will follow the links to every page and catalog what it finds. In the Vans’ example, this would take Googlebot through site’s product catalog, to many of the site’s informational pages, including store locations and gift cards, and even its content pages about skateboarding, snowboarding, and BMX.

Each time Googlebot encountered a link to a new page, it would add the URL to its list of pages to crawl. In this way, Googlebot can discover every page on the Vans’ website.

So let’s apply what we now know to help Google discover the pages on your ecommerce site.

First, the better job your site does with internal links — via topic clusters, for example — the easier for Googlebot to find all of your pages.

Second, focus on getting other sites to link to your pages. Link building not only helps to boost your rankings in search results but it can also help with page discovery.

Next, Google also uses sitemaps as a way to find ecommerce pages. A sitemap is a text or XML file that lists all of the pages you want Google to know about from your ecommerce website. You can submit the sitemap via Google Search Console.

Once submitted, the sitemap can help Google work its way through every page on your site. Just be aware that “using a sitemap doesn’t guarantee that all the items in your sitemap will be crawled and indexed, as Google processes rely on complex algorithms to schedule crawling. However, in most cases, your site will benefit from having a sitemap, and you’ll never be penalized for having one,” according to Google.

In short, if you want Google to find your ecommerce pages, (i) develop a good internal linking strategy, (ii) encourage links from other sites to yours, and (iii) submit a sitemap.

Helping Google

As Googlebot works its way through your ecommerce website, it will also take into consideration the page title and the contents of important tags, such as headers. This is why so many SEO experts recommend putting keyword phrases in a page’s title and H1 tags.

Google puts some weight on structured data, particularly in the JSON-LD format. This structured data markup helps Google understand what sort of page it has in view and may contribute to indexing and ranking.

Structured data markup in the JSON-LD format helps Google properly index a page.

Ultimately, Google is trying to figure out what your page is about. The better job you do of making it clear and easy to understand, the more likely that Google will properly index your ecommerce pages. And properly indexed pages are what appears in search results when someone conducts a query.

To improve a page’s indexability, focus first on providing useful information for human visitors. Google wants to ensure that pages will be good for its searchers.

For example, don’t stuff a page with keywords or keyword phrases. Use Google’s guidelines for content and organization.

“As well as matching keywords, algorithms look for clues to measure how well potential search results give users what they are looking for,” Google explained. “When you search for ‘dogs’ you likely don’t want a page with the word ‘dogs’ on it hundreds of times. We try to figure out if the page contains an answer to your query and doesn’t just repeat your query. So Search algorithms analyze whether the pages include relevant content — such as pictures of dogs, videos, or even a list of breeds.”

Armando Roggio
Armando Roggio
Bio   •   RSS Feed