Editor’s note: This post continues our weekly primer in SEO, touching on all the foundational aspects. In the end, you’ll be able to practice SEO more confidently and converse about its challenges and opportunities.
For every far-reaching strategy that can improve your site’s natural search performance, there is an equally important and far more mundane technical task that can disable it, often with a single character. Knowing the technical search-engine-optimization tools available and the reasons to use each can make all the difference.
This is the 12th installment in my “SEO How-to” series. Previous installments are:
- “Part 1: Why Do You Need It?”;
- “Part 2: Understanding Search Engines”;
- “Part 3: Staffing and Planning for SEO”;
- “Part 4: Keyword Research Concepts”;
- “Part 5: Keyword Research in Action”;
- “Part 6: Optimizing On-page Elements”;
- “Part 7: Mapping Keywords to Content”;
- “Part 8: Architecture and Internal Linking”;
- “Part 9: Diagnosing Crawler Issues”;
- “Part 10: Redesigns, Migrations, URL Changes”;
- “Part 11: Mitigating Risk.”
Like any toolbox, the implements inside seem crude and even dangerous at first glance. Each tool has a purpose, and some can be used for only one thing.
For instance, a saw is handy when you need to cut wood, but it’s of no use if you need to tighten that tiny screw on the arm of your eyeglasses. A screwdriver can tighten that screw, but it can also be used in a pinch for prying things, punching holes, or tapping in nails.
Knowing whether your SEO tool is a saw or a screwdriver, what it was designed to accomplish, and what it can also do for you safely will enable you to nimbly meet the challenges of organic search traffic.
Stopping the Crawl
Crawl tools act as open or shut doors to determine whether search engine crawlers can index pages. If you want to keep reputable search engines away from your site, use one of these tools. If you find you’re not receiving any natural search traffic, one of these tools may be the culprit.
- Robots.txt file. A text file, robots.txt is found at the root directory of your domain. For example, Practical Ecommerce’s file is at https://www.practicalecommerce.com/robots.txt. Robots.txt files tell bots like search engine crawlers which pages not to access by issuing a disallow command that reputable robots will obey. The file’s syntax consists of a user agent name — that’s the bot’s name — and a command to either allow or disallow access.
Asterisks can be used as wildcards to lend flexibility in handling groups of pages rather than listing each one individually. To prevent accidentally blocking search engines from accessing your site, always test changes to your robots.txt file in Google Search Console’s testing tool before going live.
- Meta robots tag. These tags can be applied to individual pages on your site to tell search engines whether to index that page or not. Use a NOINDEX attribute in the robots tag in the head of your page’s code to request that reputable search engines not index or rank a page.
Other attributes — NOFOLLOW, NOCACHE, and NOSNIPPET — are also available for use with the robots meta tag that determine the flow of link authority from the current page, whether to cache the page, and whether to display a snippet of it in search results, respectively. See “The Robots Exclusion Protocol,” a 2007 post on Google’s Official Blog, for more information.
Indexation tools help guide search engines to the content you’d like to have indexed, to rank and drive traffic.
- XML sitemap. Unlike a traditional HTML sitemap that shoppers use to navigate the major pages of a site, the XML sitemap is a stark list of URLs and their attributes in the XML protocol that bots can use as a map to understand which pages you’d like to have indexed.
An XML sitemap does not guarantee indexation. It merely informs bots that a page exists and invites them to crawl it. Sitemaps can contain no more than 50,000 URLs each and a maximum of 50 MB of data. Larger sites can create multiple XML sitemaps and link them together for easy bot digestion with a sitemap index file. To help search engines find your XML sitemap, include a reference in your robots.txt file.
- Google Search Console. Once you have an XML sitemap, submit it to Google Search Console to request indexation. The Fetch as Googlebot tool is also available there for more targeted indexation of particularly valuable pages. Fetch allows you to request that Googlebot crawl a page, which can also be rendered so that you can see how it looks to Google. After it’s fetched and rendered, you can click an additional button to request the page’s indexation, as well as the indexation of all the pages to which that single page links.
- Bing Webmaster Tools. Bing also offers a tool to submit your XML sitemaps, and to index individual pages. Yahoo’s webmaster tools are no longer live, but since Bing feeds Yahoo’s search results, Bing Webmaster Tools serves both Bing and Yahoo.
Google Search Console and Bing Webmaster Tools each contain many more helpful tools.
Removing Duplicate Content
Duplicate content wastes crawl equity, slows time to discovery of new content, and splits link authority, weakening the ability of the duplicated pages to rank and drive traffic. But duplicate content is also a fact of life, since modern ecommerce platforms and marketing program tagging and tracking needs all contribute to the problem. Both canonical tags and 301 redirects can resolve duplicate content.
- 301 redirects. A header request implemented at the server level, the 301 redirect is a status code that triggers before a page loads that signals to search engines that the page requested no longer exists. The 301 redirect is particularly powerful because it also commands search engines to transfer all the authority that the old page had gathered to the benefit of the new page that’s being redirected to.
301 redirects are incredibly useful in canonicalizing duplicate content, and are the preferred method for SEO whenever possible given technical resources and marketing needs. See Google’s Search Console Help page for more.
- Canonical tags. Another form of metadata found in the head of a page’s code, a canonical tag tells search engine crawlers whether the page it is currently crawling is the canonical or “right” version of the page that it should index. The tag is a request, not a command, for search engines to index only the version identified as the canonical.
For instance, four exact duplicate versions of a page might exist: pages A, B, C, and D. A canonical tag could appear in all four pages that points to page A as the canonical version, and request that search engines please index only page A. The tag also requests that search engines attribute the link authority from all four versions of the page to the canonical: page A. See Google’s Search Console Help page for more.
Deindexing Old Content
Deindexing old content is a way of keeping a clean site in search engines’ indices. When old pages build up in those indices, they needlessly expand the number of pages that the engines feel they need to keep recrawling to maintain an understanding of your site. Search engines hold on to pages in their indices as long as they return a 200 OK status.
The 301 redirect is also an excellent tool for deindexing old content because in addition to alerting bots that the page no longer exists, it prompts them to deindex that old URL. A 404 error is another server header status code that prompts deindexation by alerting bots that the page no longer exists. As long as a URL returns a 301 or a 404 status code, the deindexation will happen. However, when at all possible, use a 301 redirect because it also preserves and transfers the old page’s authority to a new page to strengthen the site instead of weakening it as old pages die off.
Note that not all pages that look like error pages actually return the 404 status code that search engines require to deindex content. Sometimes old URLs use a soft 404 redirection to a page that looks like a 404 page but actually returns a 200 OK status code that sends the opposite message that it should remain indexed. For more, see Google’s Search Console Help page.
Lastly, structured data is a tool that helps define content types so that search engines are more likely to understand them. In Google’s case, using structured data can trigger placement in rich snippets and Knowledge Graph cards in search results pages when your content is also the most relevant and authoritative.
Usually coded using JSON-LD, structured data places bits of metadata in the page template around key elements that already exist. For example, you already have a price shown in your category and product detail pages. Structured data, using the price schema, would add a couple of tags in specified formats near the price data in the template to signal to search engines that these numbers are, in fact, a price. This, in turn, can lead to the price being displayed directly in the search result snippet.
Breadcrumbs, ratings, images, recipe details, concert listings, events, lists, forum threads, and other elements can also be pulled from a page that uses structured data and surfaced directly in the search results page.