Let’s face it, the engines control search. They may keep tight-lipped control of their algorithmic secrets, but they also provide their users with tools that SEO practitioners can use to understand how the engines see their sites. These advanced operators offer a peek behind the curtain, enabling the patient SEO to discover and diagnose site issues. We’ll start by running through some of the queries and the engines they work on, and then dive into some real-world diagnosis using these operators.
These are a tiny fraction of the operators available. I find myself using this handful of operators in different combinations on a daily basis as an SEO practitioner focused on the architectural challenges of a popular ecommerce site. For a rundown on all of Google’s advanced operators, see Google Guide’s exhaustive search operators list.
Applying Search Operators
Like all tools, operators are useless if you don’t know when and how to use them in the real world. Let’s take a look at Practicalecommerce.com itself as our example today. Working from the foundational rule that a page must be indexed before it can be ranked, the central question we’ll be answering with today’s example is, “Are the best URLs indexed on practicalecommerce.com?”
A basic query of [site:practicalecommerce.com] begins to answer that question. Google reports that it has indexed 7,670 pages. That’s a number, but is it a good number? Which pages are indexed? Are they the “right” ones, the ones that will drive conversions to subscription? The ones that have valuable keyword targets? Or, are they pages that have little value to searchers and will result in a bounce back out of the site? Are any of them duplicates?
To answer these questions, we need two things: 1) an understanding of which pages are valuable; and 2) a combination of operators to begin to dissect the index and discover whether those valuable pages are indexed.
Let’s say that Practical eCommerce prioritizes its articles as its most valuable content for SEO. We’ll want to find many individual articles in the index, but each one only at one URL. That changes our central question to, “Are the best URLs for articles indexed on Practicalecommerce.com?”
Next we’ll narrow the query down to articles: [site:practicalecommerce.com/articles], which returns 2,390 results. Theoretically, with 87 pages of 20 articles each, there are 1,740 articles on the site. Whether this guess is accurate, and whether it is consistent with the indexed number, we’ll want to do our own investigation to be sure that the right pages are indexed not just the right number. By excluding the www subdomain, where some of the articles live, we can see where other articles live. [site:practicalecommerce.com/articles -inurl:www], 232 results, mostly on the developer subdomain, it seems. But, what do I get if I exclude the developer subdomain also? [site:practicalecommerce.com/articles -inurl:www -inurl:developer], 2,130 results, primarily at the ww and forum subdomains.
Three interesting observations spring to mind looking at the results for this query. First, ww is an odd subdomain to find in an index. Second, it’s also a bit odd to find articles in a forum subdomain. And third, why did the number of results increase when we limited the search result set further?
The last is easiest to answer: Google returns more accurate results for more detailed queries. For broader queries, like [site:practicalecommerce.com/articles -inurl:www], Google doesn’t bother to include duplicate content and other low-value pages. But, when the query strips away the higher value content, such as we did with [-inurl:www -inurl:developer], Google turns out its pockets to show the stuff buried deep in the index.
The first two have the same answer: relative URLs and an uncanonicalized site. Confirming the extent of the duplicate content issue requires other advanced operators, inurl: and intitle:. If I suspect duplicate URLs for the same article, I can do a very specific search for a URI fragment or title fragment to attempt to uncover others with the same URI and title. Be sure to check multiple articles because some may be more or differently duplicated than others.
For example, the article “Performance Metrics for Ecommerce, Part Three” was originally published at http://www.practicalecommerce.com/articles/2254-Performance-Metrics-for-Ecommerce-Part-Three. Using a couple of different queries ([site:practicalecommerce.com/articles inurl:2254] and [site:practicalecommerce.com/articles intitle:"Performance Metrics for Ecommerce, Part Three"]) I can see that it is also indexed at four other subdomains:
The ww subdomain is likely a typo. Somewhere on or off the site someone mistyped the www subdomain in an anchor tag, and because the anchor links on practicalecommerce.com are coded relatively the entire site was then crawlable and indexable at this false ww subdomain. The last three subdomains are probably a result of content from one editorial section with one subdomain appearing as a feature in another editorial area and subdomain. Rather than force each page of content to reside on a single subdomain, the pages are allowed to inherit the subdomains of different editorial sections, creating duplicate URLs for the same page of content.
How Many Articles Are Indexed?
Next, if we agree that articles are the most important pages for the purposes of this example, and we theorize that articles should be hosted by default at the www subdomain, we can determine how many of articles are indexed at their default location: [site:www.practicalecommerce.com/articles]. Why didn’t we just do that to start with? If we had, we would have missed the duplicate URL issue only discovered by removing the www. The issues are typically found in the exceptions rather than the norm. And again, with this result set we find more exceptions. Not all files in the /articles directory on the www subdomain are articles. Some are image files ([site:www.practicalecommerce.com/articles inurl:/images/ -"Create an account or Sign In"]) and some are “next” pages ([site:www.practicalecommerce.com/articles/page]) allowing navigation to the articles. So to get closer to the real indexed article count we need to subtract these other page types from the articles: 1,870-75-125 =1,670.
SEO practitioners can drive themselves to distraction trying to pry precise numbers from an imprecise tool like search operators. Know when to stop digging and turn to improving. For our purposes, 1,670 articles on the www subdomain is accurate enough, despite the belief that we have 1,740 articles on the www subdomain. The difference is proportionately small, and no further oddities were discovered in the final query.
Did advanced search operators help us answer the primary question, “Are the best URLs for articles indexed on practicalecommerce.com?” We have discovered that an appropriate number of articles are indexed at the default www subdomain, and we’ve uncovered evidence of duplicate URLs at other subdomains. So the answer is yes, the best URLs for articles are indexed in appropriate numbers, but so are other undesirable duplicate URLs. For the resolution to this duplicate content issue, read more about duplicate content.