Duplicate content is endemic to ecommerce sites. Seemingly every platform, no matter how SEO-friendly, produces some form of duplicate content, holding a site back from peak performance.
First, let’s look at why duplicate content matters. It may not be the reason you’re thinking.
A Dampening, Not a Penalty
Contrary to popular belief, there is no duplicate content penalty. Google’s blog, all the way back in 2008, said, “Let’s put this to bed once and for all, folks: There’s no such thing as a ‘duplicate content penalty.’”
That said, there is a very real — but less immediately visible — search engine optimization issue with duplicate content. An algorithmic dampening or decrease in performance occurs across the page types that suffer from duplicate content.
Duplicate content introduces self-competition for the same keyword theme and splits link authority between two or more pages. These two issues cut right to the heart of what’s important to search engine rankings: relevance and authority.
Having more than one page targeting the same keyword theme makes them all less uniquely relevant to search engines, because it’s harder to determine which one to rank. And since multiple pages are being linked to internally with the same keyword theme, the links that could all be strengthening one single page are instead weakly supporting multiple pages and giving none of them superiority.
“Dampening,” then, is a weakening of the signals that a site sends to search engine algorithms, which affects the site’s ability to rank.
How is this not a penalty? In Google’s world, a “penalty” is applied manually by a real human on Google’s web quality team when certain pages or a whole site meets a predefined definition of spammy. Someone has to physically penalize a site if it’s actually a penalty. A dampening is algorithmic in nature and tends to be more difficult to diagnose, since Google will not alert you to algorithmic issues the way it will alert you to a manual penalty via Google Search Console.
The problem with getting rid of duplicate content is that just killing off the pages can produce a couple of unwanted effects.
- Customer experience. In some cases, your shoppers need to see those pages. Sorted browse grids, wish list pages, print pages, and more can technically be duplicate content. Killing those pages would hurt your customer experience and, potentially, your revenue.
- Link authority. Every indexed URL has at least a smidge of link authority. Just killing the pages off would be wasting link authority, which would ironically hurt your SEO in the service of helping it.
The goal, then, is to precisely identify what you need to accomplish. Do you want to remove the page for search engines but keep it for shoppers? Do you need to eliminate the page for shoppers and search engines both? Is it more important that you get rid of the page immediately (for legal or other reasons), regardless of SEO impact, or are you trying to benefit SEO with the planned action?
The chart below can help you walk through that decision process.
7 Ways to Remove Duplicate Content
|Method||Affects Bot||Affects Shopper||Passes Link Authority||Deindexes URL||Command to Search Engines||Suggestion to Search Engines|
|301 Redirect (Permanent)||Yes||Yes||Yes||Yes||Yes|
|302 Redirect (Temporary)||Yes||Yes||Yes, but...||Yes|
|Google Search Console: Remove URLs||Yes||Yes||Yes|
|404 File Not Found||Yes||Yes||Yes||Yes|
|Meta Noindex||Yes||Yes||Yes, but...|
|Robots.txt Disallow||Yes||Yes||Yes, but...|
The first option on the list, 301 redirect, is the star of the SEO show. Whenever possible, use the 301 redirect to remove duplicate content because it’s the only one that can accomplish the important combination of redirecting the bot and the customer, passing link authority to the new URL and deindexing the old URL. Unlike some other options, the 301 redirect is a command to search engines, as opposed to a simple request that may be ignored.
If your development team balks at 301 redirects, or if shoppers need to continue seeing the page that search engines consider duplicate content, try canonical tags instead. These still require developer support, but they require less testing to implement and use fewer server resources while they’re live. Keep in mind, though, that canonical tags can be ignored if Google thinks you’ve made a mistake, or just doesn’t feel like obeying them for some algorithmic reason.
Number three on the list is 302 redirects, though they’re only on the list at all because they’re related to the all-powerful 301 redirect. According to Google engineer John Mueller, 302 redirects do pass link authority, but in 99 percent of the cases there’s no reason to test this theory because 301 redirects accomplish more with the same amount of effort. The reason to use a 302 redirect would be if the redirect is truly temporary, and Google should not deindex the page because it is coming back soon.
Removing Content Is Harmful
The remaining four options only deindex content. They do not redirect the bot or the shopper, and they do not pass link authority to another page. Use these options, therefore, if they are the only viable choice, because killing pages without redirecting them wastes link authority.
Link authority is the most valuable and difficult-to-earn commodity in natural search. You can create wonderful content. You can optimize your internal linking structure to flow authority where you need it within your own site.
But ethically increasing your link authority from a truly diverse and authoritative collection of external sites takes a rare combination of luck, digital outreach, press relations, social media marketing, offline marketing, and more. The number of sites that have mastered it are few and far between.
If you have to kill a page, determine whether it needs to be killed purely for SEO reasons (such as duplicate content) or for legal reasons (such as no one should ever see it again). If you only want to exclude it temporarily from Google, you can quickly and easily do that in Google Search Console in the Remove URLs tool (Google Index > Remove URLs). Customers will still see it on the site as they browse, but Google will deindex it immediately. Take care with this tool. Used incorrectly it could deindex your entire site.
The only way to truly remove a page from both human and bot visibility is to remove the page from the servers, thereby forcing the URL to return a 404 “File not found” error, or 301 redirect it to a new URL.
Meta robots noindex tags and robots.txt disallow commands are the last options on my list, for a combination of reasons. First, they waste link authority. Noindex and robots.txt disallow commands tell search engines, in different ways, that they should not index certain pages. If a page is already indexed it has some amount, however small, of link authority. Don’t waste that by telling search engines to just ignore the URLs and quietly deindex them.
Second, search engines once strictly obeyed noindex and robots.txt disallow commands. But today search engines sometimes view them as suggestions, especially for content that is already indexed. Thus noindex and robots.txt disallow commands are hit and miss, and when search engines do obey them, they can take months to go into effect. If you want something deindexed, quickly and certainly, choose another method.
Meta robots noindex tags and robots.txt disallow commands are helpful as a safety measure once content is deindexed, however. For content that is not indexed, they have proven more effective in preventing future indexation, versus deindexing content that is already indexed.