Practical Ecommerce

SEO: There Is No Duplicate Content Penalty

There is a long held belief that search engines – namely Google – penalize websites that duplicate content or produce material that is largely the same as other sites on the Internet. But, I’m here to tell you, The Duplicate Content Penalty is a myth.

Chart of Inbound Link ExamplesThink about it this way. If a page of content has five links into it and that page of content only loads at one URL, then all five of those links will flow their link popularity to a single URL. But imagine that same page of content with five links pointing to it that loads at five different URLs. Each of those duplicate URLs for that same piece of content now get a single link’s worth of passed link popularity. They’re each only one-fifth as strong as the single URL with all five links pointing to it.

The Duplicate Content Penalty myth fosters misunderstanding about the real issue: link popularity. The ideal scenario for SEO is one URL for one page of content with one keyword target. I would advise ecommerce merchants to focus their efforts on optimization rather than penalty avoidance.

Causes of Content Duplication

Many different factors result in duplicate content, but one statement is true for them all: Duplicate content doesn’t exist unless there’s a link to it. If a site has duplicate content it’s because there’s at least one link to the same content at different URLs. Links to duplicate content URLs can crop up in breadcrumbs when tracking parameters are appended, when a site doesn’t link consistently by subdomain, when filtering and sorting options append parameters to the URL, when print versions generate a new URL, and many more ways. Worse, each of these can compound the other sources of duplicate content, spawning hundreds of URL variations for the same single page of content.

Home pages would be one example. In some cases, the domain resolves as the home page but clicking on the navigational links to the home page (the same page of content) results in a different URL. Banana Republic has 18 Google-indexed versions of its home page, and several others that aren’t indexed, including:

Each of these home page URLs has at least one page linking to it. Think how much stronger this page could be if every one of the links pointed to each duplicate home page URL instead of being linked to http://www.bananarepublic.com/.

Types of Content Duplication

Canonical

Lack of canonicalization is a common source of duplicate content. Canonicalization refers to the removal of duplicate versions, or in SEO, to the consolidation of link popularity to a single version of a URL for a single page of content. Consider the following 10 example URLs for the same fictional page of content:

  1. Canonical URL: http://www.example.com/directory 4/index.html
  2. Protocol duplication: https://www.example.com/directory 4/index.html
  3. IP duplication: http://62.184.141.58/directory 4/index.html
  4. Subdomain duplication: http://example.com/directory 4/index.html
  5. File path duplication: http://www.example.com/site/directory 4/index.html
  6. File duplication: http://www.example.com/directory 4/
  7. Case duplication: http://www.example.com/Directory 4/Index.html
  8. Special character duplication: http://www.example.com/directory%204/index.html
  9. Tracking duplication: http://www.example.com/directory 4/index.html?tracking=true
  10. Legacy URL duplication: http://www.example.com/site/directory.aspx?directory=4&stuff=more

The URLs may be fictional but I’ve worked with sites that had every one of these sources of duplicate content and more. In the worst cases, link popularity was split between more than 1,000 URLs for a single product page. That page would be much stronger if every link pointed to a single URL.

The most effective way to canonicalize duplicate content, consolidate link popularity and de-index the duplicates is with 301 redirects.

Cannibal

When two or more pages target the same keyword target, that’s cannibalization. Ecommerce sites fall into this trap frequently when usability necessities such as pagination, filtering and sorting, email to a friend and other functions create unique pages with some or all of the same content. Technically these pages are not exact duplicates. They need to exist for usability reasons so they can’t be canonicalized to a single URL with 301 redirects.

Site owners have two options in this case: Either differentiate the content to target different keyword themes, or apply a canonical tag to recommend consolidation of link popularity without redirecting the user.

Resolving Duplicate Content

Remember that 301 redirects are a SEO’s best friend when it comes to canonicalizing and resolving duplicate content. If a redirect is off limits because the URL needs to function for humans, a canonical tag is the next best bet for consolidating link popularity. There are other options for suppressing content — such as meta noindex, robots.txt disallow, and 404 errors — but these will only de-index the duplication without consolidating the link popularity. For more detailed information on resolving duplicate content, view this tutorial or this video from Google Webmaster Tools on duplicate content.

Jill Kocher

Jill Kocher

Bio   •   RSS Feed


email-news-env

Sign up for our email newsletter

  1. Steve October 15, 2009 Reply

    You are right.

    I’m not sure if this is what you meant to say this though: "Duplicate content doesn’t exist unless there’s a link to it."

    Shouldn’t it be: "Duplicate content does exist unless there’s a link to it." ?

  2. Jill Kocher October 15, 2009 Reply

    Hi Steve, thanks for the comment. Nope, I definitely meant: "Duplicate content doesn’t exist unless there’s a link to it." The only way these duplicate URLs get generated and indexed is by the presence of at least one link to them.

    For example, if you click the header navigation link for "DVDs and Books" on the Discovery Store, you get http://store.discovery.com/?v=discovery_dvds-books&nvbar=DVDs+%26+Books with a nvbar tracking parameter appended. Other links to the exact same "DVDs and Books" content page point to a tracking parameter-less URL: http://store.discovery.com/?v=discovery_dvds-books. The v parameter loads the content in both URLs, and the nvbar parameter just tracks when users click to the page from the header navigation bar. In this example, the tracking URL http://store.discovery.com/?v=discovery_dvds-books&nvbar=DVDs+%26+Books could not exist without the link to it in the header navigation.

    Does that help?

  3. BetaScott October 16, 2009 Reply

    There is certainly a lot of conflicting information about this out there. Many SEO experts claim that filters are run to determine if something is duplicate content and drop it from the database.

    So, would syndicated content that is fully republished on another site not have any detrimental affect on the original?

  4. Steve October 16, 2009 Reply

    Jill,

    I think I understand what you mean. The example you gave me would be considered duplicate content since both URLs have links pointing to them.

    If one of those URLs didn’t have a link pointing to it, then it wouldn’t be duplicate content since one of them would not be indexed.

    Are we on the same page?

  5. Greg Percifield October 20, 2009 Reply

    I have been using our robots.txt for issues such as /page/ /sort/ /alpha/ and pagination.

    I’ve done this because there are many cases when there are less than 10 products to fill a certain page and any of the links above could produce exact same results.

    Thanks to your article, I will be updating these so that we use the canonical tag.

  6. GoogleVictim October 20, 2009 Reply

    Jill,

    I believe the duplicate content you are referring to is the smaller part of the duplicate content problem that most web masters face. I agree that at a single site level duplicate content dilutes the link popularity but is not penalized.

    However the bigger problem has been content shared with other sites. For example: Product descriptions on shopping sites, city or hotel descriptions on travel sites. Shared content on affiliate sites.

    I have seen google lash sites just because they share content with other similar sites.

    What is your take on that?

  7. George Zlatin October 22, 2009 Reply

    Yeah, i agree with Google Victim. You can’t mention duplicate content without considering content "borrowed" from other sites…in my opinion there is definitely a Google penalty for this type of duplicate content.

  8. Jill Kocher October 23, 2009 Reply

    Hi Steve — yes, exactly my point. Do you buy it?

  9. Jill Kocher October 23, 2009 Reply

    Hi BetaScott — while it’s true that engines tend to move duplicate content out of their primary index by way of filtering, that doesn’t in the least diminish the problem it causes for sites. My contention is that it’s less about the cluttery crufty blech that duplicate content creates in the index (although that’s still a problem) and more about the waste of link popularity. If Google or Yahoo etc. decide to disregard a URL because it’s a duplicate, then the links pointing to that duplicate URL are wasted in terms of the link popularity benefit they could provide. If a site resolves the duplicate content issue by forcing links to a single URL instead of 10s or 100s of duplicates, all those links that point to the individual URLs now point to one URL that has a stronger chance to rank by virtue of its stronger link popularity.

  10. Jill Kocher October 23, 2009 Reply

    GoogleVictim & George, content syndication is a whole other issue, for sure. There are 2 types of sites here, the content syndicator (the original source) and the content repurposer (receives and reposts syndicated content). Search engines value unique content, that’s a fact. So unless a site is strong enough in other ways (external links, other sources of unique content to offset the duplication, etc.) to overcome the "me too" impact of building a site around syndicated or stock content, it’s not as likely that the site will rank well.

    The holy grail is finding a way to mash up syndicated content with user generated content in a fantastically usable and compelling package that will attract links from other sites based on its sheer awesomeness. Naturally that’s difficult to execute. And it’s hard to create large amounts of unique content in a scalable and cost effective manner, otherwise sites would just do it. But whether we think it’s fair or not, the engines’ preference for unique content and link popularity is not likely to change.

  11. Nat December 10, 2009 Reply

    Hi Jill,

    Thanks for posting this article. The article and the link to the video were very helpful.

  12. flackie February 1, 2010 Reply

    This article assumes that different URLs will automatically split page rank. Google is a bit cleverer than this, it will recognize pages that are the same, and pool the page rank for them.

    This is the official Google blog on the matter:
    http://googlewebmastercentral.blogspot.com/2008/09/demystifying-duplicate-content-penalty.html

    " 1. When we detect duplicate content, such as through variations caused by URL parameters, we group the duplicate URLs into one cluster.
    2. We select what we think is the "best" URL to represent the cluster in search results.
    3. We then consolidate properties of the URLs in the cluster, such as link popularity, to the representative URL."

    Also, recommending 301 redirects is missing a far easier and better solution – the canonical tag. Again, here is Google’s official blog:
    http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html

  13. spastic July 26, 2011 Reply

    I believe I disagree. I believe there is a duplicate content penalty but NOT as you characterize it. If you have two "different domains", ie. http://www.gap.com and http://www.gap.net and http://www.thegapstore.com and http://www.thegapstore.net and these all essentially point to the same domain and you register these domain names as unique to Google, they will penalize you for it. If the domain duplicate reference is outside of the FQDN as you describe, then you are right there is no penalty. However, I think you are dealing with a misunderstanding of the "definition" of "duplicate content".

  14. eli September 7, 2011 Reply

    Jill is partly correct that Google doesn’t penalize a website for having duplicate content across its internal pages (but there are exceptions). But Spastic is also correct that Google can penalize websites that scrap content from OTHER domains.

    Source: http://www.google.com/support/webmasters/bin/answer.py?answer=66359

    For internal page duplication:

    "Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results."

    For third-party content duplication:

    "If you find that another site is duplicating your content… file a DMCA request.. and request removal of the other site from Google’s index."

  15. ionutm October 18, 2011 Reply

    Hi Jill,

    Your post made me extremly happy!
    I was searching for some time for a list with types of content duplication.
    Actualy I am interested in the ways Magento Comunity Edition generates duplicate content.
    Is there a way to put and end to my search (with a happy end, of courese).
    Thank you very much in advance!

  16. Iraj Nazarian May 7, 2012 Reply

    Dear sirs

    Hi.

    I use master page in my site and i add this line to my master page :

    link rel="canonical" href="http://www.ahra.ir"

    is it enough to get rid of "Duplicate Content Penalty" ?

    Best regards,

    Iraj Nazarian Azad
    http://www.ahra.ir

  17. Azmat December 5, 2016 Reply

    I have a website which got google penalty for having thin and duplicate content.
    what to do with these duplicate pages if the original content of a website from where i copied is also penalized for being thin and no user value addition. It is not possible to rewrite 40000 pages within short span of time. I have rewritten 400 pages so far. now what to do with remaining pages to lift google penalty. should i do 301 redirection, then where to redirect?
    google wants me to remove duplicate and thin content from my site to lift penalty.

    I want reply at parvaiz13 @ gmail.com

  18. parker April 6, 2017 Reply

    i think excessive canonicalization is bad for SEO and search engines will penalize those pages
    pls. correct me if i’m wrong

    Thanks
    http://www.mgsionline.com