Practical Ecommerce

SEO: The Duplicate Content Penalty

The question of Google’s supposed “duplicate content penalty” seems to be on everybody’s minds these days. This issue is particularly relevant for dynamic ecommerce websites, as they often have multiple URLs that lead to the same product content (or nearly the same, with only a variation in the product’s color or size).

Several factors magnify the problem of duplicate content. The fact that manufacturers give all their distributors the same product descriptions means those identical phrases end up on numerous sites including your own. Then there are the “scrapers” that run rampant across the web, lifting your content to use on their spam sites. How to foil the scrapers is a big topic that will need to be addressed in a future column.

There has been a lot of talk — misinformation, really — about the “duplicate content penalty” from Google. It’s a myth; it doesn’t exist. You’re hearing it straight from the horse’s (Google’s) mouth here, here, and here. I have it from reliable sources within the Googleplex that Google very rarely penalizes for duplicate content. Instead, it’s a filter.

Duplicate content is copy substantively similar to copy elsewhere on the web, and which is findable by the search engine spiders. Search engines don’t want to present users with similar-looking listings on the same search results page because that would degrade the search experience. So a query-time filter is put in place by Google to avoid displaying multiple copies of the same or very similar content.

Google engineers don’t want to penalize useful websites for the inadvertent creation of duplicate pages — such as when there isn’t a 301 redirect from domain.com to www.domain.com, when the site responds to multiple domain names, or when there are tracking or otherwise superfluous parameters in the URL like session IDs, tracking tags like source=topnav, flags like photos=on and photos=off, etc.

Indeed, even Google’s own Googlestore.com has long-standing duplicate content issues. A previous incarnation of the site had thousands of copies of category pages — e.g., a search for inurl:accessories.html site:googlestore.com returned more than 7,000 pages — due to session IDs in the URLs. Googlestore.com has since corrected that, but it still has, on average, five copies of every product page indexed.

A situation like this leads to PageRank dilution. One definitive version of the product page will receive more PageRank than five versions will receive. That’s because the votes (links) are split five ways. The end result is that product with duplicate content will never rank as well in Google’s search engine as a unique page.

This is not a penalty, merely the natural consequence of an overlooked problem.

It’s true that even a filter can end up feeling like a penalty if you end up filtered out of the search results and your competitor is left to collect the reward. This is more likely to occur if you use the same manufacturer-supplied product copy as everyone else and fold in little or no unique content of your own. In failing to do so, you will automatically garner less authority/PageRank than your competitors.

“I’ve been hit with a duplicate content penalty,” seems to be the excuse du jour. A year ago it was “I’m being sandboxed.” I’m tired of hearing either one bantered around by site operators who use the excuse as a crutch: The real problem is a lack of understanding of best practices and a flawed SEO implementation.

Best practice dictates that you should eliminate, to the best of your ability, the occurrence of duplicate pages in the search engines. It requires that you make your content as unique as possible to differentiate it from what’s found on other sites. If the snippet and/or “shingles” on your page are strikingly similar to those on someone else’s, be warned that Google tends to favor the page with greater PageRank. Of course this is an overly simplistic explanation. Suffice to say, since duplicate results are going to be filtered out, you will be better off if your site is well-endowed with strong PageRank.

Don’t get overly concerned if spammers scrape your content and post it on their own sites. Google is not going to demote you for that. Similarly, don’t get overly concerned if you have the same product copy as a hundred other retailers who sell the same wares. As I said, you won’t be penalized for that either.

My best advice to you is to augment any non-unique content you have obtained through data feeds. Wrap it around unique relevant content such as customer-contributed (or self-written) product reviews and related product recommendations (up-sells and cross-sells). By all means, tweak the product copy as much as you can: Paraphrase it, incorporate synonyms, revise it to include better keyword choices, and embellish those paragraphs with additional descriptive prose. Don’t stop with descriptions: Make unique product page title tags rather than just using the product name (which is what everybody else does), incorporate additional bits of information into the title tag such as the model number if that is something a lot of people use to search. You can figure this out by using keyword research tools; see my past article on keyword sleuthing.

Many online retailers have implemented URL re-writing to make their URLs more search engine friendly; see my article Avoid Complex URLs. The implementation of URL rewriting may sometimes be a major initiative that must be phased in. Keep in mind that not all the URLs across the site can be replaced with search engine-friendly versions in one go-around. In any event, the final outcome is usually a less duplicate-laden website, because superfluous variables have been removed. The trick is to ensure that, after you eliminate them from the search indices, you keep all the old URLs alive and functioning through the use of a 301 (permanent) redirect.

Duplicate pages are a reality of the web. Articles are syndicated all over the web. Developers stick session IDs and flags in the URLs that don’t substantially modify the content but create duplicate pages for the spiders when they crawl. Thankfully, Google has learned how to live with it.

Still, the easier we can make it for Googlebot, the better. (Tips for making it easier are here and here.)

Most of all, remember this: Duplicate content isn’t something that should keep you up at night. Now go get some sleep!

Stephan Spencer
Stephan Spencer
Bio  |  RSS Feed


Get the Practical Ecommerce RSS feed

Comments ( 15 )

  1. Legacy User March 22, 2007 Reply

    Thanks, Stephan, for clarifying some mud puddles in my thinking. As always, this is another one of your classic, pertinent articles written for us to understand (and apply!).

    – *Andrew Jensen with SozoLogic.com*

  2. Legacy User March 21, 2007 Reply

    Great article! really enjoyed the the helpful information.

    – *rick@theworkwearstore.com*

  3. Legacy User March 21, 2007 Reply

    Lower page rank sounds like a penalty to me.

    – *Leffrey*

  4. Legacy User March 22, 2007 Reply

    Hi. Thanks so much for the article. I was driving myself crazy with the Google thing. I have shut off the ranking tool, and I am now going to concentrate on making sure the content on my page is unique. My server had a major meltdown, and I am considering doing a mirror site. Would this be considered duplicate content? I am not sure what to do to prevent something like this from happening again. I was down for three days. I was told that a mirror site was bad news. So after reading this I am a little more relaxed about it. Thanks for the article.

    – *Sandy Woods artgally.com*

  5. Legacy User March 22, 2007 Reply

    This is the first sensible article on duplicate content I have read. All the others appear to be scare tactics written by SEO companies to purchase their services.

    Joe Rahall
    Domain-Names-R-Us.com

    – *Joe Rahall*

  6. Legacy User March 23, 2007 Reply

    Yes, that's a good article. Thanks.

    – *Jigar Gondalia*

  7. Legacy User April 7, 2007 Reply

    Sandy, I wonder what your purpose is of starting a mirror site? If your site has built up trust and a history, why would you abandon it (or leave it languishing on an unstable server) to start a new site with the same content at a new domain that is not aged (old domains fare better in Google than new ones) and with a URL that has no history (sites with a long history of content in web.archive.org fare better in Google than ones with no history). If your server is unstable, fire your web hosting company and move your site and domain somewhere else. A mirror site is not the answer for what ails your site.

    – *Stephan Spencer*

  8. Legacy User April 11, 2007 Reply

    Stephan, And what of Article Directories? Should authors/writers create 2+ versions (one for the article site and another for their website) for any articles they post? Google Webmaster Central says to, "Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article." Is this the easy fix – a simple reference back? If so, great! Nothing to worry about.

    – *MayaAndMarketability.com*

  9. Legacy User April 20, 2007 Reply

    Whenever I post an article on my website, whether it's my own syndicated article or someone else's, I add something of my own such as a short Editor's Note at the beginning of guest articles. I add contextual affiliate links, links to other pages on my website, and/or a different resource box at the end of my own articles. The Editor's Note is a few sentences like a short article review, and the affiliate links are something many if not most article directories forbid. Links to my own web pages add authority to my entire site.

    – *Kathryn Beach*

  10. Legacy User July 9, 2007 Reply

    Is there anyway to find out which pages within our domain is detected as duplicate content by google?

    – *David*

  11. Legacy User September 1, 2007 Reply

    I'm using a CMS called YACS. I've had some good successes so far in being indexed by Google. Probably 1/2 of my useful pages are indexed- I have about 32000 pages of content, and about 14000 are in Google. Happy happy, joy joy!

    BUT — all of this talk about duplicate content has gotten me nervous.

    I'm testing out a new robots.txt addition that will keep googlebot from seeing my pdf, print and msword versions of the documents. Will this help or hurt? I really don't know.

    What do you think? Is a PDF version or a PRINT version harmful to the strength of a particular page? Or is it asthetics (wanting to point someone to the real page rather than the PRINT-VIEW version?)

    Part of me wonders, though, if having almost duplicated content via a pdf file for example might not actually HELP the ranking rather than hurt them…

    …but I guess I'll find out in a few days :-)

    Ken
    free.naplesplus.us collier county fl news/info/business directory

    – *Kenneth Udut*

  12. Legacy User September 12, 2007 Reply

    can we use print version of a page to over come duplicate content issue… I have my site blog … and the data on the blog is coming from my data base… so everytime it sends the same or with little changes… but we have been facing a great penalty problem since two years … but now its okay.. but tell me if I use print version for my blog.. thanks in advance..

    bilal

    – *Bilal*

  13. Legacy User October 22, 2007 Reply

    I don't think that duplicate content is a factor on Google these days. For an example Google this term: "Kichler 3849TZ". It's a popular lighting fixture that we sell. Have a look at the natural results, 7 out of the top 10 are duplicate/spam content all from different websites run by CSNstores.

    I wouldn't worry about duplicate content, I'd worry about getting muscled off of page 1 via a huge competitor moving into your vertical and cranking out duplicate content across dozens of websites.

    – *Jerry*

  14. Legacy User October 29, 2007 Reply

    @Jerry

    The keyphrase "Kichler 3849TZ" might be too specific to be an example. There are fewer than 250 results, so Google is just showing the strongest ones which happen to be CSNstores.

    If you type in "Kichler" there are only two CSN results in the top 100 and they are in the 90s.

    That's not to say that CSN has a bad model — They seem to be doing quite well.

    Inbound links will also help overcome duplicate content, if you have enough of them.

    I have seen duplicate content seriously hurt Web sites. Whether it is called a "filter" or "penalty" it can still knock your site down.

    – *Pocket SEO*

  15. Legacy User November 2, 2007 Reply

    I'm happy to see that my comment generated a response and have considered PocketSEO's comments. What I think PocketSEO is missing is that highy targeted (BRAND + MODEL NUMER) searches are quite common amongst high-end shoppers looking for discounts on premium merchandise. I have hundreds and hundreds of examples of the page 1 domination that I described in my original post.

    One need only look to the success of the shopping comparsion engines for proof of this. From my own site's statistics, it's obvious that hits from these searches have a VERY VERY good conversion rate.

    Now the thing is… CSN stores has taken this SEO tactic to the Nth degree. CSN is typically using the (BRAND + MODEL NUMER) at the begining and again at the end of each of their product pages' title tags.

    Add to that the dozens of CSNstores web sites that each product appears on and what you have is a very large, very clever merchant who has successfully played Google's current algorithm.

    Obviously my opinion is biased, but I think it's a damn shame and that Google's fairness doctrine, duplicate content policy or desire to provide honest results have noy stepped in to clean up this spammy situation.

    We've been talking about pumping out a few dozen cloned sites in response to this duplicate content situation, but it just doesn't feel like it's the right thing to do to our customers.

    – *Jerry*

Email Newsletter Signup

Sign up to receive EcommerceNotes,
our acclaimed email newsletter.

And receive a free copy of our ebook
50 Great Ecommerce Ideas