Practical Ecommerce

Ask an Expert: Why Isn’t Google Indexing All of My Site’s Pages?

“Ask an Expert” is an occasional feature where we ask ecommerce experts questions from online merchants. For this installment, we address a question about getting pages from an ecommerce site fully indexed by Google. It comes from Robert Mobberley, CEO of Performance Motorcare Products, a U.K.-based automotive parts and accessories retailer.

For the answer, we turn to Stephan Spencer. He is vice president of SEO strategies at Covario, an SEO technology firm, and a long-time contributor to Practical eCommerce on the subject of search engines and search engine optimization.

If you’d like to submit a question, email Kate Monteith, staff writer, at kate@practicalecommerce.com and we’ll attempt to address it.

Robert Mobberley

Robert Mobberley

Robert Mobberley: “We have spent a lot of time and effort on our organic SEO and are getting good results on the keywords that have been indexed. However we are aware that there are still a large number of our pages that have not yet been indexed by Google. What in order of priority are the top five (or more) key actions that our SEO manager should be looking to take to ensure these pages are indexed as soon as possible by Google?”

 

Stephan Spencer

Stephan Spencer

Stephan Spencer: “There isn’t a one-size-fits-all answer to your problem. Incomplete indexation is a very complex issue that requires further investigation and analysis before a course of action can be recommended. How much PageRank have you garnered from other sites linking to you? And how big is your site (i.e. number of pages)? The bigger the site, the more PageRank you need, and the more efficient you need to be in spreading that PageRank around.

“A page that’s 10 clicks deep into the site probably won’t fare well in Google. If some of your pages have an overly complex URL structure (too many dynamic parameters, or too long), that could be impeding their inclusion in Google’s index. If there are too many redirects in a row, that could be causing the page to not be indexed. Or if you’re spamming. Or if you have an inordinate amount of duplicate content. This isn’t as simple as making a comprehensive list of URLs and dropping it into an XML Sitemap file. That’s treating the symptom instead of curing what ails your site.

“In your specific case I believe it’s a combination of factors. I wonder if the keyword-stuffing in your meta tags could be earning you a penalty (see https://www.performancemotorcare.com/acatalog/info3PMC00030.html. You only have 75 domains linking to your site (according to the SEOmoz Linkscape tool), so your site, in my opinion, doesn’t look terribly important.

“You have canonicalization issues. Specifically, some pages in Google’s index are from www.performancemotorcare.com, some from performancemotorcare.com, and some from https://www.performancemotorcare.com. This creates duplicate content and PageRank dilution. This can be solved with 301 redirects or with the canonical tag. None of these on their own is the ‘smoking gun,’ but they could be contributing factors.

“Could this possibly be as simple as a mistake in your robots.txt file? I found a disallow directive (‘Disallow: /cgi-bin/’) in your robots.txt. Yet you feature a number of ‘best selling’ and ‘new’ products on the home page; these URLs all contain ‘cgi-bin’ so they are all being disallowed. I found that very curious. Is that intentional?”

Practical Ecommerce

Practical Ecommerce

Bio   •   RSS Feed


email-news-env

Sign up for our email newsletter

  1. Rob Mobberley July 15, 2010 Reply

    Stephan

    Thank you for your observations and comments. The page example you show is a very interesting one as this is a simple larger image page and seems to be taking the keywords from somewhere else in the CMS – this we will need to have a look at. Although we do deliberately exclude the larger image pages from our XML sitemap as all they generally contain just the name of the product and the image.

    Our Google webmaster tool shows us having 735 links to the site – can you advise which tools provide the more accurate/relevant data on this? I know this should still be more even at 735 – should this be our number one priority?

    We had beleived we had recently sorted the canonicalization issues – again in websmaster tools by indicating that the main site was to be http://www.performancemotorcare.com – what line(s) of text do we need to put in the .htaccess file for the 301 to sort this?

    Yes the disallow of the cgi-bin was deliberate as this is also where all the search, checkout steps, cart pages etc are located and is unfortunate that the way the pages are constructed have the links to latest products and most popular routed via the cgi-bin – there may be ways to target the disallow on the cgi-bin more specifically and I will have these looked at.

    Again many thanks for your expert comments.

    Kind regards

    Rob

  2. Arturas Kvederis July 22, 2010 Reply

    Stephan has already indicated the probable indexing problems, to redirect all the traffic from the non-www to the www of your website with the “301 Moved Permanently” response. Edit the .htaccess file, that can be found in your root folder and add at the end of the file the following :

    RewriteEngine On
    RewriteCond %{HTTP_HOST} ^performancemotorcare.com.com [NC]
    RewriteRule ^(.*)$ http://www..performancemotorcare.com.com/$1 [L,R=301]

    Also link building should definitely be a part of any SEO strategy, more backlinks will help to get more pages indexed, also keep in mind that by adding fresh quality content regularly you can increase the crawl rate of your website.

    Best Regards,
    Arturas
    http://sysiq.com