SEO: How to Identify Low Quality Links
Links are the lifeblood of organic search. But the quality of those links can boost or kill a site’s rankings. This article suggests methods to determine the quality of in-bound links to your site. At the end of the article, I’ve attached an Excel spreadsheet to download, to help you evaluate links to your site.
Importance of Links
Search engine algorithms have traditionally relied heavily on links as a measure of a site’s worthiness to rank. After all, links are, essentially, digital endorsements from the linking site as to the value of the site to which it is linking.
Google was founded on this concept of links indicating value. In addition to the relevance signals that other engines used in their algorithms, Google added PageRank, a method of calculating ranking value similar to the way that citations in the scientific community can indicate the value of a piece of research.
When site owners began creating artificial methods of increasing the number of links pointing to their sites to improve their rankings, the search engines retaliated with link quality measures. Google’s Penguin algorithm is one such algorithmic strike intended to remove the ranking benefit sites can derive from poor quality links.
What Makes a Low Quality Link?
Unfortunately, the definition of a poor quality link is murky.
Poor quality links come from poor quality sites. Poor quality sites tend to break the guidelines set by the search engines. Those guidelines increasingly recommend that sites need to have unique content that real people would get real value from. That’s pretty subjective coming from companies (search engines) whose algorithms are based on rules and data.
The “unique” angle is easy to ascertain: If the content on a site is scraped, borrowed, or lightly repurposed it is not unique. If the site is essentially a mashup of information available from many other sources with no additional value added, it is not unique. Thus, if links come from a site that does not have unique content — i.e., a site considered low quality — those links would be low quality as well.
Search engines can identify unique content easily because they have records of every bit of content they’ve crawled. Comparing bits and bytes to find copies is just a matter of computing power and time. For site owners, it’s more difficult and requires manual review of individual sites.
There are other known indicators of low-quality sites as well, such as overabundance of ads at the top of the page, interlinking with low-quality sites, and presence of keyword stuffing and other spam tactics. Again, many of these indicators are difficult to analyze in any scalable fashion. They remain confusing to site owners.
In the absence of hard data to measure link and site quality in a scalable way, search engine optimization professionals can use a variety of data sources that may correlate with poor site quality. Examining those data sources together can identify which sites are likely to cause link quality issues for your site’s link profile.
Data such as Google toolbar PageRank, Alexa rankings, Google indexation and link counts, and other automatable data are unreliable at best in determining quality. In most cases, I wouldn’t even bother looking at some of these data points. However, because link quality data and SEO performance metrics for other sites is not available publicly, we need to make due with what we can collect.
These data should be used to identify potential low-quality sites and links, but not as an immediate indicator of which sites to disavow or request link removal. As we all know, earning links is hard even when you have high quality content, especially for new sites. It’s very possible that some of the sites that look poor quality based on the data signals we’ll be collecting are really just new high-quality sites, or sites that haven’t done a good job of promoting themselves yet.
While a manual review is still the only way to determine site and link quality, these data points can help determine which sites should be flagged for manual review.
A couple of reports can provide a wealth of information to sort and correlate. Receiving poor marks in several of the data types could indicate a poor quality site.
Google reports the top 1,000 domains that link to pages on your site. “Links” refers to the total number of links that domain has created pointing to any page on your site. “Linked Pages” refers to the number of pages that domain has linked to. So a domain may link to 10 pages on your site, but those links are on every page of their own site. If the linking site has 100 pages, that’s 1,000 “links” to 10 “linked pages.”
You can also download this report that shows a large sample of the exact pages linking to your site. In some cases the links are from domains not listed in the Link Domain Report, so you may want to add the domains from this report also.
- Red flags. Generally, higher numbers of “links” and “linked pages” indicate that the domain is a poor-quality site.
This plugin turns Excel into an SEO data collector, enabling you to enter formulas that gather data from various websites.
- What to use. For link quality I typically use the following.
- Home page Google PageRank. Shows Google toolbar PageRank, which is only updated every three months and may not show accurate data but useful as a relative comparison. Higher numbers are better.
- Google indexation. The number of pages Google chooses to report are indexed for the domain. The pages reported by Google are widely believed to be a fraction of the actual number, but it’s useful as a relative comparison. It’s the same as doing a site:domain.com search. Higher numbers are better.
- Google link count. The number of links pointing to a domain according to Google. Wildly underreported, but just barely useful as a relative comparison. Same as doing a link:domain.com search. Higher numbers are better.
- Alexa Reach. The number of Alexa toolbar users that visit the domain in a day. Higher numbers are better.
- Alexa Link Count. The number of links to the domain according to Alexa’s data. Higher numbers are better.
- Wikipedia entries. The number of times the domain is mentioned in Wikipedia. Higher numbers are better.
- Facebook Likes. The number of Facebook Likes for the domain. Higher numbers are better.
- Twitter count. The number of Twitter mentions for the domain. Higher numbers are better.
- Cautions. Every cell in the spreadsheet will execute a query to another server. If you have many rows of data, this plugin will cause Excel to not respond and you’ll have to force it to quit in your task manager. I recommend the following steps.
- Turn on manual calculation in the Formulas menu: Formulas > Calculation > Calculate Manually. This prevents Excel from executing the formulas every time you press enter, and will save a lot of time and frustration. Formulas will only execute when you save the document or click Calculate Now in the aforementioned options menu.
- Paste the formulas down one column at a time in groups of 50 to 100. It seems to respond better when the new formulas are all of the same type (only Alexa Reach data, for example) than if you try to execute multiple types of data queries at once.
- Use Paste Special. When a set of data is complete, copy it and do a Paste Special right over the same cells. That removes the formulas so they don’t have to execute again. I’d leave the formulas in the top row so you don’t have to recreate them all if you need to add more domains later.
- Use a PC if you can because Apple computers tend to stall out more quickly with this plug in.
Manual Quality Review
If a site has high numbers in the Google Webmaster Tools reports and low numbers in the SEO Tools data, it should be manually checked to determine if it’s a poor quality site, sending poor-quality links your way. The following are the quality signals I use for manually reviewing link quality.
- Trust. Would you visit this site again? Do you feel confident about buying from the site or relying on its advice? Would you recommend it to your friends? If not, it’s probably low quality.
- Source. Is this site a source of unique information or products? Does this site pull all of its content from other sites via APIs? Is it scraping its content from other sites with or without a link back to the source site? Does it feel like something you could get from a thousand other sites? If so, it’s probably low quality.
- Ad units in first view. How many paid ad units are visible when you load the page? More than one? Or if it’s only one, does it dominate the page? If you weren’t paying close attention would it be possible to confuse the ads with unpaid content? If so, it’s probably low quality.
- Use Searchmetrics. Enter the domain in the Searchmetrics’ search box to get search and social visibility, rankings, competitors, and more. It’s free, with an option to subscribe for many more features. I’ve included this in the manual review section because you have to paste each domain in separately. It does, however, provide a balancing analytical approach to the subjective nature of manual review.
Finally, when reviewing sites manually, don’t bother clicking around the site to review multiple pages. If one page is poor quality it’s likely that they all are. In particular, the home page of a site typically represents the quality of the entire site. Download this Excel spreadsheet to help organize and evaluate links to your site.