In “SEO for Google Shopping,” I addressed the need to optimize product feeds. I stated that including keyword product descriptions and titles in the feeds was scalable with “scraping.” But I didn’t describe it further.
In this post, I’ll explain scraping. I’ll review why it’s useful and how it relates to search engine optimization. Scraping can speed up many tasks, eliminating hours of manual work.
What Is Scraping?
Scraping is the process of extracting items from a web page, such as text, code, and images. Scraper applications range from browser extensions to standalone software.
Scraping speeds up the manual process of copying and pasting items on a page with your mouse and keyboard. For example, a human could spend hours manually updating 500 title tags. With a good scraper, it would take a minute.
Scraping is increasingly common. For example, the web crawler Screaming Frog uses scraping to extract data from a website.
Google scrapes websites to display rich snippets on organic search results. The text in Google’s answer boxes comes from scraping.
For years merchants have scraped competitors’ product pages to obtain their prices quickly. Your site may be getting scraped right now — as you read this.
Scraping your own site can be useful. Scraping can quickly gather all of your products and prices into a single spreadsheet for further analysis.
Content thieves use scraping to reproduce articles and images. Spammers rely on scraping tools to impersonate a website and mimic its success. Such tools also facilitate spammers scraping select content and spinning it into new posts. Google doesn’t like this because the result is generally low-value pages. But for spammers, it can be a fast way to trick Google in volume. Sometimes it works, but not nearly as well as it used to.
SEO tools scrape Google’s search results to determine rankings. These tools run millions of searches daily to get updated ranking info. Google has tried to bully rank tracking companies to stop. It costs Google money because it renders each page for the bot. Plus, it inflates search volume metrics.
Scraping can perform SEO tasks at scale. Say a competitor’s website often appears on the first page of Google for a handful of terms. You could search each term and write down the results or run a scraper on Google’s results. A good scraper will let you export the data.
Just about anything on the web can be scraped. The fun part is figuring out when and how to do it. For example, recently a client wanted to update all its logos on the internet as part of a branding exercise. Using ScrapeBox and a couple of minutes of setup, I had a full spreadsheet of all the websites Google knew about that contained the outdated logos. Each row had the specific image URL and its actual appearance.
Websites sometimes disallow scraping as part of their terms and conditions. A few years ago, for example, LinkedIn sued 100 people who used scrapers to copy user data. It’s important to know what a website allows (or disallows) in terms of scraping.
Scraping opens up options you may have never considered. “Is there a way to get all that data at once?” A thoughtful scraping strategy could be the answer.