Tutorial: Gaining Control of Search Engine Spiders

Introduction

Most of us are familiar with the term “search engine spider” and at least understand that they have something to do with how search engines index websites. The purpose of this article is to help you make your web pages more accessible to spiders. After all, spiders are our friends.

It is accurate to think of a spider as an automated web browser. It will access websites and work through all the pages, indexing the content and following links. A good search engine optimization strategy will ensure the spider can easily navigate to each page of your website, as well as the content of your website. This is many times easier said than done, however in this tutorial we will outline a simple technique using basic HTML and CSS to gain control over how spiders navigate the individual pages of a website.

Our goal is to create links that will guide the spider through our site, but not be visible to our visitors or mess with our website design. Before getting into the mechanics of this simple trick, let’s examine the path that a search engine spider might take when entering our site.

As the spider accesses our home page, it looks first through the head section of the HTML document. This part of the document contains ourmeta tags, as well as any CSS styles that have been declared (or a link to an external style sheet). The spider will then move into thebody section of our HTML document, which is the part of the document that is displayed visually in a user’s browser. Because the spider does not need to see the website visually, but rather wants to read the code, it has no reason to apply CSS styles to the HTML code, and therefore ignores CSS formatting altogether. This is what makes our little trick work, since by using CSS we can visually hide parts of our HTML code from users, but the spiders will still read the code as they migrate through our site.

With this information in hand, we can apply a strategy to ensure that the spiders can navigate our website smoothly, and also guide them to the content that we want them to see and in what order. In our example, we will create a strategy to guide the spiders first to each page’s content, then to our site’s navigation, then to our site’s legal information, and finally to our site map. By doing this to each web page, we ensure that all of our content is indexed and that the spider can easily find it’s way to every bit of information on our site. Rather than waiting and hoping that our site is accessible to the spiders, we are taking control over how they navigate our web site.

Instruction

As we stated before, our goal is to guide the search engine spiders to the relevant information on our site. The tools that we will use to make this happen are actually very simple. We will be using an HTMLdiv element, some HTML anchor tags (otherwise known as hyperlinks), and some very simple CSS. Since our spiders will enter our web page right after the <body> tags, we want to place all of our code directly beneath this tag.

To start, we want to create a section of HTML that we can work with. We accomplish this by usingdiv tags, which consists of an opening tag (<div>) and a closing tag (</div>). Thediv element is known as a block level element, which means that it displays in a browser as a block of content that appears on a new line. The opposite of this is an in-line element, which will not create a new line when rendered on the screen, such as a hyperlink. For our purposes, we are more concerned with the idea that everything inside thediv tags will be grouped, and not too concerned with how it will appear on the screen, since our goal is for it not to appear at all. So let’s dive right in and begin our code by creating our div element directly beneath our <body> tag:

<code>&lt;body&gt;  
  &lt;div&gt;&lt;/div&gt;
</code>

Now that we have created a div element, let’s decide how we want to guide the spiders through our pages. For this example, each of our pages will have three distinct sections that we want the spider to be sure to cover. The first is our page content, or the meat of the page, which is the most important. Secondly we want the spider to make sure to follow the navigation of our site, which usually is in the form of a menu of hyperlinks. This will ensure that the spider reaches all the pages of our site that a user might reach. After that, we want to guide the spider to our website’s legal information, which in this example is linked to in the footer of our document. Lastly, we want to send the spider to our website map, which is a page that links to all other pages in our site.

Once we have defined our strategy, we need to put in some links into our pages for the spider to follow. HTML links are created using an anchor tag, which then has a property assigned to it telling a browser (or spider) where to go. For example, the following link will send us to the Practical eCommerce website when clicked, and will display the text “Go to Practical eCommerce” on the screen:

<code>&lt;a href="https://www.practicalecommerce.com" title="Go to Practical eCommerce"&gt;Go to Practical eCommerce&lt;/a&gt;
</code>

However, the anchor tag can also be used to define an area within a page to link to. Nearly everyone has visited a site that scrolls vertically quite a bit, perhaps with a multitude of news articles on it. Usually there is a link on each article that says “Back to Top”, or something similar that will take us back to the top of the page without us having to scroll manually. This is accomplished by using anchors, which serve as a destination for a hyperlink within an HTML page. Let’s look at the code for our news page example. Since the top of the page will be the destination for our “Back to Top” links, there will need to be an anchor tag at the top of the page that looks something like this:

<code>&lt;a name="top"&gt;&lt;/a&gt;
</code>

As you may have noticed, these are the same tags we used to make our hyperlink, except that it is missing some information. The href attribute has been left out, which defines this element as an anchor rather than a link with a destination. Rather, we have given this tag a “name” attribute so that our anchor will be unique and we will have a way of referencing it later. Also notice that there is no text in between the opening and closing tags, which means that this tag will not show up on the screen. Our hyperlink contained the text we wanted to show (that the user clicks on to follow the link) in the browser.

Next, in order to link to this anchor, we will need to create our “Back to Top” hyperlink. Let’s take a look at that code snippet:

<code>&lt;a href="#top" title="Back to Top"&gt;Back to Top&lt;/a&gt;
</code>

This one should look familiar to you, as it is the same as the link we created to the Practical eCommerce website above. As you may have noticed, we have changed the href attribute, or destination of the link, to #top. At first this may seem strange, but the # sign indicates that we are linking to a named anchor, and not to an external web page or website. This is then followed by the name of our anchor tag. Clicking on this link will then cause the browser to go to the part of our web page that the top anchor sits. Knowing this, we want to place anchor tags at the beginning of each of our special sections– page content, site navigation, and legal information. I will be using the following anchors for this example, each placed just above the code representing that content.

<code>&lt;a name="page_content"&gt;&lt;/a&gt;  
&lt;a name="site_nav"&gt;&lt;/a&gt;  
&lt;a name="legal_info"&gt;&lt;/a&gt;
</code>

As you may have guessed, these will serve as the destinations that we want to guide the spiders to. At this point, we are halfway there already but we need to create some links for the spiders to follow in order to reach these destinations. For this example, I will create those links inside the div element that we created earlier. To make it easier to read, I have separated each link onto a new line, but this isn’t necessary;

<code>&lt;body&gt;  
&lt;div&gt;  
&lt;a href="#page_content"&gt;Page Content&lt;/a&gt;  
&lt;a href="#site_nav"&gt;Site Navigation&lt;/a&gt;  
&lt;a href="#legal_info"&gt;Legal Information&lt;/a&gt;    
&lt;a href="sitemap.html"&gt;Website Map&lt;/a&gt;  
&lt;/div&gt;  
&lt;/body&gt;
</code>

As you can see, I have created links to each of our named anchors, and another that links to our site map page, which I have called “sitemap.html”. So far so good. We could just stop here, and the spiders would enter each of our pages, and right away be guided to our page content, then our site navigation, then our legal information, and finally our site map. However, in it’s current form the links will be displayed on the screen, which may interfere with our site design and confuse our visitors. Since we want to avoid both, we will need one last step in order to hide this code from our users.

We need to declare a CSS style property in order to hide this code. CSS styles can be declared in the head section of each HTML page, or in an external CSS document. Please refer to our past tutorials regarding CSS for information on how to create and apply CSS styles. In our example, we will assume that our HTML pages links to a CSS document. In that document, we will need to create an id selector and assign style properties to it. For our example, we will create the following style in our CSS document:

<code>#hidden_links { display:none; }
</code>

The # symbol prior to our selector name identifies this as an ID selector. For more information about the types of CSS selectors, check out the links provided at the end of this tutorial. Looking at this code, we have declared a display style for any HTML element with an id of hidden_links. Our display style is set to none, which means that the HTML element that we assign this style to will not show up in a browser so that our users will not see it. As you probably guessed by now, the last step in the process is to assign this style to our div element. We will do this by putting an id attribute into our opening div tag:

<code>&lt;body&gt;  
&lt;div id="hidden_links"&gt;  
&lt;a href="#page_content"&gt;Page Content&lt;/a&gt;  
&lt;a href="#site_nav"&gt;Site Navigation&lt;/a&gt;  
&lt;a href="#legal_info"&gt;Legal Information&lt;/a&gt;    
&lt;a href="sitemap.html"&gt;Website Map&lt;/a&gt;  
&lt;/div&gt;  
&lt;/body&gt;
</code>

And we have finished. Now when a search engine spider traverses our web pages, they are guided to the content that we want them to index, in the order that we want them to index it. By examining our web pages and determining how we would like them to be indexed, and then inserting only a few lines of HTML code, we have gained control over how a search engine spider will index our website.

Tutorial: Gaining Control of Search Engine Spiders

February 12, 2007 • Brian Getting

Introduction

Instruction

Modern Fulfillment Requires a Modern 3PL

WSI | Kase