Design & Development

An Introduction to XML

Overview

Something about XML is inherently intimidating, and although we all probably hear it referred to a lot. or are aware that it exists, most people do not have any idea what it is. There is a good reason for this, since XML is a technology that is admittedly difficult to nail down, especially in the context of the web. To an online business owner XML is the driving force behind Google Sitemaps, RSS feeds, Podcasting and other emerging technologies that are proving to be valuable marketing tools. To a web application developer it facilitates the exchange of data between websites and applications and allows ecommerce sites to work every day.

Extensible Markup Language, or XML, is a type of markup language used to structure data for any number of reasons. How is that for vague and meaningless? A markup language is a system of tags that identify discreet units of information for processing. It also helps to understand that XML was created in order to allow richly ordered documents to be used on the web. HTML, which has a predefined set of tags and is primarily focused on the display of data rather than the data itself, does not have the flexibility to allow this. XML, which focuses solely on the data that it contains, allows for tags and attributes to be customized based on that data. For this reason, XML documents can range from ecommerce transactions to mathematical equations to vector graphics. The important thing is that XML describes data, and isn’t worried about displaying data.

However, since it was designed to deliver information over the web, XML can be formatted and displayed in a web browser with the use of style sheets. In fact, search engines and other automated applications can actually extract more meaning from XML documents than from HTML documents, due to their structure. So what does this all mean to you? Essentially, it means that XML is the future of HTML, and understanding a little more about it isn’t a bad idea if your business is primarily based the web.

This series of tutorials will not be a bunch of technical manuals about XML, although we have provided a link to those just in case. Our aim is to introduce XML to the beginner through some simple applications that are most relevant to online business owners. In this tutorial we will introduce the structure of XML, and create a simple XML document to hold some product information. In future tutorials we will learn how to format XML for viewing in a web browser using Cascading Style Sheets (CSS), create a Google Sitemap to inform the search engine of our web pages, create our own RSS feed to broadcast content from our website, and finally we will cover Podcasting from recording to publishing. All of which are made possible by in part by XML.

Instruction

Our goal for this first tutorial is to introduce XML by writing our first XML document. While it may not be apparent when we are finished why XML is important, the power of XML will become very apparent in future tutorials. However, before we go building Google sitemaps and RSS feeds, we need to understand how to write an XML document.

Our first step is to take a look at the data that we want to structure. In our case, it will be information about products that we sell. We will want to track the names, prices, descriptions and weight of our products. Since XML allows us to create our own tags to contain information, we aren’t constrained by a pre-defined set of tags the way we are with HTML. So let’s start by seeing how we would structure this information on our own:

<code>Product List  
Product #1  
Price: $29.99  
Weight: 5 lbs.  
Description: This is product one, and it is a beauty.

_Product #2_
Price: $35.99
Weight: 5.6 ounces
Description: Product two is smaller and weighs less, but is more expensive.
</code>

As you can see, it is common sense to group the information about each product together, creating discrete blocks of information. Using this information as a guide, let’s dive in and create our first XML document. All XML documents start with a document type declaration, just like HTML documents do. Below is the first line of our XML document, so fire up a text editor and create a text file called products.xml.

<code>&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
</code>

This line of code tells the version of XML that we are using (our examples will all use version 1.0), as well as the character encoding. For all practical purposes, this will be the first line of all the XML files you create. Now we are ready to start putting our information in. Just like HTML, pieces of information contained between starting and ending tags are grouped together. Since we want our document to describe our products, let’s start by creating a tag for that:

<code>&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;/products&gt;
</code>

Notice that we have created a beginning and an ending tag, and left a space between. This is because all the information about our products will go inside these tags. Look at this “product” tag as being equivalent to the title in our list above. Next we want to describe the first product, which has more information associated with it. Since all the information associated with this product needs to be grouped together, we will create another tag to hold it:

<code>&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;item&gt;
&lt;/item&gt;
&lt;/products&gt;
</code>

Again, we have left a space so that we can insert information inside this “item” which represents our first product. We are now down to the actual information about this product, which is separated into title, price, weight, and description. Since we need to store information about each of these things, we will now create tags for each one to hold the information. We also need to include the information inside the tags that we create:

<code>&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;item&gt;
&lt;title&gt;Product #1&lt;/title&gt;
&lt;price&gt;29.99&lt;/price&gt;
&lt;weight&gt;5&lt;/weight&gt;
&lt;description&gt; This is product one, and it is a beauty.&lt;/description&gt;
&lt;/item&gt;
&lt;/products&gt;
</code>

As you can see, we are naming the tags to be relevant to what our data is, which makes understanding this document pretty easy. However, the tags could be named anything since the structure of the document maintains the relationships between them. Everything between the <item></item> tags holds information about a single product. To describe our second product as well, we add another <item></item> tag below the first, and put the information about our second product inside:

<code>&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;item&gt;
&lt;title&gt;Product #1&lt;/title&gt;
&lt;price&gt;29.99&lt;/price&gt;
&lt;weight&gt;5&lt;/weight&gt;
&lt;description&gt;This is product one, and it is a beauty.&lt;/description&gt;
&lt;/item&gt;
&lt;item&gt;
&lt;title&gt;Product #2&lt;/title&gt;
&lt;price&gt;35.99&lt;/price&gt;
&lt;weight&gt;5.6&lt;/weight&gt;
&lt;description&gt;Product two is smaller and weighs less, but is more expensive.&lt;/description&gt;
&lt;/item&gt;
</code>

We have now created a simple XML document that describes our product list. You can see how it is organized in a hierarchical structure, and it contains most of the information that we want about our products, including the relationships between information. However, you may have noticed that there is some crucial information that has gone missing. If you noticed that our weights were in different units in the original list, then you probably noticed that the units have disappeared, which represent a big problem. Fortunately, XML tags can have attributes just like HTML tags. Rather than holding the actual data, attributes are best used to hold information needed to describe data, such as units of measure in our case. So let’s add an attribute to the weight tags to describe the units of measure that we are using:

<code>&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;item&gt;
&lt;title&gt;Product #1&lt;/title&gt;
&lt;price currency="USD"&gt;29.99&lt;/price&gt;
&lt;weight units=”lbs”&gt;5&lt;/weight&gt;
&lt;description&gt;This is product one, and it is a beauty.&lt;/description&gt;
&lt;/item&gt;
&lt;item&gt;
&lt;title&gt;Product #2&lt;/title&gt;
&lt;price currency="USD"&gt;35.99&lt;/price&gt;
&lt;weight units=”lbs”&gt;5.6&lt;/weight&gt;
&lt;description&gt;Product two is smaller and weighs less, but is more expensive.&lt;/description&gt;
&lt;/item&gt;
</code>

As you can see, we have now added attributes to the weight tags to show that there are different units of measure between the products. In addition, I added “currency” attributes to each of the <price> tags in order to show that the currency is in US dollars. The type of currency was also missing from our XML document, though it was present in our original list in the form of dollar signs.

We are now finished with our first XML document. By looking at our final document, we can visually see the structure of our information, as well as the relationships and grouping of the information. Hopefully you can see the power of ordering information in this way. If not, you should at least be able to imagine how structuring information like this allows data to be transported between databases, content to be shared between websites, and ecommerce transactions to be completed between completely different systems seamlessly. Being able to create our own tags and attributes relevant to our data allows us to create highly structured documents with an unprecedented amount of flexibility.

While it doesn’t seem like this is very useful at all, our next tutorial will cover how to create a CSS style sheet to format this information for display in a web browser. Then we will use our new understanding of XML to create a Google Sitemap, which is an XML document with a specific structure designed to keep the Google search spiders up-to-date about what pages are on your website.

Brian Getting
Brian Getting
Bio   •   RSS Feed