Code

An Introduction to XML

Something about Extensible Markup Language is intimidating. We all hear about it and are aware of it. Yet most people have no idea what it is.

XML facilitates data exchange between websites and applications and allows ecommerce sites to function. It is the driving force behind Google sitemaps, RSS feeds, podcasting, and other emerging technologies.

XML is a “markup language” — a system of tags that identifies discreet information units for processing. XML was created to allow documents to be used on the web. HTML does not have the flexibility to enable this. XML focuses solely on the data it contains, allowing for tags and attributes to be customized based on that data. Hence XML documents can range from ecommerce transactions to mathematical equations to vector graphics. XML describes data, in other words.

Illustration of XML concept

XML is a “markup language” — a system of tags — that identifies and displays information on the web.

However, since it delivers information over the web, XML can be formatted and displayed in a web browser using style sheets. Search engines and other automated applications can extract more meaning from XML documents than HTML. XML is the future of HTML and is worth understanding.

In this tutorial, I will introduce the structure of XML and create a simple XML document. Next month, I’ll explain how to format XML for viewing in a web browser using Cascading Style Sheets. I’ll create a sitemap to inform Google of our web pages and an RSS feed to broadcast content from our website — all made possible in part by XML.

XML Document

Let’s start by writing an XML document.

Our first step is to look at the data to structure. I’ll use products for sale in this example. I’ll track their names, prices, descriptions, and weight. With XML we can create our own tags to contain information and aren’t constrained by a pre-defined set as with HTML.

Product List  

Product #1  
Price: $29.99  
Weight: 5 lbs.  
Description: This is product one, and it is a beauty.

Product #2
Price: $35.99
Weight: 5.6 ounces
Description: Product two is smaller and weighs less, but is more expensive.

Note that I’ve grouped the info around each product. Let’s use this data to create our first XML document.

All XML documents start with a document type declaration (like HTML documents). The first line of our XML document creates a text file called products.xml.

<?xml version="1.0" encoding="iso-8859-1"?>

This line of code describes our XML version and the character encoding. Next, we can insert the product data. Let’s start by creating a tag to describe our products.

<?xml version="1.0" encoding="iso-8859-1"?>
<products>
</products>

I created beginning and ending tags with a space between them. All info about our products will go inside these tags. We next want to describe the first product. I will create another tag to hold it.

<?xml version="1.0" encoding="iso-8859-1"?>
<products>
<item>
</item>
</products>

Again, we have left a space to insert information inside this “item” representing our first product. We are now down to the actual information about this product, separated into title, price, weight, and description. Since we need to store information about each, we will now create tags to hold the information. We also need to include the information inside the tags that we create.

<?xml version="1.0" encoding="iso-8859-1"?>
<products>
<item>
<title>Product #1</title>
<price>29.99</price>
<weight>5</weight>
<description> This is product one, and it is a beauty.</description>
</item>
</products>

I’ve named the tags to be relevant to our data, which makes understanding this document pretty easy.

However, the tags could be named anything since the document’s structure maintains their relationships. Everything between the <item></item> tags holds information about a single product. To describe our second product, we add another <item></item> tag below the first and put the information about our second product inside.

&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;item&gt;
&lt;title&gt;Product #1&lt;/title&gt;
&lt;price&gt;29.99&lt;/price&gt;
&lt;weight&gt;5&lt;/weight&gt;
&lt;description&gt;This is product one, and it is a beauty.&lt;/description&gt;
&lt;/item&gt;
&lt;item&gt;
&lt;title&gt;Product #2&lt;/title&gt;
&lt;price&gt;35.99&lt;/price&gt;
&lt;weight&gt;5.6&lt;/weight&gt;
&lt;description&gt;Product two is smaller and weighs less, but is more expensive.&lt;/description&gt;
&lt;/item&gt;

We have now created a simple XML document that describes our product list. It is organized in a hierarchy and contains most information we want about our products, including the relationships between information.

However, some crucial information is missing: the weights. Fortunately, XML tags have attributes like HTML. Rather than holding the actual data, attributes are best used to describe data, such as units of measure in our case. So let’s add an attribute to the weight tags to describe the units of measure that we are using.

&lt;?xml version="1.0" encoding="iso-8859-1"?&gt;
&lt;products&gt;
&lt;item&gt;
&lt;title&gt;Product #1&lt;/title&gt;
&lt;price currency="USD"&gt;29.99&lt;/price&gt;
&lt;weight units=”lbs”&gt;5&lt;/weight&gt;
&lt;description&gt;This is product one, and it is a beauty.&lt;/description&gt;
&lt;/item&gt;
&lt;item&gt;
&lt;title&gt;Product #2&lt;/title&gt;
&lt;price currency="USD"&gt;35.99&lt;/price&gt;
&lt;weight units=”lbs”&gt;5.6&lt;/weight&gt;
&lt;description&gt;Product two is smaller and weighs less, but is more expensive.&lt;/description&gt;
&lt;/item&gt;

We now have attributes for the weight tags to show the different units of measure between the products. In addition, I added “currency” attributes to each of the <price> tags for U.S. dollars. The type of currency was also missing from our XML document, though it was present in our original list in the form of dollar signs.

We have completed our first XML document. Note the structure of our information, the relationships, and the grouping.

My next tutorial will cover how to create a cascading style sheet to format this information for a web browser.

Brian Getting
Brian Getting
Bio   •   RSS Feed


x