Design & Development

A Better Way to Store Ecommerce Product Information

Amazon Web Services DynamoDB, MongoDB, Couchbase, and MarkLogic are all examples of NoSQL databases that may offer a better way to store product catalog information.

Amazon Web Services DynamoDB, MongoDB, Couchbase, and MarkLogic are all examples of NoSQL databases that may offer a better way to store product catalog information.

How an ecommerce site stores and accesses product information can have a significant impact on site performance.

Online retail businesses, small and large, may benefit from a popular database technology that could offer a better way to store product information. To help understand this technology, which is called NoSQL, it can help to describe just how complex product information can be.

The Product Attribute Problem

Many — if not most — ecommerce platforms store product information in a relational database, where information is kept in tables similar to a spreadsheet. Imagine, as an example, a product listing for a small, red t-shirt. The shirt might have a unique ID, a stock keeping unit (SKU), a title, a size, and a color.

ID SKU title size color
1 POI0987654321 Awesome Shirt s red

Each of these five attributes would take up a column in the database table. A row in the table represents the complete set of product information.

For a site that sells only t-shirts, this might work well. You could imagine additional rows in the database table representing additional shirts, say the “Awesome Shirt,” in four sizes and two colors.

ID SKU title size color
1 POI0987654321 Awesome Shirt s red
2 POI0987654322 Awesome Shirt m red
3 POI0987654323 Awesome Shirt l red
4 POI0987654324 Awesome Shirt xl red
5 POI0987654325 Awesome Shirt s blue
6 POI0987654326 Awesome Shirt m blue
7 POI0987654327 Awesome Shirt l blue
8 POI0987654328 Awesome Shirt xl blue

The database query to display this product information would be relatively simple to write and fast to load. Not much of a problem. But what if this t-shirt retailer also wants to sell pants?

Pants have a different set of attributes. Size is not measured in small, medium, and large, but typically in waist and inseam.

To solve this problem, one could add columns to the table.

ID SKU type title size waist inseam color
1 POI0987654321 shirt Awesome Shirt s red
2 POI0987654322 shirt Awesome Shirt m red
3 POI0987654323 pants Neat Pants 30 30 black
4 POI0987654324 pants Neat Pants 30 31 black

This will work, but it may not scale well. Imagine what would happen if the retailer added more product types.

  • Belts. Need a length attribute and a material attribute.
  • Hats. Need a hat type attribute (i.e., snap back or elastic).
  • Shoes. Need an attribute for lace length and the number of islets.
  • Stickers. Need a height, width, and adhesive type attribute.
  • Music. Need an artist attribute and a way to store a list of tracks.
  • Movies. Need a director attribute and attributes for actors, the movie’s rating, the year it was made, and more.

All this, and in the example, there is not yet an attribute for price, sale price, cost, weight, shipping dimensions, or similar.

Very quickly, the ecommerce product database table would be hundreds or thousands of columns wide. What’s worse, most of the columns would be empty. The small, red t-shirt would still need just five or so columns, leaving thousands blank. This single table solution can be expensive to host since it wastes memory.

An ecommerce solution could eliminate the wasted space if it assigned each new product type its own table.

There would be a product table just for t-shirts.

T-shirt Product Table
ID SKU title size color
1 POI0987654321 Awesome Shirt s red
2 POI0987654322 Awesome Shirt m red

A separate table would store information for pants.

Pants Product Table
ID SKU title waist inseam color
3 POI0987654323 Neat Pants 30 30 black
4 POI0987654324 Neat Pants 30 31 black

This solution would work, but it would be very hard to manage.

For every new type of product, there would need to be a new, custom database table. The ecommerce platform would need to keep track of all of these tables, including understanding how they relate to each other.

Showing a list of products from a single manufacturer might take several database queries or “joins,” which are combinations of columns from different tables.

A third solution to the product attribute problem would be to implement an entity, attribute, value system — EAV. Popular ecommerce platforms like Magento take this approach.

EAV solutions frequently use multiple tables to store portions of the product information. Here’s the entity portion.

Product Entity Table
product ID SKU title
1 POI0987654321 Awesome Shirt
2 POI0987654322 Awesome Shirt
3 POI0987654323 Neat Pants
4 POI0987654324 Neat Pants

A different table would store the value for individual attributes.

Product Attribute Table
product ID attribute value
1 size s
1 color red
2 size m
2 color red
3 waist 30
3 inseam 30
3 color black
4 waist 30
4 inseam 31
4 color black

Here two tables are storing the product information, but there are EAV database solutions for product catalogs that include three, four, five, or more separate database tables. This approach can make accessing product information quite slow. Nonetheless, EAV can be the best solution for relational databases, to manage a product catalog.

NoSQL for Ecommerce Product Information

NoSQL — non-relational databases — take a different approach to storing information. There are no tables of structured columns and rows. In a sense, NoSQL trades consistency for availability, speed, and flexibility.

Although non-relational databases or key-value pair databases have been around for decades, NoSQL has recently become popular with large Internet businesses, including Google, Facebook, eBay, and Amazon. These companies need to access significant amounts of complex information very quickly.

NoSQL architecture differs among implementations. But in general, NoSQL will store all of an item’s data in a single container.

Thus, one product equals one container (sometimes called an item or a document) in the database.

Sometimes, these items or documents will be in JSON format or similar, meaning that it will look very familiar to web developers.

For an ecommerce product, there will typically be a unique item ID, a set of common product attributes, and some attributes that are unique to a particular product. The small red shirt might look like the following.

{
     unique_id: 'HY8765NBg6yYT77nBhNkln543NjNhYTR',
     sku: 'POI0987654321',
     type: ‘t-shirt’,
     title: 'Amazing Shirt'
     attributes: {
         size: 's',
         color: 'red'
     }
 }

A particular pair of pants would have some things in common with the shirt, but in NoSQL there would be the flexibility to have different attributes.

{
     unique_id: '99iIiGtg6GHGjfj776098JjHGtgfffhhh',
     sku: 'POI0987654323',
     type: 'pants',
     title: 'Neat Pants'
     attributes: {
          waist: '30',
          inseam: '30',
         color: 'black'
     }
 }

With this approach, a retailer could sell any sort of product it wants without having to modify a database table or figure out how to keep track of the different sorts of attributes. A belt would simply be another container in the database.

{
     unique_id: 'jjfJlkjsdfALKJljJLKDJ877KJNBB',
     sku: 'BE1888477',
     type: 'belt',
     title: 'Studded Belt'
     attributes: {
          size: '30',
          length: '30',
          color: 'black',
          material: 'leather'
     }
 }

In this way, NoSQL is much more flexible. It is simply easier to store and manage product data. What’s more, querying for an individual product can be faster. This is especially true as the number of products in the database increases.

NoSQL Considerations

An online store using a SaaS ecommerce platform may not be able to take advantage of NoSQL. But retailers using licensed ecommerce platforms or custom solutions may find that NoSQL is a better choice for product catalog information.

There are plenty of applications for which relational databases make more sense, but NoSQL can be a better solution for product information.

Armando Roggio
Armando Roggio
Bio   •   RSS Feed


x