Ecommerce businesses collect lots of data. Examples are newsletter signups, loyalty programs, sales transactions, and even supplier information. Regardless of the type, however, at some point you will need to clean the data. But when and what to clean differs among companies.
In this post, I’ll offer guidelines.
Guidelines for Data Cleaning
First, identify each problem you are trying to solve.
When was your last cleaning? If you have never cleaned, when did you start collecting the data? It’s important to clean data every year, roughly. People change physical addresses, email addresses, employers, and lifestyles. Moreover, if you’ve recently added data fields, you likely have missing info in older contacts.
Data validation services can confirm physical addresses, delivered email, death notices, and so on. Data append providers — Experian, Melissa Data, Acxiom — can supply missing physical and email addresses and phone numbers, among other fields. If you are on a tight budget, simply flag outdated contacts so that when the person interacts with your company you can update her info.
Email bounces and unsubscribes have increased. Open rates have decreased. If your email marketing performance is declining, your list could be outdated, inaccurate, or both. (“List fatigue,” which could also cause declining performance, is beyond the scope of this article. We’ve addressed it recently at “6 Ways to Re-engage Dormant Subscribers.”)
A quick analysis of unsubscribes and bounces can identify the causes. Sometimes it’s related to poor data, such as typos — firstname.lastname@example.org should be email@example.com — which you can fix. It could be bogus addresses, such as firstname.lastname@example.org, which you can remove. Part of the analysis should identify the source of bogus and bounced emails. You may uncover mistakes from a representative or from an online form.
Complaints from customer service. Customer service personnel often encounter errors in data. For example, if a customer service rep cannot locate the right person in the system, it could indicate multiple entries, errors, or duplicate records.
De-duping your customer management system is critical. Keep in mind, however, that duplicates may have conflicting email addresses or inconsistent data from one record to the next.
Improperly formatted data. An easy problem to solve, typically, is data that is not in a useable form. For example, you may have state names in two formats, such as “NY” and “New York.”
To correct, standardize your data to a consistent format. Use a lookup table to replace all values. If the data is in comments or free form, use text mining techniques to pull the correct info.
Migrating to a new system or database. Changing customer management systems and other platforms is a good time to clean data.
De-duping, flagging outdated accounts, appending missing information, and verifying key data points will ensure only quality records remain. Standardizing all fields for usability is key, too.
Quality data requires ongoing maintenance. The following procedures, in my experience, will help.
- Data reports. Monthly or quarterly reports on data quality can identify problems. Focus on these basic metrics: (a) unsubscribe rates, (b) bounce rates, (c) bogus records, (d) outdated records by last modified date or date created, and (e) incomplete or missing data.
- Data governance techniques can prevent expensive data cleaning. Common techniques include not allowing the same email address across multiple records and requiring proper responses in online forms. It also includes educating all personnel on the importance of quality data and implementing internal processes to assist.
- Ongoing maintenance. Audit the data on your customer management system yearly. Large companies that use direct-mail campaigns typically send all data to be validated before each campaign as it creates a discount from the post office. So why not add the extra steps before your next campaign? Prior to deploying, validate critical data — physical addresses, email addresses, segmentation fields, you name it.