Analytics & Data

How to Merge Data from Multiple Sources into a Single Platform

Merging data from multiple sources into a new ecommerce or customer-management platform can be confusing. It's doable, however, with careful planning and execution.

Merging data from multiple sources into a new ecommerce or customer-management platform can be confusing. It’s doable, however, with careful planning and execution.

Migrating to a new customer management platform may require merging multiple data sources or adding a new source. In this post, I’ll review how to merge multiple sources into a single database — without errors or duplicates.

Merging Data from Multiple Sources

1. Download all data from each source. When combining multiple data sources, create a new field with the name “Source.” This will enable you to identify where the record came from, which can assist in removing duplicates.

2. Combine all data sources into one list. If you have fewer than 1 million combined rows, you can do it in Excel. A couple of tips:

  • Make sure the date fields are the same. If Excel is not formatting your date fields correctly, you may need to use the “Import” function at File > Import and select the correct date type before merging the data.
  • Map to fields in the new platform. If your new platform does not contain a field that exists in one or more of your sources, add it as a custom field or combine it with other fields or as a note so as not to lose any data.

3. Identify duplicates. Duplicates within the same source or between sources are likely. Use Excel’s Conditional Formatting at Format > Conditional Formatting to identify duplicates in the same column by highlighting that column. In my experience, fields that often have duplicates are:

  • Email address,
  • Phone number,
  • Physical address,
  • Last name,
  • Company name,
  • Website URL.

Use fuzzy logic to find records that are the same but have slightly different spelling, which conditional formatting will not catch. For example, ABC Corporation could also appear as ABC Corp. For fuzzy logic, use machine-learning packages or hire a professional de-duping company. Alternatively, sort the data and manually review line by line.

4. Merge duplicates by identifying the surviving record. Use data completeness, source of data, last modified date, and other criteria to decide which record to move forward. Then:

  • Paste missing information to a record. Identify records with missing fields and then copy and paste (or append) that field from a record that will be deleted.
  • Resolve conflicting records for the same field. For example, you may have two different phone numbers, email addresses, or physical addresses. Mark those fields for validation before deleting one of the records.

5. Verify and validate all fields. Many third-party providers can verify and validate data for a fee. To reduce the expense, consider verifying and validating only a portion of your database. For example, validate only email addresses that have recently bounced or physical addresses that were last updated several years ago.

6. Standardize the data. Your data should match the fields in your new platform and, also, should be consistent regardless of the source. For example, one source could use the 2-digit code of “NY” and another could use the full spelling of “New York.” Make each field the same format. Critical data fields to standardized are:

  • Phone numbers. For businesses operating internationally, phone numbers are tricky as the number of digits varies based on the country.
  • ZIP codes. Some data sources use ZIP+4. Excel removes a 0 in front of a ZIP code. For example, Excel stores “000154” as “154.” Thus, make sure ZIP codes are text fields in Excel. Add zeros that Excel may have deleted in the process.
  • Dates. Again, formatting dates is a common problem. For example, some sources use “mm-dd-yyyy.” Others use “dd-mm-yy” or something entirely different. Make sure the date format is consistent among data sources and, also, with your new platform.
  • Text. Typical text fields to standardize include state, country, and personal and professional titles. Manually reviewing row by row can be time-consuming. A quicker approach is to create a list of all variations for that one field (use a pivot table or copy and de-dupe in a different sheet) and add a column with the correct standardized value. Use Excel’s VLOOKUP function to match the original value to the new standardized version in a new column.

Final Review

Once you complete the above steps, conduct a final review to ensure the data is ready to upload. Look closely at field formats. Review the number of records. Does the number make sense? Did you de-dupe? Is the overall file format, such as .csv, compatible with your new platform? Upload a few records to the new platform to confirm the process and the accuracy. Then upload the entire file.

Anna Kayfitz
Anna Kayfitz
Bio   •   RSS Feed


x