When it comes to your customer database the saying “garbage in, garbage out” holds true. If you base your customer segmentation or predictive models on bad or incomplete customer data profiles, you will get the wrong recommendations for your customers. Therefore, data wrangling is a huge part of the job. Data scientists will tell you that data preparation before analysis can make up 95 percent of all the work.
Without a single view of the customer, it is impossible to truly understand any single customer or to draw any conclusions about customer trends. For example, if you can only see a person’s in-store purchases, but that person makes 90 percent of his or her purchases online, you may believe this is an unprofitable customer when in reality this person could be one of your VIPs. Similarly, if somebody frequently browses your website but always ends up buying in the store, you may mistake the website customer as “low value.” In a different scenario, a customer spends a large amount and buys often, but returns items just as frequently or makes frequent calls into your call center. This customer looks like a “high value” customer, but is in fact unprofitable.
This means that you need to be able to integrate, link and deduplicate all the information that you have collected. This is not an easy task. Companies often mistake “putting all customer data in one place” with “creating customer 360 profiles”. They are not one and the same. We will go through each of the steps in the data integration process in more detail so you understand what needs to be done. As a marketer, you may not need to get involved this deep, but will at least know what questions to ask from your team.
Step 1: Cleaning and Validation of Names
After receiving the raw data, the first thing you want to do is to validate the names, postal addresses, email addresses, and phone numbers in the customer files you received. Without this, software algorithms will be unable to link the right activities to the right data records. Examples of common mistakes that need to be corrected before matching records to real people include:
- Middle names or initials might be included or excluded: William L and William Louis might both be variations of the same person—William Morrison.
- Two-person contacts might not be matched unless corrected: “William & Cathy Morrison” should be matched to William Morrison’s customer record.
- Replacing abbreviated names: Wm and Bill and William may all be the same person.
- Removing honorifics: Rev. Bill Morrison and Dr. Bill should both be matched to William Morrison’s record.
- First and last name swapped: Bill Morrison and Morrison Bill are likely the same person, especially if they live at the same address.
- Slight variations in name spelling, like Katie and Cathy.
You can use software algorithms to do what’s called fuzzy name matching. Fuzzy logic will “guess” whether two customer names that are similar, but not exactly the same, might be the same person. Some customer attributes, such as a customer’s home address, will have a higher weighting in this guessing game. William Morrison and Bill Morrison are likely the same person if they live at the same address, but unlikely the same person if they live in different states.
There are many common mistakes that software can easily correct. Software can automatically normalize names, such as change Bill to William or vice versa, and recognize and tag men and women. Metaphonic algorithms are also correct words that have similar pronunciation, such as Katherine, Cathy, and so on. Software can automatically standardize names so that different records from the same customer will now match and be linked to a single customer ID. For example, Michael is the same as Mike and James is the same as Jim, and so forth. As part of name standardization and verification, software can also change the case of the name. So if the name was in all capitals (WILLIAM) or all lowercase (william) we make it have first letter capital, remainder lowercase (William). This may seem trivial, but software algorithms are not people—and tend to take things literally. Without correction and normalization, these records would not be matched to the same person.
Step 2: Cleaning and Validation of Addresses
For mailing addresses, validation is important to make sure that each piece of expensive direct mail you are sending is actually deliverable. Address validation can reduce mailing costs by as much as 80 percent. These are some ways to validate mailing addresses:
- Canada and U.S. address coding according to USPS standards.
- Make sure that the address is complete and correctly written including a zip code with a four-digit append to have the most accurate address possible.
- NCOA (National Change of Address).
- Each address in your database can be checked against the NCOA database to make sure that the recipient has not moved since you acquired his or her address.
- CASS certification.
- CASS certification is a requirement for all mailers in order to receive certain mailing rates from the USPS based on the quality of their addresses.
- DPV (Delivery Point Validation).
This is the highest level of address accuracy checking—where each address gets checked against a data file to ensure that it exists as an active delivery point for the USPS.
- Address type flag.
- If relevant, address parsing can also append an address type flag, such as whether the address is a residence or a business.
- MSA/region append.
- Based on each address, software can append the longitude and latitude of the location, but can also match this against administrative regions.
Email address validation is equally important. Email verification services can improve client reach and reduce the danger of damaging your sender reputation. Checks that can be performed automatically include:
- Syntax correction.
- Syntax correction in names and addresses automatically removes illegal characters and fixes host names. For example, software algorithms can automatically fix common domain name misspellings (gmai1.com = gmail.com, for example). Software could also validate and correct illegal characters. For example if it says gmail,com instead of gmail.com, the correction would be to replace the comma with the period.
- Mail test.
- As part of validating email addresses, software can automatically “ping” the domain in the email address to ensure this domain is available as a mail exchange. Software can also automatically maintain a list of invalid emails.
- Invalid email filtering.
- Certain common default values, such as email@example.com, can automatically be detected and filtered out.
- Casing standardization.
- As with names, when it comes to checking emails, software can automatically standardize all addresses to include only lowercase letters.
Step 3: Linking and Deduplication
To eliminate duplicate copies of repeating data, you can use a technique called deduplication. It is important because it can increase the accuracy of key performance indicators and metrics (like the lifetime value of a customer). It also helps you avoid targeting the same person twice, which doesn’t look very professional and can be very expensive when it comes to campaigns like direct mail. The attributes of each of your contacts should be merged according to a set of priority rules to obtain a Master Contact List. At this time, you should also associate the right contacts with the right households or the right corporations.
The Result: Example of a Customer Profile
Now that all the hard work of collecting and cleaning customer data is done, let’s look at an example of the information that may be included in the profile of a single customer. Just with this profile alone there is a lot of value. You could give your sales team, customer success team, call center team, or in-store personnel access to these profiles and they will surely be able to serve customers better with this type of information at their finger tips. In the next chapters we take the next steps to actually using this information for customer analytics and to create unique and meaningful experiences for each and every customer.
||Lifecycle cluster values could include