Unfortunately, many organizations wait until they recognize they have data quality issues in their existing customer data assets before they begin thinking about how to address the challenge. This realization is usually triggered because a significant problem has been experienced or uncovered as part of a business process, viewing internal reports, or worst case, on hearing from an unhappy customer. Further analysis typically demonstrates the unearthed issue to be the tip of the iceberg of a long list of customer data problems. Could scenarios like these be prevented? If so, could APIs be the foundation of a solution?
By the time scenarios like these are discovered, an organization’s poor data quality issues can be deep, far-reaching, and expensive to remedy. They usually require extensive and disruptive data re-engineering, database processing, and customized data scrubbing processes to fix. However, a more proactive approach, using APIs and their real-time nature can address these potential issues upfront at significantly less cost, eliminating large, costly, back-end data cleanup projects. It is probable that there are few use cases where an API-based solution would be more ideal than improving the value of customer data assets.
Before we discuss solution approaches, let’s discuss the various data accuracy challenges most organizations likely face at least to some degree.
First, there is the data inconsistency problem. There are many ways to represent a piece of information in a database, such as the name of a company. “Wal-mart”, “Walmart”, “WALMART CORP.”, “Walmart Corporation”, and even misspellings such as “Wallmart” might appear in a database, all permutations of data that represent the same business entity. This, of course, can cause significant problems in generating business intelligence reports driven by company name, as one example. Per-customer analysis, searching for organization matches in a database, and matching across internal data sources are other situations where data inconsistency can be problematic and costly to resolve.
Matching existing data of several different content types can also be a challenge. For example, should a record of “John Snow” from “Northern Wall Industries” be inserted as a new customer into a customer data store when a “Jon Snow” from “North Wall Inc.” is already a customer? Having the same customer in a database multiple times can cause another long list of problems such as incomplete customer views, a lack of understanding of a true customer base, and the potential embarrassment of treating someone like a prospect when they are already a customer.
Another data quality scenario is data verification. For example, verifying whether an address physically exists according to USPS or other national mailing authority sources can be quite useful. If one discovers an instance of invalid data in a downstream business process, such as a returned mailer indicating that a given customer address is incorrect or non-existent, it might prove difficult and quite problematic to contact the customer to resolve.
These are just a few examples of data quality issues that might occur within customer data. Resolving them traditionally has required costly, project-oriented, back-office data quality resolution projects. However, an upfront, API-driven approach can showcase the power of APIs and their flexibility when incorporated as part of a solution.
So how do APIs help? Many of these data quality issues can be addressed at the point of data collection, including when a customer or prospect is filling out an online form, when data is being obtained over the phone, or in the many other ways that customer data is collected, simply by making use of an integrated API. Introducing an API data quality check *before* data is ever entered into a customer database prevents potential problems at the source, so these data quality issues don’t expensively propagate themselves throughout the customer data lifecycle.
For example, a data standardization API can compare a company name field’s data to a master standards company name data source that first algorithmically removes generic corporate name components such as “inc”, “corp”, “the”, and other noise words. It could then converge all permutations of company names, usually by incorporating an extensive external database of company name standards, into a standardized entry. A new instance of collected company name data can go through this process and be compared against the company name standards dictionary via the API and adjusted accordingly before it ever enters the customer database, reducing the incidence of inconsistent company data substantially.
Another use case is to leverage a matching API to identify pairs or sets of data that exceed similarity thresholds and therefore are determined to likely be the same entity. An individual’s name for example can call an API that heuristically generates a similarity key. A similarity key is created using known name variations (Bob, Robert, Rob, etc.) as well as algorithmically eliminating noise. If all of the current customer records also have a similarity key generated, these keys can be used to search for possible matches rather than comparing the actual data, casting a much wider net when searching for matching name records.
Similarity Key Name Matching Example:
Other examples include a data verification API that compares mailing addresses to a USPS source to ensure it physically exists. Email addresses can be checked utilizing other purpose-built APIs that perform actions such as email syntax checking and active mail server validation to ensure email addresses are valid, thus preventing bounces that hurt a sender’s email reputation score. Again, the simple integration enabling the call out of a physical address or email address to a validation API can resolve problem data before it becomes part of customer data assets that must be dealt with later. In addition, reference data sources that serve as the basis of the validation can be swapped out with newer, more accurate versions without a single change to the API or the integration of such, another significant benefit.
The cost savings of standardizing, matching, and validating data in real-time with an integrated API can be very substantial as opposed to waiting until downstream to solve these problems with exhaustive batch data processing exercises, especially if the data is being used as a part of a data lake, data warehouse, or other business intelligence activity, where the same bad data can reappear after every data load.
Clearly, in the case of keeping customer data assets accurate, complete, and valuable, an API approach and all the associated benefits of API use proves they are an ideal weapon in the fight against poor customer data quality.