Self-service data prep versus data quality

man using laptop under tree Many data quality issues are a result of the distance separating data from the real-world object or entity it attempts to describe. This is the case with master data, which describes parties, products, locations and assets. Customer (one of the roles within party) master data quality issues are rife with examples, especially within the data quality dimension currency (i.e., whether data is current with the real world it models). The current postal addresses, email addresses, phone numbers and preferred contact method for customers often change faster than updates can be applied – or the need for updates is even detected. Often the only way a company discovers its customer contact data is out of date is by failing in attempts to contact customers using the data currently on file.

A similar point can be made about accuracy (not to be confused with validity). During some of data quality projects I've worked on, some people joked that the only way to truly know if customer master data was accurate was to call customers and ask them to verify their data over the phone. Not only was this impractical due to the large number of customers most enterprises have, but customers would have found it more annoying than telemarketing calls.

This is why – historically – self-service data preparation wasn’t really a viable option for addressing data quality issues. Nowadays, however, thanks to reliable, high-speed broadband connectivity and widespread wireless networks, the Internet or mobile web is used to provide a self-service portal for customers in most industries. New customers use these portals to create current, accurate master data describing them. And during subsequent log-ins, new and existing customers are periodically prompted to verify their contact information.

I certainly don’t think the rise of self-service data preparation (and ongoing self-service data management) obsolesces the need for automated data quality processes to be built and maintained. However, enterprises do need to consider the growing number of self-service options available when choosing the best method for creating and maintaining the high-quality data their critical business applications depend on.