"Tell me and I forget. Teach me and I remember. Involve me and I learn."
—Benjamin Franklin
In my analytics capstone class, my students learn quickly one of the most basic yet important lessons of analytics: Organizations typically struggle when it comes to data quality and accuracy. Of course, I tell them this in class and I cover it in Analytics: The Agile Way. Still, there's really no substitute for their independently learning this critical bromide. I smile when I hear them say things at the end of the semester such as "Garbage in, garbage out."
This begs the question: How can organizations improve data accuracy?
Allow me to answer this question in four ways.
Minimize manual intervention...
If a decade of consulting taught me one thing, it's this: Data quality varies inversely with the number of people who touch the data. By touch, I mean edit. Call this Simon's Law of Data Accuracy.™
Sure, there are exceptions, but I've seen this movie many times before.
...but do not completely eliminate the human element.
So no one should ever even look at the data, right? (Cue This Is Spinal Tap reference.)
Hardly. Many automated business processes benefit from automation but it's folly to think that all do. Simply letting a machine do its thing can be a recipe for disaster – even death. Human intervention can be a good thing. Put differently, routinely auditing processes and creating simple visualizations can spot key errors. Even if they merely confirm the accuracy of current automation, isn't the increased confidence worth it?
Increase the frequency of checks
As Dylan Jones wrote on this site five years ago:
A simple technique that you can adopt is to increase the frequency and quality of reality checks. The technique works on the basis that every item of data will experience data quality degradation if there are no reality checks at some point in the future.
Jones' words continue to hold water in 2019. Sure, social security numbers and birth dates don't change, but certain fields expire: addresses, names, phone numbers, e-mail addresses and the like.
Increase the number of checks
Aside from the frequency, increasing the number of checks can also pay enormous dividends. I can think of several of my consulting gigs in which I would write scripts that would generate and send the results of database queries to my clients. Even if the result set contained zero records, my clients appreciated knowing that errant data would not contaminate their shiny new enterprise systems. Measure twice and cut once.
Simon says: Improving data accuracy is possible but takes time.
Data accuracy isn't nearly as sexy as machine learning, blockchain and any number of newfangled technologies. Make no mistake, though: The downsides of even a few discrepancies can be enormous.
Feedback
What say you?
Get a free e-book: Data management with SAS