The value of data quality

1

While perusing my news feed , I was intrigued to see a blog post from IT Business Edge's Loraine Lawson titled "Is Data Quality Worth the Cost?" For a yes/no question, this is one that screamed for an easy and confident "Yes!" That is, until I thought, "Wait, why is she asking? What's the angle here?"

Loraine's blog post, which references an interesting Information Week column by Rajan Chandras, asks a good question. Is data quality worth the time, effort and (to Rajan's point) process interruptions caused by the effort to find and eliminate poor-quality data? It's a valid question that practically every IT staff faces.

Rajan proposes that "not everything needs to wait upon a big-bang data quality initiative. It's not a bad idea to take the occasional step back and ask yourself what business value can be obtained from data as is." This is good advice, especially as big data enters the equation. There may be some types of data, like highly repeatable data like machine-generated information, that arrives in an "acceptable" state.

As Rajan and Loraine both note, the awareness of data quality has increased in recent years. A long time ago (OK, maybe it was just a decade ago), I heard a data quality practictioner explain that he didn't wake up every day anxious to clean up data for the sake of making it clean. There was always a business imperative driving him. He was buidling a new CRM system, and if bad data was at its core, then customer service reps might not know which products a customer owned. Or if it was a high-value customer eligible for discounts or a higher level of support.

In this data quality practitioner's view, poor-quality data had a material impact on business processes. This was similar to what I saw in the same time frame across many organizations. Ten years ago, a data quality program likely evolved from the failure (or, at least, poor performance) of a business process or a business application.

Today, it's gratifying that people now view data quality as a necessary imperative for any data-driven effort. That's a huge step forward because, as I discussed previously, the changing nature of data (read: big data) requires a smart approach to data management.

Rajan also covered an interesting element of data quality – how do you know when you've had enough. In one example he cited, social media data had initial accuracy rates of between 70 and 90 percent. That is good, and in some instances, it's good enough for that program. However, some types of data – such as medical records or tax information – require higher degrees of data quality.

This has become an interesting topic among data quality teams. How do you define "good data quality?" To define what is acceptable for data quality, IT can work with the line-of-business to assign a reasonable level quality. Companies can use various attributes of data quality, such as completeness or accuracy, to provide a full view of the overall "health" of the data.

Some organizations have even established a service level agreement (SLA) approach to data quality, where business groups can expect a certain percentage of good-quality data qccording to their SLA with IT. Dylan Jones from Data Quality Pro has a terrific article about data quality SLAs, which explains this much better than I ever could.

The data quality market has matured, and while organizations have gotten smarter about data quality, I still get the sense that many organizations are struggling. To Rajan's point, they are times when they are trying to fix something that isn't entirely broken. Other times, they may be unsure where to start, even as the amount and complexity of data increases.

Data quality will continue to have a role in today's IT groups. Better data leads to better results, and it removes the complexity and some of the rework from a variety of processes. It may not be right for every occasion. But its's worth a look, especially when the data involved can have a substantial impact on the organization.

 

Share

About Author

Daniel Teachey

Managing Editor, SAS Technologies

Daniel is a member of the SAS External Communications team, and in his current role, he works closely with global marketing groups to generate content about data management, analytics and cloud computing. Prior to this, he managed marketing efforts for DataFlux, helping the company go from a niche data quality software provider to a world leader in data management solutions.

1 Comment

  1. Pingback: The Fallacy of Defect Prevention | The Data Roundtable

Leave A Reply

Back to Top