The First Law of Data Quality explained the importance of understanding your Data Usage, which is essential to the proper preparation required before launching your data quality initiative.
The Second Law of Data Quality explained the need for maintaining your Data Quality Inertia, which means a successful data quality initiative requires a program – and not a one-time project.
The Third Law of Data Quality explained a fundamental root case of data defects is assuming data quality is someone else’s responsibility, which is why Data Quality is Everyone’s Responsibility.
The Fourth Law of Data Quality explained that Data Quality Standards must include establishing standards for objective data quality and subjective information quality.
The Fifth Law of Data Quality explained that a solid Data Quality Foundation enables all enterprise information initiatives to deliver data-driven solutions to business problems.
The Sixth Law of Data Quality
“Data quality metrics must be aligned with business insight.”
When the correlation between poor data quality and poor business performance isn’t measured in a tangible way, data quality is misperceived as a technical activity performed for the sake of the data, instead of a business activity performed to provide data-driven solutions for business problems.
Business-relevant metrics align data quality with business objectives and measurable outcomes.
There are many data quality metrics, which are alternatively referred to as data quality dimensions. In her great book Executing Data Quality Projects, Danette McGilvray provides a comprehensive list of data quality metrics, which include the following:
- Timeliness and Availability – A measure of the degree to which data are current and available for use as specified and in the time frame in which they are expected.
- Data Coverage – A measure of the availability and comprehensiveness of data compared to the total data universe or population of interest.
- Duplication – A measure of unwanted duplication existing within or across systems for a particular field, record, or data set.
- Presentation Quality – A measure of how information is presented to and collected from those who utilize it. Format and appearance support appropriate use of the information.
- Perception, Relevance, and Trust – A measure of the perception of and confidence in the quality of data, i.e., the importance, value, and relevance of the data to business needs.
Although there are many additional data quality metrics (as well as alternative definitions for them), perhaps the two most common data quality metrics are Completeness and Accuracy.
Completeness is generally a measure of the presence of an actual data value within a field, excluding NULL values and any non-NULL values indicating missing data (e.g., character spaces). Completeness can also be used as a measure of the absence of some of the sub-values that would make a field complete (e.g., a telephone number in the United States missing the area code). Either way, completeness is not a measure of the validity or accuracy of the values present within a field.
There is a subtle, but important, distinction between the related notions of validity and accuracy.
Validity is the correctness of a data value within a limited context such as verification by an authoritative reference. Accuracy is the correctness of a valid data value within an extensive context including other data as well as business processes.
Validity focuses on measuring the real-world alignment of data in isolation of use. Accuracy focuses on the combination of the real-world alignment of data and its fitness for the purpose of use.
A common mistake made by those advocating that data needs to be viewed as a corporate asset is measuring data quality independent of its business use and business relevance, which is why most data quality metrics do a poor job in relaying the business value of data quality. Without data quality metrics that meaningfully represent tangible business relevance, you should neither expect anyone to feel accountable for providing high quality data, nor expect anyone to view data as a corporate asset.
Therefore, every data quality metric you create must be able to answer two questions:
- How does this data quality metric relate to a specific business context?
- How does this data quality metric provide business insight?
I just came across your set of posts. Thanks for them, Jim. I recently wrote a presentation on this same subject for a local Salesforce users group. It seems lots of folks are talking about data quality, but not so many are jumping on it.
We're actually trying to bring some data quality solutions to market--only have a free app for dedupe ready now (http://www.dupecatcher.com), but hope to have some more sophisticated tools ready by fall. Tools though, are only half of the equation.
I'll keep following.