The Eighth Law of Data Quality


The First Law of Data Quality explained the importance of understanding your data usage, which is essential to the proper preparation required before launching your data quality initiative.

The Second Law of Data Quality explained the need for maintaining your data quality inertia, which means a successful data quality initiative requires a program – and not a one-time project.

The Third Law of Data Quality explained a fundamental root case of data defects is assuming data quality is someone else’s responsibility, which is why data quality is everyone’s responsibility.

The Fourth Law of Data Quality explained that data quality standards must include establishing standards for objective data quality and subjective information quality.

The Fifth Law of Data Quality explained that a solid data quality foundation enables all enterprise information initiatives to deliver data-driven solutions to business problems.

The Sixth Law of Data Quality explained that data quality metrics must meaningfully represent tangible business relevance in order for high-quality data to be viewed as a corporate asset.

The Seventh Law of Data Quality explained you must prioritize data quality improvement efforts by determining the business impact of data quality issues before taking any corrective action.

The Eighth Law of Data Quality

“Often, big data is messy, varies in quality,” explained Viktor Mayer-Schonberger and Kenneth Cukier in their book Big Data: A Revolution That Will Transform How We Live, Work, and Think.  “With big data, we’ll often be satisfied with a sense of general direction rather than knowing a phenomenon down to the inch, the penny, the atom.  We don’t give up exactitude entirely; we only give up our devotion to it.  What we lose in accuracy at the micro level we gain in insight at the macro level.”

I have previously blogged about how big data forces us to re-think data quality.  For example, there will be times when the macro-level shallow insights of a large group of unqualified strangers will trump the micro-level deep insights of a small group of qualified experts.

According to Mayer-Schonberger and Cukier, data quality is one of the major shifts in mindset required by big data, calling for “a willingness to embrace data’s real-world messiness rather than privilege exactitude.”  When dealing with smaller amounts of data, “reducing errors and ensuring high quality of data was a natural and essential impulse.  Since we only collected a little information, we made sure that the figures we bothered to record were as accurate as possible.”

However, this “obsession with exactness is an artifact of the information-deprived analog era,” Mayer-Schonberger and Cukier argued.  “When data was sparse, every data point was critical, and thus great care was taken to avoid letting any point bias the analysis.  Today we don’t live in such an information-starved situation.  In dealing with ever more comprehensive datasets, which capture not just a small sliver of the phenomenon at hand but much more or all of it, we no longer need to worry so much about individual data points biasing the overall analysis.  Rather than aiming to stamp out every bit of inexactitude at increasingly high cost, we are calculating with messiness in mind. In more areas of technology and society, we are leaning in favor of more and messy over fewer and exact.”

I know that many (if not most) data quality professionals will be screaming in disagreement at the points they raise, but Mayer-Schonberger and Cukier provided several excellent examples in their book (especially in the chapter appropriately titled “Messy”), and, please note, they are definitely not suggesting that what works for many big data applications will work for all data applications.

Therefore, the Eighth Law of Data Quality is:

“A smaller data quality emphasis SOMETIMES enables bigger data-driven insights, which means that SOMETIMES using a bigger amount of lower-quality data is better than using a smaller amount of higher-quality data.”


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

1 Comment

Leave A Reply

Back to Top