Big data quality – Part 2


students using new tools to understand streaming dataBig data poses some interesting challenges for disciplines such as data integration and data governance, but this blog series addresses some of the most common questions big data raises related to data quality.

How much quality does big data need?

Perspectives about data quality have always been relative and variable. Data quality is relative to a particular business use and can vary by user, with one user’s good data being considered another user’s bad data. Perfect is also the enemy of good – and all data quality improvement efforts eventually reach a point of diminishing returns when chasing after perfect data gives way to calling data good enough.

Data quality standards for big data will also vary and will very often be lower than the standards for other data (e.g., master data), especially since big data has triggered an explosion in the volume and variety of external data, where bad data is often as good as you can get. It’s important to note that big data has created new use cases, especially in aggregate analytics, where sometimes bigger, lower-quality data is better. In fact some aggregate analytics do not compensate for, or even check for, data quality issues. This is obviously not advisable for all business applications, but it’s no use denying it works in some cases, such as the five-star ratings on Netflix.

Big data didn’t create the concept of validating lower data quality standards. The history of consumer electronics provides three examples (videotapes, MP3 files and digital photography) that demonstrate how it doesn’t always pay to have better data quality. While less than ideal, poor quality data can still provide business insights. A comparison I have previously made is that big data is like fast food, which uses neither quality ingredients nor cooking skill, but can still satisfy your hunger. This is analogous to when a data-driven decision based on data of questionable quality still created a good business result. It’s also important to consider that even when big data has high quality it does not guarantee big insights. Weather prediction is a great example.

Big data forces us to re-think data quality. However, even when lower data quality standards are applied to it, the value of big data often comes from using it to supplement data held to higher data quality standards.

In Part 3 of this series, I'll wrap up my discussion about big data quality.

Download a paper about data management best practices

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top