How big of a deal is big data quality?


Data confidence Scrabble blocksData quality has always been relative and variable, meaning data quality is relative to a particular business use and can vary by user. Data of sufficient quality for one business use may be insufficient for other business uses, and data considered good by one user may be considered bad by others. This is the reason we cannot define how much quality data needs in general terms applicable to all data sources, business purposes and users. Perfect is also the enemy of good – therefore all data quality improvement efforts eventually reach a point of diminishing returns when continuing to chase after perfect data has to give way to calling data good enough. These data truths are universal, even within the rapidly expanding big data universe.

Big data’s biggest bang has been its explosion in the volume and variety of external data, where you have less control over the quality of data, and even in some cases limited ability to verify the quality of the data. With external data, bad data may be as good as you can get, or you may simply have to use data as-is. It’s important to note, though, that big data has also created new use cases. This is especially the case in aggregate analytics, where sometimes bigger, lower-quality data is better. In fact some aggregate analytics do not compensate for, or even check for, data quality issues. This is obviously not advisable for all business applications, but there's no use denying it works in some cases, such as the five-star ratings on Netflix.

How big of a deal big data quality is has to be taken on a case-by-case basis. Data quality standards for some big data (e.g., social media data) will be lower than the standards for other data (e.g., master data). However, even when lower data quality standards are applied to it, the value of big data often comes from using it to supplement data held to higher data quality standards (e.g., integrating social media data with master data).

Big data quality footer image


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top