How big of a deal is big data quality?

Data quality has always been relative and variable, meaning data quality is relative to a particular business use and can vary by user. Data of sufficient quality for one business use may be insufficient for other business uses, and data considered good by one user may be considered bad by others. This is the reason we cannot define how much quality data needs in general terms applicable to all data sources, business purposes and users. Perfect is also the enemy of good – therefore all data quality improvement efforts eventually reach a point of diminishing returns when continuing to chase after perfect data has to give way to calling data good enough. These data truths are universal, even within the rapidly expanding big data universe.

Big data’s biggest bang has been its explosion in the volume and variety of external data, where you have less control over the quality of data, and even in some cases limited ability to verify the quality of the data. With external data, bad data may be as good as you can get, or you may simply have to use data as-is. It’s important to note, though, that big data has also created new use cases. This is especially the case in aggregate analytics, where sometimes bigger, lower-quality data is better. In fact some aggregate analytics do not compensate for, or even check for, data quality issues. This is obviously not advisable for all business applications, but there's no use denying it works in some cases, such as the five-star ratings on Netflix.

How big of a deal big data quality is has to be taken on a case-by-case basis. Data quality standards for some big data (e.g., social media data) will be lower than the standards for other data (e.g., master data). However, even when lower data quality standards are applied to it, the value of big data often comes from using it to supplement data held to higher data quality standards (e.g., integrating social media data with master data).

Blogs

Blogs

How big of a deal is big data quality?

About Author