Is data quality a component of data preparation? Or vice versa?


man considers data quality versus data preparationCritical business applications depend on the enterprise creating and maintaining high-quality data. So, whenever new data is received – especially from a new source – it’s great when that source can provide data without defects or other data quality issues.

The recent rise in self-service data preparation options has definitely improved the quality of data from some sources. That's especially the case with self-service portals that allow customers to maintain, for example, their current postal addresses, email addresses, phone numbers and preferred contact method. Of course, such options are not available for every source. And even when they are, they don't obsolesce the need to build and maintain automated data quality processes, such as those incorporated into enterprise data warehouses and master data management hubs.

Analytics is a great example of when additional preparation is needed before data can be put to a specific use – or, perhaps more precisely – reused for purposes other than the one that framed its initial preparation.

Consider an enterprise data warehouse that integrates multiple sources into a single data model optimized for querying and standard reporting. To achieve this, a standard set of transformations and data quality rules were applied to the source data. Ad hoc analysis or open-ended discovery of the same data often needs to return to the sources to perform customized data preparation. As such, analytics exemplifies the fitness for the purpose of use definition of data quality. Or what David Loshin has referred to as the problem-solver approach to data preparation, where the analyst decides what data from which sources will be included in the analysis, and what transformations and data quality rules need to be performed on the data.

All of this begs the question: Is data quality a component of data preparation? Or vice versa? Since preparing data often involves more than just verifying its quality, it could be argued that data quality is a component of data preparation. Conversely, since high-quality data makes everything done with it better, it could be argued that data preparation is a component of data quality.

As someone who has spent the majority of his career in data quality, I obviously favor the latter view. But I will happily accept the former view as long as data quality is always taken into consideration during data preparation.

Download a paper about 5 data management for analytics best practices


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Related Posts

Leave A Reply

Back to Top