Big data and data enrichment

0

Last time we explored consumption and usability as an alternative approach to data governance. In that framework, data stewards can measure the quality of the data and alert users about potential risks of using the results, but are prevented from changing the data. In this post we can look at a different approach that centers on enrichment – namely, the ability to make changes to acquired data sets to achieve the desired analytical results.

This governance perspective focuses on massaging the data to meet the business needs. It involves methods that are often referred to as data cleansing or correction within a completely controlled environment. However, when you do not control the creation of the data, “fixing” the data is actually making it inconsistent with its original form. The degree to which the data can be modified differs depending on the tolerance the users have to this type of inconsistency.

Enrichment is a good way to draw a line in the sand. There is a difference between the corrective actions of cleansing with the “polishing” actions of standardization. For example, cleansing might use entity identification, matching and linkage to identify where there are slight duplicates used for individual names and then change those corresponding data instances to the same name. On the other hand, location descriptions can be subjected to address standardization so that street designation errors or mistyped zip codes can be corrected.

The goal of an enrichment perspective is to determine how user requirements can direct controlled adjustments to data to increase its usability. More to the point: enrichment goes beyond the approach of reporting levels of acceptability for usability, and adds value to massive data sources by making the data usable when the flaw can be deterministically identified.

This post and my previous one looked at two alternative philosophies that can be applied to data governance for big data, specifically when the data creation processes are beyond the barriers of the organization. In upcoming posts we will look at some other alternatives for big data governance.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top