Last time we explored consumption and usability as an alternative approach to data governance. In that framework, data stewards can measure the quality of the data and alert users about potential risks of using the results, but are prevented from changing the data. In this post we can look at a different approach that centers on enrichment – namely, the ability to make changes to acquired data sets to achieve the desired analytical results.
This governance perspective focuses on massaging the data to meet the business needs. It involves methods that are often referred to as data cleansing or correction within a completely controlled environment. However, when you do not control the creation of the data, “fixing” the data is actually making it inconsistent with its original form. The degree to which the data can be modified differs depending on the tolerance the users have to this type of inconsistency.
Enrichment is a good way to draw a line in the sand. There is a difference between the corrective actions of cleansing with the “polishing” actions of standardization. For example, cleansing might use entity identification, matching and linkage to identify where there are slight duplicates used for individual names and then change those corresponding data instances to the same name. On the other hand, location descriptions can be subjected to address standardization so that street designation errors or mistyped zip codes can be corrected.
The goal of an enrichment perspective is to determine how user requirements can direct controlled adjustments to data to increase its usability. More to the point: enrichment goes beyond the approach of reporting levels of acceptability for usability, and adds value to massive data sources by making the data usable when the flaw can be deterministically identified.
This post and my previous one looked at two alternative philosophies that can be applied to data governance for big data, specifically when the data creation processes are beyond the barriers of the organization. In upcoming posts we will look at some other alternatives for big data governance.