Can you govern someone else’s data?

0

In my last post, we started to look at some of the issues with the concept of “big data governance,” especially when a large part of governance is intended to prevent the introduction of errors into data sets. Many big data analytics applications focus on the intake of numerous varied data sources acquired from external sources. By the time the data has been brought into the organization, it is basically too late to have any impact on the data creation process, so preventing errors is out the window.

In fact, the problem is much worse than that for two reasons. First, in many cases the data sets being used are not only created by parties outside the administrative domain, the internal users may have no idea where the data came from altogether. For example, public US federal transparency data sets published at www.data.gov are created solely for the purpose of posting the data to the web site, but the values populating those data sets may have come from numerous internal applications designed and implemented to support specific business functions, and the resulting data sets are effectively created without any of the original context.

That means that the actual details of the originating system are completely lost, often including the technical/structural metadata (such as data types and lengths) as well as the more important business metadata such as data element definitions and reference data domains. The user of the data is compelled to manufacture the semantics based on intuition and context, but not much else.

The second reason is that attempting to change the data means that you are potentially introducing biases into the data sets you are about to subject to analysis. This means that your hands are tied when it comes to performing any activity that potentially changes the meaning of the data, such as “cleansing” the data or eliminating duplicate records.

So you cannot control the creation of the data and are limited in correcting the data. But aren’t those the main objectives of data governance? Seems like we have a little bit of a conflict here…

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

Leave A Reply

Back to Top