Data governance and big data

1

The data quality and data governance community has a somewhat disconcerting habit to want to append the word “quality” to every phrase that has the word “data” in it. So it is no surprise that the growing use of the phrase “big data” has been duly followed by claims of the need for “big data quality” and “big data governance.”

Using internal data sources allows you to institute data quality monitoring and measurement within the data production flow. These inspections can be used to alert data stewards when some process is detected to be introducing data errors. At that point, the data steward can initiate some kind of remedial action to address the introduction of a data flaw to ensure the production of high-quality information.

However, the characteristics of the data sets and data streams used for big data analytics projects are somewhat different from those associated with typical data sources created as a byproduct of internal transaction processing or operational systems. These applications will often absorb massive data sets from external sources whose creation points are far removed from their various repurposed uses, way beyond the administrative authority of anyone within the company.

That means that the traditional mantras of the data quality experts (such as “validate the data at the source” and “eliminate data defects”) do not apply. There are no places within the data production stream to institute inspection and monitoring, nor does the data steward have any means of influencing the quality of the data production process. Essentially, what you see is what you get.

So what does it really mean to advocate for “big data governance”? You might say that any scenario in which you cannot exercise control over the production process is not truly amenable to governance, since there is no way to eliminate the root causes of any data failures. On the other hand, perhaps governance, oversight and stewardship need to be redefined for the concepts to be meaningful in the big data context.

Share

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at b-eye-network.com and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at mdmbook.com . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at loshin@knowledge-integrity.com.

1 Comment

  1. I would agree. Part of the excitement in big data is it's lack of governance and the skill (value add) organisations can provide is in managing this inherent lack of governance to extract value. However is this [lack of data governance] something new? Finance, Meteorology and Enginering have for years had to appraise the quality mutliple data feeds (internal and external to their organisation) to run thier business.

Leave A Reply

Back to Top