Can big data be governed?


Yes. For those keeping score at home, this is my second post in a row starting with a one-word answer to its questioning title. In this case, it’s a question that’s asked a lot and for good reason since big data raises big questions for all data-related disciplines.

In general, data governance provides the guiding principles and context-specific policies that frame the processes and procedures of data management. One challenge to governing big data, as Phil Simon recently blogged, is that much of it is largely external to the enterprise. As such, the organization can exert very little, if any, control over many sources of big data, complicating, if not negating, aspects such as data ownership and data stewardship.

Another challenge, as Faramarz Abedini recently blogged, is the unstructured nature of most its data sources makes the content definition (e.g., business terminology) and content standards (e.g., data quality rules) aspects of a data governance framework difficult to apply to big data in the same way it is applied to traditional data, such as the master data describing customers involved in the transaction data describing purchases of the organization’s products and services. Principles, policies, and procedures proven effective for governing this data might not be applicable, or fully enforceable, with big data, especially when organizations attempt to integrate big data into existing, and well-governed, applications.

1421435697166[1]This doesn’t mean that big data can not be governed. It just means big data often can not be governed by the same rules that apply to other data, even within the same domain or subject area. For example, a customer’s social media data can not be governed by the same rules as a customer’s master and transaction data. Governing external sources of big data is more analogous to a treaty than a law, and data governance principles and policies, as well as data management processes and procedures, must accept this reality. Governing big data also requires accepting data quality standards for big data will vary and very often be lower than the standards for other data.

Without question, big data can be governed. But no one should question whether governing big data will be easy.


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.


  1. Let’s recall the paradigm that quality of data is defined as "fitness for use". Yes, it is difficult to govern big (unstructured) data at the time of collection or while data is stored in the big data lake, as we probably don’t know the usage context yet. In my view governance should rather affect the big data refinery process to make sure the information that is derived from the lake of data is trustworthy. When sentiment is extracted from social media data, one needs to make sure there is appropriate social media data to make decisions on. With this in mind, the governance process shifts only within the data lifecycle to the usage stage, but I believe with still very similar governance techniques. Data is more structured at this stage, so similar governance technology and principals can be used as with traditional data.

  2. Pingback: Cinco beneficios de implementar una estrategia de Data Management que usted debe conocer - SAS Colombia

Leave A Reply

Back to Top