Yes. For those keeping score at home, this is my second post in a row starting with a one-word answer to its questioning title. In this case, it’s a question that’s asked a lot and for good reason since big data raises big questions for all data-related disciplines.
In general, data governance provides the guiding principles and context-specific policies that frame the processes and procedures of data management. One challenge to governing big data, as Phil Simon recently blogged, is that much of it is largely external to the enterprise. As such, the organization can exert very little, if any, control over many sources of big data, complicating, if not negating, aspects such as data ownership and data stewardship.
Another challenge, as Faramarz Abedini recently blogged, is the unstructured nature of most its data sources makes the content definition (e.g., business terminology) and content standards (e.g., data quality rules) aspects of a data governance framework difficult to apply to big data in the same way it is applied to traditional data, such as the master data describing customers involved in the transaction data describing purchases of the organization’s products and services. Principles, policies, and procedures proven effective for governing this data might not be applicable, or fully enforceable, with big data, especially when organizations attempt to integrate big data into existing, and well-governed, applications.
This doesn’t mean that big data can not be governed. It just means big data often can not be governed by the same rules that apply to other data, even within the same domain or subject area. For example, a customer’s social media data can not be governed by the same rules as a customer’s master and transaction data. Governing external sources of big data is more analogous to a treaty than a law, and data governance principles and policies, as well as data management processes and procedures, must accept this reality. Governing big data also requires accepting data quality standards for big data will vary and very often be lower than the standards for other data.
Without question, big data can be governed. But no one should question whether governing big data will be easy.