Traditional data governance is all about establishing a boundary around a specific data domain. This translates to establishing authority to define key business terms within that domain; establishing business-driven decision making processes for changing the business terminology and the rules that apply to them; defining content standards (e.g., metadata and data quality rules); and outlining an ongoing process for measuring and monitoring.
The recent data explosion highlights the point that data governance is critical to organizations' success. In fact, the need for a mature data governance framework is accepted more than ever. But despite this acknowledgement, established methods for governing data have not been challenged or altered.
Think about the role of a data scientist. Data scientists are tasked with exploring new data sources and trying to glean nuggets of gold out of mountains of sludge. In the process, they use any shortcut they can, often bypassing data governance processes. In the big data space, exploration is the norm. So a well-established business definition could likely be challenged at the very outset.
Consider a casino and gaming organization. The most valuable customer in this business might be defined as the player who spends the most money at the gaming table. But a quick analysis by the data scientist may reveal flaws in this definition. Maybe analysis of the data reveals that the most valuable customer is actually the one who pays the highest rate for the hotel room, gambles a little (but also attends shows), spends a day at the spa and orders room service three times daily. In this scenario, there must be enough flexibility in the data governance program to quickly review and approve the new definition.
The unstructured nature of most big data sources may make it feel a bit cumbersome to render the content definition aspects of the data governance framework. In turn, defining content standards such as data quality rules and metadata may seem very challenging.
One new twist introduced with the advent of data science and big data is in the area of algorithms and models. Data scientists by nature must try new algorithms and models on data to derive meaning. Usually, if not always, these algorithms or models do not go through a rigorous governance process. But the results could be disastrous if business decisions were based on analyses of algorithms and models that had not been approved by data stewards or vetted by data governance committees.
Clearly, it is important to vet all algorithms and models used on big data sets within a governance framework. In this era of big data, it's more important than ever for your data governance framework to be well established. But it needs to be agile enough to accommodate new discoveries made while analyzing that big mountain of data.