Is effective data governance possible in an era of big data?

"A man's gotta to know his limitations."
—Clint Eastwood as "Dirty" Harry Callahan, Magnum Force

Let's go back in time to 2005, well before the arrival of what we now call Big Data.

A decade ago, YouTube didn't even exist. Facebook was still limited to college students. No one talked about cloud computing.

Seems like a long time ago, right?

Today, technology is like dog years, but on steroids. A year can yield tremendous changes in business models, competition, and external environments, never mind a decade. (Don't believe me? Just ask Blackberry's management.) I've said it many times: the long term has never been shorter.

This is true in all aspects of business, and data governance is certainly no exception. At the heart of this post lies a simple question: Is "traditional" data governance possible in an era of big data?

Data governance was never easy

Before examining the viability of "complete" data governance today, let's take another brief but more focused stroll down memory lane. As a 2013 Rand Survey manifested, many large organizations have never even developed a formal data-governance plan.

Why is this the case? In short, data governance is not a "solution" that is crammed into an organization. It's rife with cultural and management issues. As such, it is hard. Really hard. As Kimberly Nevala wrote on this site:

'One size doesn’t fit all' is a well-known refrain in the data governance community. Typically, this well-worn but evergreen adage is applied when discussing organizational structures. Two companies in the same industry, of like size and means and with similar objectives can take drastically different approaches for instantiating data governance within their organizations. Culture, organizational maturity and incumbent practices all influence the shape of the program to come.

Think about this for a moment. Many if not most organizations fared poorly attempting to govern Small Data (read: structured information internal to the enterprise). You know, relational database-friendly stuff that they owned, generated, hosted and controlled.

New data sources, new challenges

Against this discouraging backdrop, consider the following queries: How does an organization effectively "govern" data that is largely external to the enterprise? (That is, data over which it exerts very little if any control.) And what if that data is largely unstructured and subject to all sorts of biases?

Here I am primarily referring to three types of data, some of which overlaps:

Social data
Linked data
Open data (For more on this, see my interview with Joel Gurin on his excellent book Open Data Now.)

For instance, let's say that your organization relies upon data related to airline performance and the causes of flight delays. You could try to independently collect this data, but that type of cost and effort is hard to justify when it's available right now for free. But is it accurate? And how can you verify that? What about insuring that any errors are permanently resolved? At the risk of stating the obvious, certain parties (read: airlines) may not exactly be forthcoming with data related to delays and causes.

Simon Says: Is it possible to govern the ungovernable?

I'm not discounting the importance of DG centers of excellence and boards. To be sure, they remain valuable. After all, it's never been more important to govern what you can.

Those last three words are essential. In an era of Big Data, organizations cannot govern all data. With some data sources, they shouldn't even try. Put differently, they need to heed Dirty Harry's dictum.

Feedback

What say you?

Blogs