Can data governance stop data lakes from becoming data swamps?


At a time when most of the buzz about big data is focused on the technology and the exciting opportunities that it creates, I find that little attention is being given to the way that big data is impacting the culture of organizations. In particular, I'd like to see more discussions on how organizations should adapt to prevent their data lakes from becoming data swamps!

The adoption of big data technologies has the potential to radically change the way organizations make decisions and how the business and IT collaborate in managing data assets to create useful insights.

New data sources and new business scenarios are calling for an even stronger data governance framework.

Traditionally, organizations use data to make decisions by first defining goals, and then gather the data needed to achieve those goals. For instance, I need to know what customers are buying on my website and how, so I gather navigation logs and aggregate this with data from the ERP to report on pre-defined metrics.

traditional data processing

Many big data initiatives today are based on the promise that new insight will magically transpire from aggregating and analyzing vast amounts of internal and external data, and the questions or metrics are generally unknown at the start of the process.

big data processing

But surely, this dramatic shift in decision making paradigm will not deliver the expected results without defining some of the key questions that the business wants to answer to improve the bottom line in the first place. The risk is to drown in the big data wave with no shore in sight!

Big data is changing the way the business and IT collaborate to manage data assets.

In the traditional approach, business users determine what questions to ask and IT prepares the infrastructure and the data to answer that question. This process is usually iterative as business users are typically not able to define to IT well enough what they really need and the first attempt doesn’t lead to the final solution.

In the big data approach, IT delivers a platform that consolidates data sources of interest. Then the business users use the platform to explore data for ideas and questions to ask. This creative discovery also lets the business users assess whether the data set they are using is fit for their specific purpose.

It is worth noting that both approaches are useful and likely to be used in parallel for a very long time. Big data platforms are not going to replace traditional EDW (Enterprise Data Warehouses), but they will complement each other by bringing new capabilities and business opportunities, as well as offloading the pressure from the expensive EDW.

how big data is changing IT business collaboration

In a nutshell, big data brings new data sources (internal or external) into play, to be used as part of new business scenarios, providing organizations with new ways to leverage data as a strategic asset to gain a competitive advantage.

big data calling for new data governance

Those new data sources and new use cases are driving new requirements for data governance bodies. The objectives of data governance do not really change but big data requires us to seriously consider a few points:

  • Linking to new data sources, especially for external sources and unstructured data, will put the data out of reach of typical data governance programs, with the inability to enforce the standards and data quality controls usually done at the source.
  • Bringing vast amounts of data in a data lake will raise questions around privacy and regulations. Do we have the right to store this data? For how long? Who should have access to it? And how are we allowed to use it?
  • Trying to enforce the same level of quality for big data might annihilate the benefits expected from big data initiatives around the speed of data integration and the ability to handle data streams in real-time. There is clearly a balance to be found between the data quality imperative and the benefits of big data velocity.

Ignore those questions when embarking on a big data journey and you might find yourself in the middle of a big data swamp!

Register now for a free, 30-minute big data webinar. Or learn more about SAS vision for data governance and big data.


About Author

Olivier Penel

Advisory Business Solutions Manager

With a long-lasting (and quite obsessive) passion for data, Olivier Penel strives to help organizations make the most of data, comply with data-driven regulations, fuel innovation with analytics, and create value from their most valuable asset: data. As a global leader at SAS for everything data management and privacy-related, Penel enjoys providing strategic guidance, and sharing best practices and experiences in using data governance and analytics as a catalyst for digital transformation.

Comments are closed.

Back to Top