Big data integration - A good starting point for data governance?

3

In the UK, technology trends move a little slower than for our US counterparts. It was about 5 years ago when I first met a data leader at a conference on this side of the pond who was actively engaging in large scale big data projects.

This wasn’t a presenter or big-name draw to the event. My "Big Data Scoop" was uncovered during a break-out coffee and danish session – a fertile ground for me to uncover new stories for Data Quality Pro.

The data leader in question worked for a telecom firm in the UK (not BT). Their Hadoop project was intriguing, and he seemed quite puzzled as to why he was embarking on a big data project. He confided that "it just seemed the right thing to do because all the other telcos were doing it."

We chatted further and I discovered that their approach was at best described as Agile but could also easily be classed as haphazard and disjointed.

They were gluing huge data sets of call data records, social media data, demographic data and any other consumer data they could find, in-house or externally, "to see what came out of it."

To me, it was exactly the kind of project I revelled in. The chance to match art and science, to break conventions and uncover some hidden golden nuggets of insight from the disparate spaghetti junction of data sets.

But I could tell that the reality for the data leader was very different.

He came from an architectural background where order, strategy, structure and dare I say it, governance, were the order of the day. Looking at his LinkedIn profile he appears to have given it 12 months on the project and moved on.

Speaking to other seasoned big data practitioners, this story seemed to typify the early days of big data projects. Lots of lofty goals that never really materialized in tangible gains.

But things have clearly changed.

Fast forward to today, and big data technology is far more mature. Data integration capabilities are equally sophisticated and have moved on from the limitations of ETL to cope with the volume, variety and velocity demands of big data.

The 3Vs of big gata have been common parlance in recent years – but Gartner analyst Alex Borek, in a recent Virtual Summit presentation, told an online audience that the discussion has shifted away from the tech and further toward how to make tangible and sustainable profit from big data. Alex talks of the shift in focus to the 3M’s - "Make Me Money." This is a sign that the C-level are getting onboard with big data but want to see results.

Still, there are data integration challenges with linking the new world of big data into our traditional data management landscape:

  • Rapidly changing, externally sourced data can lead to difficulty in managing the lineage and quality of data over time.
  • Structured and unstructured data sets need to be understood, managed and integrated.
  • Big data systems are often a moving target, created in Agile cycles as opposed to the slow and steady developments witnessed in traditional databases.
  • The focus of big data is often to manage data flows and rapidly streaming data while connecting it to static, slow-moving data found in legacy databases, so that a complete narrative of the data can be found.

All of these challenges pose the question of whether data governance can be applied to big data?

Jim Harris answered this same question recently but it’s interesting that the question even needs to be asked. It’s almost as though many people view big data as out of bounds for data governance initiatives. The reality is that big data initiatives need the same governance focus as any traditional data landscape, if not more.

You only need to look at the typical spreadsheet landscape of banking and insurance firms to observe what happens when unstructured and external data sources proliferate. In one insurance firm that we featured on Data Quality Pro, they discovered 2,500+ spreadsheets that had no lineage recorded, no steward assigned and no quality measures monitored. A large proportion of the data also came from external sources.

1439322506230Big data integration can be a perfect starting point for your data governance initiative. Create a ring-fence around your big data stores and only accept data across the data integration gateways that has a clearly defined lineage, stewardship and quality management process. It's OK to experiment and innovate within your big data sandboxes. But as soon as you start to integrate with conventional data sets that underpin your business, you better make sure solid governance is in place.


SAS is a leader in Gartner Magic Quadrant for data integration tools for the fifth consecutive year.

Share

About Author

Dylan Jones

Founder, Data Quality Pro and Data Migration Pro

Dylan Jones is the founder of Data Quality Pro and Data Migration Pro, popular online communities that provide a range of practical resources and support to their respective professions. Dylan has an extensive information management background and is a prolific publisher of expert articles and tutorials on all manner of data related initiatives.

Related Posts

3 Comments

  1. Rasananda Behera on

    Yes, that;s absolutely right.

    There are data integration challenges with linking the new world of big data into our traditional data management landscape

  2. I completely agree as the (Big) Data Landscape continues to expand and be linked to more traditional data sets the need to govern and understand this landscape increase especially if the analysis that is being performed is going to be used for decisions that have to be audited and reproduced

Leave A Reply

Back to Top