Ready to be disillusioned? Big data and Hadoop enters the next phase

0

If you work in the software industry, seeing technologies emerge and catch fire is a great spectator sport. Whether it's a programming language, a platform or something like e-commerce, each new wave ripples throughout the industry. Currently, Hadoop is having its time in the sun, and we are all trying to figure out the far-reaching effects from the most notable big data platform.

As I wrote in an earlier post, big data technologies like Hadoop appear to be maturing and spreading very quickly. Why? I think it comes down to the right solution for the right problem. Hadoop met a thorny issue (massive amounts of data, now known as big data) with a relatively inexpensive and uncomplicated solution (open source software running on clusters of commodity hardware).

Recently, InformationWeek reported on Gartner's latest "Hype Cycle for Big Data, 2013." This cycle, the article explains, is a "way of communicating the degree of hyperbole versus productivity associated with emerging technologies."  The full picture of this cycle, which includes a variety of cloud and big data technologies, is available here.

What I like the Hype Cycle format is that it shows the relationship between the buzz ("This is great!") and the reality of any technology ("Wait, wasn't this supposed to fix everything?"). Currently, the more established big data tools like Hadoop distributions and in-memory database management systems heading to the "trough of disillusionment." This rather dour term follows the "peak of inflated expectations," where the buzz is at its highest.

If you think about the news coverage of Hadoop in the past 2-3 years, this makes perfect sense. Issues related to Hadoop, and big data in general, have dominated the news, as organizations begin to try out the tools and see what they can and can't do. This naturally leads to articles about these efforts as well as "what if" articles about the potential of big data.

What's interesting from a careful examination of big data cycle is the time between reaching the trough and reaching the "plateau of productivity" is rather short. For Hadoop distributions, it's in the 2-5 year range. For a technology that didn't exist until 2005, that is an impressive rise to maturity.

The 2-5 year range squares with the TDWI "Managing Big Data" report, as well as other surveys I've seen in recent months. Most of these reports have shown about a quarter of companies with early Hadoop implementations, with a majority of organizations expecting to use these technologies within a few years. These are just measures of adoption, but as more people figure out how to make it work, productivity won't be far behind.

So, what do companies need to do to guide their organizations through – and past – disillusionment? Here are a few things that are emerging.

  • Start focusing on bringing meaning from big data. A hot topic for many early adopters of Hadoop and other big data frameworks is trying to visualize the information in all its bigness. The next phase will be about analyzing that data to a larger degree, performing descriptive or predictive analytics on bigger data sets. At that point, big data can tell you not just where you've been, but where you're going.
  • Investigate the use data quality or data governance techniques... when necessary. One of the unique things about big data is that data scientists often like to use it in its "raw" state. This is perfectly normal for early big data environments, when aggregation and processing are put to the test. However, as companies start to integrate big data (or the learnings from big data), it will be important to have useful information. For example, if you get massive amounts of social data, this could lead to problems resolving identities across your customer base. At the core, there will be some data quality issues throughout this process.
  • Understand the people behind big data. We've all heard about data scientists in the past few years. This emerging breed is a hot commodity, but as my colleague Steve Putman pointed out, they have a very different viewpoint from other data-focused roles in the organization, like data stewards. It will be important to define roles and start to cultivate the new data scientist role, including the advent of new training programs and majors at colleges around the world.

Look for more on these topics in the future.

 

Share

About Author

Daniel Teachey

Managing Editor, SAS Technologies

Daniel is a member of the SAS External Communications team, and in his current role, he works closely with global marketing groups to generate content about data management, analytics and cloud computing. Prior to this, he managed marketing efforts for DataFlux, helping the company go from a niche data quality software provider to a world leader in data management solutions.

Leave A Reply

Back to Top