How to get started now with Hadoop

What do you get when you put a dozen big data pros around a table to discuss the real (not overinflated) effects of big data on the organization? An honest discussion around what big data is and what it isn’t, what Hadoop can accomplish and what it cannot.

Where did this conversation take place? At the Big Data Innovation Summit in Boston, MA, I had the privilege of sitting in on a roundtable discussion with Kirk Dunn, the Chief Operating Officer of Cloudera.  Expecting a larger audience, Kirk had prepared a formal presentation.  However, seeing the small group of about 12 eager participants, we gathered around a single table, creating an environment more akin to a fireside chat.  Ranging from a investment banker looking for the next big winner, to a data scientist looking at super collider outputs, the background of each attendees was as quite varied, as were the topics.  However three big themes were at play:

The same but different: Let the data tell you what questions to ask

Dunn sees similarities between this new era of big data and “data management in the 1980’s.”  Exciting, entrepreneurial, but actually not all that different.  That is, we’ve always had big data and big data problems.  However systems like Hadoop are beginning to change the way we approach these problems and the way we see our data.  For example, we’ve become used to starting an analytic project with a very clear and concise question in order to target the specific data we need.  That’s by necessity of course, because haphazardly interrogating the big data of traditional systems was either too costly, took too long, or drove your DBAs to drink before noon.  But now that we’ve tamed the big-data beast, we are free to unleash the creative side of our exploration with (relative) impunity.

Of course, we need some sort of context for our questions, says Dunn, but ultimately new business insights are being had by more creative, experimental, and iterative approaches that were never available before.  So instead of settling for a sample, we can see everything.  Instead of limiting the history to 5 years, we can look back 50 years.  Instead of sticking to one data source, we can combine datasets that previously sent sparks flying onto the server room floor.  This new way of thinking lets the data drive what the questions should be, not the other way around.  And the insights that follow can be powerful.

This isn’t a zero sum game

One participant asked how to view the interplay between Hadoop and other data warehouse and appliance projects.  Essentially, why are we not dropping our expensive storage and moving it all to Hadoop?  It’s true that Hadoop can be much cheaper, however it’s not capable of solving all the problems well.   Dunn explained that he doesn’t see Hadoop completely replacing existing data warehouse systems in the near term. Rather, he suggests allowing Hadoop to take on what it does best, and free the data warehouse or appliance to take on new projects for which there had been no room before.

A good use case involves using Hadoop for early exploration and allowing the data to drive the subsequent exploration.  As a scientific endeavor.  A playpen.  Then when you need to operationalize the results and put them into real-time action, an appliance vendor like Teradata can move into the fore.  This is a beautiful use case that shows a compliment of early exploratory analysis using commodity big data implementations, advanced analytics, and then operationalizing the insights for action. So, the game becomes more purpose-based use of these systems, not one displacing the other.   With the explosion of what we can do with them, in the end, “all boats will rise,” Dunn says.

Start small, start now!

Hadoop is a powerful tool, but with powerful tools we tend to think there needs to be a carefully thought out plan, complete with project managers, coffee stained shirts, and two years of strategy meetings.  That’s not the way to think of your early plans with Hadoop, Dunn explained.  He says simply:  Do it now, and start small.  Get yourself in the game.  Commandeer a handful of nodes, grab a distribution, and get moving.   From the business perspective, he says to pick a key area near and dear to your organizations heart, or a particularly sticky business pain, and solve it.  This is how Hadoop has gained traction, visibility, and has turned ordinary business analysts into heroes (my words, not his).  Once you gain traction, then it’s time to explore broader use cases.  In Dunn’s experience, looking for just a few key use cases for Hadoop has often led to exponentially more.

It was a great discussion and indeed felt as if we are solving age-old business problems with a complete set of new eyes.  So what are you waiting for?  Get started now!

tags: big data, Hadoop, high-performance analytics

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>