So you’ve been monitoring or are already on the journey with Hadoop -- and you’re wondering: Where are we on the adoption curve compared to the market in general?
Based on my interactions with numerous companies, I want to share what I think that curve looks like so that you can orientate your organization and decide if you’re leading the way or lagging behind. Neither is inherently bad, but you do need to be conscious of where you are and why.
Stage 1: Monitoring
The monitoring stage is where you generally find two types of organizations:
- The first type are those who don’t believe they need to deal with big data. These companies might be struggling with other issues and feel that big data is just too far into the future -- or they believe they’ll never have the need to deal with big data in their market segment.
- The sccond type are those who believe there are other technologies more mature than Hadoop which can meet their needs. Sometimes a few people in such organizations have played with Hadoop and decided it doesn’t meet their needs or lacks specific capabilities. From that point on, it’s a case of keeping an eye on the market and how it’s developing.
The big questions for these organizations are: What would it take for them to look at Hadoop? Is that well defined and agreed upon? Are they ready to move if there’s a sudden need to spring into action?
Stage 2: Investigating
In this stage, organizations generally have a small group playing with Hadoop technology (when I say Hadoop technology, I mean the ecosystem -- not just Map/Reduce, etc.). The group experimenting with Hadoop is usually in IT, or is an IT-savvy group in a business unit.
In these organizations, there’s no real Hadoop mandate yet but, the investigations are designed to determine what Hadoop might be useful for or how to begin addressing big data challenges on the horizon.
In this case, companies are either using the free Hadoop Apache distribution download or one of the free downloads from an established commercial Hadoop Distribution vendor like Cloudera, Hortonworks or MapR.
All of the effort in this level is useful. It’s not a commitment to Hadoop, but it is building the skills and knowledge necessary to consider Hadoop’s IT/business implications -- and also to be ready to quickly move to the next level should the time come.
The big questions for these organizations are: What would it take for them to move beyond trials and into using Hadoop? Is that well-defined and agreed upon? Are they looking at more than just storing data? Are they also looking at how to utilize that stored data through visualization and analytics?
Stage 3: Implementing
In this stage, organizations have deployed a Hadoop cluster and have at least one project running on it. They’ve largely moved on from the Apache distribution because they needed additional capabilities offered by one of the commercial vendors such as support, back-up, management tools, other SQL data stores, etc.
The companies in this stage generally have up to three Hadoop projects, either in production, or close to it. Initial projects often focus on new business challenges, or on using data that was not previously accessible. Where possible, existing end user toolsets are used to minimize the need for training and maximize on delivering quick ROI on those early projects.
This phase is the riskiest one. If ROI is not delivered, the value of Hadoop can be undermined. At the same time, skills are at a premium and experimentation is likely happening as organizations build production projects out. At this stage you could say the bubble is at risk of bursting for some organizations. Few of the organizations I’m working with are past this phase yet.
Stage 4: Established
Established organizations have a number of projects in production with plans for many more and a large Hadoop cluster. Generally speaking, these companies will also be working on a broad enterprise architecture where Hadoop is taking on an increasingly important role in their five-year vision.
They’ll be working with a commercial Hadoop distribution to influence the development of features and functions they require to support their future architecture, and will be growing the size of their clusters rapidly. This is the group hoping for large returns from their investment both in savings, and also perhaps in disruptive market changes they’re trying to enable through the use of Hadoop and big data.
These advanced organizations will help drive requirements for the next generation of Hadoop -- all aligned to the business issues they need solving. If an industry is not well represented in this group, it’s possible that industry’s needs will take a back seat to those of more well-represented industries.
There are already many companies publicly moving forwards with Hadoop such as Yahoo, Home Depot, Rogers, Schlumberger, Barclays Bank, Symantec, Verizon, British Telecom, ING, Port of Rotterdam, British Airways, Truecar, EDF, Sanoma, Octo Technology, HSBC, Orange France, Shazam, CERN to name but a few.
You can see these companies, and more, on the websites of the commercial distributions such as Cloudera, Hortonworks and MapR or by reviewing the various recordings from Strata or Hadoop conferences. If you’re exploring Hadoop, it may be that now is the time to put your foot down and accelerate a little more.
These are some of my initial impressions. If you’re using Hadoop, do you fit into one of these buckets or is there another that I might be missing? Would you name the buckets differently? Any other characteristics you would add to any of the groups?