In 2016, I worked on a research study to explore challenges organizations had when moving Hadoop into production use. Many organizations were exploring how Hadoop can be used. And while we had gotten the impression from client conversations that the open source, high-performance platform ecosystem was often evaluated, there were not a lot of publicly reported success stories. That is, aside from the case studies provided by “the usual suspects” (Google, Yahoo, Facebook, LinkedIn – you know, those organizations that basically developed the Hadoop stack).
We learned a lot through the survey about anticipated benefits of using Hadoop, drivers of adoption, choice of Hadoop distribution, vendor preferences and satisfaction, corporate experience and staffing levels. But there were two findings of particular interest because they shed light on some of the challenges of actually moving a Hadoop prototype or proof-of-concept into production.
The first challenge – development
The first challenge of moving Hadoop into production has to do with development – in particular, the need to develop staff skills. We divided the pool of respondents according to the number of years of corporate experience their organization had using Hadoop. Across the board, of all the challenges we posed, the most frequently selected one was acquiring or developing skilled staff. When segmenting the results by years of corporate experience, it appeared that as the number of years the organization had been working with Hadoop increased, the emphasis on acquiring or developing skilled staff diminished but did not disappear. This makes sense, because the longer an organization has been working with Hadoop, the more likely their team members will have accumulated the right sets of skills.
A perception remains that because of easy accessibility, open source tools lower all the barriers to entry for adopting a technology. Yet for a novice, the Hadoop ecosystem can be a complex collection of components. It takes some time to understand all the details before designing and implementing a system that's production-ready. That means establishing a skilled team should be front and center when you're considering implementing Hadoop. This remains a challenge well into the technology adoption process, and it merits special attention as a prerequisite to the process.
The second challenge – integration
In our research, we asked about different facets of integrating Hadoop into the enterprise environment and the level of challenge respondents perceived on a scale ranging from very challenging to not challenging at all. At least 50% of those answering the questions about integration indicated that these functions were either very challenging or challenging. Respondents were also challenged when it came to integrating the existing enterprise data architecture with Hadoop along with other emerging data management environments such as columnar, in-memory databases, and NoSQL systems – as well as with necessary data management tasks such as ETL and data validation. Together, this suggests that practical aspects of integration and interoperability between Hadoop systems and the existing infrastructure still pose challenges to many organizations. For example, many struggle to understand how all the various components fit together.
I infer from this that most organizations with an existing enterprise infrastructure will face many roadblocks in effectively introducing innovative technologies like Hadoop that reflect a very different operational paradigm. Instituting proper training and planning, both from a systemic and an information management perspective, will help you balance the evaluation and testing of new technologies like Hadoop with practical production implementation.
In my next post, we'll look at some ideas that may help soften the challenges of migrating from a conventional environment to one based on Hadoop.
Download a free paper – Bringing the Power of SAS to Hadoop