Challenges of moving Hadoop into production


businessman considers the challenges of moving Hadoop into productionIn 2016, I worked on a research study to explore challenges organizations had when moving Hadoop into production use. Many organizations were exploring how Hadoop can be used. And while we had gotten the impression from client conversations that the open source, high-performance platform ecosystem was often evaluated, there were not a lot of publicly reported success stories. That is, aside from the case studies provided by “the usual suspects” (Google, Yahoo, Facebook, LinkedIn – you know, those organizations that basically developed the Hadoop stack).

We learned a lot through the survey about anticipated benefits of using Hadoop, drivers of adoption, choice of Hadoop distribution, vendor preferences and satisfaction, corporate experience and staffing levels. But there were two findings of particular interest because they shed light on some of the challenges of actually moving a Hadoop prototype or proof-of-concept into production.

The first challenge – development

The first challenge of moving Hadoop into production has to do with development – in particular, the need to develop staff skills. We divided the pool of respondents according to the number of years of corporate experience their organization had using Hadoop. Across the board, of all the challenges we posed, the most frequently selected one was acquiring or developing skilled staff. When segmenting the results by years of corporate experience, it appeared that as the number of years the organization had been working with Hadoop increased, the emphasis on acquiring or developing skilled staff diminished but did not disappear. This makes sense, because the longer an organization has been working with Hadoop, the more likely their team members will have accumulated the right sets of skills.

A perception remains that because of easy accessibility, open source tools lower all the barriers to entry for adopting a technology. Yet for a novice, the Hadoop ecosystem can be a complex collection of components. It takes some time to understand all the details before designing and implementing a system that's production-ready. That means establishing a skilled team should be front and center when you're considering implementing Hadoop. This remains a challenge well into the technology adoption process, and it merits special attention as a prerequisite to the process.

The second challenge – integration

In our research, we asked about different facets of integrating Hadoop into the enterprise environment and the level of challenge respondents perceived on a scale ranging from very challenging to not challenging at all. At least 50% of those answering the questions about integration indicated that these functions were either very challenging or challenging. Respondents were also challenged when it came to integrating the existing enterprise data architecture with Hadoop along with other emerging data management environments such as columnar, in-memory databases, and NoSQL systems – as well as with necessary data management tasks such as ETL and data validation. Together, this suggests that practical aspects of integration and interoperability between Hadoop systems and the existing infrastructure still pose challenges to many organizations. For example, many struggle to understand how all the various components fit together.

I infer from this that most organizations with an existing enterprise infrastructure will face many roadblocks in effectively introducing innovative technologies like Hadoop that reflect a very different operational paradigm. Instituting proper training and planning, both from a systemic and an information management perspective, will help you balance the evaluation and testing of new technologies like Hadoop with practical production implementation.

In my next post, we'll look at some ideas that may help soften the challenges of migrating from a conventional environment to one based on Hadoop.

Download a free paper – Bringing the Power of SAS to Hadoop

About Author

David Loshin

President, Knowledge Integrity, Inc.

David Loshin, president of Knowledge Integrity, Inc., is a recognized thought leader and expert consultant in the areas of data quality, master data management and business intelligence. David is a prolific author regarding data management best practices, via the expert channel at and numerous books, white papers, and web seminars on a variety of data management best practices. His book, Business Intelligence: The Savvy Manager’s Guide (June 2003) has been hailed as a resource allowing readers to “gain an understanding of business intelligence, business management disciplines, data warehousing and how all of the pieces work together.” His book, Master Data Management, has been endorsed by data management industry leaders, and his valuable MDM insights can be reviewed at . David is also the author of The Practitioner’s Guide to Data Quality Improvement. He can be reached at

Leave A Reply

Back to Top