It seems that we are only just getting going with Hadoop at many companies and the prophets of doom are starting to appear. Why is it that this technology, which promises so much, is predicted to cause so much pain?
In a December 2014 Wall Street Journal article, “The Joys and Hype of Software Called Hadoop,” research company Gartner predicts that “... through 2017, 60% of big-data projects will fail to go beyond piloting and experimentation and will be abandoned”.
More recently, a Datanami article entitled “Does Hadoop Need a Reality Check” states:
The truth about Hadoop–and big data analytics in general–is that it’s not easy. In addition to data science skills, which are in notoriously short supply, organizations need the engineering skills to bring all the proper technologies to bear in the proper amounts.
SAS also has been conducting research. One recent survey, sponsored in conjunction with Intel, took place in the Nordics, where we found that 35 percent of the respondents cited resources and competencies as an obstacle to Hadoop adoption.
Clearly, there is a skills issue preventing many companies from best exploiting the Hadoop platform.
In my opinion, large companies can afford to employ the (currently expensive) skilled resources they need to hack together the acquisition of data, the basic manipulation and visualization of that data and perform some limited analytics. It is for this reason that many larger businesses are already playing with Hadoop. The challenge for them will come later, when that undocumented plethora of programing and scripts get handed over from the contractors and becomes the company's problem to decipher. Sound familiar? We saw scripting and coding used for a long time in data warehousing until it became clear that it was very inefficient.
Could it be that many of the Hadoop projects predicted to fail by Gartner will fail due to a lack of transparency into what all these custom-developed programs are doing, along with the ensuing maintenance headache that comes after implementation? Maybe attrition of key staff to competitors will play a role as well.
Organizations that are not so large might try to battle through themselves using existing staff and online learning or temporary consultants. The risk in this scenario is that those organizations will suffer from inefficient development cycles, lots of frustrating missteps and slow time to market. Or they will hire great people at a significantly higher pay rate only to watch them churn to the large companies hungry for talent. Even if they manage to get something up and running, they will suffer from the same issues the large companies are already facing, and they will be at high risk if their few trained staff working around Hadoop leave.
What if your technologists could work with Hadoop without needing to be experts in HiveQL, Pig Latin, Sqoop, Oozie, MapReduce and the many other seemingly competing technologies?
We simply must find a way to enable Hadoop development that is more transparent and capture what has been done so that it can be changed without wading through thousands of lines of potentially undocumented code. Finally – we need to lower the skill level needed to enable the use of Hadoop, so that its benefits can be realized more widely. Without that, Hadoop is going to fail for many.
SAS already provides numerous products for Hadoop, with a focus on making it easy for all. And now we are pleased to add a new product called SAS® Data Loader for Hadoop. This product is designed to make it easy for you to get data into and out of Hadoop and to transform and check the quality of data in Hadoop. It essentially removes the complexity of getting the right data into the right format to get going with visualization or analytics on your Hadoop data.
The SAS Data Loader for Hadoop leverages technologies such as Oozie, Sqoop and HiveQL through a point and click interface that drives the environment. It also adds some SAS smarts that let you do data profiling and other key tasks that none of those technologies provide.
Would you like to try the SAS Data Loader for Hadoop for 90 days totally free? Why not? What have you got to lose? If you are interested in finding out more, visit our SAS Data Loader for Hadoop product page or stop by the stand at the Hadoop Summit in Belgium this week.