The Global Hadoop market was valued at $1.5 billion in 2012 and is expected to grow at a compound annual growth rate of 58.2 percent, to reach $50.2 billion by 2020, according to a Hadoop Market Analysis report prepared by Allied Market Research.
There is no doubt that IT teams are often taking the lead in driving the adoption of Hadoop, as they look to optimize spend and prepare for a future featuring a lot more data, but they are not alone. Increasingly, analysts and business users are starting to significantly influence that growth as the potential of Hadoop becomes clear to them.
Based on my interactions with analysts and business leaders, there are a four main drivers I often hear users talking about when it comes to why they want to see the adoption of Hadoop:
- Access to more data: Analysts and business users are looking to have quick access to more data, oftentimes not pre-aggregated, in order improve the accuracy of reports and analytical models. Access to more historical and granular data can help analysts to further tailor messages to a specific customer or market segment for example.
- Access to new data sources: Quick access to new data sources such as Social Media, Open Data, Dar Data has fast became the need of the day in order to better understand consumer behavior and market shifts. These new data sources require quicker on-boarding of the data to be useful for decision making purposes and in many cases business users and analysts are frustrated by their lack of access to these emerging data sources.
- Existing data warehouse projects are slower in on-boarding new data sources: Big data sources such as Web logs, sensor and machine data, social data, etc., are blending up with enterprise data for better & accurate business insights. However, many on-going DWH initiatives were or are focused on dealing with structured data. According to a recent Deloitte article in the Wall Street Journal, “90 percent of the data warehouses process just 20 percent of an enterprise’s data. Consequently, many enterprises have only been able to use their data warehouses for historical analysis and past performance reporting.” Fulfilling requirements of new big data sources for business insights are seeing longer turnaround cycles and this is frustrating business users and analysts.
- Self-service data discovery & analysis: Business users are looking to explore new data sources themselves for business insights so that they can quickly operationalize business decisions either for competitive advantage or in order to avoid monetary risks.
Challenges to adopting Hadoop
What is preventing Hadoop adoption? When you ask this simple question of companies who aren't using Hadoop, despite its growing mainstream adoption, you hear the following four things that business users and analysts are faced with:
- No direct access to data held in Hadoop: Business users or analysts generally don’t have direct access to the data stored in Hadoop. This limits the ability of business users when it comes topics such as data discovery or building models.
- Need for specialized skills to use data held in Hadoop: Specialized programming and query skills are required in order to retrieve data from environments such as Hadoop. What this means is that business users or analysts are dependent on IT to turnaround their requests. This makes the tasks of data discovery or modelling very cumbersome as data might need to be transferred to an environment that is readily accessible by the analysts with a structure that they can understand.
- Uncontrolled data quality: Since business users don’t have easy access to data inside Hadoop environment the task of validating the accuracy of data becomes even more difficult and has to be left primarily in the hands of IT workers who often do not understand the business context of all the data.
- Too much reliance on IT: Acquisition of data from new sources such as Hadoop has entirely become an IT exercise. This means business users are constrained to existing projects that IT are supporting when it comes to Hadoop rather than being able to innovate alone.
Hadoop for everyone
The first step in resolving these challenges is to allow non-technical and business users to perform lightweight data integration, data quality and data preparation tasks in Hadoop without requiring specialized skills or training. These simple capabilities would bring business users closer to unlocking the benefits of Hadoop. This could be accomplished through the following steps:
- Business user interface: A web-based business user interface would enable a self-service approach towards Hadoop data management. This interface should allow business users to query data, improve data quality, transform data and load into analytical applications with the click of button and without any IT involvement or knowledge of programming or SQL.
- Ability to create and run data transformations in Hadoop: This functionality should also provide options for selecting data sources and applying business rules or filters on the data they are most interested in to narrow down to what they need.
- Query access to Hadoop data: This should be supported through the click of a button to generate HiveQL or other code behind the scene. Such queries would deliver a summarized view of the data that would help business users or analysts in making quick business decisions without having to navigate the larger data sets.
- Profiling of Hadoop data: Generating standard data quality reports at the click of a button would boost the business confidence on the data inside a Hadoop environment. Metrics such as counts, unique patterns, null values, etc., can help users to select right data sources for analysis purposes without needing to involve IT.