“What on earth is this elephant doing in our china shop?” This is what a surprised IT manager might utter on discovering a yellow elephant inside his carefully constructed information architecture landscape.
Next, he sees his rational data scientists feed the new pet with data coming from social media as well as unstructured documents and real-time sales data collected from the company’s online shop. The elephant also has a name: Hadoop.
What is this new zoo all about? Let us begin with a quick glance into the Enterprise Information Management scene.
Data warehouses have been built for 20 years. They are used for:
- Collecting data from different operative data sources
- Transform it into a structured format
- Produce new information to for decision-making for businesses or public organizations
The results are then processed with various Business Intelligence and advanced analytics solutions to make use in decision-making.
Data warehouses are typically based on relational databases that understand SQL. The S stands for “structured”. Hence, it is all about structural, i.e. numeric information, such as financial figures, sales information, production volumes or raw material costs.
However, we live in a world in which information is churned out from all directions and in all formats. Businesses want to know what consumers are saying about their products and services on Twitter and Facebook. They also want to understand why people prefer one product to another. On the other hand, businesses want to handle information in real time and be able to make decisions more swiftly. There is also a need to match operative data with clients’ contract information and other documents.
This is where the new elephant stomps in. Hadoop is a storage solution for handling varied information in different formats. It also provides a powerful platform for advanced analytics.
Despite its name, Big Data does not necessarily always stand for large amounts of data, but rather data in different structures, which has previously been impossible to combine in traditional relational databases. Fundamentally, Hadoop is also an Open Source solution, and therefore its arrival in your own zoo could be very cost efficient.
Hadoop also distributes the processing power needed in data handling within several computers. Hence, its maintenance and data scalability is more flexible than with traditional data warehouse solutions.
In other words, it provides better capacity, cheaper maintenance and more varied data handling.
How can the collected Big Data then be utilized?
The data contained in Hadoop can be utilized with advanced analytical solutions. By combining, for example, the purchasing behavior data from retailers with comments in social media, it will be easier to understand consumers’ preferences. With this information, marketing campaigns can focus more effectively on different geographical areas.
We will also be able to understand changes in massive sensor data within industrial equipment and anticipate maintenance requirements. With the aid of text analytics, we could analyze maintenance records stored in Hadoop as well as use them to optimize the resource usage, while improving customer service.
Data Mining methods, on the other hand, provide wide social network analysis to fraudulent individuals trying to obtain social security benefits.
There is enormous potential in advanced analytics, which are way beyond the uses that we are currently aware of.
Hadoop is the new generation of data management and warehousing. It is not a direct replacement of relational data warehouses, but it offers more options for organizations that want to take analytics to new levels.
Time will tell whether this elephant is able to move delicately in china shops, or whether it is needed to stump over some other methods that no longer serve organization’s information needs.
Henrikki Hervonen, Professional Services Director, Finland