Having spent a quarter of a century working on databases and on database-related technologies, I have developed an aura of skepticism on any new product that hits the market being presented as the best thing we have ever seen. It’s not that I love to revel in “I told you so” moments, it’s just that I have seen too many products fly high in the sky only to disappear like meteors.
For many, Hadoop’s entrance into the database field meant that technology had finally come up with the only possible instrument equipped with a framework capable of handling “big data.” On top of that, its affordability unequivocally meant that the end was in sight for traditional relational databases that had so far dominated the scene. Today, after much time and effort spent on integrating Hadoop in their environments, many of the companies that were quick to jump on its bandwagon are discovering that despite having an important role in their infrastructure, Hadoop is not the Godsend answer than many thought it would be.
Why is that? The explanation is simple. At the end of the day, Hadoop is another technological tool, just like its relational database counterparts. On the other hand, big data is not about technology, but rather about business needs. This means that Hadoop shouldn’t be considered as the sole player in the field of data analysis. For example, it makes sense to use Hadoop to run broad exploratory analysis of large data, but a relational database is still a better option to perform an operational analysis of what was uncovered. Hadoop is also good for looking at the lowest level of detail in a data set, but relational databases make more sense when it comes to storing transformed and aggregated data. As the Facebook analytics Chief Ken Rudin puts it, “you need to use the right technology to fit your business needs.”
A recent survey commissioned by an IT company, found that more than 30% of the companies interviewed had already deployed Hadoop, with an additional 30% having plans to deploy it within 12 months. Something interesting that came out of the survey was the fact that the majority of these companies planned to combine Hadoop’s data analysis capabilities with the ones provided by other databases that were already integrated in the companies infrastructures. According to the study, the goal was and still is to use Hadoop to perform raw data analysis, while using traditional databases to take care of non-analytic workloads, especially transaction-oriented ones, and perform data analysis on aggregated data coming from Hadoop.
Take eBay, for example. The San Jose, Calif.-based company’s three-tier data analytics approach is an example of the kind of role Hadoop can find within an organization alongside other traditional relational databases. Structured data resides in the first tier, an enterprise data warehouse that is used for daily housekeeping items, such as feeding business intelligence dashboards and reports. The second tier consists of a Teradata data management platform that is used to store huge amounts of semi-structured information. Fully unstructured data such as textual information lives in the third tier, a Hadoop cluster reserved for deeper research, analysis and experimentation.
The moral of the story is that Hadoop is not a synonym for big data, but one of the many players you need to mine and analyze your data. A good reason to hang on to those other databases a little longer.
I’ll be talking about big data and Hadoop at Analytics 2014 along with Josh Wills from Cloudera and my SAS colleagues Wayne Thompson and Kelly Hobson. Check out our panel presentation and round table discussion on Hadoop. We hope to see you there!
- Panel discussion with SAS and Cloudera on Big Data and Hadoop: Moving beyond the hype to realize your analytics strategy with SAS® - Monday, October 20, 3:00-3:50 pm
- Round Table discussion on Practical Considerations for SAS Analytics in a Hadoop Environment – Tuesday, October 21, 12:30-1:45 pm
You can also check out our starter services on Visual Analytics and Visual Statistics and the Expert Exchange for Hadoop.