Twenty-five years ago (when I was 12 years old), we realized that data, across the corporation, was not integrated. Nor did our data let us predict the future by looking at the past. So we started creating these stores of historical data soon to be called “data warehouse.”
Here are some of the problems we faced:
- Database technology could not load, integrate and query as fast as we needed. So we changed how we designed and indexed the database to make it perform.
- Reporting tool technology was not easy to use, nor was it intuitive. So we wrote those reports, and did what we had to do to get the “information” in the hands of the business user.
- We programmed a LOT! Our ETL tools were not mature enough to do the job, so we used whatever software was at our disposal.
How does this compare to where Hadoop is today? Let’s see:
- The technology requires programmers to make most things happen on this platform, like reporting. But many vendors are working on new tools to use with Hadoop, as we speak.
- Inserting and uploading data into Hadoop is well on its way to maturity, but updating data is not so fast yet.
- Hadoop works really well for long-running queries where we don’t need the results for a long time, but it does not work so well for small queries.
- Hadoop works very well with diverse data sets of unstructured data, and can combine that with semistructured and structured data.
I believe Hadoop complements the data warehouse in its ability to do large-scale analytics from various types of data, and will keep moving forward. But for now, keep your data warehouse and bring in Hadoop functionality as deemed necessary by your organization.
After all, this is not just a new toy – it is an up-and-coming component of our enterprise information.