Will Hadoop replace the data warehouse?


server room

Twenty-five years ago (when I was 12 years old), we realized that data, across the corporation, was not integrated. Nor did our data let us predict the future by looking at the past. So we started creating these stores of historical data soon to be called “data warehouse.”

Here are some of the problems we faced:

  • Database technology could not load, integrate and query as fast as we needed. So we changed how we designed and indexed the database to make it perform.
  • Reporting tool technology was not easy to use, nor was it intuitive. So we wrote those reports, and did what we had to do to get the “information” in the hands of the business user.
  • We programmed a LOT! Our ETL tools were not mature enough to do the job, so we used whatever software was at our disposal.

How does this compare to where Hadoop is today? Let’s see:

  • The technology requires programmers to make most things happen on this platform, like reporting. But many vendors are working on new tools to use with Hadoop, as we speak.
  • Inserting and uploading data into Hadoop is well on its way to maturity, but updating data is not so fast yet.
  • Hadoop works really well for long-running queries where we don’t need the results for a long time, but it does not work so well for small queries.
  • Hadoop works very well with diverse data sets of unstructured data, and can combine that with semistructured and structured data.
SAS and Hadoop paper
SAS and Hadoop paper

I believe Hadoop complements the data warehouse in its ability to do large-scale analytics from various types of data, and will keep moving forward. But for now, keep your data warehouse and bring in Hadoop functionality as deemed necessary by your organization.

After all, this is not just a new toy – it is an up-and-coming component of our enterprise information.

Try SAS Data Loader for Hadoop free for 90 days.


About Author

Joyce Norris-Montanari

President of DBTech Solutions, Inc

Joyce Norris-Montanari, CBIP-CDMP, is president of DBTech Solutions, Inc. Joyce advises clients on all aspects of architectural integration, business intelligence and data management. Joyce advises clients about technology, including tools like ETL, profiling, database, quality and metadata. Joyce speaks frequently at data warehouse conferences and is a contributor to several trade publications. She co-authored Data Warehousing and E-Business (Wiley & Sons) with William H. Inmon and others. Joyce has managed and implemented data integrations, data warehouses and operational data stores in industries like education, pharmaceutical, restaurants, telecommunications, government, health care, financial, oil and gas, insurance, research and development and retail. She can be reached at jmontanari@earthlink.net.

Leave A Reply

Back to Top