After Mike Olson sold his start up to Oracle, he wanted to do something other than databases. It was 2008 and there was a new open-source software called Hadoop that caught his eye. It was aimed at a new class of data problem, addressing the kind and amount and speed of data – i.e. big data, and the questions you’d like to ask of it.
Olson co-founded Cloudera to sell a version of Hadoop that’s packed with the features and support businesses need to get value out of big data (learn about Cloudera’s partnership with SAS). It’s a hot topic, and at The Premier Business Leadership Series in Orlando, Olson answered attendees’ questions – read the highlights of the Q&A below:
What is Hadoop?
Olson: This morning, Dr. Goodnight [co-founder and CEO of SAS] talked about this scale out, using a big grid and a lot of processors to attack data. That’s the central idea. Back in the day, you bought the biggest computer you could. You had structured data and a centralized system.
Now, data is doubling (or more) yearly. You can no longer buy one big server and rely on it to keep up with this level of data growth. All you can do is scale out. Now, what people do is buy a truckload of commodity servers, spread data out among them, and have a scaled out architecture. It’s what Google, Amazon and Facebook all run on.
What is big data?
Olson: There are a number of attributes. Volume matters. Or, you may not have that much data, but you want to ask really deep questions and build an aggregated view of your customers from all touch points for targeted offers – even if the data volumes are modest, the processing you want to do is challenging. Or, you may want to aggregate data from multiple sources going back 10 years – those are big data problems.
Is traditional database dead?
Olson: Absolutely not. Think about your data warehouse – there are things running in that box that only that box can handle. OLAP cubes won’t work on a shared architecture. What you can do is put copies of that data on the scaled out architecture and store and keep more data, relieve the pressure on the high-end expensive architecture and key jobs you need to have run.
We don’t see anyone shutting down those systems, but we’ve seen them rethink where they’re going to do the work. They’re also archiving data to the scaled out architecture.
If you’re looking at using Hadoop, make sure you use it in ways best practices suggest.
What activities are not appropriate for Hadoop?
Olson: It was never designed for high-performance transactions. Applications that demand that level of integrity and consistency ought to run on a different architecture.
How can I apply this in my organization?
Olson: I am a convert to the church of data. Data in the next 15 years will transform your business and society at large. Important cancers will be cured in our lifetime because we’ll be able to analyze cancers from outset and how they spread in the body in ways we haven’t been able to before. With the world’s growing population we’re going to have to produce more energy and distribute it better – data will allow us to do that.
Big data isn’t new. What’s new is the variety of data and longer timeframes of data you can collect – that allows for better performance and better insights to all the existing problems you’re solving with data.
What are the challenges from a people and cost perspective?
Olson: Cost is going down: You can build out this infrastructure more easily and cheaply. Instead of tens of thousands of dollars per terabyte of data, you spend hundreds per terabyte. One challenge is that the number and variety of shrink-wrapped apps that can tackle the problems we need to tackle is still too small, but it’s growing. We’re in the early days of this market.
We also need to skill up, but the way we’re going to get better at solving business problems is better software: more apps and tools to aid you in solving these problems. For example, determining the next best action for retail, in real time -- that app will be available. More and more apps will be available to make this easy. The fix will be better software, not more people.
What developments are you most excited about?
Olson: What we’re doing with SAS unlocks large amounts of power on lots of data. We’ve done a few things with SAS. For a long time, the SAS client could talk to wide variety of data sources. Our platform is now in that set – SAS can get data out of Cloudera so that you get instant data at scale.
Also, the SAS High-Performance Analytics server and SAS Visual Analytics software are now able to push analytics down onto that fabric. Why not send the PROCs down to the data and take advantage of all those servers to run analytics in parallel? It’s going to unlock a lot of value in the data.
Want to know more? Learn how to get started with Hadoop and download the white paper: How to Use Hadoop as a Piece of the Big Data Puzzle