Does it seem like almost everything is a “big data” problem right now? And nearly every vendor is offering big data or big analytics solutions? Is big analytics more important than big data? And what is the difference? I've encountered this confusion in the market a lot over the last year as I’ve traveled the globe talking to business and government leaders about big data.
In the process of explaining the market to others, I've come up with a clearer way to understand the landscape. This explanation has helped a lot of businesses understand what type of analytic problems they actually have, and sometimes it helps them see that their problems are more of the big analytics variety instead of the standard issue of big data alone.
Sometimes, for example, you don’t have that much data but it’s still taking you five hours to run a marketing optimization job because of the number of possible offers. There really aren’t a lot of records but you have to do multiple passes on the data, running complex algorithms with each step. That’s a big analytics problem and not just a big data problem.
Let’s dig into those differences a bit further.
Our first step is to revisit the distinction we’ve made over the years between reactive and proactive analytics. Standard business reports, ad hoc reports, OLAP and even alerts and notifications based on analytics are in the reactive category. Now, reactive analytics can still be very useful. They’re required for a lot of finance and regulatory reporting, and they help business users perform ad hoc analysis every day, but they are ultimately informing you about the past.
Proactive analytics like optimization, predictive modeling, forecasting and statistical analysis, however, are forward looking. They allow you to identify trends, spot weaknesses or determine conditions for making decisions about the future. They include optimization of complex problems with many dependencies, predictive modeling, regression analysis and other advanced methods for proactive decision making.

FIGURE 1: Reactive and proactive analytics
The next thing we need to define is big data. Put simply, when you have exceeded the capacity of conventional database systems, you’re dealing with big data. Before that, it’s what I like to call “growing data." It is still a large amount of data but it hasn’t hit the limitations seen with big data.
Today, we can store lots and lots of data but processing times have become excessive because traditional storage environments are not conducive for proactive analytics. When you have reached a point where processing times become unacceptable, you may be dealing with big data sizes but you may also be dealing with big analytics.
To better understand the difference, let’s create a chart with reactive and proactive analytics on the Y axis and the size of the data on the X axis, like this:
Now we can see the four major types software solutions available in the analytics market today. They are:
Business Intelligence (BI). If you are dealing with a large amount of data and providing reporting capabilities for end users so they can gain access to information, summarize data and even drill down into that data themselves, you are dealing with business intelligence applications. These solutions provide a strong look at various performance aspects of the company that occurred in past. That is BI. That is the lower left quadrant in Figure 2.
Big data BI. Now, when data gets bigger and you’re dealing with outside data sources or – as more companies are starting to see – you’re pulling in unstructured data, your problems are getting bigger. It’s taking users too long to get the information they need, or you’re having a hard time combining data sources fast enough to provide reports like you used to and you need technology that allows quick access to data – but you’re still providing reactive analytics. This is the most common big data scenario in the market right now, and most businesses are trying to solve this with SQL based solutions. That is big data BI. It is in the lower right quadrant of Figure 2.
Big analytics. As I mentioned before, it takes a different kind of analytics to support forward looking decisions. If you’re looking at customer preferences, markdown optimizations or fraud predictions, you need a different type of architecture. These problems typically involve growing data sizes and proactive analytics. Instead of the data size slowing you down, it’s the fact that you’re making multiple passes on data that may take hours and hours to get results, and you’re running advanced analytic calculations that take longer to process. Today, you need those answers in seconds or minutes. This is big analytics. It is located in the upper left quadrant of Figure 1.
Big data analytics. Now, what about organizations that have a whole lot of data and are dealing with proactive decision making? Here, we’re talking about hundreds of millions of SKUs across multiple retail stores. We’re looking at future sources of data too like telematics data in the auto industry, which can be useful for manufacturers and insurers. These are the types of problems most businesses really haven’t dealt with in past. And these aren’t small data problems. You don’t want to summarize that information. Manufacturers want to be predict safety problems before they impact customers and insurance companies want to adjust rate plans for the best drivers, for example. This is big data analytics. You’ll find it in the upper right corner of Figure 2.
My point here is not to say that one is better than the other, but they each do different things and they each require different architectures. As you look at what’s going on the market and in your business, understand the difference between each of these four areas and how the different problems can be solved.
Analytics continues to be a broad term in market but it’s worthwhile to look at the problems you are trying to solve and determine where you fall in this landscape. It will help determine what your next steps are in your big data journey.
I’ll be presenting these concepts in more detail later this week at The Premier Business Leadership Series. If you’re attending, stop by after the presentation and let me know if this is a useful breakdown for you. I’d love to hear your thoughts.



7 Comments
excellent article!
Very good and comprehensive arcticle. This cleared out many mis-perceptions.
Can you give more concrete examples on what is proactive Big Data analytics?
Another good example is in the area of fraud detection. Consider fraud in health care. It costs Americans billions of dollars. By using big data that contains patient interaction and claims, models can be created that can be used to expose fraudulent activity. There is a big payback here.
Hello Jim,
Great article! Helped to understand the space in an organized manner.
You mention that the four types in the quadrant each do different things and they each require different architectures.
Could you please elaborate that a little? And what should be the best way for a company to move from lower right quadrant to upper left quadrant considering the costs ?
Would appreciate your insight.
Very succinct and clear article. Do you have something written in an equally clear style on architecture and data models for proactive analytics.
Thanks Jim
Hi, Naveen. Thanks for reading and commenting. Our engineers have been working on optimizing data models and architectures for proactive analytics, and here are two important things they’re finding. 1) The fastest architectures for proactive analytics have a separate server that hosts the math and the data models. The data is passed to that server from your DW or your Hadoop clusters in parallel through a series of connections. 2) The models and algorithms that reside on that server are engineered to break hard math problems down into a series of parts, to work on those parts concurrently and to communicate between parts. Our VP of Big Data, Paul Kent is really good at explaining this architecture in detail, so we’ll work with him to get a longer, more detailed description in print.
It's very nice article, thank you jim.
4 Trackbacks
[...] we discussed big data analytics here a few weeks ago, we talked about calculating high-performance marketing optimization jobs in [...]
[...] the post, What kind of big data problem do you have? a reader asks: You mention that the four types in the quadrant each do different things and they [...]
[...] What kind of big data problem do you have? [...]
[...] These are the features that others don’t have. The difference between an in-memory database and an in-memory analytic server is the difference between the top and bottom half of the grid I showed when talking about big data BI vs big data analytics. [...]