What kind of big data problem do you have?

Does it seem like almost everything is a “big data” problem right now? And nearly every vendor is offering big data or big analytics solutions? Is big analytics more important than big data? And what is the difference? I've encountered this confusion in the market a lot over the last year as I’ve traveled the globe talking to business and government leaders about big data.

In the process of explaining the market to others, I've come up with a clearer way to understand the landscape. This explanation has helped a lot of businesses understand what type of analytic problems they actually have, and sometimes it helps them see that their problems are more of the big analytics variety instead of the standard issue of big data alone.

Sometimes, for example, you don’t have that much data but it’s still taking you five hours to run a marketing optimization job because of the number of possible offers. There really aren’t a lot of records but you have to do multiple passes on the data, running complex algorithms with each step. That’s a big analytics problem and not just a big data problem.

Let’s dig into those differences a bit further.

Our first step is to revisit the distinction we’ve made over the years between reactive and proactive analytics. Standard business reports, ad hoc reports, OLAP and even alerts and notifications based on analytics are in the reactive category. Now, reactive analytics can still be very useful. They’re required for a lot of finance and regulatory reporting, and they help business users perform ad hoc analysis every day, but they are ultimately informing you about the past.

Proactive analytics like optimization, predictive modeling, forecasting and statistical analysis, however, are forward looking. They allow you to identify trends, spot weaknesses or determine conditions for making decisions about the future. They include optimization of complex problems with many dependencies, predictive modeling, regression analysis and other advanced methods for proactive decision making.

FIGURE 1: Reactive and proactive analytics

The next thing we need to define is big data. Put simply, when you have exceeded the capacity of conventional database systems, you’re dealing with big data. Before that, it’s what I like to call “growing data." It is still a large amount of data but it hasn’t hit the limitations seen with big data.

Today, we can store lots and lots of data but processing times have become excessive because traditional storage environments are not conducive for proactive analytics. When you have reached a point where processing times become unacceptable, you may be dealing with big data sizes but you may also be dealing with big analytics.

To better understand the difference, let’s create a chart with reactive and proactive analytics on the Y axis and the size of the data on the X axis, like this:

Figure 2: Data size and analytic competence

 

Now we can see the four major types software solutions available in the analytics market today. They are:

Business Intelligence (BI). If you are dealing with a large amount of data and providing reporting capabilities for end users so they can gain access to information, summarize data and even drill down into that data themselves, you are dealing with business intelligence applications. These solutions provide a strong look at various performance aspects of the company that occurred in past. That is BI.  That is the lower left quadrant in Figure 2.

Big data BI. Now, when data gets bigger and you’re dealing with outside data sources or – as more companies are starting to see – you’re pulling in unstructured data, your problems are getting bigger. It’s taking users too long to get the information they need, or you’re having a hard time combining data sources fast enough to provide reports like you used to and you need technology that allows quick access to data – but you’re still providing reactive analytics. This is the most common big data scenario in the market right now, and most businesses are trying to solve this with SQL based solutions. That is big data BI. It is in the lower right quadrant of Figure 2.

Big analytics. As I mentioned before, it takes a different kind of analytics to support forward looking decisions. If you’re looking at customer preferences, markdown optimizations or fraud predictions, you need a different type of architecture. These problems typically involve growing data sizes and proactive analytics. Instead of the data size slowing you down, it’s the fact that you’re making multiple passes on data that may take hours and hours to get results, and you’re running advanced analytic calculations that take longer to process. Today, you need those answers in seconds or minutes. This is big analytics. It is located in the upper left quadrant of Figure 1.

Big data analytics. Now, what about organizations that have a whole lot of data and are dealing with proactive decision making? Here, we’re talking about hundreds of millions of SKUs across multiple retail stores. We’re looking at future sources of data too like telematics data in the auto industry, which can be useful for manufacturers and insurers. These are the types of problems most businesses really haven’t dealt with in past. And these aren’t small data problems. You don’t want to summarize that information. Manufacturers want to be predict safety problems before they impact customers and insurance companies want to adjust rate plans for the best drivers, for example. This is big data analytics. You’ll find it in the upper right corner of Figure 2.

My point here is not to say that one is better than the other, but they each do different things and they each require different architectures. As you look at what’s going on the market and in your business, understand the difference between each of these four areas and how the different problems can be solved.

Analytics continues to be a broad term in market but it’s worthwhile to look at the problems you are trying to solve and determine where you fall in this landscape. It will help determine what your next steps are in your big data journey.

I’ll be presenting these concepts in more detail later this week at The Premier Business Leadership Series. If you’re attending, stop by after the presentation and let me know if this is a useful breakdown for you. I’d love to hear your thoughts.

tags: big analytics, big data

8 Comments

  1. Posted October 8, 2012 at 3:25 pm | Permalink

    excellent article!

  2. Phil
    Posted November 4, 2012 at 12:40 pm | Permalink

    Very good and comprehensive arcticle. This cleared out many mis-perceptions.

    Can you give more concrete examples on what is proactive Big Data analytics?

    • Jim
      Posted November 8, 2012 at 6:28 pm | Permalink

      Another good example is in the area of fraud detection. Consider fraud in health care. It costs Americans billions of dollars. By using big data that contains patient interaction and claims, models can be created that can be used to expose fraudulent activity. There is a big payback here.

  3. Arpit
    Posted November 21, 2012 at 9:07 am | Permalink

    Hello Jim,

    Great article! Helped to understand the space in an organized manner.

    You mention that the four types in the quadrant each do different things and they each require different architectures.

    Could you please elaborate that a little? And what should be the best way for a company to move from lower right quadrant to upper left quadrant considering the costs ?

    Would appreciate your insight.

  4. Naveen
    Posted March 20, 2013 at 9:17 am | Permalink

    Very succinct and clear article. Do you have something written in an equally clear style on architecture and data models for proactive analytics.

    Thanks Jim

    • Jim
      Posted March 23, 2013 at 10:42 am | Permalink

      Hi, Naveen. Thanks for reading and commenting. Our engineers have been working on optimizing data models and architectures for proactive analytics, and here are two important things they’re finding. 1) The fastest architectures for proactive analytics have a separate server that hosts the math and the data models. The data is passed to that server from your DW or your Hadoop clusters in parallel through a series of connections. 2) The models and algorithms that reside on that server are engineered to break hard math problems down into a series of parts, to work on those parts concurrently and to communicate between parts. Our VP of Big Data, Paul Kent is really good at explaining this architecture in detail, so we’ll work with him to get a longer, more detailed description in print.

  5. siva
    Posted March 28, 2013 at 1:54 am | Permalink

    It's very nice article, thank you jim.

  6. Tara Kumar
    Posted December 21, 2013 at 2:42 pm | Permalink

    Excellent and clear information. Thanks!!

5 Trackbacks

  1. [...] we discussed big data analytics here a few weeks ago, we talked about calculating high-performance marketing optimization jobs in [...]

  2. [...] the post, What kind of big data problem do you have? a reader asks: You mention that the four types in the quadrant each do different things and they [...]

  3. By The best of SAS blogs for 2012 - SAS Voices on December 27, 2012 at 5:23 pm

    [...] What kind of big data problem do you have? [...]

  4. [...] These are the features that others don’t have. The difference between an in-memory database and an in-memory analytic server is the difference between the top and bottom half of the grid I showed when talking about big data BI vs big data analytics. [...]

  5. By Tech watch 2014 - The Corner Office on January 22, 2014 at 11:49 am

    [...] area, you need to move past the high-level, reactionary articles on the topic, and start studying big data analytics and [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>