Turning big data (volume, variety, and velocity) into value

Today, more companies are offering more products in more markets in more currencies to more customer segments than ever before. The result? An exponential explosion in data covering virtually every aspect of the organization: sales, marketing, finance, manufacturing, legal, HR, and more. Unquestionably, the era of “big data” has arrived. Does that mean the business world has outgrown the value and ability of traditional analytics? Absolutely not. But the emerging discipline of high-performance analytics offers businesses new opportunities to derive greater value from those growing mountains of data: more insights in less time at less cost.

Before we examine the principles and components of high-performance analytics, it’s helpful to first define what we mean by big data. Typically, when most people hear the term, they understandably envision massive data centers that house terabytes and petabytes of transactional data. However, that only captures one (albeit, certainly important) dimension. What really defines and describes big data are the commonly referred to three V’s:

Volume – It’s called “Big Data” for a reason. Every day, the world creates 2.5 quintillion bytes of data[1] and storage capacity has doubled roughly every three years since the 1980s.[2] The volume of data is quickly outstripping the compute capacity of many corporate IT departments.
Velocity – How quickly does the data move across the enterprise when you need to make a decision?
Variety – Big data means much more than rows and columns. It means unstructured text, video, audio that can have important impacts on company decisions – if it’s analyzed properly in time.

The three components of high-performance analytics

When the volume, velocity, and variety of big data exceed the organization’s storage or compute capacity, it prevents the company from transforming data into the information we need to achieve value-producing insights. That’s where high-performance analytics (HPA) enters the picture. The fact is, most organizations today don’t have the infrastructure to process all of their data – they can use only small portions. They suffer from a data overload because they don’t have the time or capacity to analyze and extract all of the value from their data. HPA closes that “gap” by enabling the organization to analyze more data in less time using three key technologies: grid computing, in-database processing, and in-memory processing. So, big data isn’t really a problem, it’s a symptom. A symptom of an organization’s ability to store or compute results in a timely fashion.

1. Grid computing

Although companies have made – and will continue to make – enormous investments in their computing infrastructures for hardware, software, storage, and networking, there are typically large imbalances in utilization. In many instances, resources are provisioned according to business function (i.e. “the marketing server”) rather than resource needs. As a result, some servers are at capacity while others are idle. Grid computing tools intelligently divide up a compute job and apportion it across all available resources, regardless of location or nominal “ownership.” Those components of the work get processed in parallel by the multiple available computers, resulting in far greater speed and performance. In some cases, grid computing delivers exponentially faster results and makes previously infeasible computing jobs possible.

This means the enterprise can process far more data in its analytics initiatives – such as exploring data histories that stretch back much further, or analyzing more variables to create more accurate predictive models.

2. In-database processing

In traditional analytical processing, the IT organization typically takes the data – usually from a database – and brings it to the analytical application(s). With HPA, we reverse that equation and bring the analytical processing to the database itself. Most organizations define in-database processing as database integration. Database integration is leveraging the DBMS to do what it already does well: counts, sums, averages, minimums, maximums, standard deviations and other descriptive statistical functions that are refactored to pass that work onto the DBMS.

True In-Database processing is defined as extending the database to do things it does not inherently do. For example, databases do not have a “propensity to attrite” function. They don’t have a “propensity to pay” function. They don’t have a “next-best-offer” function. Well-developed in-database processing can take these types of predictive analytical models and turn them into functions inside the database and achieve significant improvements in processing time and take greater advantage of large capital investments in IT infrastructures.

3. In-memory processing

The third component of high-performance analytics is in-memory processing. This method divides analytic processes into easily manageable pieces with computations distributed in parallel across a dedicated set of processing instances (sometimes called “blades”). With in-memory processing, the cohabitation of big data and sophisticated analytical models are finally feasible. We can tackle sophisticated business challenges in timeframes that are practical.

Now, of course, all computing is done “in-memory.” What’s different with our approach is that we’re tapping into vast amounts of addressable memory distributed across many, many computers. The total amount of memory becomes so large that we can, for the first time, address high-end analytical challenges that were previously beyond anyone’s reach.

The software is optimized for distributed, multithreaded architectures and scalable processing, so you can run new scenarios or complex analytical computations blazingly fast. You can instantly explore and visualize data and tackle problems you never before considered due to computing constraints.

Using these breakthrough speed advantages, analysts can “crank the wheel” many more times – creating more models, running more scenarios, and analyzing far larger data sets. Instead of waiting days or weeks to run a complex analysis, you can use the power of high-performance analytics – driven by grid computing, in-database processing, and in-memory processing – to perform multiple sophisticated analyses that were previously not possible.

The impact of high-performance analytics

The impact of these complementary technologies is three-fold.

The IT Impact: IT now has the ability to provide never-before-seen service levels to both the analyst community and line of business.
The Analyst Impact: Analytics are brought to the forefront in the organization, allowing better (and faster) insights.
The Line of Business Impact: The result is decision-makers have faster access to a more comprehensive and more accurate set of facts and forecasts they can use to steer the business and drive value.

The bottom line, high-performance analytics allows organizations to:

Process BIGGER DATA
Derive BETTER ANALYTICS
Generate FASTER DECISIONS

You can hear more from me and other "big data" experts in this special 32-page report on high-performance analytics.

[1] http://www-01.ibm.com/software/data/bigdata/

[2] "The World’s Technological Capacity to Store, Communicate, and Compute Information", Martin Hilbert and Priscila López (2011), Science (journal), 332(6025), 60-65. Free access: www.martinhilbert.net/WorldInfoCapacity.html

1 Comment

Doug Laney on April 13, 2012 5:12 pm

Great post Gary. And good to see the industry finally adopting the "3V"s of big data over 11 years after Gartner first published them. For future reference, proper attribution and a copy of the original article I wrote in 2001, see: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/. --Doug Laney, VP Research, Gartner, @doug_laney

Blogs