Big data defined: It's more than Hadoop

9

Volume, Variety, Velocity – how many times have you heard that lately? The “3 V’s” are commonly used to describe big data by various vendors and analysts.

  • Forrester extends this by introducing “Variability” as a 4th V – this addresses the fact that you need to design for agility due to the fact that things will come along tomorrow that you can’t anticipate.
  • Gartner looks at Velocity a little differently, instead of the velocity of data coming at you, it’s more the change in velocity. In many cases, big data doesn’t come at you in a consistent fashion – you have peaks and troughs, and you need to properly plan for this variability or you will have wasted capacity or overwhelmed capacity.

Other definitions focus on volume – getting into specific terabyte, petabyte counts, or focusing on the fact that regardless of technology advances, the ability to store and process big data will become overwhelmed.

SAS leverages a fairly simple definition…

Big data:  When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making

Big data is a relative term – every organization has a tipping point, most organizations will reach a point that the volume, variety and velocity will be something that they have to address. More importantly, every organization has an opportunity to leverage big data to their advantage – to drive accurate and timely decisions that can materially impact their business or organizational goals. As one of my colleagues puts it “it’s the value stupid”. At SAS, we tie that value to analysts: Big Data Analytics. We realize that big data presents an opportunity for every organization, it’s not just large, multi-nationals, it’s not just about Hadoop, it’s not “one size fits all”.

So why is the definition important? The definition can orient you correctly when you start thinking about requirements and design. It can provide a starting point for the big data discussion. Things to consider:

  • “It’s the value stupid” – although it’s interesting to technical discussion to focus on size, the focus should be on business value. Identify the business challenge or goal, do you have a need to leverage weblog and social media to analyze customer churn, do you need to strengthen your fraud analysis approach by mining clickstream and other forms of content? Focus on the business value for alignment with your technical and solution approach.
  • “Think big” – from a design perspective, think about the big picture. You certainly don’t need to take a big bang approach in terms of implementation, but leverage enterprise architecture principles to ensure that you don’t box yourself into a corner.
  • “It’s not just about the elephant” - Hadoop is getting an amazing amount of attention, and it’s a great technology that could play a big part in your solution, but your big data strategy should not be defined by Hadoop, but Hadoop should be another tool in your big data arsenal.
  • “Analytics is the key” – in most cases, we think about leveraging information management technologies like data integration, data quality to prepare data for analytics. Although this is certainly important, the big game changer is how you can apply analytics to the entire big data process: leveraging analytics up-front to determine what to do with big data, determining which data is relevant, how or whether data should be stored, etc. This just scratches the surface, so we’ll explore this topic in future blog posts.

What’s your definition of big data?

Share

About Author

Mark Troester

IT / CIO Thought Leader & Strategist

Mark Troester is the IT / CIO Thought Leader & Strategist for SAS. He oversees the company’s market strategy efforts for information management and for the overall CIO and IT vision. He began his career in IT and has worked in product management and product marketing for a number of Silicon Valley start-ups and established software companies. Twitter @mtroester

Back to Top