Big data defined: It's more than Hadoop

Volume, Variety, Velocity – how many times have you heard that lately? The “3 V’s” are commonly used to describe big data by various vendors and analysts.

  • Forrester extends this by introducing “Variability” as a 4th V – this addresses the fact that you need to design for agility due to the fact that things will come along tomorrow that you can’t anticipate.
  • Gartner looks at Velocity a little differently, instead of the velocity of data coming at you, it’s more the change in velocity. In many cases, big data doesn’t come at you in a consistent fashion – you have peaks and troughs, and you need to properly plan for this variability or you will have wasted capacity or overwhelmed capacity.

Other definitions focus on volume – getting into specific terabyte, petabyte counts, or focusing on the fact that regardless of technology advances, the ability to store and process big data will become overwhelmed.

SAS leverages a fairly simple definition…

Big data:  When volume, velocity and variety of data exceeds an organization’s storage or compute capacity for accurate and timely decision-making

Big data is a relative term – every organization has a tipping point, most organizations will reach a point that the volume, variety and velocity will be something that they have to address. More importantly, every organization has an opportunity to leverage big data to their advantage – to drive accurate and timely decisions that can materially impact their business or organizational goals. As one of my colleagues puts it “it’s the value stupid”. At SAS, we tie that value to analysts: Big Data Analytics. We realize that big data presents an opportunity for every organization, it’s not just large, multi-nationals, it’s not just about Hadoop, it’s not “one size fits all”.

So why is the definition important? The definition can orient you correctly when you start thinking about requirements and design. It can provide a starting point for the big data discussion. Things to consider:

  • “It’s the value stupid” – although it’s interesting to technical discussion to focus on size, the focus should be on business value. Identify the business challenge or goal, do you have a need to leverage weblog and social media to analyze customer churn, do you need to strengthen your fraud analysis approach by mining clickstream and other forms of content? Focus on the business value for alignment with your technical and solution approach.
  • “Think big” – from a design perspective, think about the big picture. You certainly don’t need to take a big bang approach in terms of implementation, but leverage enterprise architecture principles to ensure that you don’t box yourself into a corner.
  • “It’s not just about the elephant” - Hadoop is getting an amazing amount of attention, and it’s a great technology that could play a big part in your solution, but your big data strategy should not be defined by Hadoop, but Hadoop should be another tool in your big data arsenal.
  • “Analytics is the key” – in most cases, we think about leveraging information management technologies like data integration, data quality to prepare data for analytics. Although this is certainly important, the big game changer is how you can apply analytics to the entire big data process: leveraging analytics up-front to determine what to do with big data, determining which data is relevant, how or whether data should be stored, etc. This just scratches the surface, so we’ll explore this topic in future blog posts.

What’s your definition of big data?

tags: analytics, big data, data storage, data volumes, hadoop, Mark Troester

9 Trackbacks

  1. [...] Blogs Home > Information Architect > Hadoop usage scenarios: How will you leverage it? « Big data defined: It's more than Hadoop Big Data why now? » Hadoop usage scenarios: How will you leverage it? Mark Troester|November [...]

  2. [...] “Big Data will be the single biggest challenge for IT over the next 5 years” – as always, Mark Beyer was interesting and somewhat provocative. You might argue whether big data will have that type of impact, but this was one of many references throughout the conference to information management and analytics. Mark advocated that organizations should manage information as an asset and that IT has an opportunity to lead this effort. Managing information as an asset using a service based approach, organizations can optimize their processes and systems in a way that can lead to new revenue streams and business models. He also noted how data warehouse and analytic infrastructures need to be fully enabled as “information-processing platforms” that can handle big data. This drives the need for an extreme information management approach that goes beyond a product focused data management approach – this aligns well with the SAS approach to data and information. [...]

  3. [...] hype about big data has played a significant role in driving awareness about the value of analytics. SAS welcomes the [...]

  4. By IT considerations for analytics in 2012 on November 28, 2011 at 2:43 pm

    [...] Big Data: Start a pilot project leveraging big data possibly with unstructured text and Hadoop - We’ve [...]

  5. [...] Big data: Volume, velocity, variety – big data is relative, but all organizations need to think about the entirety of the data is at their disposal and how the requirements for storage, scale, processing, etc., will multiply in the future. [...]

  6. By Is big data over hyped? - The Corner Office on February 10, 2012 at 2:22 pm

    [...] there’s no bubble here. You can’t put the data back in the bottle, so to speak. Big data – however you define it – isn’t going away and it isn’t getting smaller. It’s going to keep [...]

  7. By SAS: Big Play for Hadoop - Big Data Analytics on March 5, 2012 at 4:38 pm

    [...] SAS Big Data Analytics (it’s not just Hadoop) [...]

  8. By SAS: Big play for Hadoop - Information Architect on March 6, 2012 at 10:57 am

    [...] SAS Big Data Analytics (it’s not just Hadoop) [...]

  9. [...] it comes to big data, it’s not just about volume. As evident with data quality, many of the considerations are [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>