Big data: What the hellabyte?

0
Andrew McAfee
Andrew McAfee talking about Yottabytes of data

According to entrepreneur Gilad Elbaz, “The world is one big data problem.” And as a big data researcher and writer, Andrew McAfee is staking his career on it.

McAfee spoke to a crowd of business and industry executives at The Premier Business Leadership Series in Orlando, FL, to put the topic of big data into perspective:

How much data are we talking about?

According to McAfee, the metrics system is about to become insufficient to describe the amount of data in the world. In 1979 a company named Teradata was founded – branded to emphasize their ability to help companies deal with unimaginably (for the time) large amounts of data. That prefix, tera, worked for 29 years until 2008 when, according to Wired, we hit the Petabyte age, followed four years later by the proclamation of the Exabyte revolution. That lasted for about six months until Cisco, in analyzing router traffic, pronounced that we will reach the Zettabyte era by the end of 2015.  We’re about to run out of prefixes.  Next is Yotta -- and after that we're going to have to make up some new ones.

The point is, we've always had data -- what's new is the amount of data we now have.  And it’s becoming clear: No corner of the business world is going to be untouched by big data. Here are a few examples of problems that big data is helping solve:

Forecasting: We would all like to be able to predict demand, future staffing needs, how our markets will evolve, and more. For example, the National Association of Realtors works hard to predict housing price using a number of traditional factors and models. But what happens if we try something completely new? Researchers Erik Brynjolfsson and Lynn Wu created a model to predict housing price changes taking into account more recent and much larger amounts of data…and actually ended up being 23.6%  more accurate than the NAR’s analysis.

Talent management: Google's the king of data and the data-driven approach, right? Well, early on job interviews at Google consisted of brain teasers like, how many golf balls can you fit on a bus? They hired based on how well the interviewee answered the brain teasers. After doing this for years, they decided to analyze how well the brainiacs performed on the job. Turned out that there was no correlation between succeeding in the brain teaser interview and job performance.  After that analysis, they decided to take a quantitative approach to hiring.

Trouble shooting: During the cholera outbreak after Haiti's massive earthquake in 2010, it became critical to figure out the root cause of the outbreak.  Two streams of data became important: Twitter and genomic data. Twitter provided a faster and more accurate detection of the start of the epidemic than government reports.  There was a hypothesis that the source was a camp of Nepalese UN workers who came to help. A group of researchers used genome sequencing to study the prevalent strain of cholera, and it was an exact match for the strain of Cholera in Nepal, proving the validity of the Twitter data, and ending the controversy.

Challenges ahead?

What will the future challenges be? What will the roadblocks be as we try to become a big data organization? McAfee discusses two in particular:

Technical challenges, skillsets, and cultural change: Academia is failing at producing enough data scientists, but in the realm of cultural change, McAfee  offers a warning: Beware of HiPPOs. The Highest Paid Person’s Opinion is a strategy that most companies employ to make their decisions.  The results of an analysis are presented to the leader – who then, in turn, still makes a decision based on gut feeling than based on the data. It is a prevalent approach that is broken and needs to change.

HiPPOs vs. geeks
As a self-identified “geek,” McAfee says these data stewards “go where the data leads them.” HiPPOs, on the other hand, lead and decide based on instinct and experience.

The wine industry, as an example, is HiPPO-dominated.  Professor Orley Ashenfelter, a Princeton economist, devised a mathematical formula for predicting the quality of red wine vintages in France. He based his ratings on nothing but calculations of winter rain, harvest rain, and the average growing season temperature -- no tasting involved.  The wine industry laughed when he published his results. But Ashenfelter predicted the great vintages of the century … and he was “astonishingly accurate.”  The HiPPOs of the wine industry still ridicule Ashenfelter's methods, but they use his data.

So, if Cisco is right and we’ve already reached the Zettabyte era, and the Yottabyte era is within site, what’s after that?  McAfee’s tongue-in-cheek prediction: The Hellabyte?

Visit the Business Analytics Knowledge Exchange to learn about calculating the value of big data.

Share

About Author

Anne-Lindsay Beall

Senior Editor

Anne-Lindsay Beall is a writer and editor for SAS. Since joining the company in 2000, Anne-Lindsay has edited print publications, Web sites, customer success stories, blogs and digital publications. She has a bachelor’s degree in English from the University of North Carolina at Chapel Hill and a master’s degree in English from North Carolina State University. You can find her on LinkedIn at: www.linkedin.com/in/annelindsaybeall

Comments are closed.

Back to Top