This is the time of year when we like to make predictions about the upcoming year. Although I am optimistic about the potential of predictive analytics in the era of big data, I am also realistic about the nature of predictability regardless of how much data is used.
For example, in his book Too Big to Know, David Weinberger explained how “Thomas Jefferson and George Washington recorded daily weather observations, but they didn’t record them hourly or by the minute. Not only did they have other things to do, such data didn’t seem useful. Even after the invention of the telegraph enabled the centralization of weather data, the 150 volunteers who received weather instruments from the Smithsonian Institution in 1849 still reported only once a day.”
Nowadays there is, as Weinberger continued, “a literally immeasurable, continuous stream of climate data from satellites circling the earth, buoys bobbing in the ocean, and Wi-Fi-enabled sensors in the rain forest. We are measuring temperatures, rainfall, wind speeds, carbon dioxide levels, and pressure pulses of solar wind.”
Has all of this additional data, and our analysis of it, allowed us to reliably predict the weather?
No, of course not. But why? Does meteorological data suffer from data quality issues? No, the completeness and accuracy (and many other quality dimensions) of this data is astounding. So, is meteorological data not being delivered fast enough to support real-time data-driven decisions about weather forecasting? No, in fact, the velocity of this data is as about real-time as real-time gets.
So, it must be a decision-quality problem then, right? In other words, meteorologists must not know how to make high-quality decisions using all of that real-time high-quality meteorological data. Well, as much as we all like to complain about the ineptness of our local weather forecasters, meteorologists are actually well-trained, competent scientists performing numerical weather prediction using computer simulations built on complex mathematical models.
“Models this complex,” Weinberger explained, “often fail us, because the world is more complex than our models can capture. But sometimes they can predict accurately how the system will behave. At their most complex these are sciences of emergence and complexity, studying properties of systems that cannot be seen by looking only at the parts, and cannot be well predicted except by looking at what happens.”
This reminded me of an old statistics joke about mistaking a correlation for a cause, which says that the best predictive variable of whether it will rain is a spike in the sale of umbrellas. It’s a joke because obviously people buy more umbrellas when it’s already raining, so umbrella sales do not forecast rain.
Even though the business world will forever remain as predictable as the weather, predictive analytics can help your organization make more sense of big data and more reliably forecast your business performance this year and beyond. Just as long as you resolve to not forget that a forecast isn’t a fact and prediction isn’t clairvoyance.Learn how SAS can help you analyze streaming data and take action quickly
Pingback: Sometimes it’s okay to be shallow - The Data Roundtable