The Big Data Theory


In 1964, when the American radio astronomers Arno Penzias and Robert Wilson were setting up a new radio telescope at AT&T Bell Labs, they decided to point it towards deep space where they expected a silent signal that could be used to calibrate their equipment. Instead of silence, however, what they heard was a persistent noise, a seemingly meaningless background static that they initially mistook as an indication their telescope was faulty equipment in need of repair.

For almost a year, they functioned off this assumption. At one point, they pondered if the cause of the static might be the excessive amount of pigeon poop accumulating on their telescope. But even after spending a month meticulously cleaning it, when they pointed the telescope towards deep space, once again they heard the same persistent noise. (At which point, although it is not included in the official scientific record, I like to imagine that much stronger language than “poop” was uttered.)

However, after analyzing what they initially thought was the crappiest possible data produced by a broken telescope, they challenged their own assumptions. By doing so, they discovered what was data of the highest possible quality. It revealed, in a classic example of mistaking signal for noise, one of the greatest scientific breakthroughs of twentieth-century physics.

Arno Penzias and Robert Wilson won the 1978 Nobel Prize in Physics for discovering what’s now known as cosmic microwave background radiation. In other words, in the big data raining down from Big Sky, they managed to hear the remnants of the Big Bang. Penzias and Wilson helped the Big Bang Theory defeat its primary rival, the Steady State Theory, as the prevailing scientific model of the universe.

Nowadays, in the era of big data, there is what we could call the Big Data Theory, which is challenging steady state theories that have been the bedrock of the status quo within the data management industry for decades.

Although I don’t doubt the theoretical potential of big data, I remain cautiously optimistic about big data becoming the prevailing data model of the business universe. After all, when performing analysis on a data set of any size, it’s hard to determine if what you’ve discovered is a meaningful business insight or data quality issue.

The reason that I like the Penzias and Wilson story so much is it illustrates that while big data will deliver more signals, not just more noise, we won’t always be able to tell the difference. Furthermore, it also exemplifies how an insight can be resisted when a big data set contradicts the preconceptions of the people performing the analysis.

Even though big data analytics will reveal wonders, I can’t help but wonder how often the tepid response to it will be: “yeah, well that might be what big data shows. But it’s just a theory.”


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top