The data sharpshooter fallacy


In his book You Are Not So Smart, which I am at least smart enough to highly recommend, David McRaney explains the human tendency to interpret patterns in randomness where none actually exist using the example of the Texas Sharpshooter Fallacy.

“The fallacy gets its name from imagining a cowboy shooting at a barn. Over time, the side of the barn becomes riddled with holes. In some places there are lots of them, in others there are few. If the cowboy later paints a bull’s-eye over a spot where his bullet holes clustered together, it looks like he is pretty good with a gun. By painting a bull’s-eye over a cluster of bullet holes, the cowboy places artificial order over natural random chance.”

A similar issue that sometimes arises during data-driven decision makingis what could be called the Data Sharpshooter Fallacy — when analysts try to overlay meaning onto a random cluster of data points and thereby convince themselves that they have discovered a seemingly magical business insight. But instead of a bull’s-eye, it is the human eye that’s naturally drawn to any cluster of data points.

“If you have a human brain,” McRaney explains, “you do this all of the time. Looking at the factors from a distance, you can accept the reality of random chance. You are lulled by the signal. You forget about the noise. With meaning, you overlook randomness, but meaning is a human construction. Picking out clusters of coincidence is a predictable malfunction of normal human logic.”

Perhaps the most predictable malfunction of normal human logic is when clusters of coincidence coincide with what we were looking to find in the data — with the bull’s-eye we drew with our mind’s eye before we started looking at any data.

This happens quite frequently when we are confronted with the challenging signal-to-noise ratio in big data. If we look at enough data, then we can usually find data that supports our preconceptions. However, whether we have found signal or noise still requires a data quality assessment, as well as an honest assessment of where — and when — we painted the data analytical bull’s-eye.


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top