Lean against bias for accurate analytics


We sometimes describe the potential of big data analytics as letting the data tell its story, casting the data scientist as storyteller. While the journalist has long been a newscaster, in recent years the term data-driven journalism has been adopted to describe the process of using big data analytics to create a news story.

One of the concerns that Daniel Kahneman expressed during a recent interview about journalism is that its stories are too good. “The stories are oversimplified and exaggerate the coherence of the information. This is something that comes naturally for journalists to do but that’s also what the reading public demands. On both sides of this, there’s an eagerness to produce stories that are coherent and to hear stories that are coherent. So the stories are simpler than reality and in some ways better than the true stories.”

This two-sided challenge also exists within companies. Analytical teams are eager to produce data-driven stories that are coherent, and business leaders are eager to hear data-driven stories that are coherent. The distillation of complexity that occurs with big data analytics means that data-driven stories are always simpler than reality.

Whether stories are driven by fact or fiction is open to the interpretation and biases of both the storyteller and the audience. “Quite often,” Kahneman explained, “what makes sense are the things you don’t even know. You speculate, you fill in, this is the way to make sense of it. They sound like facts but in fact they’re guesses. In our own thinking, it is not always easy to distinguish between knowing facts from observation and what we guess is a fact. The boundaries between guesses and truth, and guesses and facts, are blurry. Trying to show that boundary and show that distinction, that’s the way you safeguard against over-interpretation or biased interpretation.”

What we read in a news story, or read into a news story, and what data shows us, or what we see in data, makes accuracy equally important and challenging for both journalism and analytics.

As Kahneman explained, “what we mean by accuracy, or conveying accurate information, is leaning against people’s biases, which is probably not what comes naturally when you have an audience. Your natural tendency is to lean toward the biases of your audience as opposed to leaning against them, but that’s where the issue of accuracy comes in. You are going to be more accurate and produce more accuracy by leaning against the biases.”

Kahneman acknowledged that there is a strong temptation to tell people what they want to hear, not only in terms of opinion but even in terms of facts. The facts that are most difficult for people to assimilate are those they don’t want to hear or ideas they don’t want to understand because they don’t fit their conception of the world. “Leaning against biases,” Kahneman concluded, “means drawing people’s attention to facts that they normally would not pay attention to or might not even want to pay attention to.”

Leaning against bias is a call to action not only for accurate journalism, but also for accurate analytics.


About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Leave A Reply

Back to Top