Cargo cult data science

2

Last week, Phil Simon blogged about being wary of snake oil salesman who claim to be data scientists.  In this post, I want to explore a related concept, namely being wary of thinking that you are performing data science by mimicking what data scientists do.

The American theoretical physicist Richard Feynman coined the term cargo cult science to refer to practices that have the semblance of being scientific, but do not in fact follow the scientific method.

As Feynman described his analogy, “in the South Seas there is a cult of people.  During the war they saw airplanes land with lots of materials, and they want the same thing to happen now.  So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land.  They’re doing everything right.  The form is perfect.  But it doesn’t work.  No airplanes land.  So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.”

With all the hype and hullabaloo about big data and data science, your organization has probably read a lot of blog posts, and maybe even a few books.  But that doesn’t mean that you can stir up a bunch of data, throw some statistics at it, add some data analysis, sprinkle in some data visualizations, then just call it data science, and expect the planes to land carrying a cargo of business insights.

“We’ve learned from experience that the truth will come out,” Feynman cautioned. “Other experimenters will repeat your experiment and find out whether you were wrong or right.  Nature’s phenomena will agree or they’ll disagree with your theory.  And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work.  And it’s this type of integrity, this kind of care not to fool yourself, that is missing to a large extent in much of the research in Cargo Cult Science.”

Just as Simon cautioned against hiring fake data scientists, I want to caution against performing fake data science, which, with a nod to Feynman, should be called Cargo Cult Data Science.

Share

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

2 Comments

  1. Jim - I love the analogy to landing the plane. It's right on target...if the plane does try to land, disaster can follow. The same is very true about conducting fake data science - only in this case disaster WILL follow.

Leave A Reply

Back to Top