Last week, Phil Simon blogged about being wary of snake oil salesman who claim to be data scientists. In this post, I want to explore a related concept, namely being wary of thinking that you are performing data science by mimicking what data scientists do.
The American theoretical physicist Richard Feynman coined the term cargo cult science to refer to practices that have the semblance of being scientific, but do not in fact follow the scientific method.
As Feynman described his analogy, “in the South Seas there is a cult of people. During the war they saw airplanes land with lots of materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land. They’re doing everything right. The form is perfect. But it doesn’t work. No airplanes land. So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land.”
With all the hype and hullabaloo about big data and data science, your organization has probably read a lot of blog posts, and maybe even a few books. But that doesn’t mean that you can stir up a bunch of data, throw some statistics at it, add some data analysis, sprinkle in some data visualizations, then just call it data science, and expect the planes to land carrying a cargo of business insights.
“We’ve learned from experience that the truth will come out,” Feynman cautioned. “Other experimenters will repeat your experiment and find out whether you were wrong or right. Nature’s phenomena will agree or they’ll disagree with your theory. And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work. And it’s this type of integrity, this kind of care not to fool yourself, that is missing to a large extent in much of the research in Cargo Cult Science.”
Just as Simon cautioned against hiring fake data scientists, I want to caution against performing fake data science, which, with a nod to Feynman, should be called Cargo Cult Data Science.