Analytics, statistics, operations research, data science and machine learning - with which term do you prefer associate? Are you from the House of Capulet or Montague, or do you even care? Shakespeare's Juliet derides excess identification with names in the famous play, Romeo and Juliet.
"What's in a name? That which we call a rose
By any other name would smell as sweet."
Romeo was from the house of Montague, Juliet, the house of Capulet, and this distinction that meant that their families were sworn enemies. The play is a tragedy, because by the end the two lovers end up dead as a result of this long-running feud. Statistics, data science and machine learning are but a few of the "houses" that feud today over names, and while to my knowledge no deaths have resulted from this debate the competing camps have nearly come to blows.
"Operations Research? Management Science? Analytics? What’s in a brand name? How has the emerging field of Analytics impacted the Operations Research Profession? Is Analytics part of OR or the other way around? Is it good, bad, relevant, a nuisance or an opportunity for the OR profession? Is OR just Prescriptive or is it something more? In this panel discussion, we will explore these topics in a session with some of the leading thinkers in both OR and Analytics. Be sure to attend to have your questions answered on these highly complementary and valuable fields.” This was the abstract for a panel at the INFORMS Annual Meeting last year that included two past presidents of INFORMS and other long-time members. To many long-time members of INFORMS this abstract was provocative indeed, because not all embrace of the term "analytics" to describe what they do.
Interestingly, the American Statistical Association (ASA) is host to a very similar debate. Last fall the ASA released a statement on the Role of Statistics in Data Science. There’s a whole wing of data science practitioners who are downright hostile to statisticians. This camp makes assertions like "sampling is obsolete,” due to computational advances for processing big data. Popular blogger Vincent Granville has even said "Data Science without statistics is possible, even desirable," extending the obsolescence to statisticians themselves. INFORMS member and self-described statistical data scientist Randy Bartlett, who is both a Certified Analytics Professional (the CAP certification offered by INFORMS) and an Accredited Professional Statistician (the PSTAT certification from ASA), has written about this statistics denial in an excellent series of blog posts he publishes from his LinkedIn page. In the face of such direct attacks on their profession it is no wonder the ASA felt a need to take a stance.
Many statisticians assert "Aren't We Data Science?," as Marie Davidian (professor of statistics at NC State University) did in in 2013 in an article published during her tenure as president of the ASA. More recently David Donoho (professor of statistics at Stanford University) makes a similar but complex argument in a long-form piece, "50 Years of Data Science," which he released last fall (after a presentation on it at the Tukey Centennial workshop). Donoho is equally dismayed at much of the current data science movement. As he puts it, "The statistics profession is caught at a confusing moment: the activities which preoccupied it over centuries are now in the limelight, but those activities are claimed to be bright shiny new, and carried out by (although not actually invented by) upstarts and strangers." Donoho points out the harm in the huge oversight of the contributions of statistics while also exhorting academic statisticians to expand beyond a narrow focus on theoretical statistics and "fancy-seeming methods." He proposes a definition of data science, based on people who are "learning from data," drawing upon a remarkably prescient article John Tukey published more than 50 years ago, "The future of data analysis," in The Annals of Mathematical Statistics. In making his case, Donoho also points to subsequent essays by John Chambers (of Bell Labs and co-developer of the S language), William Cleveland (also of Bell Labs and arguably the one who coined the term data science in 2001), and Leo Breiman (of the University of California at Berkeley). These gentleman together argue for addressing a wider portion of the lifecycle of analysis, such as the preparation of data as well as its presentation, and the importance of prediction (and not just inference). I heartily recommend reading Donoho's excellent analysis.
While I haven’t seen a similar assault on operations research, there are those within the OR/MS community who see terms like analytics as a threat to the survival of OR. Several years ago, when INFORMS began exploring whether to embrace the term analytics, researchers surveyed the INFORMS membership and published their results in an article in Interfaces entitled "INFORMS and the Analytics Movement: The View of the Membership." Member views of the relationship between OR and analytics were roughly divided into three camps: OR is a subset of analytics, analytics is a subset of OR, and analytics is the intersection of OR and analytics. While the numbers may have shifted I doubt they have yet converged into a clear definition or consensus among members. There will always be naysayers - 6% of the membership surveyed at the time thought there was no relationship between OR and analytics, and for that matter there are statisticians who see no association with data science or analytics at all. Today INFORMS embraces the term analytics, describing itself as "the largest society in the world for professionals in the field of operations research (O.R.), management science, and analytics." By adding the word analytics, instead of replacing operations research, INFORMS has shown that this is not an either/or question - it values its roots while acknowledging a present that includes those who describe their work as "analytics."
Trying to wrangle a clear definition of analytics, statistics, data science, machine learning, and even operations research can be as messy as cleaning up a typical data set! These are distinct disciplines, related but not the same. Increasingly analytics is used synonymously with data science, which is derived in large part from statistics, which also is a foundation for machine learning, which relies upon from optimization techniques, which I’d argue are part of analytics. These days I observe increased cross-fertilization, like my operations research peers clamoring to attend machine learning conferences and machine learning talks that are standing-room-only at the Allied Social Sciences Annual Meeting (where the economists gather).
We can parse terminology all day, but instead we should invest our energy in the opportunity at hand and drive towards increased adoption. Some of these buzzy terms get people’s attention and provide an incredible opportunity to use mathematically-based methods to make a significant impact. Last year my friend Jack Levis accepted my invitation to give a keynote at an analytics conference SAS hosted. He spoke about ORION, the OR project he leads at UPS that Tom Davenport has said is "arguably the world's largest operations research project in the world." While few of the conference attendees likely understood in great detail the operations research methods employed, all were amazed at the impact his team has had on saving miles, time, and money for UPS, happily tweeting their excitement during his talk. No doubt this impact is why Jack's team won the 2016 INFORMS Edelman Competition, which some call "the Super Bowl of OR."
The important advances in research presented at conferences like the Joint Statistical Meetings and the INFORMS Annual Meeting pave the way for progress that enables success in practice at places like UPS. We need academics and other researchers to continue to invest in advancing the unique approaches of their disciplines. Most INFORMS members would call what Jack's team does operations research. Most attendees at the conference where Jack spoke probably thought of it as analytics. Does it really matter what we call it, if people value what was done, want to share the story, and pave the path for adoption through their enthusiasm ? If it leads to the expansion of OR I don’t care if this application of operations research is referred to popularly as analytics, because after all, in the words of the bard, "a rose by any other name would smell as sweet." Each of the historic disciplines has a chance to have not only a bigger slice of the pie but a bigger slice of a bigger pie if analytics is embraced at large. I understand the value and pride in disciplines like operations research and statistics, and their contributions to data science and machine learning, which are the knowledge base analytics draws upon. We don't have to throw out older terms to embrace new ones. The houses of Capulet and Montague can celebrate their unique heritage but put an end to their feuds. This is not an either/or proposition, but I do believe it is our opportunity to squander. Instead of feuding let us learn more, put that learning into practice, and make the world a better, smarter place.