With the 2015 World Series currently underway, there couldn't be a more exciting time to discuss how analytics is transforming the game of baseball. It comes as no surprise to say that managers and front office executives, not to mention the players, are continuously looking for a competitive advantage over their opponents. Over the years, those competitive advantages have come in various forms, but as Mark McGuire once said to the House Committee of Government Reform back on March 17, 2005 , "I'm not here to talk about the past."
I'm here to talk about the future. The future of analytics in baseball is now, and for some teams analytics is playing an increasingly important role in how strategic decisions are made, both on and off the field.
But this shouldn't come as too big of a surprise, right? The idea of incorporating analytics in baseball is far from an original idea. Bill James, often considered the godfather of modern statistical baseball research, has published over two dozen books on the subject dating back to as early as 1977. The Society for American Baseball Research (SABR), an organization dedicated to scientific and historical research on baseball, has been going strong since 1971. In fact, people who use analytics and mathematical tools to analyze the game of baseball have been dubbed "sabermetricians."
You may have read Michael Lewis' book, Moneyball, which highlights how the 2001 Oakland Athletics used analytics to build a playoff caliber baseball team on a microscopic budget. The book gained so much popularity that it's since been made into a movie, and despite my initial reservations about Brad Pitt and Jonah Hill garnering the leading roles, it was a major hit. (For the record, the movie has earned 94 percent on the Tomatometer on RottonTomatoes.com.)
The fact is, even the tiniest insight into a certain player's patterns and tendencies can lead to a huge competitive advantage on the field. I know all too well the implications it can have, albeit on a smaller scale. As a former college starting pitcher, one Saturday afternoon in the middle of our season, I faced off against Bridgewater College, a small college in Virginia, who we had every intention of annihilating. After two and one third innings pitched, I was pulled after giving up my eighth run of the game. I was dumbfounded. They had lit me up like a pinball machine. When we shook hands after the game with the opposing team, the opposing coach pulled me aside and informed me of a very important piece of information.
It turns out that when I was on the pitching rubber, if I was about to throw a fastball, my hand began down by my side. However, if I was about to throw an offspeed pitch, my hand began in my glove. It quickly dawned on me that every batter I faced that afternoon knew exactly what pitch I was about to throw!
Now take that example to scale. If one passing observation by an opposing coach can have that big of an impact on the outcome of a game, how big of an impact can analytics have when incorporating years of historical baseball data on every player across all 30 Major League Baseball teams?
One place we're seeing the effect of analytics right now is in the form of "the shift."
The shift is a defensive realignment that moves players from their original positions in the infield to areas in the infield where the current batter is most likely to hit the ball. There are four infielders that take part in the shift, the shortstop and thirdbaseman who line up to pitcher's right side, and the second and firstbaseman who line up to the pitcher's left side. For example, take a left-handed batter who is most likely to pull the baseball (or hit the ball to the side from which he hits). If you know this, the shortstop will slide over to the pitcher's left side to help fill the gap between first and second. Now more players are covering the area where the current batter is most likely to hit the baseball.
This poses two questions:
- Does this actually work or does it just sound good on paper?
- If it does actually work, how well does it work?
Last season it was estimated that over 400 net hits were saved by all Major League Baseball teams as a result of deploying the shift. The Houston Astros led the league in net hits saved with 44. So the answer is pretty clear, the shift does work in practice, and for those teams who can accurately predict where a player is most likely to hit the baseball, it works very well. Therein lies a competitive advantage for teams that can use big data and analytics to their advantage.
The shift is just the tip of the iceberg in terms of how analytics is transforming the way baseball is being played. With the abundance of tools and data available today, teams with the ability to harness the power of analytics to extract valuable insights will have a distinct competitive advantage over their competitors.
Watch for more sports analytics posts from me soon. In the meantime, you can learn more about sports analytics in The New Science of Winning research paper, or uncover one of the advantages the Mets currently have on their side.
12 Comments
For those of you who are interested in learning more about analytics and MLB, here's a webcast featuring the director of decision sciences for the Houston Astros: http://www.jmp.com/en_us/events/ondemand/analytically-speaking/analytically-speaking-sig-mejdal.html
I never knew mlb used anaytics this way. It is very interesting. I wonder for each player how much data is collected for each person.
Hi, Shirley. This post discusses a different sport, but it gives you a good idea of just how much data can be collected on an individual athlete:
http://blogs.sas.com/content/sascom/2015/07/17/what-do-an-elite-rower-and-a-leading-data-scientist-have-in-common/
I wouldn't really call this an example of "big" data, but I'd guess that analytics are still vastly underutilized in professional sports.
The evaluation question at the end is interesting - how do we really know analytics are working? Not just in baseball, but everywhere.
It is really exciting, seeing how analytics has transformed baseball. In terms of the two questions raised; that is, does this actually work or does it just sound good on paper? And if it does actually work, how well does it work? My question is how can we access the facts to support the findings to these questions?
I've always loved how stats are used in sports; I remember my grandfather, who never had schooling past Grade 2, sitting on the floor with numerous notebooks tracking hockey, soccer and baseball - players, teams, weather, stadiums. If you sat there with the notebooks and were able to follow his logic, it was literally like being there. He even had a "Predictions" notebook, and was convinced the Jays were going to win the World Series both years well before anyone else. It was awesome.
I graduated with a B.Math degree in statistics. I had three class mates graduate with the same degree that year. One of them was going to become a sports statistician.
Here I was thinking it was more about the front office and ticket sales and a retail optimization problem that analytics in sports was solving. Of course, analyzing games must be, generally speaking, very old as a practice much older than the 1970's.
I think, it might be interesting to see how baseball data is captured. If one used broadcast cameras and then you would have big data in millions of pixels which you would then classify into various plays or player movement like the 'shift'. How much did the player shift and at exactly what time?
Interesting.
I think you'll really enjoy the article below. It talks about the PITCHf/x system and how its success has led to the HITf/x and FIELDf/x systems to help measure these things on the most granular level.
http://www.hardballtimes.com/wp-content/images/tht/pitchfx_guide.bt.pdf
What can sports analytics do for a more random sport like hockey?
Hi, John. The Analytics in Sports research paper linked above included interviews with NHL teams. You may want to download it for details: http://www.sas.com/en_us/whitepapers/iia-analytics-in-sports-106993.html
It's really interesting to see how much analytics is starting to take off in professional sports. I would say baseball has the most history of analytics and has lead they way, but in the past year or two ofther sports have really started to incorporate analytics not only from an internal team decision making standpoint but also in terms of how the media presets the games.