In Defense of Outliers

0

If outliers could scream, would we be so cavalier about removing them from our history, and excluding them from our statistical forecasting models? Well, maybe we would – if they screamed all the time, and for no good reason. (This sentiment is adapted from my favorite of the many Deep Thoughts by Jack Handey.)

It is, therefore, in the holiday spirit of peace, love, and understanding – and the avoidance of a common worst practice – that I offer my defense of outliers.


In the practice of business forecasting, we often encounter historical data that contain outliers – data values that are unusually large or small, or that fall well above or below what we would expect for a given time period. The easiest (and most common) thing to do is just filter the outliers (remove them from your data) and ignore them. Aren't they just annoyances that make it harder to construct a good model of the history?

Removing or adjusting the outliers lets you fit a simpler and more aesthetically pleasing model to the time series data. This is the “principle of parsimony” at work. The model, based on smoother data, won’t propogate the crazy spikes and troughs, and you end up with a nicer, smoother view of the future. The future, in fact, starts to look pretty well-behaved and predictable, which is the way we like it! However, the gratuitous masking of outliers can have an ugly downside.

Unusual and annoying things have happened in the past. Unusual and annoying things will probably happen again in the future. When we ignore the outliers in our historical data, we are actually ignoring a very important source of information on how ill-behaved the world can really be.

There can be merit in removing or adjusting outliers in order to create a better-behaving (and more appropriate) model of the future. However, there is no merit in ignoring the additional risk and uncertainty that outliers scream out to us when we do forecasting. Ignoring outliers can be a very dangerous practice, leading to excessive (and unjustified) confidence in our predictions.

Whatever method is used to handle the outliers in your data, remain aware that extreme data points have happened before, and they will almost certainly happen again. Don’t get too overconfident in your forecasts – you never know when they will go terribly wrong.

Final Reflections on 2010

As we come to the end of another year, I'd like to thank Constance Korol of the Institute of Business Forecasting for the invitation to post my reflections on "What We Learned About Forecasting in 2010" on the IBF blog.

Tags
Share

About Author

Mike Gilliland

Product Marketing Manager

Michael Gilliland is a longtime business forecasting practitioner and formerly a Product Marketing Manager for SAS Forecasting. He is on the Board of Directors of the International Institute of Forecasters, and is Associate Editor of their practitioner journal Foresight: The International Journal of Applied Forecasting. Mike is author of The Business Forecasting Deal (Wiley, 2010) and former editor of the free e-book Forecasting with SAS: Special Collection (SAS Press, 2020). He is principal editor of Business Forecasting: Practical Problems and Solutions (Wiley, 2015) and Business Forecasting: The Emerging Role of Artificial Intelligence and Machine Learning (Wiley, 2021). In 2017 Mike received the Institute of Business Forecasting's Lifetime Achievement Award. In 2021 his paper "FVA: A Reality Check on Forecasting Practices" was inducted into the Foresight Hall of Fame. Mike initiated The Business Forecasting Deal blog in 2009 to help expose the seamy underbelly of forecasting practice, and to provide practical solutions to its most vexing problems.

Comments are closed.

Back to Top