If outliers could scream, would we be so cavalier about removing them from our history, and excluding them from our statistical forecasting models? Well, maybe we would – if they screamed all the time, and for no good reason. (This sentiment is adapted from my favorite of the many Deep Thoughts by Jack Handey.)
It is, therefore, in the holiday spirit of peace, love, and understanding – and the avoidance of a common worst practice – that I offer my defense of outliers.
In the practice of business forecasting, we often encounter historical data that contain outliers – data values that are unusually large or small, or that fall well above or below what we would expect for a given time period. The easiest (and most common) thing to do is just filter the outliers (remove them from your data) and ignore them. Aren't they just annoyances that make it harder to construct a good model of the history?
Removing or adjusting the outliers lets you fit a simpler and more aesthetically pleasing model to the time series data. This is the “principle of parsimony” at work. The model, based on smoother data, won’t propogate the crazy spikes and troughs, and you end up with a nicer, smoother view of the future. The future, in fact, starts to look pretty well-behaved and predictable, which is the way we like it! However, the gratuitous masking of outliers can have an ugly downside.
Unusual and annoying things have happened in the past. Unusual and annoying things will probably happen again in the future. When we ignore the outliers in our historical data, we are actually ignoring a very important source of information on how ill-behaved the world can really be.
There can be merit in removing or adjusting outliers in order to create a better-behaving (and more appropriate) model of the future. However, there is no merit in ignoring the additional risk and uncertainty that outliers scream out to us when we do forecasting. Ignoring outliers can be a very dangerous practice, leading to excessive (and unjustified) confidence in our predictions.
Whatever method is used to handle the outliers in your data, remain aware that extreme data points have happened before, and they will almost certainly happen again. Don’t get too overconfident in your forecasts – you never know when they will go terribly wrong.
Final Reflections on 2010
As we come to the end of another year, I'd like to thank Constance Korol of the Institute of Business Forecasting for the invitation to post my reflections on "What We Learned About Forecasting in 2010" on the IBF blog.