My esteemed colleague and recently-published author Jared Dean shared some thoughts on how ensemble models help make better predictions. For predictive modeling, Jared explains the value of two main forms of ensembles --bagging and boosting. It should not be surprising that the idea of combining predictions from more than one model can also be applied to other analytical domains, such as statistical forecasting.
Forecast combinations, also called ensemble forecasting, are the subject of many academic papers in statistical and forecasting journals; they are a known technique for improving forecast accuracy and reducing the variability of the resulting forecasts. In their article “The M3 Competition: Results, Conclusions, and Implications" published by the International Journal of
Forecasting, Spyros Makridakis and Michèle Hibon write about the results of a forecasting competition and share as one of their four conclusions: “The accuracy of the combination of various methods outperforms, on average, the specific methods being combined and does well in comparison with other methods.”
The lesson from this statement is that a combination of forecasts from simple models can add substantial value in terms of enhancing the quality of the forecasts produced, but the statement also concedes that combinations might not always perform better than a suitably-crafted model.
But how do you combine statistical forecasts? Similar to ensembles for predictive models, the basic idea is to combine the forecasts created by individual models, such as exponential smoothing models or ARIMA models. Let’s have a look at three combination techniques typically used:
- Simple average
- Every forecast created is combined using a similarly-weighted value – while this sounds like a simplistic idea, it has been proven very successful by practitioners, in particular if the individual forecasts are very different from each other.
- Ordinary least squares (OLS) weights
- In this approach an OLS regression is used to combine the individual forecasts. The main idea is to assign higher weights to the more accurate forecast.
- Restricted least squares weights
- Extends the idea of OLS weights by forcing constraints on the individual weights. For example, it might make sense to force all weights to be non-negative.
It is worth mentioning that estimating prediction error variance needs to be considered separately. In all cases, the estimated prediction error variance of the combined forecast uses the estimates of prediction error variance from the forecasts that are combined.
Not every time series forecast benefits from combination. The power of this technique becomes apparent when you consider that modern software such as SAS® Forecast Server allows for combination methods to be applied to large-scale time series forecasting of hierarchically structured data. The software makes it possible to generate combinations for inclusion into its model selection process in an automated fashion. In all cases, combined forecasts must prove their worth by their performance in comparison to other forecasts in the model selection process. If you are interested in more details this paper provides an extended explanation.