About four weeks ago i attended the ISF forecasting conference in Thessaloniki for the first time. It was a great conference with a lot of focus on forecasting methods. This year one of the hot topics was how to use machine learning in forecasting. One of the key questions was if machine learning can improve forecast accuracy.

Some of you might have heard about the M4 forecasting competition. For the ones who have not heard about it you can find information about it here. During the conference i learned a lot about the outcome of the latest M4 forecasting competition. In a nutshell in the first three M Competitions, simple forecasting methods did well. In the M4 competition more sophisticated methods combining machine learning with traditional time series methods improved forecast accuracy.

A closer look at M4

Looking at the outcomes of the competition in more detail there were two surprises:

A hybrid approach combining statistical and machine learning features was nearly 10 percent more accurate than the combination of timeseries models benchmark.
The use of pure machine learning models performed poorly in this competition.

A different experience with machine learning

One of the things we have seen with our customer base at SAS is sending a different message.

By applying pure machine learning models such as gradient boosting or artificial neural networks we could beat traditional timeseries models. In some cases seeing up to double digit accuracy improvements.

With such great results we had to ask ourselves a question. Why is it that machine learning models seem to work better in practical forecasting exercises compared to the results of the M4 competition?

The difference maker

One of the key conclusions we made is that whenever we fed causal factors such as promo plans or event data into the machine learning models we could see a decent improvement in accuracy. Where we had datasets consisting only of historic timeseries/shipment data, the machine learning models usually could not beat the traditional timeseries forecast approach. (Note that SAS has long included causal factors and event handling in our time series forecasting software, such as SAS® Forecast Server and SAS® Visual Forecasting.)

Taking a closer look at the dataset used in the M4 competition it did not include any causal factors such as price or promo data.

It seems like the value of using pure machine learning models in forecasting rises with the amount covariate and causal effects that machine learning models can leverage.

Stay tuned

Machine learning techniques show promise helping to improve statistical forecast accuracy. We all should keep a close eye on the latest development of machine learning in forecasting. To incorporate covariates and causal effects in the M5 forecasting competition should be a logical next step.

If you like to learn more about the M4 forecasting competition, a recommended read will be the forthcoming Q4 issue of International Journal of Forecasting. It will be devoted to M4 results, analysis, and commentary. For more information about the International Journal of forecasting please visit www.forecasters.org