Emails – it’s hard to imagine that there used to be a time people worked without them – at least in our industry. When I started at SAS in 1996 the amount of business email I received on a particular day was almost less than the amount of spam mails that make it through our spam filter. Today things are different so one is required to deal with new challenges – for example my email system telling me that "your mailbox is almost full."
In a way this reminds me of a discussion I had with Jack, an analyst at a European retailer not long ago. He is in charge of implementing SAS Forecast Server with two major goals in mind:
- improve on-shelf-availability of their products (to serve their consumers better).
- Reduce over-stocking (which might lead to unwanted mark-downs and wastage).
A question which seemed to keep Jack awake at night was apparently: when is a forecast accurate? Jack expected that a good forecasting model should be at least 95 percent accurate (if not 100%) when measured against actual values. In fact he was complaining that our forecasting engine did not archive these accuracy levels for all products - but in particular for so-called slow-movers, i.e. items which don't sell frequently. In my experience this is a very common misperception of statistical forecasting projects.
When I checked with him how they are creating those forecasts currently, he told me that their existing system is just “horrible and completely off most of the time”. Well, here is the catch: when trying to assess the performance of your statistical forecasting engine you should not just compare against actual values – in fact, you should compare against the current way of creating forecasts (which could also be "judgmental" approaches, like: take the number of last year and add 10 percent). Example: the actual value might be 100 units. Suppose we forecasted 75 units (which is 25 units too low) but what if the current system forecasted 50 units (which is 50 units too low)? Isn’t this already an improvement? I think so. Jack kept insisting: but what about the 25 units which we under-forecasted?
In statistical forecasting,we have to accept the fact that future values that we are trying to predict are subject to random disturbances. Sometimes people call this white noise. Unfortunately, this white noise is not predictable - no matter how hard we try. What can be predicted in many instances is an underlying pattern. Sometimes people call this the signal. This is different from the mailbox example I mentioned earlier. Here the following holds true: as the total amount of storage is fixed and known - a statement like "full" makes sense. In a statistical forecasting project we are estimating the outcome of a future value which is completely unknown to us and is subject to randomness (i.e. not fixed). If I were to know the final outcome beforehand, why would I forecast in the first place?
My colleague Mike Gilliland, author of The Business Forecasting Deal blog, wrote:
We should not have too high expectations for forecast accuracy, and should not expend heroic efforts trying to achieve unrealistic levels of accuracy. Instead, by accepting the reality that forecast accuracy is ultimately limited by the nature of what we are trying to forecast, we can instead focus on the efficiency of our forecasting processes... for business forecasting, the objective should be: To generate forecasts as accurate and unbiased as can reasonably be expected – and to do this as efficiently as possible.
I would like to add that setting up a forecasting process, which also helps assessing and measuring our ability to predict the future, is a value in itself. It helps in identifying areas where we have to buffer against the risk of not being accurate with our predictions.