Come on Irene: Time series cross-validation

The hurricane didn't get me, but Monday night's thunderstorm sure played a dirty trick. After leaving my car windows cracked open over night, I drove to work Tuesday morning feeling a little soggier by the minute. Upon arrival at SAS, I was aghast to find the seat of my pants was soaked (very visibly so).

What to do next?

My first reaction was to cut and run, i.e., drive back home and change into some dry clothes. I hadn't been in a situation this dire since '93, when I came to work at Oscar Mayer Foods wearing one black shoe and one brown shoe. (Back then I had no choice but to return home for a change -- you know how poorly the meat packing industry takes to sartorial faux pas.) At least both shoes were mine.

Thankfully, my oversized European Man-Purse saved the day. Thinking quickly, I adjusted the straps until it hung low behind me, fully outing the damned spot from view. I then cantered to my office, seemingly unobserved, and spent the next 30-minutes squatting over an air vent to dry.

Time Series Cross-Validation

Tuesday turned out a little better than it started, when my colleague Udo Sglavo pointed me to a Research Tips blog about cross validation for time series by Rob J Hyndman. Time series cross-validation describes a method for forecast evaluation "with a rolling origin," analogous to a leave-one-out cross-validation for cross-sectional data. Last week Hyndman blogged about implementing time-series cross validation in R. Udo wrote:

Currently SAS Forecast Server does not provide access to cross-validation by default, as it implements a more classical methodology for hold-out sampling, which is designed for large-scale automatic forecasting challenges.

A holdout sample is a subset of actual time series used for model selection. For each candidate model specified, the holdout sample is excluded from the initial model fit and forecasts are made within the holdout sample time range. Then, for each candidate model specified, the statistic of fit specified is computed by using only the observations in the holdout sample. Finally, the candidate model, which performs best in the holdout sample, based on this statistic, is selected to forecast the actual time series.

The cross-validation approach described by Hyndman can be implemented by directly access the batch engine of SAS Forecast Server called SAS High-Performance Forecasting. As certain default functionality of this forecasting engine needs to be “turned off,” some SAS coding is required to replicate the example provided by Hyndman. (I can provide the full code upon request to udo.sglavo@sas.com.) Also, SAS High-Performance Forecasting does not provide access to Hyndman’s ETS method; instead a multiplicative Winter’s method is used (for sake for the example).

Over the next two installments of The BFD, Udo will guest-blog his solution to conducting cross validation using SAS Forecast Server.

Blogs

Blogs

Come on Irene: Time series cross-validation

About Author