Rolling statistics in SAS/IML

2

Last week I showed how to use PROC EXPAND to compute moving averages and other rolling statistics in SAS. Unfortunately, PROC EXPAND is part of SAS/ETS software and not every SAS site has a license for SAS/ETS. For simple moving averages, you can write a DATA step program, as discussed in previous post. However, for complex rolling statistics, the SAS/IML language, which enables you to easily access previous observations (and even future ones!), is a more powerful tool for computing rolling statistics.

This article shows how to implement various rolling statistics in SAS/IML. To keep the explanations and programs simple, the functions assume that there are no missing values in the data. The article "What is a moving average" explains the mathematical formulas used in this post.

A simple moving average function

The key to computing most rolling statistics is to define a rolling window of observations. At each time point, you extract the observations in the rolling window and use them to compute the statistic. You then move on to the next time point and repeat the computation. You might need to perform special computations at the beginning of the time series.

The following SAS/IML program implements a simple moving average. The arguments to the MA function are a column vector, Y, of time series values and a scalar value, k, which indicates the number of values to use for each moving average computation. The program loops over elements in Y. For each element, the program computes the mean of the current element and previous k-1 values.

proc iml;
/* Simple moving average of k data values.
   First k-1 values are assigned the mean of all previous values.
   Inputs:     y     column vector of length N >= k
               k     number of data points to use to construct each average
*/
start MA(y, k);
   MA = j(nrow(y), 1, .);
   do i = 1 to nrow(y);
      idx = max(1,(i-k+1)):i;   /* rolling window of data points */
      MA[i] = mean( y[idx] );   /* compute average */
   end;
   return ( MA );
finish;

The first k-1 values require special handling because these values have fewer than k-1 prior observations to average. You could handle these special values by using a separate loop. However, I chose to use the expression max(1, (i-k+1)) to select the first element for the rolling mean computation. When i is less than k, this expression returns 1 for the first element, and the program computes the mean of the first i values. Otherwise, this expression returns i minus k-1 (which is i-k+1) for the first element, and the program computes the mean of k values.

The most important part of this computation is enumerating the time points to use in the computation (for example, idx = (i-k+1):i;) followed by extracting the associated data (for example, y[idx]). With these two expressions, you can compute any rolling statistic. For example, by changing the function call from MEAN to STD, you can compute a rolling standard deviation. The rolling median, rolling minimum, and rolling maximum are also trivial to implement. By changing the time points, you can compute rolling statistics for centered windows. If your data contain several variables, you can compute a rolling correlation.

A weighted moving average function

The following function computes a weighted moving average. The arguments to the WMA function are a column data vector, Y, and a vector of weights that has k elements. For each time point, wk (the last weight) is the weight for current data value, wk-1 is for the previous data value, and so forth. The function internally standardizes the weights so that they sum to unity. (This ordering was chosen so that the WMA function uses the same syntax as PROC EXPAND.) This function handles the first few values in a separate loop:

/* Weighted moving average of k data values.
   First k-1 values are assigned the weighted mean of all preceding values.
   Inputs:     y     column vector of length N >= k
               wt    column vector of weights. w[k] is weight for most recent 
                      data; wt[1] is for most distant data value.  The function 
                     internally standardizes the weights so that sum(wt)=1.
   Example call: WMA  = WMA(y, 1:5);
*/
start WMA(y, wt);
   w = colvec(wt) / sum(wt);       /* standardize weights so that sum(w)=1 */
   k = nrow(w);
   MA = j(nrow(y), 1, .);
   /* handle first k values separately */
   do i = 1 to k-1;
      wIdx = k-i+1:k;                 /* index for previous i weights */
      tIdx = 1:i;                     /* index for previous i data values */
      MA[i] = sum(wt[wIdx]#y[tIdx]) / sum(wt[wIdx]);  /* weighted average */
   end;
   /* main computation: average of current and previous k-1 data values */
   do i = k to nrow(y);
      idx = (i-k+1):i;               /* rolling window of k data points */
      MA[i] = sum( w#y[idx] );       /* weighted sum of k data values */
   end;
   return ( MA );
finish;

Notice that the function requires computing a weighted mean, which is described in a previous article.

An exponentially weighted moving average function

An exponentially weighted moving average is defined recursively. The average at time t is a weighted average of the data point at time t and the average from time t-1. The relative weights are determined by the smoothing parameter, α. The following function implements that definition:

/* Exponentially weighted moving average (EWMA) with smoothing parameter alpha.
   REF: http://www.sascommunity.org/sugi/SUGI90/Sugi-90-76%20Brocklebank.pdf
        https://en.wikipedia.org/wiki/Exponential_smoothing
   Inputs:      y     column vector of length N
                alpha scalar value 0 < alpha < 1
*/
start EWMA(y, alpha);
   MA = j(nrow(y), 1, .);
   MA[1] = y[1];              /* initialize first value of smoother */
   do i = 2 to nrow(y);
      MA[i] = alpha*y[i] + (1-alpha)*MA[i-1];
   end;
   return ( MA );
finish;

The three moving average functions are now defined. You can read the time series data into a vector and call the functions. If necessary, you can write the rolling statistics to a SAS data set:

/* read time series data */
use Sashelp.Air;  
   read all var "date" into t;
   read all var "air" into y;
close;
MA   = MA(y, 5);           /* moving average, k=5 */
WMA  = WMA(y, 1:5);        /* weighted moving average */
EWMA = EWMA(y, 0.3);       /* exponentially WMA, alpha=0.3 */
 
create out var{t y MA WMA EWMA};  append;  close out;

You can use the SGPLOT procedure to visualize the rolling statistics, as shown in my previous article.

Vectorizing time series computations

The experienced SAS/IML programmer will notice that these functions are not heavily vectorized. The MA and WMA computations use vectors of length k to compute the means and weighted means, respectively. It is possible to write these functions by using a matrix operation, but if the time series has N data points, the transformation matrix is an N x N lower-triangular banded matrix, which requires a lot of memory for large values of N.

Notice that the EWMA function uses scalar quantities inside a loop. For time series computations that use lagged data values, you can sometimes vectorize the time series computations. However, for operations that are defined recursively, such as the EWMA, the effort required to vectorize the computation might exceed the benefit you gain from the vectorization. In many cases, a function that uses a simple loop is fast and easy to maintain.

Summary

This article presents functions for computing rolling statistics in SAS/IML. Examples included a simple moving average (MA), a weighted moving average (WMA), and an exponentially weighted moving average (EWMA). The article describes how to modify these function to compute other rolling statistics in SAS.

Computations of rolling statistics might not be easy to vectorize. Even when you can vectorize a computation, a simpler approach might run faster.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

2 Comments

  1. Pingback: Compute a moving average in SAS - The DO Loop

  2. Pingback: The running median as a time series smoother - The DO Loop

Leave A Reply

Back to Top