PROC ARIMA: Penetrating the matrix


Justin Smith and William “Gui” Zupko were looking at manufacturing data over time and wanted to know the minimum value in their dataset, and they wanted to pinpoint its exact location – the specific row and column. PROC ARIMA uses the ARIMA (auto-regressive integrated moving average) model or the ARMA (autoregressive moving-average) model to analyze and forecast transfer function data, intervention data and, in this case, time series data.

“PROC ARIMA is one of the main procedures for doing this kind of modeling,” said Smith during his presentation of their paper, “Penetrating the Matrix.”

“When you model time series,” he said, “there are two main types of terms you [will]have. One is auto-regressive, that’s if the current value of sales for the current month is related to past months; and the other types of terms are moving average, that’s if the current value of sales depends on previous lags of random noise.”

Smith explained that you can have 0 to p number of auto-regressive terms and 0 to q number of moving average terms. There are several ways to find the values of p and q, but Smith and his co-author chose PROC ARIMA with the MINIT option (minimum information criteria).

With the MINIC option, PROC ARIMA will build a matrix where the rows equal the values of p and the columns equal the values of q. “That’s a tool that somehow tells you that your p and q are good, in a way,” he said. “What you want to see in the minimum values are the smallest values – the smaller, the better.”

According to Smith, there are several ways to find the information. “We could figure out how to use the PROC MEANS,” he said. “We could at least think about doing this in arrays, and there is also something in PROC IML called index of the minimum.”

But Smith and Zupko decided that – for their purposes – PROC IML would be best. Using the read all into command, they read their dataset into PROC IML as a matrix and called that pqmat. The observations are now rownames and the variable names start with the string “ma”.

Now they can find the number of rows and columns to calculate the minimum:

nrow=nrow(pqmat); ncol=ncol(pqmat); minmatrix=min(pqmat);

Smith points out that calculating the minimum of the matrix inside the DO loops would slow the process, so run the calculation outside the DO loops so that it is only calculated once.

In the DO loops below, Smith and Zupko store the values of i and j to macro variables that correspond to the minimum. “We have these loops where i goes through one and then loops through all of the rows, and j is the counter for the columns,” Smith pointed out. “What we are doing, simply, is checking the cell, and if the cell value is the same as the minimum that we’ve already calculated, we store that to a macro variable.”

print pqmat nrow ncol;
do i=1 to nrow;
   do j=1 to ncol;
      if pqmat[i,j]=minmatrix then call symput('i',left(char(i)));
      if pqmat[i,j]=minmatrix then call symput('j',left(char(j)));

At this point, they still need to calculate the value of p and q. For the purposes of Smith and Zupko’s calculations, a business rule must be corrected for: They started the range of p and q as one, but their business rule states that future observations must be based on at least two past observations. So, p must be at least two. That means that the value of p can be found by increasing i by one. A correction isn't needed for q because it starts at one. So, p = 3 and q = 1.

“We used %EVAL to get the correct value of p. We then used these values of p and q for further – lots and lots of further – processing and time series analysis,” said Smith.

Download their Coder’s Corner paper, “Penetrating the Matrix,” to learn the full details and sample code for calculating the minimum value of a dataset and finding its location. Smith and Zupko wrote a longer paper that details how the minimum value and its location were used. Download “Creating and Displaying an Econometric Model Automatically” to take a look at the entire project.

*According to the SAS/ETS® 9.3 documentation, “The ARIMA class of time series models is complex and powerful, and some degree of expertise is needed to use them correctly.”


About Author

Waynette Tubbs

Editor, Marketing Editorial

Waynette Tubbs is a seasoned technology journalist specializing in interviewing and writing about how leaders leverage advanced and emerging analytical technologies to transform their B2B and B2C organizations. In her current role, she works closely with global marketing organizations to generate content about artificial intelligence (AI), generative AI, intelligent automation, cybersecurity, data management, and marketing automation. Waynette has a master’s degree in journalism and mass communications from UNC Chapel Hill.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top