The macro loop that vanished: How to simplify your SAS/IML programs

5

I don't use the SAS macro language very often. Because the SAS/IML language has statements for looping and evaluating expressions, I rarely write a macro function as part of a SAS/IML programs. Oh, sure, I use the %LET statement to define global constants, but I seldom use the %DO and %EVAL macro statements.

I was therefore momentarily baffled when I tried to decipher the following SAS/IML statements, which include a macro loop:

%macro ExtractSubmatrices;
proc iml; 
use ManyMatrices;  /** 3,747 rows of data **/
read all into X;
 
%do i=0 %to (3747/3)-1;
   %let j = %eval((&i*3) + 1);
   %let k = %eval((&i*3) + 2);
   %let l = %eval((&i*3) + 3);
 
   s&i. = X[{&j.,&k.,&l.},];
   /** do something with s&i. **/
%end;
quit;
%mend;
%ExtractSubmatrices;

The program looks complicated because of the macro syntax, but it is actually fairly simple. The program reads a data set, ManyMatrices, which contains 3,747 rows and 3 variables. The first three rows represent a 3 x 3 matrix, the next three rows represent a different 3 x 3 matrix, and so on. (There are a total of 3,747 / 3 = 1,249 matrices in the data set.) For each 3 x 3 matrix, the programmer wants to compute a quantity (not shown) based on the matrix.

The program can be improved by making the following changes:

proc iml;                     /** 1 **/
use ManyMatrices;
   read all into X; 
close ManyMatrices;           /** 2 **/
 
p = ncol(X);                  /** 3 **/
do i=0 to nrow(X)/p-1;        /** 4 **/
   rows = (p*i+1):(p*i+p);    /** 5 **/
   s = X[rows,];              /** 6 **/
   /** do something with s **/
end;
quit;

The numbered comments correspond to the following improvements:

  1. Eliminate the macro function. There is no need to use one.
  2. It is always a good idea to close the input data set when you are finished reading from it.
  3. Why limit yourself to 3 x 3 matrices? Use the NCOL function to assign a variable, p, that contains the number of variables in the data.
  4. Use the NROW function to determine the number of observations in the data set, rather than hard-code this value.
  5. Use the index creation operator (:) to create the sequence of rows that contain the ith p x p matrix.
  6. Subscript the data to extract the relevant rows and all of the columns. There is no need to create and store 1,249 matrices, each with a unique name. It is more efficient to reuse the same matrix name (s) for each computation.

There are often good reasons to write a macro function. However, the programming statements in the SAS/IML language often make macro functions unnecessary in PROC IML.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

5 Comments

  1. Jason Secosky on

    I agree that one must take care when coding a %DO loop inside any PROC or DATA step.

    In this case, each iteration of the %DO loop generates more IML statements. More statements translates into increased memory use and time to compile the program. Using an IML DO loop creates a much smaller program and more efficient compilation.

  2. It is not the macro that is the main problem. It is the lack of familiarity with proc iml language. Macro generated or not, the so called wall-paper code clearly shows the lack of understanding of language or programming fundamentals.

    It seems obvious from your reference to the non-existing %EVALF macro function that I can see your macro code (if you have to write some) would do a strange thing or two,

  3. There is a very big difference in the two pieces of code. The non macro code can only handle one sub-matrix at a time. The macro code splits the matrix in sub-matrices to be used in calculations afterwards (for example s1 * s2). All sub-matrices are available at the same time. You will need macro code to make that happen.
    Depending on the business need, the relevant code should be selected.

    P.S. It was not the lack of familiarity with SAS/IML, just the need for multiplying different sub-matrices afterwards that made use of macro code necessary.

  4. Pingback: Indirect assignment: How to create and use matrices named x1, x2,…, xn - The DO Loop

Leave A Reply

Back to Top