Compute statistics for each row by using subscript operators

3

In a previous blog, I showed how to use SAS/IML subscript reduction operators to compute the location of the maximum values for each row of a matrix. The subscript reduction operators are useful for computing simple statistics for each row (or column) of a numerical matrix.

If x is a matrix, I primarily use subscript reduction operators to compute the following quantities:

  • the row vector that contains the sum of the elements for each column of a matrix: x[+, ]
  • the row vector that contains the mean of the elements for each column of a matrix: x[:, ]
  • the column vector that contains the sum of the elements for each row of a matrix: x[, +]
  • the column vector that contains the mean of the elements for each row of a matrix: x[, :]
I wrote a 2011 article in which I gave examples of each of these operations and encouraged SAS/IML programmers to use subscript reduction operators to avoid loops over rows or columns.

Recently a SAS/IML programmer contacted me about how to compute the maximum value of each row in a matrix. He sent the following program, which uses a DO loop and the MAX function to compute a column vector whose ith element is the maximum value of the ith row of a matrix:

proc iml;
x  = {10  0  1  0  2  0  4,
       0  3  9  7 20  8  8,
       4  4 30  9  0  2  1,
       0  1  2  4  6 40  3 };
 
/* Find max of each row. Method 1: DO loop (inefficient) */
y = J(nrow(x), 1, 0);
do i = 1 to nrow(x);
   y[i] = max(x[i, ]);
end;
print y;

You can eliminate the DO loop by using the <> operator, as follows:

/* Method 2: subscript operator (efficient) */
y = x[, <>];   /* max of each row */

The expression x[, <>] is read as follows:

  • No subscripts are specified for the row index (before the comma). This means "use all rows." (You could also use the expression x[1:nrow(x), <>], but this is less efficient.)
  • The operator (<>) is specified for the column index (after the comma). This means "find the maximum element for columns." Because the operator is specified in place of a column index, the result is a column vector.

The hardest part, for me, is remembering where to put the subscript reduction operator. I use the following mnemonics:

  • If you want a column vector, use the operator in place of a column index: x[, <>]
  • If you want a row vector, use the operator in place of a row index: x[<>, ]
  • If you want a scalar value, use the operator as a sole subscript. For example, x[<>] computes the maximum element of an entire matrix, and is equivalent to max(x).

In addition to finding sums, means, maxima, and minima, you can also use subscript reduction operators to compute products (#) and sum of squares (##). These operations are useful for forming simple statistics for each row or for each column of a matrix.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

3 Comments

  1. Pingback: Generate uniform data in a simplex - The DO Loop

  2. Pingback: Compute maximum and minimum values for rows and columns in SAS - The DO Loop

Leave A Reply

Back to Top