IF-THEN logic with matrix expressions

In the SAS DATA step, all variables are scalar quantities. Consequently, an IF-THEN/ELSE statement that evaluates a logical expression is unambiguous. For example, the following DATA step statements print "c=5 is TRUE" to the log if the variable c is equal to 5:

if c=5 then put "c=5 is TRUE";
else put "c=5 is FALSE";

In the SAS/IML language, the situation can be more complicated because a variable is usually a matrix or a vector. You can use matrices in logical expressions, but when c is not a scalar quantity, what does the statement if c=5 then... mean? What happens when some of the elements of c are 5, but other elements are not?

The following SAS/IML example illustrates the situation:

proc iml;
c = {5 5 6};
if c=5 then 
   print "c=5 is TRUE";
   print "c=5 is FALSE";

According to the output, the logical expression c=5 evaluates to FALSE. This result can be understood by reading the documentation for the IF-THEN/ELSE statement, which says, "when the condition to be evaluated is a matrix expression, ... this evaluation is equivalent to using the ALL function." In other words, the IF-THEN/ELSE statement is equivalent to the following:

if all(c=5) then 
   print "all(c=5) is TRUE";
   print "all(c=5) is FALSE";

The ALL function evaluates to TRUE when all of the elements in a matrix are nonzero and nonmissing. As I recently mentioned, you can use the RESET PRINTALL statement to see temporary matrices that SAS/IML creates when it evaluates logical expressions. The expression c=5 results in the temporary matrix {1 1 0}. The ALL function returns 0 (FALSE) for that matrix.

If you want a condition to evaluate to TRUE if any element in a matrix satisfies a criterion, then use the ANY function explicitly, as follows:

if any(c=5) then 
   print "any(c=5) is TRUE";
   print "any(c=5) is FALSE";

The takeaway lesson is that a matrix expression can be used in an IF-THEN/ELSE statement in the SAS/IML language. The THEN condition will be executed when all of the elements in the matrix satisfy the expression. The ELSE condition will be executed if one or more elements do not satisfy the expression.

Post a Comment

What is an empty matrix?

At the beginning of my book Statistical Programming with SAS/IML Software I give the following programming tip (p. 25):

Do not confuse an empty matrix with a matrix that contains missing values or with a zero matrix. An empty matrix has no rows and no columns. A matrix that contains missing values has at least one row and column, as does a matrix that contains zeros.

What do those sentences mean? Consider the following SAS/IML program:

proc iml;
z = j(2, 2, 0);       /* zero matrix = {0 0, 0 0} */
m = j(1, 4, .);       /* matrix of missing = {. . . .} */
q = {};               /* empty matrix */

The J function in SAS/IML software creates a matrix that has a specified number of rows and columns and fills the matrix with a constant value. The syntax is result = j(nrows, ncols, value). The first statement in the program creates a matrix named z that has two rows and two columns. Each element of the matrix has the value 0. The matrix is also called a "zero matrix." You can print this matrix and use it in expressions that involve matrix addition, matrix multiplication, logical expressions, and so forth.

The second statement creates a matrix named m that has one row and four columns. Each element is a missing value. The matrix is called a "matrix of missing values." You can print this matrix. You can use it in some elementwise matrix operations, but not in matrix multiplication.

The third statement creates an empty matrix. It has zero rows and zero columns. It does not have any elements. You cannot add with it, multiply with it, or even print it. However, you can ask for its dimensions, concatenate with it, and use it in set operations such as unions and intersections.

The syntax q = {} was introduced in SAS/IML 12.1. Read my previous article for ways to create and detect empty matrices.

In conclusion, do not confuse an empty matrix with a matrix of missing values. They are very different. The former has no rows or columns. The latter has rows and columns, but every element is a missing value.

Post a Comment

Finding matrix elements that satisfy a logical expression

A common task in SAS/IML programming is finding elements of a SAS/IML matrix that satisfy a logical expression. For example, you might need to know which matrix elements are missing, are negative, or are divisible by 2.

In the DATA step, you can use the WHERE clause to subset data. You can also use IF-THEN/ELSE logic to examine each observation in a data set to determine whether it satisfies a criterion. For data that are stored in a SAS/IML matrix, some programmers write a loop that iterates over the rows and/or columns and use IF-THEN/ELSE logic on each scalar element in the matrix. However, there is an easier and more efficient way: Use the LOC function, which finds the LOCation of elements that satisfy a logical expression.

This article provides examples of using the LOC function and a logical expression to find elements in a matrix that satisfy a criterion. It also points out that the SAS/IML language creates temporary 0/1 matrices from logical expressions.

The following SAS/IML statements define a 3 x 4 matrix. The LOC function is used to find the location of the elements that are missing, negative, and even, respectively:

proc iml;
x = {. -5 2  5,
    -2  . 3  4,
     4  . . -1};
missingLoc = loc(x=.);             /* missing values */
negativeLoc = loc(x^=. & x<0);     /* nonmissing and negative  */
evenLoc = loc(mod(x,2)=0);         /* n is even if (n mod 2)=0 */
print missingLoc, negativeLoc, evenLoc;

The output shows the location of the elements for each criterion. The elements of the matrix are enumerated in row-major order: the first row contains elements 1–4, the second row contains elements 5–8, and the third row contains elements 9–1. The program output shows that missing values are stored at locations 1, 6, 10 and 11. Negative values are stored at locations 2, 5, and 12. Even integers occur at locations 3, 5, 8, and 9.

Even if you are familiar with the LOC function, you might not be familiar with the way SAS/IML passes a logical expression like x=. or x<0 into the LOC function. When the SAS/IML parser sees an expression that involves a comparison operator, it evaluates the logical expression, which creates a temporary matrix of 0s and 1s. That temporary 0/1 matrix is sent to the LOC function. You can observe this behavior by using the RESET PRINTALL statement to display all matrices that SAS/IML creates, including the temporary matrices. If you add the statement

reset printall;

as the second line of the previous program, then the following output is displayed when the first LOC function executes:


The output shows that PROC IML creates a temporary matrix named _TEM1001. The matrix is the result of evaluating the logical expression x=.. Notice that the 0/1 matrix contains a 1 in the locations for which the data are missing and a 0 in the other locations. Similarly, the other logical expressions in the program create temporary matrices with names like _TEM1002 and _TEM1003.

There are two takeaways from this example:

  • Use the LOC function to find the location of matrix elements that satisfy a logical condition. The LOC function is more efficient than an explicit loop over the matrix elements.
  • The argument that is passed to the LOC function is always a matrix. When you specify a logical condition, SAS/IML creates temporary variables from the logical expressions. You can use the RESET PRINTALL statement to see the temporary variables that are created.

For more information, here are two related posts:

Post a Comment

Calling a global statement inside a loop

The other day I was creating some histograms inside a loop in PROC IML. It was difficult for me to determine which histogram was associated with which value of the looping variable. "No problem," I said. "I'll just use a TITLE statement inside the loop so that each histogram has a different title."

Easier said than done.

You see, the TITLE statement requires an argument that is a literal string value. Because the TITLE statement is a "global statement" in SAS and not a statement in the SAS/IML language, the argument cannot be a SAS/IML variable. In other words, the following attempt to set the title inside a loop is not correct:

ods graphics / width=400px;
proc iml;
x = j(50,1);
do mu = 0 to 1;
   call randgen(x, "Normal", mu);   /* x ~ N(mu, 1) */
   msg = "mu = " + strip(char(mu)); /* concatenate strings */
   title msg;                       /* try to form title (DOESN'T WORK!) */ 
   call histogram(x);

You might try to use macro variables to change the title within the loop. For example, you might be tempted to set create a macro variable inside the loop and write the title statement as

   title "mu = &mu";               /* use macro variable (DOESN'T WORK!) */

However, that attempt also fails, for reasons that I have described in a previous article about macro variables and SAS/IML loops.

The solution that I came up with requires using the CALL EXECUTE subroutine in SAS/IML. The EXECUTE subroutine enables you to execute any string that represents a valid SAS or SAS/IML statement. In particular, you can create the string dynamically by using string concatenation operations, and then create the TITLE statement in a similar way. You can use the CALL EXECUTE subroutine to run the TITLE statement for each iteration of the loop. Here's one way to do it:

/* module that executes the TITLE statement on an arbitrary string */
start SetTitle(msg);
   cmd = 'TITLE "' + msg + '";';    /* create TITLE stmt by concatenating */
   call execute(cmd);
x = j(50,1);
do mu = 0 to 1;
   call randgen(x, "Normal", mu);   /* x ~ N(mu, 1) */
   msg = "mu = " + strip(char(mu)); /* concatenate strings */
   run SetTitle(msg);
   call histogram(x);

There are a few subtle tricks in this code that are worth explaining:

  • In the SetTitle module, single quotes are used to form the string in the cmd variable. That is because the string itself contains double quotes.
  • The string in the cmd variable contains a semicolon at the end.
  • The EXECUTE subroutine is called from a module. Prior to SAS/IML 13.2, it was not possible to call the EXECUTE subroutine in the main scope of the program. This restriction has been lifted in recent versions of SAS/IML.

Obviously the same technique will use with other statements that require literal strings, including the FOOTNOTE and LIBNAME statements. However, you can use this technique to call SAS/IML statements as well. For example, you can call the MATTRIB statement within a loop to assign matrix attributes to multiple matrices that are enumerated in a list.

Post a Comment

Twelve posts from 2014 that deserve a second look

I began 2015 by compiling a list of popular articles from my blog in 2014. Although this "People's Choice" list contains many interesting articles, some of my favorites did not make the list. Today I present the "Editor's Choice" list of articles that deserve a second look. I've highlighted one article from each month of 2014.

I've grouped the articles into three broad topics: simulating data, statistical graphics, and matrix computations.

Simulating data and resampling

Statistical graphics

Matrix computations


Many of the previous articles use the SAS/IML language. In 2014 SAS made the SAS/IML language freely available. The IML procedure is included as part of the SAS University Edition, which is free for students, professors, researchers, and adult learners.

Two additional articles from 2014 are worth highlighting because of their importance to SAS/IML programmers:

There you have it, 12 topics that I think are worth a second look. What was your favorite article from The DO Loop in 2014? Leave a comment.

Post a Comment

Compute the kth smallest data value in SAS

A SAS programmer recently posted a question to the SAS/IML Support Community about how to compute the kth smallest value in a vector of numbers. In the DATA step, you can use the SMALLEST function to find the smallest value in an array, but there is no equivalent built-in function in the SAS/IML language. However, it is easy to write a short module that performs the computation.

If the data include missing values, there is an ambiguity in the question. Do you want the kth smallest value after omitting the missing values? Or do you want the kth smallest value in the ordered list where missing values appear before nonmissing values? This article considers both cases.

The smallest value in an array

The DATA step has built-in functions for both cases. The SMALLEST function returns the kth smallest nonmissing value. The ORDINAL function sorts the missing and nonmissing values and returns the kth element in the ordered list. These functions act on elements of a DATA step array. Consequently, if you want to find the kth element in a column of a data set, you need to transpose the data by using PROC TRANSPOSE.

The following example creates a data set with 10 observations, two of which are missing. The data set is transposed and the data are read into an array. The small3 variable contains the third smallest nonmissing value, ignoring the missing values. The ord3 variable contains the third smallest ordinal value, where missing values are ordered before nonmissing values. The small7 and ord7 variables contain the seventh smallest values, excluding and including the missing values, respectively.

/* Base SAS: use SMALLEST or ORDINAL with array to compute k_th smallest value */
data A;
input x @@;
. -5 -2 0 2 5 2 2 4 .
proc transpose data=A out=B;
data C;
   set B;
   array x[*] _NUMERIC_;
   ord3   = ordinal(3, of x[*]);      /* include missing values */
   ord7   = ordinal(7, of x[*]);
   small3 = smallest(3, of x[*]);     /* ignore missing values */
   small7 = smallest(7, of x[*]);
proc print noobs data=C;
   var ord3 ord7 small3 small7;

The smallest value in a vector

The SAS/IML language does not have built-in functions that are equivalent to the SMALLEST and ORDINAL functions. However, you can easily sort the data and extract the kth smallest value. The following statements define a function named ORDINAL that takes a SAS/IML vector as an argument:

proc iml;
start ordinal(k, _x);
   x = colvec(_x);       /* make sure it is a column vector */
   call sort(x);
   if k<1 | k>nrow(x) then
      return (.);        /* return missing if index out of range */
   return( x[k] );       /* value of the k_th ordered element */
/* test the module by using the example data */
use A;  read all var {"x"};  close A;
ord1 = ordinal(1, x);
ord3 = ordinal(3, x);
ord7 = ordinal(7, x);
print ord1 ord3 ord7;

You can see that the ORDINAL module returns the kth value of the data, including missing values. In fact, the argument k can be a vector. The statement ord = ordinal({1 3 7}, x) returns a vector that contains the 1st, 3rd, and 7th ordered elements.

If you want to exclude missing values, you can use the LOC function to locate nonmissing values and use the subscript operator to extract them. You can then call the ORDINAL module on the vector of nonmissing values, as shown below:

start smallest(k, _x);
   idx = loc(_x^=.);               /* find nonmissing values */
   if IsEmpty(idx) then return(.); /* all values are missing */
   x = _x[idx];                    /* extract nonmissing */
   return ( ordinal(k, x) );       /* call the previous module */
small1 = smallest(1, x);
small3 = smallest(3, x);
small7 = smallest(7, x);
print small1 small3 small7;

I leave it as an exercise to the reader to modify the definition of the ORDINAL module so that the argument k can be a vector. In other words, modify the function so that the statement s = smallest({1 3 7}, x) returns a vector with three elements.

If x is a matrix and you want the smallest elements of each column, just call the modules in a DO loop.

Do you ever use the SMALLEST or ORDINAL functions in the DATA step? For what purpose?

Post a Comment

Popular posts from The DO Loop in 2014

I published 118 blog posts in 2014. This article presents my most popular posts from 2014 and late 2013.

2014 will always be a special year for me because it was the year that the SAS University Edition was launched. The University Edition means that SAS/IML is available to all students and adult learners all over the world. My article "10 tips for learning the SAS/IML language" has been popular as new programmers take advantage of this free opportunity to learn the SAS/IML language for matrix computations and data analysis.

General math and statistics articles

Although I mostly write about statistical programming, several of my general articles on math and statistics were very popular:

Statistical Graphics and Data Visualization

Who doesn't like to learn better ways to visualize data in SAS? The following posts generated some interesting discussions.

Visualizing Matrices

An important visualization is to visualize the data in a rectangular matrix by using a heat map:

Regression and data smoothers

Everyone loves a good regression tip!

Start your new year by (re-)reading one of these 14 popular posts from 2014. Happy New Year to all my readers!

Post a Comment

Self-similar structures from Kronecker products

I recently posted an article about self-similar structures that arise in Pascal's triangle. Did you know that the Kronecker product (or direct product) can be used to create matrices that have self-similar structure?

The basic idea is to start with a 0/1 matrix and compute a sequence of direct products of the matrix with itself. If M is a matrix that contains only zeros and ones, then M@M is a block matrix where a block of zeros occurs where M has a 0, and a copy of M appears where M has a 1, as shown by the following SAS/IML program:

proc iml;
M = {0 1 0,
     1 1 1,
     0 1 0};
M2 = M @ M;
print M2;

In this example, the original matrix, M, is 3 x 3. The matrix M@M is a 3 x 3 block matrix, where each block is either the zero matrix or a copy of M. If you apply the Kronecker product again and form the matrix M@M@M, you will get a 9 x 9 block matrix, where each block is either the zero matrix or a copy of M.

You can continue this process. The k-fold product of M with itself is a 3k x 3k matrix. Arthur Charpentier provides a nice animation of this iterative process on his Freakonometrics blog, which is where I first read about using the Kronecker product of a 0/1 matrix to create a self-similar structure. If you like to read articles about mathematical aspects of probability and statistics, I highly recommend Charpentier's well-written and always interesting blog.

Printing a 0/1 matrix is not an effective visualization. Instead, I will create a two-color heat map by using the HEATMAPDISC subroutine in SAS/IML 13.1. Because I want to create several heat maps, I wrote the following SAS/IML module that composes a matrix with itself k times and draws a two-color heat map of the result. The following heat map shows the structure of the four-fold Kronecker product M@M@M@M:

start KroneckerPlot(M, Iters);
   N = 1;
   do i = 1 to Iters;
      N = N @ M;
   call heatmapdisc(N) title="Iteration: "+strip(char(Iters)) 
        colorramp={White Blue} displayoutlines=0 ShowLegend=0;
ods graphics / width=800px height=800px;
run KroneckerPlot(M, 4);

Let's say it all together: "Ooooooh!" "Ahhhhh!"

The structures that you can create depend on the arrangement of zeros and ones in the original matrix. Notice that these product matrices get big fast: the k-fold Kronecker product of an n x n matrix is an nk x nk matrix. You might run out of memory if you try to visualize a k-fold product when k or n is greater than 5. The next statements start with a 4 x 4 matrix and display a heat map for the five-fold product, which is a 1024 x 1024 matrix:

M = {1 1 1 1,
     1 1 0 0,
     1 0 1 0,
     1 0 0 1};
run KroneckerPlot(M, 5); /* 5 iterations of 4x4 matrix ==> 1024 x 1024 matrix */

I call this image "Landing at LaGuardia."

Obviously you can create other structures. You can even generate random structures by using the Bernoulli distribution to create a random 0/1 matrix. I leave you with a few additional statements that create self-similar images. Feel free to create your own matrices. Leave a comment and tell me your favorite matrix for creating a self-similar structure.

ods graphics / width=500px height=500px;
M = {1 1 1 1,
     1 0 0 1,
     1 0 0 1,
     1 1 1 1};
run KroneckerPlot(M, 4);
/* Make random fractal plots! */
call randseed(54321);
do i = 1 to 5;
   M = randfun({4 4}, "Bernoulli", 0.75);
   run KroneckerPlot(M, 3);
Post a Comment

Elementwise minimum and maximum operators

Like most programming languages, the SAS/IML language has many functions. However, the SAS/IML language also has quite a few operators. Operators can act on a matrix or on rows or columns of a matrix. They are less intuitive, but can be quite powerful because they enable you perform computations without writing a DO loop. This article describes the elementwise minimum and maximum operators.

Recently I wrote about how to use subscript operators to compute the min or max of a row or column of a data matrix. These min/max operators are two examples of subscript operators that perform row and column operations on a data matrix. Other subscript operators enable you to compute the sum, mean, and sum of squares for rows or columns of a matrix.

But what if you need to compare the values across two different matrices? To paraphrase a popular ad campaign, "there's an op for that!"

The elementwise minimum operator

Here's an example that illustrate how the elementwise minimum operator works. Suppose that there is a teacher who always gives a bonus question on her quizzes and tests. Quizzes are worth 50 points, tests are worth 100 points, and the bonus question is worth 5 points. However, the teacher does not allow any student to score more than 50 points on a quiz or more than 100 points on a test.

The following SAS/IML statements define the raw scores for three students in the class:

proc iml;
Test = {"Quiz1" "Test1" "Quiz2" "Test2"};
Name = {"Rita", "Sam", "Tim"};
x = {55 105 50 105,
     45  95 55  90,
     15 100 55 105};
print x[r=Name c=Test L="Raw Scores"];

You can see that on several occasions a student has answered all questions correctly, including the bonus question. The teacher wants to cap the scores at some maximum value. The elementwise minimum operator (><) is an easy way to perform this operation. To ensure that 100 is the maximum possible score, the teacher could apply the >< operator, just as in the SAS DATA step:

trunc100 = x >< 100;             /* result is minimum of x[i,j] and 100 */
print trunc100[r=Name c=Test L="Max Score = 100"];

Now the maximum value of any cell is 100. However, columns 1 and 3 represent quiz scores, so they need to be truncated at the maximum value of 50. You might be tempted to loop over the columns and use the CHOOSE function to cap the maximum value of each column, as follows:

target = {50 100 50 100};    /* target[i] is max allowed value of column i */
A = j(nrow(x), ncol(x));     /* allocate new matrix for results */
do j = 1 to 4;
   A[,j] = choose(x[,j] > target[j], target[j], x[,j]);

However, the elementwise minimum operator enables you to compute the result without writing a loop. Because the target vector is a row vector with four columns, the following statement returns a matrix where element (i,j) is the minimum of A[i,j] and target[j]:

A = (x >< target);   /* target[j] is max allowed value of column j */
print A[r=Name c=Test L="Adjusted Scores"];

The elementwise maximum operator

In a similar way, the elementwise maximum operator (<>) enables the teacher to set the smallest possible value for a test score. The teacher might decide that extremely low scores are unduly influential when computing the average grade, so she might decide to make 25 the lowest possible score, as follows:

B = A <> 25;             /* result is maximum of A[i,j] and 25 */

The elementwise minimum and maximum operators work when the second matrix is a scalar, a row vector, a column vector, or a matrix.

  • The expression (A >< scalar) returns a matrix that is the same size of A, and no element is smaller than scalar.
  • The expression (A >< row_vector) returns a matrix that is the same size of A, and no element of the jth column is smaller than the jth element of row_vector.
  • The expression (A >< col_vector) returns a matrix that is the same size of A, and no element of the ith row is smaller than the ith element of col_vector.
  • The expression (A >< matrix) returns a matrix that is the same size of A, and the (i,j)th element is no smaller than the (i,j)th element of matrix.

Truncating vector values

The elementwise min/max operators can be used to truncate a vector of values. This often comes up when data has small negative values (possibly because of numerical round off) and you want to truncate all negative values to 0.

Another example is when you want to truncate the values of a function. For example, the sine function returns values in the range [–1, 1]. If you want to truncate the sine function at ±1/2, you can use the following shorthand notation:

ods graphics / width=400px height=200px;
t = do(0,12.56,0.03);
z = (-0.5 <> sin(t) >< 0.5); /* truncate within [-0.5, 0.5] */
title "Truncated Sine Function";
call series(t, z);
Post a Comment

A Christmas tree from Pascal's triangle

O Christmas tree,
O Christmas tree,
One year a fractal made thee!
O Christmas tree,
O Christmas tree,
A heat map can display thee!

From Pascal's matrix we define!
Reflect across, divide by nine.
O Christmas tree,
O Christmas tree,
Self-similar and so divine!

Eventually I will run out of cute ways to use SAS software to create and visualize a Christmas-themed image. But not this year!

My recent article about how to create Pascal's matrix in SAS included a lower triangular matrix that exhibits self-similar patterns. If you take the Pascal matrix modulo 9 and reflect it across the Y axis, you get a triangular array that looks a little bit like a Christmas tree. You can add a "trunk" to the bottom of the tree to improve the resemblance. As shown in my previous post, you can use a heat map to visualize the matrix. In the following SAS/IML program, I use a green palette of colors to visualize the Pascal triangle values modulo 9. The resulting heat map is shown at the top of this article.

proc iml;
start PascalRule(n);
   m = j(n,n,0);              /* initialize with zeros */
   m[,1] = 1;                 /* set first column to 1 */
   j = 2:n;                   /* elements to compute */
   do k = 2 to n;
      /* for kth row, add adjacent elements from previous row */
      m[k,j] = m[k-1,j-1] + m[k-1,j];
/* reflect Pascal's triangle to create left-right symmetry
   and add a tree trunk to create a Christmas tree image */
start PascalTree(n);
   m = PascalRule(n);
   T = m[,n:2] || m[,2:n];    /* reflect; omit column of 1s */
   T[ loc(T=0) ] = .;         /* replace zeros with missing values */
   trunk = j(3,ncol(T),.);    /* add a "tree trunk" with value -1 */
   midpt = ncol(T)/2;         /* note that ncol(T) is even */
   halfwidth = ceil(n/10);
   trunk[,(midpt-halfwidth):(midpt+halfwidth+1)]= -1;
   return( T // trunk );
m = PascalTree(25);
k = 9;
tree = mod(m,k);
ods graphics / width=300px height=380px;
title = "Happy Holidays to All SAS Users!";
ramp = palette("greens", k);
ramp = "CX26261C" || ramp[,k:1];  /* brown (for trunk) and green palette */
call heatmapdisc(tree) colorramp=ramp displayoutlines=0 title=title;

It is remarkable that the Pascal matrix has so many amazing mathematical properties. Now you can add "makes a reasonable facsimile of a Christmas tree" to the list!

Happy holidays and a wonderful New Year to all of my readers. You are the reason that I write this blog.

Post a Comment