Complex assignment statements: CHOOSE wisely

6

This article describes the SAS/IML CHOOSE function: how it works, how it doesn't work, and how to use it to make your SAS/IML programs more compact. In particular, the CHOOSE function has a potential "gotcha!" that you need to understand if you want your program to perform as expected.

What the CHOOSE function does

The CHOOSE function is like a compact alternative to a compound assignment statement. In other words, it can eliminate an IF-THEN/ELSE statement. The CHOOSE function takes three arguments: a condition to evaluate as true or false, a value to use when the condition is true, and a value to use when the condition is false. Each argument can be vector-valued. For example, the following statements use the CHOOSE function to return the absolute value of a vector:

proc iml;
x = -2:2;
absX = choose( x<=0, -x, x );

Of course, using the ABS function is more efficient, but this illustrative example is easy to understand. The CHOOSE function receives three vectors as arguments: a vector of zeros and ones, a vector that contains the negative of x, and the vector x. The CHOOSE function examines each element of the first argument. If an element is nonzero, it returns the corresponding element of the second argument. If an element is zero, it returns the corresponding element of the third argument.

For this example, if x[i] is not positive, the value -x[i] is returned; otherwise x[i] is returned.

The second and third arguments can also be scalar values. For example, if you want to replace all nonpositive values of x with a missing value, you can use the following statement:

z = choose( x<=0, ., x );

The CHOOSE function is analogous to the ternary C/C++ operator "?:" or to the ifelse function in R.

What the CHOOSE function doesn't do

Notice that the CHOOSE function takes values, not expressions. Each argument of the function is evaluated before it is sent into the CHOOSE function.

This is important, because a common mistake is to expect the CHOOSE function to evaluate the second expression only for elements for which the first expression is true. For example, in a previous post, I described several ways to handle negative values in evaluating a logarithmic data transformation. You might assume that the following statements prevent the LOG function from evaluating negative values in Y:

Y = {-3,1,2,.,5,10,100}; /** notice negative datum **/
LogY = choose(Y<=0, ., log(Y)); /* WRONG */

However, that assumption is not correct. The third argument is evaluated before being sent to the CHOOSE function. It is easy to see the mistake if you use a temporary variable and rewrite the code:

Val2 = log(Y); /* WRONG */
LogY = choose(Y<=0, ., Val2);
However, you can use the CHOOSE function to solve this problem: simply operate on the Y values to replace nonpositive values by missing values:
Y2 = choose(Y<=0, ., Y);
LogY = Log(Y2);
In summary, the CHOOSE function is a compact way to implement a complex assignment statement, but it is important to remember that all arguments are evaluated prior to being passed into the function.
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

6 Comments

  1. Interesting post Rick, I wasn't aware of this function. Is it possible to say something about the efficiency of this function compared to an IF-THEN/ELSE statement?

    In IML, as in any other programming language, there is always more than one way of doing things and I'm often unsure which way is more efficient. For example, is x[<>] more efficient than MAX(x), or less? (I hope asking this not really article related question is okay. According to a quick check I did it doesn't make much of a difference whether you use x[<>] or MAX(x) ). There have been blog posts about efficiency but I would very much appreciate more in the direction, perhaps a summary article with general recommendations.

  2. Here's a very odd glitch with CHOOSE that I discovered recently. Try this:

    a = J(1,6,0);
    b = J(1,6,1);
    i = 2;
    a = choose(i=1,b,a//b);

    In words, if i = 1, I want 'a' to become 'b', otherwise, I want 'b' to be appended to 'a'. Doesn't work.
    Now, replace the CHOOSE statement in the code above with the following equivalent if/then/else statement:
    if i=1 then a = b; else a = a//b;

    It will work. Why ? I've heard tons of teachers and manuals tell me the CHOOSE statement is just a more compact form of the if/else statement. Doesn't seem to be.

    • Rick Wicklin

      As I say in this post, you shouldn't think of CHOOSE as equivalent to the IF-THEN/ELSE statement, although they are similar. The general formulation of the CHOOSE statement is to have three vectors as arguments. As I say in this post:
      "The CHOOSE function examines each element of the first argument. If an element is nonzero, it returns the corresponding element of the second argument. If an element is zero, it returns the corresponding element of the third argument."

      In the general case, therefore, the dimensions of the three arguments must be the same. That's why the CHOOSE function is complaining that the second and third arguments are different dimensions. However, in your example the first argument is a scalar. Although, in principal, CHOOSE could handle this case differently than the general case, that's not what it does, since special-casing leads to its own problems.

  3. Pingback: Square root transformations: How to handle negative data values? - The DO Loop

Leave A Reply

Back to Top