This article describes the SAS/IML CHOOSE function: how it works, how it doesn't work, and how to use it to make your SAS/IML programs more compact. In particular, the CHOOSE function has a potential "gotcha!" that you need to understand if you want your program to perform as expected.
What the CHOOSE function does
The CHOOSE function is like a compact alternative to a compound assignment statement. In other words, it can eliminate an IF-THEN/ELSE statement. The CHOOSE function takes three arguments: a condition to evaluate as true or false, a value to use when the condition is true, and a value to use when the condition is false. Each argument can be vector-valued. For example, the following statements use the CHOOSE function to return the absolute value of a vector:
proc iml; x = -2:2; absX = choose( x<=0, -x, x ); |
Of course, using the ABS function is more efficient, but this illustrative example is easy to understand. The CHOOSE function receives three vectors as arguments: a vector of zeros and ones, a vector that contains the negative of x, and the vector x. The CHOOSE function examines each element of the first argument. If an element is nonzero, it returns the corresponding element of the second argument. If an element is zero, it returns the corresponding element of the third argument.
For this example, if x[i] is not positive, the value -x[i] is returned; otherwise x[i] is returned.
The second and third arguments can also be scalar values. For example, if you want to replace all nonpositive values of x with a missing value, you can use the following statement:
z = choose( x<=0, ., x ); |
The CHOOSE function is analogous to the ternary C/C++ operator "?:" or to the ifelse function in R.
What the CHOOSE function doesn't do
Notice that the CHOOSE function takes values, not expressions. Each argument of the function is evaluated before it is sent into the CHOOSE function.
This is important, because a common mistake is to expect the CHOOSE function to evaluate the second expression only for elements for which the first expression is true. For example, in a previous post, I described several ways to handle negative values in evaluating a logarithmic data transformation. You might assume that the following statements prevent the LOG function from evaluating negative values in Y:
Y = {-3,1,2,.,5,10,100}; /** notice negative datum **/ LogY = choose(Y<=0, ., log(Y)); /* WRONG */ |
However, that assumption is not correct. The third argument is evaluated before being sent to the CHOOSE function. It is easy to see the mistake if you use a temporary variable and rewrite the code:
Val2 = log(Y); /* WRONG */ LogY = choose(Y<=0, ., Val2); |
Y2 = choose(Y<=0, ., Y); LogY = Log(Y2); |
6 Comments
Interesting post Rick, I wasn't aware of this function. Is it possible to say something about the efficiency of this function compared to an IF-THEN/ELSE statement?
In IML, as in any other programming language, there is always more than one way of doing things and I'm often unsure which way is more efficient. For example, is x[<>] more efficient than MAX(x), or less? (I hope asking this not really article related question is okay. According to a quick check I did it doesn't make much of a difference whether you use x[<>] or MAX(x) ). There have been blog posts about efficiency but I would very much appreciate more in the direction, perhaps a summary article with general recommendations.
Thanks for the suggestion. I certainly intend to continue blogging about efficiency. To quickly answer your specific questions:
1) I haven't run tests either, but a DO loop with an IF-THEN/ELSE statement inside the loop is less efficient. The most efficient is either CHOOSE or to use the LOC function such as idx=LOC(Y<=0); Y[idx]=.;
2) In your MAX example, both are equally efficient. However, the subscript reduction operator will be more efficient when finding the max of rows or columns.
See http://blogs.sas.com/iml/index.php?/archives/148-Use-Subscript-Reduction-Operators!.html
and http://blogs.sas.com/content/iml/2010/10/22/looping-versus-loc-ing-revisited/
Here's a very odd glitch with CHOOSE that I discovered recently. Try this:
a = J(1,6,0);
b = J(1,6,1);
i = 2;
a = choose(i=1,b,a//b);
In words, if i = 1, I want 'a' to become 'b', otherwise, I want 'b' to be appended to 'a'. Doesn't work.
Now, replace the CHOOSE statement in the code above with the following equivalent if/then/else statement:
if i=1 then a = b; else a = a//b;
It will work. Why ? I've heard tons of teachers and manuals tell me the CHOOSE statement is just a more compact form of the if/else statement. Doesn't seem to be.
As I say in this post, you shouldn't think of CHOOSE as equivalent to the IF-THEN/ELSE statement, although they are similar. The general formulation of the CHOOSE statement is to have three vectors as arguments. As I say in this post:
"The CHOOSE function examines each element of the first argument. If an element is nonzero, it returns the corresponding element of the second argument. If an element is zero, it returns the corresponding element of the third argument."
In the general case, therefore, the dimensions of the three arguments must be the same. That's why the CHOOSE function is complaining that the second and third arguments are different dimensions. However, in your example the first argument is a scalar. Although, in principal, CHOOSE could handle this case differently than the general case, that's not what it does, since special-casing leads to its own problems.
Thanks Rick!!
Pingback: Square root transformations: How to handle negative data values? - The DO Loop