The DATA step and the implied AND operator

5

The SAS DATA step supports a special syntax for determining whether a value is contained in an interval:

y = (-2 < x < 2);
This expression creates an indicator variable with the value 1 if x is in the interval (-2,2) and 0 otherwise.

The documentation for the AND operator states that "two comparisons with a common variable linked by [the AND operator]can be condensed" into a single statement with an implied AND operator. For example, the following two statements are equivalent:

y = (-2 < x < 2);
y = (-2<x & x<2);
The second syntax is more familiar to programmers in languages such as C/C++ that do not support an "implied AND" comparison operator.

The reason that I mention this syntax is that it is NOT available in the SAS/IML language (nor in R, nor in MATLAB). Sometimes experienced DATA step programmers expect the implied AND operator to work in their SAS/IML programs. The syntax does parse and execute, but it gives a different value than in the DATA step! For example, consider the following SAS/IML program:

proc iml;
x = -3:3;   /* the vector {-3 -2 -1 0 1 2 3} */
y = (-2 < x < 2);
There are no errors when you run this program. The expression for y is parsed as
y = ((-2 < x) < 2);
Regardless of the values in x, the variable y is a vector of ones. Why? The previous statement is equivalent to the following two statements:
v = (-2 < x);  /* {0 0 1 1 1 1 1} */
y = (v < 2);   /* {1 1 1 1 1 1 1} */
In the first statement, v is an indicator variable: v[i]=1 when the expression is true, and 0 otherwise. In the second statement, the zeros and ones in v are compared with the value 2. Not surprisingly, all of the zeros and ones are less than 2, so y is a vector of all ones.

Conclusion: Don't use the DATA step syntax in your SAS/IML programs. Instead, use an explicit AND operator such as (-2<x & x<2).

I am told that Python and Perl 6 (but not Perl 5) also support this implied AND operator. The SQL procedure in SAS software supports it, although other implementations of SQL do not. Do you know of other languages that also support a compact syntax for testing whether a value is within an interval?

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

5 Comments

  1. The macro language does NOT support the implied AND either, which catches many new macro programmers by surprise. The results are same as seen in IML:

    %put %eval(1<5<2) ;
    1

  2. Peter Lancashire on

    The BETWEEN operator has been part of standard SQL for a long time. For example, SELECT * from mytable WHERE mycolumn BETWEEN 20 AND 30.

  3. I would strongly advise against using this "condensed" syntax for the very reason that it is valid syntax even when it is not supported - this sort of error will generally be very difficult to catch, and makes porting the project to another language more complicated.

    (compare with the ubiquity of the C = vs == operator error, and try fixing that when the compiler doesn't treat it as a warning)

    • That's an interesting point Troll. I'm inclined to agree with you if the implied AND operator doesn't increase efficiency. Does anybody know whether the implied AND is faster?

  4. Pingback: Compute a running total - The DO Loop

Leave A Reply

Back to Top