In SAS, the DATA step and PROC SQL support mnemonic logical operators. The Boolean operators AND, OR, and NOT are used for evaluating logical expressions. The comparison operators are EQ (equal), NE (not equal), GT (greater than), LT (less than), GE (greater than or equal), and LE (less than or equal). These character-based operators are called mnemonic because their names make it easy to remember what the operator does.
Each mnemonic operator in SAS has an equivalent symbolic operator. The Boolean operators are & (AND), | (OR), and ^ (NOT). The comparison operators are = (EQ), ^= (NE), > (GT), < (LT), >= (GE), and <= (LE). The symbol for the NOT and NE operators can vary according to the computer that you use, and the tilde character (~) can be used in place of the caret (^).
Mnemonic operators tend to appear in older languages like FORTRAN, whereas symbolic operators are common in more recent languages like C/C++, although some relatively recent scripting languages like Perl, PHP, and Windows PowerShell also support mnemonic operators. SAS software has supported both operators in the DATA step since the very earliest days, but the SAS/IML language, which is more mathematically oriented, supports only the symbolic operators.
Functionally, the operators in SAS are equivalent, so which ones you use is largely a matter of personal preference. Since consistency and standards are essential when writing computer programming, which operators should you choose?
The following sections present arguments for using each type of operator. The argument for using the mnemonic operators is summarized by Mnemonic Norman. The argument for using symbols is summarized by Symbolic Sybil. Finally, there is a rejoinder by Practical Priya. Thanks to participants on the SAS-L discussion forum and several colleagues at SAS for sharing their thoughts on this matter. Hopefully Norman, Sybil, and Priya represent your views fairly and faithfully.
Use the mnemonic operators
Hi, I'm Mnemonic Norman, and I've been programming in SAS for more than 30 years. I write a lot of DATA step, SQL, and macro code. I exclusively use the mnemonic operators for the following reasons:
- Easy to type. I can touch-type the main alphabet, but I've never mastered typing symbols without looking down at my fingers. In addition, exotic symbols like | (OR) are not usually located in an easy-to-reach location on my keyboard. By using the mnemonic operators, I can avoid hitting the SHIFT key and can write programs faster.
Easy to read. Even complex comparisons are easy to read because they form a sentence in English:
if x gt 0 AND sex eq "MALE" then ...
- Easy to remember. There is a reason why these are called mnemonic operators! I program in several different languages, and each one uses a different NE operator. In SAS it is ^=. In Lua the NE operator is ~=, in Java it is !=, and the ANSI standard for SQL is <>. I use NE so I don't have to remember the correct symbol.
- Easy to communicate. My boss and clients are not statisticians. They can understand the mnemonic operators better than abstract symbols.
- Easy to see. I don't want to emphasize my age, but statistics show that most people's eyesight begins to diminish after age 40. I find the symbols | and ^ particularly difficult to see.
Easy to distinguish assignment from comparison. I like to distinguish between assignment and logical comparison with equality, but SAS uses the = symbol for both. Therefore I use the equal sign for assignment and use EQ for logical comparison. For example, in the statement
b = x EQ y;
it is easy to see that b is a variable that holds a Boolean expression. The equivalent statement
b = x = y;
looks strange. (Furthermore, in the C language, this expression assigns the value of y to both b and x.)
- Easy to use macro variables. I reserve the ampersand for macro variables. If I see an expression like x&n, I immediately assume that the expression resolves to a name like x1 or x17. To avoid confusion with macro variables, I type x AND n when that is what I intend.
- Easy to cut and paste. Because the less-than and greater-than symbols are used to delimit tags in markup languages such as HTML and XML, they can disappear when used in Web pages. In fact, I dare you to try to post this comment to Rick's blog: "I use the expression 0 < x and y > 1." This is what you'll get: "I use the expression 0 1."
Use the symbolic operators
Hi, I'm Symbolic Sybil, and I've been programming in SAS for a few years. In school I studied math, statistics, and computer science. In addition to SAS, I program in C/C++, and R. I use symbolic operators exclusively, and here are reasons why:
- Consistent with mathematics. When a text book or journal presents an algorithm, the algorithm uses mathematical symbols. If you study Boolean logic, you use symbols. Symbols are a compact mechanism for representing complex logical conditions. Programs that implement mathematical ideas should use mathematical notation.
- Consistent with other modern languages. I don't use FORTRAN or SQL. I might write a DATA step to prepare data, then jump into PROC IML to write an analysis. Sometimes I call a package in R or a library in C++. I use symbols because all the languages that I use support them.
- Distinguish variables from operators. Symbols are not valid variable names, so it is easy see which tokens are operators and which are variables. Although Norman claims that symbols are hard to see, I argue that they stand out! If a data set has variables named EQ and LT, the expression EQ > LT is more readable than the equivalent expression EQ GT LT.
- Enforce coding discipline. Some of Norman's arguments are the result of lazy programming habits. The only reason he can't remember symbols is because he doesn't use them regularly. If you put spaces around your operators, you will never confuse x&n and x & n. As to remembering which operators are supported by which programming language, that is an occupational hazard. We are highly paid professionals, so learn to live with it. I don't think the solution is to use even more operators!
- Easy to communicate. I disagree with Norman's claim that his non-statistical boss and clients will understand character-based operators easier. How patronizing! Did they drop out of school in the third grade? Furthermore, in the modern world, we need to be inclusive and respectful of different cultures. The character-based operators are Anglocentric and might not be easy to remember if your client is not a native English speaker. In Spanish, "greater than" is "mayor que" and "equal" is "igual". In contrast, mathematical symbols are universal.
Use them both, but be consistent
Hi, I'm Pratical Priya. There is no a need to start a flame war or to make this an either/or debate. As a famous computer scientist wrote, "the nice thing about standards is that you have so many to choose from."
I used symbols exclusively until I consulted on a project where the client insisted that we use mnemonic operators. Eventually I gained an appreciation for mnemonic operators. I think they are easier to see and are more readable for experts and non-experts alike.
Today I use a combination of symbols and mnemonic operators. Like Norman, I find the logical operators AND, OR, and NOT easier to type and to read than the symbols &, |, and ^. For the relational (comparison) operators, I always use <, >, and =. I learned these symbols in school and they are universally understood.
I argue that a hybrid approach is best: In the DATA step I use mnemonic Boolean operators but use symbols for comparison operators.
This presents a clear visual separation between clauses that FORM Boolean expression and clauses that OPERATE ON logical expressions, like this:
if x = 5 AND missing(y) OR y < z then ...
However, I'm embarrassed to admit that I do not consistently use symbols for the comparison operators. I also use NE, which is inconsistent with my scheme but is more readable than ^=. If my keyboard had a "not equals" symbol (≠), I'd use it, but until then I'm sticking with NE.
Your turn: Which logical operators do you use and why?
Norman, Sybil, and Priya have made some good points. Who do you agree with? What rules do you follow so that your SAS programs use logical operators in a readable, consistent manner? Leave a comment, but as Norman said, be careful typing < and >. You might want to use the HTML tags < and >.