Every programming language has an IF-THEN statement that branches according to whether a Boolean expression is true or false. In SAS, the IF-THEN (or IF-THEN/ELSE) statement evaluates an expression and braches according to whether the expression is nonzero (true) or zero (false). The basic syntax is
if numeric-expression then
do-computation;
else
do-alternative-computation;
One of the interesting features of the SAS language is that it is designed to handle missing values. This brings up the question: What happens if SAS encounters a missing value in an IF-THEN expression? Does the IF-THEN expression treat the missing value as "true" and execute the THEN statement, or does it treat the missing value as "false" and execute the alternative ELSE statement (if it exists)?
The answer is fully documented, but let's run an example to demonstrate the SAS behavior:
data A; input x @@; if x then Expr="True "; else Expr="False"; datalines; 1 0 . ; proc print noobs; run; |
Ah-ha! SAS interprets a missing value as "false." More correctly, here is an excerpt from the SAS documentation:
SAS evaluates the expression in an IF-THEN statement to produce a result that is either non-zero, zero, or missing. A non-zero and nonmissing result causes the expression to be true; a result of zero or missing causes the expression to be false.
This treatment of missing values is handled consistently by other SAS languages and in other conditional statements. For example, the CHOOSE function in the SAS/IML language is a vector alternative to the IF-THEN/ELSE statement, but it handles missing values by using the same rules:
proc iml; x = {1, 0, .}; Expr = choose(x,"True","False"); print x Expr; |
The output is identical to the previous output from the DATA step and PROC PRINT.
If you do not want missing values to be treated as "false," then do not reference a variable directly, but instead use a Boolean expression in the IF-THEN statement. For example, in the following statement a missing value results in the THEN statement being executed, whereas all other numerical values continue to behave as expected:
if x^=0 then ...; |
Have you encountered places in SAS where missing values are handled in a surprising way? Post your favorite example in the comments.
13 Comments
The obvious "surprising way" is that SAS considers missing values to be less than non-missing numeric values (ie .<0=TRUE), which I'm sure screws up almost everyone's program at least once before they learn their lesson.
Also, the missing() function is nice because it can handle either character or numeric.
Yes. I like the CMISS function for the same reason.
A very simple way to address this is at the top of your code... include a tight arguement that addresses what to do when a missing value is encountered. This can be done also when you are using more than one variable, for for the example I'll keep it simple.
Example:
Data revised;
set mydata;
if age = . or age gt 120 then age_group=.;
else if 0<=age<=5 then age_group='0-5 yrs'
else if 6<=age<=18 then age_group='6-18 yrs'
else if age gt 18 then age_group='19+ yrs';
Sorry I forgot to add the run; at end but I think that would be obvious to most.
Thanks for the comment. Missing values and out-of-range values come up a lot. An alternative approach for your example is to use PROC FORMAT to define a user-defined format on AGE, rather than create a new variable.
Short memory tip: You have to get to one to be true. Sorta like dating, missing a date doesn't get you a date, being a big fat zero doesn't get you date.
The problem with code like
if age = . or age gt 120 then age_group=.;
is that it doesn't handle other missing values such as .A or ._, so I greatly prefer
if missing(age) or age gt 120 then age_group=.;
I also object to the commonly-used
if . < age < 0 then ...
or the slightly more correct
if .z < age < 0 then ...
preferring the explicit
if not missing(age) and age < 0 then ...
which doesn't require arcane knowledge of the internal representation of missing values in SAS (including the ordering of special missing values).
In short, if you want to test for missingness, do not test for equality to . but instead test for missingness.
Hi! Please note the effect of Minus values.
data A;
input x @@;
if x then Expr="True ";
else Expr="False";
datalines;
1 -1 0 .
;
proc print noobs; run;
gives the result
x Expr
1 True
-1 True
0 False
. False
Perhaps the line -1 True is a surprise to some persons.
Hi! A mistake I made recently. Question: How to find it easily ?
This is not a SAS error - it is a programming mistake, which is perhaps not easy to see if you are tired. The effect in this case is "radical".
data A;
input x @@;
if x then; Expr="True "; /* The left-most semicolon added by mistake. */
datalines;
1 0 .
;
proc print noobs; run;
result:
x Expr
1 True
0 True
. True
/ Br Anders
data A;
input x @@;
if not x then Expr="True ";
else Expr="False";
datalines;
1 0 .
;
proc print noobs;
run;
here answer is: False True True.
can't we say SAS interprets missing value as True?
please clarify.
No. A missing value is treated as FALSE, as shown and explained in the blog post.
This is not always true. Sometimes sas interpretes missing values as true. Example
data A; input x @@; if x then Expr="True "; else Expr="False"; datalines; 1 0 . proc print noobs; run;
data b;set a;y=x;if y<=0 then y=10;run;proc print data=b;run;
Obs x Expr y
1 1 True 1
2 0 False 10
3 . False 10
In this case, the second observation (x=0) is true (as it is <=0) and is given a value of y=10, but the third observation is also considered as true (because it is given a value of y=10) albeit the variable is missing.
No, you are confusing the sort order for missing values with the logical evaluation of a missing value. The sort order of missing values is that a missing value is less than any representable floating point number. Thus the expression (y <= 0) evaluates as TRUE when y is missing.