How does the IF-THEN statement in SAS treat a missing value?

13

Every programming language has an IF-THEN statement that branches according to whether a Boolean expression is true or false. In SAS, the IF-THEN (or IF-THEN/ELSE) statement evaluates an expression and braches according to whether the expression is nonzero (true) or zero (false). The basic syntax is

if numeric-expression then
   do-computation;
else
   do-alternative-computation;

One of the interesting features of the SAS language is that it is designed to handle missing values. This brings up the question: What happens if SAS encounters a missing value in an IF-THEN expression? Does the IF-THEN expression treat the missing value as "true" and execute the THEN statement, or does it treat the missing value as "false" and execute the alternative ELSE statement (if it exists)?

The answer is fully documented, but let's run an example to demonstrate the SAS behavior:

data A;
input x @@;
if x then Expr="True "; 
     else Expr="False";
datalines;
1 0 .
;
 
proc print noobs; run;

Ah-ha! SAS interprets a missing value as "false." More correctly, here is an excerpt from the SAS documentation:

SAS evaluates the expression in an IF-THEN statement to produce a result that is either non-zero, zero, or missing. A non-zero and nonmissing result causes the expression to be true; a result of zero or missing causes the expression to be false.

This treatment of missing values is handled consistently by other SAS languages and in other conditional statements. For example, the CHOOSE function in the SAS/IML language is a vector alternative to the IF-THEN/ELSE statement, but it handles missing values by using the same rules:

proc iml;
x  = {1, 0, .};
Expr = choose(x,"True","False");
print x Expr;

The output is identical to the previous output from the DATA step and PROC PRINT.

If you do not want missing values to be treated as "false," then do not reference a variable directly, but instead use a Boolean expression in the IF-THEN statement. For example, in the following statement a missing value results in the THEN statement being executed, whereas all other numerical values continue to behave as expected:

if x^=0 then ...;

Have you encountered places in SAS where missing values are handled in a surprising way? Post your favorite example in the comments.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

13 Comments

  1. The obvious "surprising way" is that SAS considers missing values to be less than non-missing numeric values (ie .<0=TRUE), which I'm sure screws up almost everyone's program at least once before they learn their lesson.

    Also, the missing() function is nice because it can handle either character or numeric.

  2. A very simple way to address this is at the top of your code... include a tight arguement that addresses what to do when a missing value is encountered. This can be done also when you are using more than one variable, for for the example I'll keep it simple.

    Example:

    Data revised;
    set mydata;
    if age = . or age gt 120 then age_group=.;
    else if 0<=age<=5 then age_group='0-5 yrs'
    else if 6<=age<=18 then age_group='6-18 yrs'
    else if age gt 18 then age_group='19+ yrs';

      • Rick Wicklin

        Thanks for the comment. Missing values and out-of-range values come up a lot. An alternative approach for your example is to use PROC FORMAT to define a user-defined format on AGE, rather than create a new variable.

  3. Short memory tip: You have to get to one to be true. Sorta like dating, missing a date doesn't get you a date, being a big fat zero doesn't get you date.

  4. The problem with code like
    if age = . or age gt 120 then age_group=.;
    is that it doesn't handle other missing values such as .A or ._, so I greatly prefer
    if missing(age) or age gt 120 then age_group=.;

    I also object to the commonly-used
    if . < age < 0 then ...
    or the slightly more correct
    if .z < age < 0 then ...
    preferring the explicit
    if not missing(age) and age < 0 then ...
    which doesn't require arcane knowledge of the internal representation of missing values in SAS (including the ordering of special missing values).

    In short, if you want to test for missingness, do not test for equality to . but instead test for missingness.

  5. Anders Sköllermo on

    Hi! Please note the effect of Minus values.
    data A;
    input x @@;
    if x then Expr="True ";
    else Expr="False";
    datalines;
    1 -1 0 .
    ;

    proc print noobs; run;

    gives the result

    x Expr
    1 True
    -1 True
    0 False
    . False

    Perhaps the line -1 True is a surprise to some persons.

  6. Anders Sköllermo on

    Hi! A mistake I made recently. Question: How to find it easily ?
    This is not a SAS error - it is a programming mistake, which is perhaps not easy to see if you are tired. The effect in this case is "radical".

    data A;
    input x @@;
    if x then; Expr="True "; /* The left-most semicolon added by mistake. */
    datalines;
    1 0 .
    ;

    proc print noobs; run;

    result:
    x Expr
    1 True
    0 True
    . True

    / Br Anders

  7. data A;
    input x @@;
    if not x then Expr="True ";
    else Expr="False";
    datalines;
    1 0 .
    ;
    proc print noobs;
    run;

    here answer is: False True True.

    can't we say SAS interprets missing value as True?

    please clarify.

  8. This is not always true. Sometimes sas interpretes missing values as true. Example

    data A; input x @@; if x then Expr="True "; else Expr="False"; datalines; 1 0 . proc print noobs; run;

    data b;set a;y=x;if y<=0 then y=10;run;proc print data=b;run;

    Obs x Expr y

    1 1 True 1
    2 0 False 10
    3 . False 10
    In this case, the second observation (x=0) is true (as it is <=0) and is given a value of y=10, but the third observation is also considered as true (because it is given a value of y=10) albeit the variable is missing.

Leave A Reply

Back to Top