The SELECT statement in the SAS DATA step

9

Every beginning SAS programmer learns the simple IF-THEN/ELSE statement for conditional processing in the SAS DATA step. The basic If-THEN statement handles two cases: if a condition is true, the program does one thing, otherwise the program does something else.

Of course, you can handle more cases by using multiple ELSE IF statements. I have seen SAS programs that contain contains dozens of ELSE clauses. Sometimes a long sequence of IF-THEN/ELSE statements is necessary, such as when you are testing complex logical conditions.

Flow chart diagram for a switch statement (SELECT statement)

An alternative control statement in SAS is the SELECT-WHEN statement. The SELECT-WHEN statement (sometimes simply called the SELECT statement) enables you to conditionally execute statements based on the value of a single categorical variable. Usually the variable can have three or more valid values that you want to handle.

The following example uses the Sashelp.Heart data set, which contains data for 5,167 patients in a medical study. The Smoking_Status variable is a categorical variable that encodes the average number of cigarettes that each patient smokes per day. The following DATA step view implements a recoding scheme, which is sometimes the easiest way to force levels of a nominal variable to appear in a natural order during a SAS analysis.

/* example of using the SELECT statement */
data Heart / view=Heart;
set sashelp.heart;
select (Smoking_Status);
   when ('Non-smoker')        Smoking_Cat=1;
   when ('Light (1-5)')       Smoking_Cat=2;
   when ('Moderate (6-15)')   Smoking_Cat=3;
   when ('Heavy (16-25)')     Smoking_Cat=4;
   when ('Very Heavy (> 25)') Smoking_Cat=5;
   otherwise                  Smoking_Cat=.;
end;
run;

The SELECT-WHEN statement is easy to read. You specify the name of a variable on the SELECT statement. You then list a sequence of WHEN statements. Each WHEN statement specifies a particular value for the variable. If the variable has that value, the program conditionally executes a statement, which in this example assigns a value to the Smoking_Cat variable.

Notice that you can use the OTHERWISE keyword to handle missing values, invalid data, or default actions.

You can also combine categories in a WHEN statement. For example, in a statistical analysis you might want to combine the 'Heavy' and 'Very Heavy' categories into a single group. In the WHEN statement you can specify multiple values in a comma-separated list:

   /* combine the 'Heavy' and 'Very Heavy' categories */
   when ('Heavy (16-25)', 'Very Heavy (> 25)') Smoking_Cat=4;

If the WHEN condition is true, the program will execute one statement. This is the same rule that the IF-THEN statement follows. To execute more than one statement, use a DO-END block, which groups statements together:

   when ('Non-smoker') do;       /* execute multiple statements */
      Smoking_Cat=1;
      IsSmoker = 0;
   end;

I use the SELECT-WHEN statement as a "table lookup" when a program needs to branch according to the value of a single categorical variable that has three or more valid values. The basic SELECT-WHEN statement is not as flexible as the IF-THEN/ELSE statement, but, when applicable, it results in very clean and easy-to-read programs.

Other languages have similar branching statements. The SQL language supports a CASE-WHEN statement. The C/C++ and Java/Javascript languages support a switch-case statement. Whereas the CASE-WHEN statement in SAS executes one statement, the switch-case statement implements fallthrough, so C-programmers often use the break statement to exit the switch block.

Some languages do not support a special switch statement, but instead require that you use IF-THEN/ELSE statements. Python and the SAS/IML language fall into this category.

There is an alternative syntax for the SELECT-WHEN statement that does not specify an expression in the SELECT statement. Instead, you specify logical conditions in the WHEN statements. This alternate syntax is essentially equivalent to an IF-THEN/ELSE statement, so which syntax you use is a matter of personal preference. Personally, I use SELECT-WHEN for branching on a known set of discrete values, and I use the IF-THEN/ELSE statement to handle more complex situations.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

9 Comments

  1. Michelle Homes

    Useful tip to prepare data for analysis!

    I find the SELECT statement useful when writing data dependent code using SAS macro and look ups. Instead of hard-coding the possible values, you can write SAS macro code to create them and if the category values change or new categories are added, the SAS code is automatically updated with the new values.

  2. In addition, you can lay out values in two columns of Excel, if you are starting with data in a tabular form. Then use concatenation commands to write each clause of the statement, substituting the values. It's what I call a 'poor man's macro'.

  3. Note there is a variation of the SELECT statement where you are not restricted to one single variable or expression that is tested against a series of values, but where you can use as many expressions as there are WHEN clauses. In the example below there are 3 different clauses: one for the first day of a month, one for the months Jan, Apr, Jul and Oct, an one for februrary 29. If none of these conditions is true, the OTHERWISE clause will be executed.

    SELECT; /* no variable or expression here !!! */
    WHEN (Day (Date) eq 1)                        Do; ... ; End;
    WHEN (Month (Date) in (1, 4, 7, 10))          Do; ... ; End;
    WHEN (Month (Date) eq 2 and Day (Date) eq 29) Do; ... ; End;
    OTHERWISE                                     Do; ... ; End;
    END;

    Note however that the second clause (Jan, Apr, Jul, Oct) will not be executed on the first day of these months, as the first clause already processes the observation.

  4. I want to select only patients with more than 3 times of given diagnosis . I have data set with repeated measurements for each observation but in the column of diagnosis, I have binary outcome with (0 and 1) where 1 indicates positive diagnosis. I want to select individuals who have positive diagnosis (1) for at least 3 times.

    I will be glad to be assisted.

  5. Stanley Fogleman on

    I wrote a Global Forum paper back in 2010 entitled "New Dogs and Old Tricks Part II: Using the SELECT statement and FLAGS to Streamline your Code" (paper 087-2010) which ALSO mentions using booleans in combination with the SELECT statement. I should also mention that the "otherwise" clause has saved my bacon on a number of occasions when "dirty data" (not to be confused with the Jennifer Grey - Patrick Swayze movie) has come my way.
    Here is a link to the paper:
    http://support.sas.com/resources/papers/proceedings10/087-2010.pdf

  6. Pingback: Using FILENAME ZIP and FINFO to list the details in your ZIP files - The SAS Dummy

Leave A Reply

Back to Top