Two types of syntax for the SELECT-WHEN statement in SAS

0

The SELECT-WHEN statement in the SAS DATA step is an alternative to using a long sequence of IF-THEN/ELSE statements. Although logically equivalent to IF-THEN/ELSE statements, the SELECT-WHEN statement can be easier to read. This article discusses the two distinct ways to specify the SELECT-WHEN statement. You can use the first syntax when you are making logical decisions based on the possible values of a discrete variable. The second syntax enables you to use more complex logical expressions. In brief:

  • Use the first syntax when the possible values of a single discrete variable are known in advance.
  • Use the second syntax when the logic involves multiple variables or complex logic.

The first syntax for the SELECT-WHEN statement

A previous article shows the first syntax for the SELECT-WHEN statement and includes several examples. However, I will provide only a simple example here. The first syntax requires that you specify an expression on the SELECT statement. On the WHEN statement, you specify one of more values that the expression can take. Following the WHEN statement, you specify a statement that you want to perform when the WHEN expression is true. If you want to perform more than one statement, you can use a DO-END block to perform multiple operations.

It sounds more difficult than it is, so let's see an example. The following SAS data set contains dates in 2020. Some of them are well-known religious holidays, others are US holidays, and others are "personal holidays" that I like to celebrate with my friends, such as Leap Day, Pi Day, Cinco de Mayo, and Halloween. The data set contains the date of the holiday and a name. You can use a SELECT-WHEN statement to classify each holiday as Religious, USA, or Personal, as follows:

data Dates;
length Holiday $11;
input Holiday 1-12 Date date9.;
format Date date9.;
datalines;
MLK         20JAN2020 
Leap        29FEB2020 
Pi          14MAR2020 
StPatrick   17MAR2020 
Easter      12APR2020 
CincoDeMayo 05MAY2020 
Memorial    25MAY2020 
Labor       07SEP2020 
Halloween   31OCT2020 
Christmas   25DEC2020 
;
 
data Holidays;
length Type $12;
set Dates;
select(upcase(Holiday));                             /* specify expression on the SELECT statement */
  when('MLK','MEMORIAL','LABOR')   Type="USA";       /* handle possible values of expression */
  when('EASTER','CHRISTMAS')       Type="Religious";
  otherwise                        Type="Fun";       /* handle any value not previously handled */
end;
run;
 
proc print data=Holidays noobs;
   var Date Holiday Type;
run;

Notice that the SELECT statement branches on the expression UPCASE(Holiday). The new data set has a Type variable that classifies each holiday according to the value of UPCASE(Holiday).

In the SELECT-WHEN syntax, the OTHERWISE statement is optional but is recommended. If you do not have an OTHERWISE statement, and all the WHEN statements are false, the DATA step will stop with an error. You can place use a null statement after the OTHERWISE keyword to specify that the program should do nothing for unhandled values, as follows:
OTHERWISE ; /* semicolon ==> null statement ==> do nothing */

By the way, if you have only the dates, you can still form the Type variable by using the HOLIDAYNAME function in SAS. This is left as an exercise.

As stated previously, the SELECT-WHEN statement provides the same functionality as an IF-THEN/ELSE statement, but it is a simpler way to handle multiple values. For example, the equivalent IF-THEN/ELSE logic is as follows:

if upcase(Holiday)='MLK' |
   upcase(Holiday)='MEMORIAL' |
   upcase(Holiday)='LABOR' then Type="USA";
else if ...

The second syntax for the SELECT-WHEN statement

The first SELECT-WHEN syntax enables you to perform an action for any value of an expression. Usually, you will branch on values of a discrete variable.

You can use the second SELECT-WHEN syntax when you need more complex logical processing. In the second syntax, you do not specify an expression on the SELECT statement. Instead, you specify a logical expression on each WHEN statement. The WHEN statements are executed in order until one is true. The corresponding action is then executed. If no WHEN statements are true, the OTHERWISE statement is processed.

The previous example uses the name of a holiday to assign values to the Type variable. Many holidays do not have fixed dates but rather dates that vary from year to year. For example, US Thanksgiving is the fourth Thursday of November, and MLK Day is the third Monday of January. Easter is even more complicated. Luckily, SAS provides the HOLIDAY function, which enables you to obtain the dates of common holidays for each year. For each date in the data set, you can use the HOLIDAY function to determine whether the date corresponds to a religious or USA holiday. The following SELECT-WHEN statement shows how to implement the second syntax:

data Want;
length Type $12;
set Dates;    
Year = year(Date);                            /* get the year for the date */
select;                                       /* no expression on SELECT statement */
  when(holiday('MLK',           Year)=Date |  /* logic to determine type of holiday */
       holiday('Memorial',      Year)=Date |
       holiday('USIndependence',Year)=Date |
       holiday('Labor',         Year)=Date |
       mdy(9, 11, Year)              =Date |  /* Patriot Day 9/11 */
       holiday('Veterans',      Year)=Date |
       holiday('Thanksgiving',  Year)=Date) 
          Type="USA";
  when(holiday('Easter',     Year)=Date |
       holiday('Christmas',  Year)=Date )
          Type="Religious";
  otherwise                  
          Type="Fun";
end;
run;
 
proc print data=Want noobs;
   var Date Holiday Type;
run;

This syntax for the SELECT-WHEN statement looks even more like an IF-THEN/ELSE statement.

Consider PROC FORMAT for recoding variables

On discussion forums, I see a lot of questions like this example in which programmers are creating a new variable from the values of an existing variable. In simple situations, you can use PROC FORMAT to perform this task. For more complicated situations, you can even define your own function to use as a format.

Summary

This article shows two ways to specify the SELECT-WHEN statement in SAS. The SELECT-WHEN statement is an alternative to using multiple IF-THEN/ELSE statements. For simple logical processing, you can specify an expression on the SELECT statement and specify values for the expression on the WHEN statements. For more complicated logic, you can put all the logic into the WHEN statements.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Leave A Reply

Back to Top