The difference between the Subsetting IF and the IF—THEN—ELSE—IF statement

11

Subsetting IF and the IF—THEN—ELSE—IF statementMy river walk last week turned into a spectacular fall show. But if it rains this week in San Antonio, like the weatherman predicts, what will I do?

In the coming days, I’ll be presenting at two user groups,  one in eastern Canada in Halifax, and the other all the way south west in San Antonio, Texas. I’m really praying for great weather. I’ll carry an umbrella if I have to, and wear a scarf if I have to. But I don’t’ really want to, at least not in San Antonio in Texas.

Regardless of what the weather turns out to be, at least I can use the weather forecast to provide the data I need to explain IF and IF-THEN-ELSE statements to my Programming 2 students. This was their question:

Tell me the difference between the Subsetting IF and the IF—THEN—ELSE—IF statement?

Yes, at first glance they do appear similar.

However the Subsetting IF is a filter. If you are familiar with the WHERE clause, its role is similar to the WHERE, it filters or subsets rows based on a condition. It’s a bit different from the WHERE because it has access to even new variables that you are constructing in the data step.

The IF—THEN—ELSE statement is familiar to most programmers, regardless of their background. It performs an action based on a condition, but its role is not to subset data.

To really see the difference here’s some code I wrote up:

1 - First, I construct my weather dataset by pulling in data from a Google search

data weather;
infile datalines dlm=',';
input city :	$12.	day $	date: date9.	high	low	weather :$9.;
datalines;
Toronto,Thu,3-Nov-16,12,3,rain
Halifax,Thu,3-Nov-16,11,7,cloudy
Halifax,Fri,4-Nov-16,10,4,rain
San Antonio,Sat,5-Nov-16,25,14,sun cloud
San Antonio,Sun,6-Nov-16,23,16,rain
San Antonio,Mon,7-Nov-16,24,18,rain
San Antonio,Tue,8-Nov-16,24,17,rain
Toronto,Tue,8-Nov-16,13,7,rain
;
run;
 
NOTE: The data set WORK.WEATHER has 8 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00

2 - Next, I'll do some conditional processing. Notice my log note, how all rows are written to the output dataset:

data what;
set weather;
if  city ='Halifax' and weather='rain'
then whattotake="Sigh! I knew it, I'll take an umbrella";
else if city ='San Antonio' and weather='rain'
then whattotake="No way! ok I'll take an umbrella";
flag='SAS user group conferences 3-8 november 2016';
run;
 
NOTE: There were 8 observations read from the data set WEATHER.
NOTE: The data set WORK.WHAT has 8 observations and 8 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

3 - Finally, I’ll use a Subsetting If.

If the condition is true, only then the next action takes place, (i.e. flag is set to 'SAS etc.’), same as in the conditional processing scenario.

But the key driving difference is this. Only if the subsetting IF condition is true, (i.e. only if city is either Halifax or San Antonio) will SAS hit the RUN statement.

Implicit in the RUN statement are 2 actions, IMPLICIT output, and IMPLICIT return.

data halifaxSanAnt;
set weather;
if city in ('Halifax', 'San Antonio');
flag='SAS user group conferences 3-8 november 2016';
run;
 
NOTE: There were 8 observations read from the data set WEATHER.
NOTE: The data set WORK.HALIFAXSANANT has 6 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

There you go, the big differentiator between the IF..THEN..ELSE and Subsetting IF statements.

What do you think? Will it rain or will it shine?

Hope you enjoyed this blog post. I can’t wait to share more SAS tips at the two user groups in Halifax and San Antonio. If you are in the neighbourhood, stop by. And wish me fair weather.

Share

About Author

Charu Shankar

Technical Training Specialist

Charu Shankar has been a Technical Training Specialist with SAS since 2007. She started as a programmer, and has taught computer languages, business and English Language skills. At SAS, Charu teaches the SAS language, SQL, SAS Enterprise guide and Business Intelligence. She interviews clients to recommend the right SAS training to help them meet their needs. She is helping build a center for special needs kids in this project. http://www.handicareintl.org/pankaja/pankaja.swf

Related Posts

11 Comments

  1. Pingback: Debugging the difference between WHERE and IF in SAS - The SAS Dummy

  2. I got your script, it is really well and good , thus how can i get good manageable notes to know about those things and where i can get those links to work out on the programs.

    • thanks Shalini, glad you found the script useful. The notes are printed in the log by default at the end of every step processed.. were you referring to the log notes by any chance? or was it something else that you are asking about, let me know. thanks

  3. Anders Sköllermo on

    Hi! I think that you should add:
    if city in ('Halifax', 'San Antonio');
    is interpreted by the SAS System as:
    if city NOT in ('Halifax', 'San Antonio') then DELETE;

    • thanks Anders, for pointing out that implicit delete in a NOT in statement. I teach this in class, but left it out here as sometimes folks can interpret that as SAS actually deletes rows and we know that's not what happens in a then DELETE. the record simply doesn't get written out.

  4. Kudos to Michelle for pointing out the important distinction between IF and WHERE, because although some users might think they are functionally equivalent, using the wrong one can either hinder performance or even lead to errors sometimes.

    As for the IF statement, there is an implied "THEN..." portion that is still being executed. When you filter by submitting:
    if city in ('Halifax', 'San Antonio');

    that's really the same as:
    if city in ('Halifax', 'San Antonio') then output halifaxSanAnt;

    Although adding an explicit OUTPUT isn't always necessary, it can be helpful if you want to split a single data set into two or more output data sets based on a condition while performing a single pass through the base data. Good luck on the weather!

  5. Wishing you fair weather Charu! Nice example on the differences on the if statements.

    I think it is also important for readers to understand the differences between an if and where statement and to know you can use both too. As we are processing larger volumes of data, it is helpful to use a where statement to process only the data you need. Even though an if and where statement may produce the same end results, depending on what you're doing, the amount of data read and processing time can be dramatically reduced. Think "Read what you need"! See this blog post for details http://blogs.sas.com/content/sastraining/2013/08/06/read-what-you-need/

    • Hi. "Read what you need" seems to make sense. However some experimenting with subsetting (choosing one half) a large (50 million observations, 11 numeric variables) data set in different ways shows that a subsetting IF compares favorably with three varieties of WHERE ...

      SET BIG; IF Z LE .5;
      real time 35.94 seconds
      cpu time 5.75 seconds

      SET BIG: WHERE Z LE .5;
      real time 37.46 seconds
      cpu time 6.02 seconds

      SET BIG (WHERE=(Z LE .5));
      real time 35.80 seconds
      cpu time 5.77 seconds

      PROC SQL;
      CREATE TABLE WHERE3 AS
      SELECT * FROM BIG WHERE Z <= .5;
      QUIT;
      real time 37.39 seconds
      cpu time 7.58 seconds

Back to Top