My river walk last week turned into a spectacular fall show. But if it rains this week in San Antonio, like the weatherman predicts, what will I do?
In the coming days, I’ll be presenting at two user groups, one in eastern Canada in Halifax, and the other all the way south west in San Antonio, Texas. I’m really praying for great weather. I’ll carry an umbrella if I have to, and wear a scarf if I have to. But I don’t’ really want to, at least not in San Antonio in Texas.
Regardless of what the weather turns out to be, at least I can use the weather forecast to provide the data I need to explain IF and IF-THEN-ELSE statements to my Programming 2 students. This was their question:
Tell me the difference between the Subsetting IF and the IF—THEN—ELSE—IF statement?
Yes, at first glance they do appear similar.
However the Subsetting IF is a filter. If you are familiar with the WHERE clause, its role is similar to the WHERE, it filters or subsets rows based on a condition. It’s a bit different from the WHERE because it has access to even new variables that you are constructing in the data step.
The IF—THEN—ELSE statement is familiar to most programmers, regardless of their background. It performs an action based on a condition, but its role is not to subset data.
To really see the difference here’s some code I wrote up:
1 - First, I construct my weather dataset by pulling in data from a Google search
data weather; infile datalines dlm=','; input city : $12. day $ date: date9. high low weather :$9.; datalines; Toronto,Thu,3-Nov-16,12,3,rain Halifax,Thu,3-Nov-16,11,7,cloudy Halifax,Fri,4-Nov-16,10,4,rain San Antonio,Sat,5-Nov-16,25,14,sun cloud San Antonio,Sun,6-Nov-16,23,16,rain San Antonio,Mon,7-Nov-16,24,18,rain San Antonio,Tue,8-Nov-16,24,17,rain Toronto,Tue,8-Nov-16,13,7,rain ; run; NOTE: The data set WORK.WEATHER has 8 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 |
2 - Next, I'll do some conditional processing. Notice my log note, how all rows are written to the output dataset:
data what; set weather; if city ='Halifax' and weather='rain' then whattotake="Sigh! I knew it, I'll take an umbrella"; else if city ='San Antonio' and weather='rain' then whattotake="No way! ok I'll take an umbrella"; flag='SAS user group conferences 3-8 november 2016'; run; NOTE: There were 8 observations read from the data set WEATHER. NOTE: The data set WORK.WHAT has 8 observations and 8 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds |
3 - Finally, I’ll use a Subsetting If.
If the condition is true, only then the next action takes place, (i.e. flag is set to 'SAS etc.’), same as in the conditional processing scenario.
But the key driving difference is this. Only if the subsetting IF condition is true, (i.e. only if city is either Halifax or San Antonio) will SAS hit the RUN statement.
Implicit in the RUN statement are 2 actions, IMPLICIT output, and IMPLICIT return.
data halifaxSanAnt; set weather; if city in ('Halifax', 'San Antonio'); flag='SAS user group conferences 3-8 november 2016'; run; NOTE: There were 8 observations read from the data set WEATHER. NOTE: The data set WORK.HALIFAXSANANT has 6 observations and 7 variables. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds |
There you go, the big differentiator between the IF..THEN..ELSE and Subsetting IF statements.
What do you think? Will it rain or will it shine?
Hope you enjoyed this blog post. I can’t wait to share more SAS tips at the two user groups in Halifax and San Antonio. If you are in the neighbourhood, stop by. And wish me fair weather.
11 Comments
Pingback: Debugging the difference between WHERE and IF in SAS - The SAS Dummy
I got your script, it is really well and good , thus how can i get good manageable notes to know about those things and where i can get those links to work out on the programs.
thanks Shalini, glad you found the script useful. The notes are printed in the log by default at the end of every step processed.. were you referring to the log notes by any chance? or was it something else that you are asking about, let me know. thanks
Hi! I think that you should add:
if city in ('Halifax', 'San Antonio');
is interpreted by the SAS System as:
if city NOT in ('Halifax', 'San Antonio') then DELETE;
thanks Anders, for pointing out that implicit delete in a NOT in statement. I teach this in class, but left it out here as sometimes folks can interpret that as SAS actually deletes rows and we know that's not what happens in a then DELETE. the record simply doesn't get written out.
Kudos to Michelle for pointing out the important distinction between IF and WHERE, because although some users might think they are functionally equivalent, using the wrong one can either hinder performance or even lead to errors sometimes.
As for the IF statement, there is an implied "THEN..." portion that is still being executed. When you filter by submitting:
if city in ('Halifax', 'San Antonio');
that's really the same as:
if city in ('Halifax', 'San Antonio') then output halifaxSanAnt;
Although adding an explicit OUTPUT isn't always necessary, it can be helpful if you want to split a single data set into two or more output data sets based on a condition while performing a single pass through the base data. Good luck on the weather!
Thanks Mike and good points about the implicit output.
thanks Mike for the reminder about the option of the explicit OUTPUT statement. big benefit of the data step in its ability to write to multiple output data sets in one read of the input data set..
Wishing you fair weather Charu! Nice example on the differences on the if statements.
I think it is also important for readers to understand the differences between an if and where statement and to know you can use both too. As we are processing larger volumes of data, it is helpful to use a where statement to process only the data you need. Even though an if and where statement may produce the same end results, depending on what you're doing, the amount of data read and processing time can be dramatically reduced. Think "Read what you need"! See this blog post for details http://blogs.sas.com/content/sastraining/2013/08/06/read-what-you-need/
Hi. "Read what you need" seems to make sense. However some experimenting with subsetting (choosing one half) a large (50 million observations, 11 numeric variables) data set in different ways shows that a subsetting IF compares favorably with three varieties of WHERE ...
SET BIG; IF Z LE .5;
real time 35.94 seconds
cpu time 5.75 seconds
SET BIG: WHERE Z LE .5;
real time 37.46 seconds
cpu time 6.02 seconds
SET BIG (WHERE=(Z LE .5));
real time 35.80 seconds
cpu time 5.77 seconds
PROC SQL;
CREATE TABLE WHERE3 AS
SELECT * FROM BIG WHERE Z <= .5;
QUIT;
real time 37.39 seconds
cpu time 7.58 seconds
thanks Michelle! weather turned out to be great overall. Yes, the WHERE clause is so powerful isn't it. thanks for your link. I had also compared the performance between the WHERE and the subsetting IF here...http://blogs.sas.com/content/sastraining/2011/02/23/top-5-sas-programming-best-practices/