Debugging the difference between WHERE and IF in SAS

16

In the DATA step, the WHERE statement and the IF statement (a.k.a. the "subsetting IF") have similar functions. In many scenarios, they produce identical results. But new SAS programmers are taught early on that these two statements work very differently, and in important ways. To understand the differences, it helps to step through the program line-by-line to see how SAS "thinks." Fortunately, the new DATA step debugger in SAS Enterprise Guide 7.13 makes this really easy to do.

Difference between WHERE statement and IF statement

Here are the basics: the WHERE statement rules are determined when the DATA step is compiled. As the DATA step runs, incoming data (from a SET or MERGE statement) is filtered to just those records that match the WHERE condition, so only those records are ever loaded into the program data vector (PDV). This results in fewer iterations through DATA step code, but provides no opportunity for "dynamic" decisions about which records to examine.

In contrast, the IF statement is evaluated at run time, and operates on the variables after they are already in the PDV. When the IF condition is met, the current observation is kept for eventual output. Unlike the WHERE statement, the IF statement can examine values of new variables that are defined within the step.

Consider these two DATA steps. They produce identical output of 10 records, but the first one processes only those 10 records whereas the second step processes all 19 records from the input.

data results1;
  set sashelp.class;
  /* WHERE applied at compile time  */
  /* Processes ONLY matching obs    */
  where sex='M';
run;
 
data results2;
  set sashelp.class;
  /* IF evaluated at run time  */
  /* Processes EVERY obs       */
  if sex='M';
run;

Using the DATA step debugger to understand the DATA step

The new DATA step debugger in SAS Enterprise Guide makes it very easy to illustrate how WHERE is processed differently from IF. I loaded each of the above programs into my session, then clicked the new "bug" toolbar icon to activate the debugger. Once activated, you can click the bug in the left "gutter" of the program editor to begin a debug session. (You can also press F5 to debug the active DATA step.)
Starting the Debugger
Watch this first animation of a debugger session and see what you notice about the WHERE statement.

Debugger with WHERE
Watching this little movie, I see a few things that reveal some insights.

  • The statement pointer never lands on Line 5 (the WHERE statement). That's because the WHERE statement isn't processed at run time.
  • Even though the CLASS data contains 19 records, the value of the _N_ automatic variable reaches only 11, indicating that only 10 records were processed.
  • The variable watch window uses red to indicate when a variable changes between iterations. The Sex variable never changes from 'M', and thus stays colored black through the entire session.

Let's compare that to the IF statement. Study this animation and see what stands out to you.

Debugger with IF
Here's what I see:

  • The statement pointer begins at Line 2, then 5, and moves to Line 6 (the RUN statement) only when the record has made it past the IF condition and into the output. For each observation where Sex='F', the DATA step stops processing the record and the RUN statement is skipped.
  • In this program, _N_ reaches 20 -- that's because all 19 records in SASHELP.CLASS are processed and the step exits at the end-of-file condition.

Learning more about subsetting IF, IF-THEN, WHERE, and debugging

There are several good articles about how the IF statement works, on its own and in combination with IF-THEN-ELSE constructs. Here's a recent article by SAS trainer Charu Shankar. And here's another reference that's included in a piece about the Top 10 SAS coding efficiencies.

The new DATA step debugger in SAS Enterprise Guide opens a new world of understanding for beginner and veteran SAS programmers. It has all of the functions of the "classic" debugger available in the Base SAS windowing environment, but with a much friendlier user interface, keyboard shortcuts, and useful watch windows. In my next post, I've covered the debugging functions in more detail.

Share

About Author

Chris Hemedinger

Senior Manager, SAS Online Communities

+Chris Hemedinger is the manager of SAS Online Communities. He's also co-author of the popular SAS for Dummies book, author of Custom Tasks for SAS Enterprise Guide using Microsoft .NET, and a frequent participant on the SAS Enterprise Guide discussion forum.

Related Posts

16 Comments

  1. Pingback: Tracking progress in your program with SAS Enterprise Guide: another trick - The SAS Dummy

  2. As the kids say, OMG! DATA step debugger appears in EG, I never expected it. Very exciting. Look forward to your post on this subject. I think the DATA step debugger has been among the most under-used tools in SAS (including by me) happy to see it reborn, strengthened, and given heightened visibility in EG.

  3. Another nice enhancement for SAS EG ! Thanks for sharing, Chris.

    The difference about if/where in terms of data step run time can also be observed by enabling DSOPTIONS=PROFILE (hidden) system options (TM : Peter Crawford, UK SAS Wizard).

    options dsoptions=profile;

    Source : https://www.sas.com/content/dam/SAS/en_gb/doc/other1/sas-forum-2016/presentations/peter-crawford-crawford-software-tuesday-27th-september.pdf

    NB : A SAS profile is needed to access the presentation.

  4. Nice to know the debugger is back.

    But I have some problems with the phrase "WHERE applied at compile time". The rules for the filter are set at compile time, but those rules are applied at run time.
    The difference between WHERE and IF is that WHERE filters before the observation enters the data step, and IF filters within the data step.

    The difference between things happening at compile time and at run time is important in understanding how macro code works in relation to the SAS code it produces. IMHO the way 'compile time' is used here, is a bit confusing.

    • Chris Hemedinger
      Chris Hemedinger on

      Frank, I think you caught an earlier draft of my post. Another colleague had the same observation when I initially published, so I changed the wording to be very close to your suggestion: "the WHERE statement rules are determined when the DATA step is compiled".

  5. I think it's also worth to mention the dataset option (where=(...)).
    It can be used on the "set" dataset as well as on the created dataset.

    It can be used in proc sql and a lot of other procs as well. (like all dataset options)

  6. Pingback: Using the DATA step debugger in SAS Enterprise Guide - The SAS Dummy

Leave A Reply

Back to Top