A DATA step compiler trick to get the record count

9

Years ago I saw a line of SAS code that was really puzzling. It was a statement that started with:

if 0 then … ;

What? This was a statement that would always be evaluated as false. Why would anyone write such a statement? Recently, I was discussing with a good friend of mine a macro I wrote where I used an IF statement that was always false.

DATA Step compiler shortcut
In some programming languages, the "IF 0" trick ensures that the code that follows would never be executed.  But with the DATA step, SAS actually does still evaluate the "zeroed-out" code during the compile phase.

Let's see why this can actually be useful.

The problem I was trying to solve was to assign the number of observations in a data set to a macro variable. First, the code, then the explanation.

if 0 then set Oscar Nobs=Number_of_Obs; call symputx('Number',Number_of_Obs); stop; run;

The SET option Nobs= places the number of observations in data set Oscar in a variable I called Number_of_Obs. This happens at compile time, before any observations are read from data set Oscar. The CALL SYMPUTX routine takes two arguments. The first is the name of a macro variable (Number) and the second is the name of a SAS variable (Number_of_Obs). The call routine assigns the value of the SAS variable to the Macro variable.

Titles by Ron CodyMost data sets stop executing when you reach the end-of-file on any data set. Because you are not reading any values from data set Oscar, you need a STOP statement to prevent the Data Step from looping. When this Data Step executes, the number of observations in data set Oscar is assigned to the macro variable called Number.

Now you, too, can write programs using this useful trick. The macro where I used this trick can be found in my book, Cody's Data Cleaning techniques, 3rd edition. You can see a copy of this macro (and all the other macros described in the book) by going to my author web site: support.sas.com/cody, scrolling down to the Data Cleaning book and clicking on the link to see example and data. One of the macros that use this trick is called HighLow.sas. As an alternative, you can buy the book!

By the way, you can download programs and data from any of my books by using this same method and it is free!

Books by Ron Cody See them all!
Share

About Author

Ron Cody

Private Consultant

Ron Cody, EdD is a retired professor from the Robert Wood Johnson Medical School. He now works as a private consultant and a national instructor for SAS Institute Inc. A SAS user since 1977, Ron's extensive knowledge and innovative style have made him a popular presenter at local, regional, and national SAS conferences. He has authored or co-authored numerous books, as well as countless articles in medical and scientific journals.

9 Comments

  1. Leonid Batkhan

    Hi Ron,
    This is one of my favorite tricks in SAS. However, not only it can be used for grabbing a number of observations in a data set, it can also be used for injecting a whole data structure into a SAS data step without disturbing the data step flow. Try running the following code:

    data ABC;
      make='Toyota';
    run;
     
    data MYDATA;
       if 0 then set SASHELP.CARS;
       set ABC;
    run;
    

    I call it applying a SAS data template. For more details, see my blog post Simplify data preparation using SAS data templates.

  2. Thanks! Very interesting trick that I have never seen. It's amazing how you can have been a SAS programmer for so many years and still learn something new!

    • Data abc;
      Attrib make length=$1. Format=$1.;
      Set sashelp.class.
      Stop!
      Run

      We use this technique to make sure our data match the db2. Create an empty template by retrieving definition from db2, append sas data, insert to database 🙂

  3. Subhash mantha on

    The tip on this email is part of my interview question on what are the different ways to get number of observations in a data set. If somebody uses this method I would consider them an advanced user as they have a deeper understanding of SAS datastep.

  4. The 9.1 macro reference book contains a macro to find the number of obs and the number of variables. In fact, the attrn function can return a number of other attributes of a dataset. I found that it was not necessary to create a macro variable and that the macro call would act as a macro variable.

    The nice thing about having this as a macro is that it can be used anywhere: data step, title statement, SQL query, etc. without problems with step boundary conditions.

    %macro getnobs(dsname=);
    %*---------------------------------------------------------------------------;
    %* ;
    %* Name: getnobs.sas ;
    %* Author: Robert Drozd (RED) ;
    %* Purpose: Return the number of obs in a dataset as either a number when ;
    %* action=code otherwise as a NOTE:. ;
    %* This method is the preferred method to find the number of obs ;
    %* in a dataset since it can be used anywhere and does not run ;
    %* or compile a proc or data step. ;
    %*---------------------------------------------------------------------------;
    %let dsnum=%sysfunc(open(&dsname));
    %sysfunc(attrn(&dsnum,nlobs))
    %let dsnum=%sysfunc(close(&dsnum));
    %mend getnobs;

    I placed a macro like this in our autocall macro library and it has served us well. The call to the macro say, %getnobs(dsname=sashelp.shoes) simply resolves to a string of 395. I try to use global macro vars only when necessary so I especially like this one.

    Doesn't look like this technique spread very far. I wonder why.

  5. I usually use proc sql for that. (Proc sql; select count(*) into: xx from mydata; quit;
    But also proc contents works. If you had access on the dataset right before you want to know the nr of records then %sysnobs also provides this info.

    • Robert Drozd on

      If the dataset is very large, this could take some time since it is actually reading each observation (the data portion) of the dataset. The dictionary.tables SQL facility provides the number of observations by reading the descriptor portion of the dataset. There is a myth that it uses PROC FREQ to get the counts.

      The only problem with the dictionary.tables or sashelp.vtable is that PROC SQL has to "discover" the datasets in each defined libname. However, their usefulness generally is worth it.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top