Years ago I saw a line of SAS code that was really puzzling. It was a statement that started with:
if 0 then … ;
What? This was a statement that would always be evaluated as false. Why would anyone write such a statement? Recently, I was discussing with a good friend of mine a macro I wrote where I used an IF statement that was always false.
In some programming languages, the "IF 0" trick ensures that the code that follows would never be executed. But with the DATA step, SAS actually does still evaluate the "zeroed-out" code during the compile phase.
Let's see why this can actually be useful.
The problem I was trying to solve was to assign the number of observations in a data set to a macro variable. First, the code, then the explanation.
if 0 then set Oscar Nobs=Number_of_Obs;
call symputx('Number',Number_of_Obs);
stop;
run;
The SET option Nobs= places the number of observations in data set Oscar in a variable I called Number_of_Obs. This happens at compile time, before any observations are read from data set Oscar. The CALL SYMPUTX routine takes two arguments. The first is the name of a macro variable (Number) and the second is the name of a SAS variable (Number_of_Obs). The call routine assigns the value of the SAS variable to the Macro variable.
Most data sets stop executing when you reach the end-of-file on any data set. Because you are not reading any values from data set Oscar, you need a STOP statement to prevent the Data Step from looping. When this Data Step executes, the number of observations in data set Oscar is assigned to the macro variable called Number.
Now you, too, can write programs using this useful trick. The macro where I used this trick can be found in my book, Cody's Data Cleaning techniques, 3rd edition. You can see a copy of this macro (and all the other macros described in the book) by going to my author web site: support.sas.com/cody, scrolling down to the Data Cleaning book and clicking on the link to see example and data. One of the macros that use this trick is called HighLow.sas. As an alternative, you can buy the book!
By the way, you can download programs and data from any of my books by using this same method and it is free!
Books by Ron Cody See them all!
10 Comments
Sum = 0, Count = 0
while (Pile 1 has more cards)
{
Read the top card X in Pile 1
Sum, Count = addIfNoun(X, Sum, Count)
Move card X to Pile 2
}
AvgT = Sum / Count
CountB = 0
while (Pile 2 has more card)
{
Read the top card X in Pile 2
if (X.LetterCount > AvgT and X.PartOfSpeed == "Noun")
{
CountB = CountB+1
}
Procedure addIf Noun (Y, SumT, CountT)
if(Y.PartOf Speech == "Noun")
{
SumT = SumT + Y.LetterCount
CountT = CountT +1
}
return ([SumT, CountT])
End addIfNoun
I usually use proc sql for that. (Proc sql; select count(*) into: xx from mydata; quit;
But also proc contents works. If you had access on the dataset right before you want to know the nr of records then %sysnobs also provides this info.
If the dataset is very large, this could take some time since it is actually reading each observation (the data portion) of the dataset. The dictionary.tables SQL facility provides the number of observations by reading the descriptor portion of the dataset. There is a myth that it uses PROC FREQ to get the counts.
The only problem with the dictionary.tables or sashelp.vtable is that PROC SQL has to "discover" the datasets in each defined libname. However, their usefulness generally is worth it.
Data abc;
Attrib make length=$1. Format=$1.;
Set sashelp.class.;
Stop;
Run;
Lost ‘;’ - please replace
The 9.1 macro reference book contains a macro to find the number of obs and the number of variables. In fact, the attrn function can return a number of other attributes of a dataset. I found that it was not necessary to create a macro variable and that the macro call would act as a macro variable.
The nice thing about having this as a macro is that it can be used anywhere: data step, title statement, SQL query, etc. without problems with step boundary conditions.
%macro getnobs(dsname=);
%*---------------------------------------------------------------------------;
%* ;
%* Name: getnobs.sas ;
%* Author: Robert Drozd (RED) ;
%* Purpose: Return the number of obs in a dataset as either a number when ;
%* action=code otherwise as a NOTE:. ;
%* This method is the preferred method to find the number of obs ;
%* in a dataset since it can be used anywhere and does not run ;
%* or compile a proc or data step. ;
%*---------------------------------------------------------------------------;
%let dsnum=%sysfunc(open(&dsname));
%sysfunc(attrn(&dsnum,nlobs))
%let dsnum=%sysfunc(close(&dsnum));
%mend getnobs;
I placed a macro like this in our autocall macro library and it has served us well. The call to the macro say, %getnobs(dsname=sashelp.shoes) simply resolves to a string of 395. I try to use global macro vars only when necessary so I especially like this one.
Doesn't look like this technique spread very far. I wonder why.
The tip on this email is part of my interview question on what are the different ways to get number of observations in a data set. If somebody uses this method I would consider them an advanced user as they have a deeper understanding of SAS datastep.
Thanks! Very interesting trick that I have never seen. It's amazing how you can have been a SAS programmer for so many years and still learn something new!
Data abc;
Attrib make length=$1. Format=$1.;
Set sashelp.class.
Stop!
Run
We use this technique to make sure our data match the db2. Create an empty template by retrieving definition from db2, append sas data, insert to database 🙂
Data abc;
Attrib make length=$1. Format=$1.;
Set sashelp.class.;
Stop;
Run;
Lost ‘;’ - please replace
Hi Ron,
This is one of my favorite tricks in SAS. However, not only it can be used for grabbing a number of observations in a data set, it can also be used for injecting a whole data structure into a SAS data step without disturbing the data step flow. Try running the following code:
I call it applying a SAS data template. For more details, see my blog post Simplify data preparation using SAS data templates.