Multi-purpose macro function for getting information about data sets

20

SAS toolbox: macro functions
Did you know you could have a single universal function that can replace all the functions in the world? All those sin(x), log(x), … whatever(x) can all be replaced by a single super function f(x). Don’t believe me? Just make those functions names – "sin", "log", … "whatever" to be another argument to that all-purpose function f, just like that: f(x, sin), f(x, log), … f(x, whatever). Now, we only must deal with a single function instead of many, and its second argument will define what transformation needs to be done with the first argument in order to arrive at this almighty function’s value.

How many functions there are in SAS

Last time I counted there were more than 600 SAS functions, and that is excluding call routines and macro functions. But even that huge number grossly under-represents the actual number of functions available in SAS. That is because there are some functions that are built like the universal multi-purpose super function described above. For example, look at the following functions:

finance() function represents several dozen various financial functions;

finfo() function represents multiple functions returning various information items about files (file size, date created, date modified, access permission, etc.);

dinfo() function returns similar information items about directories;

attrn() function returns numeric attributes of a data set (number of observations, number of variables, etc.)

attrc() function returns character attributes of a data set (engine name, encoding name, character set, etc.)

Each of these functions represents not a single function, but a group of functions, and one of their arguments stipulates specific functionality (an information item or an attribute) that is being requested. You can think of this argument as a function modifier.

%sysfunc SAS macro function

%sysfunc() is a super macro function that brings a wealth of SAS functions into SAS macro language. With very few exceptions, most SAS functions are available in SAS macro language thanks to the %sysfunc().

Moreover, we can build our own user-defined macro functions using SAS-supplied macro functions (such as %eval, %length, %quote, %scan, etc.), as well as hundreds of the SAS non-macro functions wrapped into the %sysfunc() super macro function.

Building a super macro function to retrieve information about data sets

Armed with such a powerful arsenal, let’s build a multi-purpose macro function that taps into the data tables’ metadata and extracts various information items about those tables.

Let’s make this macro function return any of the following most frequently used values:

  • Number of observations
  • Number of variables
  • Variables list (positional, separated by spaces)
  • Variables list (positional, separated by commas)

Obviously, we can create much more of these information items and attributes, but here I am just showing how to do this so that you can create your own list depending on your needs.

In my earlier blog post, How to create and use SAS macro functions, we had already built a macro function for getting the number of observations; let’s expand on that.

Here is the SAS Macro code that handles extraction of all four specified metadata items:

%macro dsinfo(dset,info);
/* dset - data set name                             */
/* info - modifier (NOBS, NVARS, VARLIST, VARLISTC) */      
   %local dsid result infocaps i;
   %let infocaps = %upcase(&info);
   %let dsid = %sysfunc(open(&dset));
   %if &dsid %then
   %do;
      %if &infocaps=NOBS %then %let result = %sysfunc(attrn(&dsid,nlobs));
      %else %if &infocaps=NVARS %then %let result = %sysfunc(attrn(&dsid,nvars));
      %else %if &infocaps=VARLIST %then
      %do i=1 %to %sysfunc(attrn(&dsid,nvars));
         %let result = &result %sysfunc(varname(&dsid,&i));
      %end;
      %else %if &infocaps=VARLISTC %then
      %do i=1 %to %sysfunc(attrn(&dsid,nvars));
         %if &i eq 1 %then %let result = %sysfunc(varname(&dsid,&i));
         %else %let result = &result,%sysfunc(varname(&dsid,&i));
      %end;
      %let dsid = %sysfunc(close(&dsid));
   %end;
   %else %put %sysfunc(sysmsg());
   &result
%mend dsinfo;

The SAS log will show:

%put NOBS=***%dsinfo(SASHELP.CARS,NOBS)***;
NOBS=***428***
%put NVARS=***%dsinfo(SASHELP.CARS,NVARS)***;
NVARS=***15***
%put VARLIST=***%dsinfo(SASHELP.CARS,VARLIST)***;
VARLIST=***Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length***
%put VARLISTC=***%dsinfo(SASHELP.CARS,VARLISTC)***;
VARLISTC=***Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length***

Macro function code highlights

We used the following statement to make our macro function case-insensitive regarding the info argument:

%let infocaps = %upcase(&info);

Then depending on the up-cased second argument of our macro function (modifier) we used the attrn(), varnum() and varname() functions within %sysfunc() to retrieve and construct our result macro variable.

We stick that result macro variable value, &result, right before the %mend statement so that the value is returned to the calling environment.

While info=VARLIST (space-separated variable list) is useful in DATA steps, info=VARLISTC (comma-separated variable list) is useful in PROC SQL.

Usage example

Having this %dsinfo macro function at hands, we can use it in multiple programming scenarios. For example:

/* ending SAS session if no observations to process */
%if %dsinfo(SASHELP.CARS,NOBS)=0 %then %do; endsas; %end;
 
/* further processing */
data MYNEWDATA (keep=%dsinfo(SASHELP.CARS,VARLIST));
   retain %dsinfo(SASHELP.CARS,VARLIST);
   set SASHELP.CARS;
   if _n_=1 then put %dsinfo(SASHELP.CARS,VARLIST);
   /* ... */
run;

Here we first check if there is at least one observation in a data set. If not (0 observations) then we stop the SAS session and don’t do any further processing. Otherwise, when there are some observations to process, we continue.

If SAS code needs multiple calls to the same macro function with the same argument, we can shorten the code by first assigning that macro function’s result to a macro variable and then reference that macro variable instead of repeating macro function invocation. Here is an example:

/* further processing */
%let vlist = %dsinfo(SASHELP.CARS,VARLIST);
data MYNEWDATA (keep=&vlist);
   retain &vlist;
   set SASHELP.CARS;
   if _n_=1 then put &vlist;
   /* ... */
run;

Additional resources

Your thoughts?

Do you see the benefits of these multi-purpose SAS macro functions? Can you suggest other scenarios of their usage? Please share your thoughts in the comments section below.

Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

20 Comments

  1. Hi Leonid, Can I put the macro dsinfo into my toolbox and use as a start and expand or modify your code? Really great that macro. 🙂 Best Regards from Germany Jens

    • Leonid Batkhan

      Thank you, Jens, for asking this question. It's very classy of you. Yes, you (and everybody else) may freely use, expand and modify this code. Please feel free to add to it as well as build other multi-purpose macro functions. I also invite you to share your addition ideas and code implementations here - I am sure our readers will appreciate it.

      • Hi Leonid,

        I modified it slightly in that way that everybody gets 4 macro vars which can be used afterwards with only one call %GetDSinfo(SASHELP.CARS);

        Really nice 🙂

        %macro GetDSinfo(dset);
        /* dset - data set name                             */
           %local dsid result infocaps i;
           %let infocaps = ALL;
           %let dsid = %sysfunc(open(&dset));
           %global _NOBS _NVARS _VARLIST _VARLISTC;
           
        %if &dsid %then
           %do;
              %if &infocaps=ALL %then %let _NOBS  = %sysfunc(attrn(&dsid,nlobs));
              %if &infocaps=ALL %then %let _NVARS = %sysfunc(attrn(&dsid,nvars));
              %if &infocaps=ALL %then
              %do i=1 %to %sysfunc(attrn(&dsid,nvars));
                 %let result = &result %sysfunc(varname(&dsid,&i));
                 %let _VARLIST=&result;
              %end;
              %if &infocaps=ALL %then
              %do i=1 %to %sysfunc(attrn(&dsid,nvars));
                 %if &i eq 1 %then %let result = %sysfunc(varname(&dsid,&i));
                 %else %let result = &result,%sysfunc(varname(&dsid,&i));
                 %let _VARLISTC=&result;
              %end;
              %let dsid = %sysfunc(close(&dsid));
           %end;
           %else %put %sysfunc(sysmsg());
          
        %mend GetDSinfo;
        
        %GetDSinfo(SASHELP.CARS);
        
        %put _NOBS:  &_NOBS;
        %put _NVARS: &_NVARS;
        %put _VARLIST:  &_VARLIST;
        %put _VARLISTC: &_VARLISTC;
        

        • Leonid Batkhan

          Hi Jens,
          Thank you for sharing your implementation. Your main point is to call this macro once and get everything it can provide at once. It makes sense in some scenarios, but there are some caveats:

             1. With this implementation this code stops being a macro function as it does not return a value. It is a macro, but not a macro function. I addressed that in my previous blog post How to create and use SAS macro functions, and my goal for this post was to show how to create multi-purpose macro functions.

             2. I strongly advise not to define hard-coded global macro variable names within a SAS macro as this is prone to wreaking havoc in your SAS program invoking such a macro in case that program uses the same macro variable names for other purposes. Such a practice contradicts the principle of foolproof software development (see Are you developing foolproof solutions?) For your implementation, instead of hard-coding your global macro variable names inside the macro, I would make them parameters to allow macro users to assign their own names. Also, by making them named parameters (instead of positional) you would effectively make them optional parameters, so you can only request calculating those that you need. Below is how I would write this code:

          %macro GetDSinfo(dset=, nobs=, nvars=, varlist=, varlistc=);
          /* dset - data set name                                           */
          /* nobs, nvars, varlist, varlistc - optional macro variable names */
             %local dsid result i;
             %let dsid = %sysfunc(open(&dset));
             %global &nobs &nvars &varlist &varlistc;
             
             %if &dsid %then
             %do;
                %if &nobs ne %then %let &nobs = %sysfunc(attrn(&dsid,nlobs));
                %if &nvars ne %then %let &nvars = %sysfunc(attrn(&dsid,nvars));
                %if &varlist ne %then
                %do i=1 %to %sysfunc(attrn(&dsid,nvars));
                   %let result = &result %sysfunc(varname(&dsid,&i));
                   %let &varlist = &result;
                %end;
                %if &varlistc ne %then
                %do i=1 %to %sysfunc(attrn(&dsid,nvars));
                   %if &i eq 1 %then %let result = %sysfunc(varname(&dsid,&i));
                   %else %let result = &result,%sysfunc(varname(&dsid,&i));
                   %let &varlistc = &result;
                %end;
                %let dsid = %sysfunc(close(&dsid));
             %end;
             %else %put %sysfunc(sysmsg());
            
          %mend GetDSinfo;
          
          %GetDSinfo(dset=SASHELP.CARS, nobs=_NOBS1, varlist=_VARLIST1); 
          %put &=_NOBS1;
          %put &=_VARLIST1;
          
          %GetDSinfo(dset=SASHELP.CARS, nobs=_NOBS2, nvars=_NVARS2, varlist=_VARLIST2, varlistc=_VARLISTC2); 
          %put &=_NOBS2;
          %put &=_NVARS2;
          %put &=_VARLIST2;
          %put &=_VARLISTC2;
          

  2. Bartosz Jabłoński on

    Yesterday, during my lecture, I was showing almost exactly the same macro to my students 🙂

    • Leonid Batkhan

      This is quite a remarkable coincidence as I also published this blog post yesterday. It just confirms that this topic is trendy and timely. Great minds think alike! 🙂

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top