Data-driven SAS macro loops

16

In my previous post, Introducing data-driven loops, I suggested a way of implementing programming loops with a list of index variables pulled from an external data table. These ordinary programming loops iterate during code execution while processing some data elements of an input data table.

Handpicked Related Content: CALL EXECUTE made easy for SAS data-driven programming

SAS macro loops, on the other hand, are completely different creatures as they do not iterate during execution time, but rather during code compilation time. That makes SAS macro loops a powerful code generator tool allowing to produce multiple variations of SAS code snippets with actually writing them just once.

Syntactically speaking, while SAS programming loops or do-loops always reside within SAS data step, SAS macro loops or %do-loops are located within SAS macros. They can be either within a data step (or proc step) generating multiple data/proc step statements, or outside data/proc step generating multiple data/proc steps or global statements.

Implementing SAS macro loops

To make macro loop driven by data we can use two index macro variables: the first one (primary index) iterates from 1 to n incrementing by 1 effectively going through the observations of a driver table, the other macro variable (secondary index) gets its values from the driver variable and is being a true data-driven index for our macro loop.  The following figure illustrates this concept.

SAS macro data-driven loop concept

SAS macro loops containing data or proc steps

For example, we have data table sashelp.prdsale that looks like this:

data-driven loops source table

Suppose, we need to produce in separate HTML files for each country - charts showing actual product sales by years.

Here is how this can be done the data-driven way without any hard-coding:

/* output files location */
filename odsout "C:\PROJECTS\_BLOG_SAS\data-driven-macro-loops\html";

/* get unique countries */
proc sort data=sashelp.prdsale(keep=COUNTRY) out=work.countries nodupkey;
  by COUNTRY;
run;

%macro loop;

  %local num i cntry;

  /* get number of countries */
  %let dsid = %sysfunc(open(work.countries));
  %let num  = %sysfunc(attrn(&dsid,nlobs));
  %let rc   = %sysfunc(close(&dsid));

  %do i=1 %to #

    data _null_;
      p = &i;
      set work.countries point=p;
      call symputx('cntry',COUNTRY);
      stop;
    run;

    ods html path=odsout file="report_&cntry..html"  style=styles.seaside;
    goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in;
    title1 "Product sales by year for &cntry";
    axis1 minor=none label=('Actual Sales');

    proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&cntry"));
      vbar YEAR /
      sumvar = ACTUAL
      width = 10
      outside = sum
      raxis = axis1
      cframe = white nozero discrete
      ;
      format ACTUAL dollar12.0;
    run;
    quit;

    ods html close;

  %end;

%mend loop;
%loop;

The highlights of this code are:

  1. Using proc sort with nodupkey option we create a table work.countries of unique COUNTRY values. This can be done using proc sql as well.
  2. We determine the number of unique COUNTRY values, &num.
  3. Within macro called %loop, we use primary index – macro variable &i – to iterate from 1 to &num with increment 1.
  4. We use data _null_ step within that loop to sequentially read values of COUNTRY using direct access to observations of work.countries table by means of point= option. For each iteration &i of %do-loop, we create a secondary index – macro variable &cntry, which is used as a true index for our loop.
  5. During the code compilation, SAS macro processor loops through the %do-loop &i times repeatedly generating SAS code within it, each time with a new value &cntry, thus accomplishing our task.

This implementation of the macro %do-loop works perfectly fine, except in the situations when we need to use it within a data/proc step. The problem is the data _null_ statement that converts primary index &i to the secondary index &cntry, since we can’t use a data step within another data step.

SAS macro loop within data or proc step

Let’s solve the following coding problem. Suppose we have to create SALE_CATEGORY variable on our sashelp.prdsale table, something that you would routinely code like this:

data work.prdsale;
  set sashelp.prdsale;
  if ACTUAL < 50 then SALE_CATEGORY = 'A'; else
  if ACTUAL < 200 then SALE_CATEGORY = 'B'; else
  if ACTUAL < 500 then SALE_CATEGORY = 'C'; else
  if ACTUAL < 700 then SALE_CATEGORY = 'D'; else
  if ACTUAL < 900 then SALE_CATEGORY = 'E'; else
  if ACTUAL < 2000 then SALE_CATEGORY = 'F';
run;

What is wrong with this code? Nothing. Except when category definition changes you would have to find every place in your code where to apply that change. Besides, if a number of categories is large, the code becomes large too.

Let’s implement this the data-driven way, without any hard-coded values. Notice, that in the code above we have multiple if-then-else statements of a certain pattern that are repeated multiple times and thus they can be generated via %do-loop.

Let’s create the following driver table that contains boundary and sale category definitions that match the above hard-coded data step:

data-driven loops illustration

The data-driven macro loop can be implemented using the following code:

%macro mloop;

  /* get observations number - num, variable numbers - vnum1, vnum2, */
  /* variable type - vtype2, getfunc = getvarC or getvarN            */
  %let dsid = %sysfunc(open(work.salecategory));
  %let num  = %sysfunc(attrn(&dsid,nlobs));
  %let vnum1 = %sysfunc(varnum(&dsid,upboundary));
  %let vnum2 = %sysfunc(varnum(&dsid,salecat));
  %let vtype2 = %sysfunc(vartype(&dsid,&vnum2));
  %let getfunc = getvar&vtype2;

data work.prdsale;
  set sashelp.prdsale;

  %do i=1 %to &num;

    /* get upboundary and salecat values from driver table salecategory */
    /* and assign them to upper and categ macro variables               */
    %let rc = %sysfunc(fetchobs(&dsid,&i));
    %let upper = %sysfunc(getvarn(&dsid,&vnum1));
    %let categ = %sysfunc(&getfunc(&dsid,&vnum2));
    %if &vtype2 eq C %then %let categ = "&categ";

    /* generate if ... then ...; else statements */  
    if ACTUAL < &upper then SALE_CATEGORY = &categ;
    %if (&i ne &num) %then %str(else);

  %end;
  %let rc = %sysfunc(close(&dsid));

run;

%mend mloop;
%mloop;

With a little overhead of several %sysfunc() functions and SAS Component Language (SCL) functions we effectively generate a set of if-then-else statements based on the values in the driver table. Notably, even if the number of categories increases to hundreds the code does not have to be changed a bit.

Of course, this approach can be used for any SAS code generating efforts where there is a repetition of SAS code pattern.

Bonus

As a bonus to those who was patient enough to bear with me to the end, here is a macro equivalent of the data _null_ statement used in the first section - SAS macro loops containing data or proc step:

Data step:

    data _null_;
      p = &i;
      set work.countries point=p;
      call symputx('cntry',COUNTRY);
      stop;
    run;

Macro equivalent:

    %let dsid = %sysfunc(open(work.countries));
    %let vnum = %sysfunc(varnum(&dsid,COUNTRY));
    %let rc = %sysfunc(fetchobs(&dsid,&i));
    %let cntry = %sysfunc(getvarc(&dsid,&vnum));
    %let rc = %sysfunc(close(&dsid));

Thoughts?

Please share your thoughts and comments.

Handpicked Related Content: CALL EXECUTE made easy for SAS data-driven programming
Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

16 Comments

  1. Hi Leonid,

    Thank you for your useful and informative post. I use a similar, but different, approach in my own code. I hope you don’t mind if I share that approach.

    I have a number of SAS utility macros at https://github.com/scottbass/SAS/tree/master/Macro, one of which is %loop: https://github.com/scottbass/SAS/blob/master/Macro/loop.sas.

    %loop is a “wrapper” macro, which loops over a delimited list of items, tokenizing that list into individual “words”, and calling a user-defined inner macro for each “word” in the list.

    Extracting the looping and parsing logic into this utility macro has allowed me to “think” differently/more clearly when approaching coding issues as you’ve described in your post. I just have to think about 1) what is the list I want to loop over, and 2) what code do I want to run for each item. For the inner macro, I know to just reference the macro variable &word at some point.

    Because this is “pure macro”, i.e. executes to completion during code compilation, the approach is the same whether called in global code, data step, procs, etc.

    One caveat is that, in *very* rare cases, creating the __LIST__ parameter can overflow the maximum size of a macro variable (IIRC it’s 64K bytes). In that scenario, I would use the “macro array” approach you’ve illustrated (&mvar1, &mvar2, &mvar3, etc, then referencing those variables via &&mvar&i). I may write a %loop2 version using this approach in the future.

    You can simulate a multidimensional array using two (or more) delimiters. A simplistic example:

    proc sql noprint;
    select catx(“|”,name,sex,age) into :list separated by "^" from sashelp.class;
    quit;
    %macro code;
    %let name=%scan(&word,1,|);
    %let sex=%scan(&word,2,|);
    %let age=%scan(&word,3,|);
    %put &=name &=sex &=age;
    %mend;
    %loop(&list,dlm=^);

    One last example: rename each variable in a dataset (real world example, used to load a hash object):

    data work.foo;
    set sashelp.shoes;
    run;
    proc contents data=work.foo out=contents noprint;
    run;
    proc sql noprint;
    select name into :vars separated by " " from contents;
    drop table contents;
    quit;
    %macro code;
    &word=_&word <<< no trailing semi-colon
    %mend;
    proc datasets lib=work nowarn nolist;
    modify foo;
    rename %loop(&vars);
    quit;

    I use my %loop macro all the time in my own code; %loop and %seplist are probably my two most used utility macros.

    I hope some of your readers may find this approach useful.

    Kind Regards,
    Scott

  2. Peter Lancashire on

    Thanks for these examples. For generating code I tend to use PROC SQL to create a pseudo-array of macro variables from the data and then use those in a macro loop. For pure data manipulation I use pure SQL; for example your SALE_CATEGORY example can be done as a join with a table of recodings. You could have used PROC FORMAT there too. I find non-procedural code easier to write and maintain.

    Could your first example be done with a BY statement?

    Please could you explain the advantages and disadvantages of the these various approaches?

    • Leonid Batkhan

      Thank you, Peter, for your comments. Of course, the same problem can be solved in a variety different ways: by using different PROCs, statements, techniques, approaches, languages. In most cases, it is matter of personal background and preferences. As for advantages and disadvantages, it would take a dissertation to analyze them in detail for different scenarios. My shortcut take on this: as long as two different codes produce intended results within given constrains (run time, computing resources, etc.), the more advantageous code is the one that is clearer and easier to understand by your peers.

  3. Gerry Nelson on

    Good stuff Leonid. I have found the scl functions in macro and datastep very useful over the years.

    If you use the "fetch" function to loop through your driver table you can simplify the code. You no longer need the data _null_.

    /* output files location */
    filename odsout "C:\temp";

    /* get unique countries */
    proc sort data=sashelp.prdsale(keep=COUNTRY) out=work.countries nodupkey;
    by COUNTRY;
    run;

    %macro loop;

    %local num i cntry;

    /* get number of countries */
    %let dsid = %sysfunc(open(work.countries));
    %let num = %sysfunc(attrn(&dsid,nlobs));

    %do %while(%sysfunc(fetch(&dsid)) = 0);

    %let cntry=%sysfunc(getvarc(&dsid,%sysfunc(varnum(&dsid,COUNTRY))));

    ods html path=odsout file="report_&cntry..html" style=styles.seaside;
    goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in;
    title1 "Product sales by year for &cntry";
    axis1 minor=none label=('Actual Sales');

    proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&cntry"));
    vbar YEAR /
    sumvar = ACTUAL
    width = 10
    outside = sum
    raxis = axis1
    cframe = white nozero discrete
    ;
    format ACTUAL dollar12.0;
    run;
    quit;

    %end;

    %let rc = %sysfunc(close(&dsid));

    ods html close;

    %mend loop;
    %loop;

    • Leonid Batkhan

      Thank you, Gerry. Isn't your suggestion to simplify code by replacing data _null_ with SCL functions essentially the same as described in the Bonus section of this blog post?

      • Gerry Nelson on

        Yes similar, but you don't need to build the indexes for the loop, just use the fetch to fetch each row from the table.

        You could also simplify further and remove the getvarc if you use %syscall set(dsid);. Like this:

        /* get number of countries */
        %let dsid = %sysfunc(open(work.countries));

        %syscall set(dsid);

        %do %while(%sysfunc(fetch(&dsid)) = 0);

        %let cntry=&country;

        ods html path=odsout file="report_&cntry..html" style=styles.seaside;
        goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in;
        title1 "Product sales by year for &cntry";
        axis1 minor=none label=('Actual Sales');

        proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&cntry"));
        vbar YEAR /
        sumvar = ACTUAL
        width = 10
        outside = sum
        raxis = axis1
        cframe = white nozero discrete
        ;
        format ACTUAL dollar12.0;
        run;
        quit;

        %end;

        %let rc = %sysfunc(close(&dsid));

  4. Thanks for great content, just shared on my twitter wall. We also have some courses relevant to SAS- BIG DATA, PYTHON, Angular and other programming languages. Some of them are completely free. You can view it here Big Data, SAS, Python, AngularJS. Also would like to ask if i could republish some of your article on my blog? i think our readers would be happy to get technical knowledge and tips that you provide.

    • Leonid Batkhan

      Thanks for your feedback and your interest in our content. To your question regarding republishing some of our articles, we prefer not to republish our posts on your blog. In most cases, we do not allow duplicating content on other sites because it has a negative impact on the SEO for our site as well as the republishing sites. You are free to reference our articles in your blog.

  5. I am quite interested in your use of %SYSFUNC to capture counts from the data. Did you intend to make &AMP an unspecified macro variable? Thank you. Rick

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top