In my previous post, Introducing data-driven loops, I suggested a way of implementing programming loops with a list of index variables pulled from an external data table. These ordinary programming loops iterate during code execution while processing some data elements of an input data table.
Handpicked Related Content: CALL EXECUTE made easy for SAS data-driven programmingSAS macro loops, on the other hand, are completely different creatures as they do not iterate during execution time, but rather during code compilation time. That makes SAS macro loops a powerful code generator tool allowing to produce multiple variations of SAS code snippets with actually writing them just once.
Syntactically speaking, while SAS programming loops or do-loops always reside within SAS data step, SAS macro loops or %do-loops are located within SAS macros. They can be either within a data step (or proc step) generating multiple data/proc step statements, or outside data/proc step generating multiple data/proc steps or global statements.
Implementing SAS macro loops
To make macro loop driven by data we can use two index macro variables: the first one (primary index) iterates from 1 to n incrementing by 1 effectively going through the observations of a driver table, the other macro variable (secondary index) gets its values from the driver variable and is being a true data-driven index for our macro loop. The following figure illustrates this concept.
SAS macro loops containing data or proc steps
For example, we have data table sashelp.prdsale that looks like this:
Suppose, we need to produce in separate HTML files for each country - charts showing actual product sales by years.
Here is how this can be done the data-driven way without any hard-coding:
/* output files location */ filename odsout "C:\PROJECTS\_BLOG_SAS\data-driven-macro-loops\html"; /* get unique countries */ proc sort data=sashelp.prdsale(keep=COUNTRY) out=work.countries nodupkey; by COUNTRY; run; %macro loop; %local num i cntry; /* get number of countries */ %let dsid = %sysfunc(open(work.countries)); %let num = %sysfunc(attrn(&dsid,nlobs)); %let rc = %sysfunc(close(&dsid)); %do i=1 %to # data _null_; p = &i; set work.countries point=p; call symputx('cntry',COUNTRY); stop; run; ods html path=odsout file="report_&cntry..html" style=styles.seaside; goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in; title1 "Product sales by year for &cntry"; axis1 minor=none label=('Actual Sales'); proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&cntry")); vbar YEAR / sumvar = ACTUAL width = 10 outside = sum raxis = axis1 cframe = white nozero discrete ; format ACTUAL dollar12.0; run; quit; ods html close; %end; %mend loop; %loop;
The highlights of this code are:
- Using proc sort with nodupkey option we create a table work.countries of unique COUNTRY values. This can be done using proc sql as well.
- We determine the number of unique COUNTRY values, &num.
- Within macro called %loop, we use primary index – macro variable &i – to iterate from 1 to &num with increment 1.
- We use data _null_ step within that loop to sequentially read values of COUNTRY using direct access to observations of work.countries table by means of point= option. For each iteration &i of %do-loop, we create a secondary index – macro variable &cntry, which is used as a true index for our loop.
- During the code compilation, SAS macro processor loops through the %do-loop &i times repeatedly generating SAS code within it, each time with a new value &cntry, thus accomplishing our task.
This implementation of the macro %do-loop works perfectly fine, except in the situations when we need to use it within a data/proc step. The problem is the data _null_ statement that converts primary index &i to the secondary index &cntry, since we can’t use a data step within another data step.
SAS macro loop within data or proc step
Let’s solve the following coding problem. Suppose we have to create SALE_CATEGORY variable on our sashelp.prdsale table, something that you would routinely code like this:
data work.prdsale; set sashelp.prdsale; if ACTUAL < 50 then SALE_CATEGORY = 'A'; else if ACTUAL < 200 then SALE_CATEGORY = 'B'; else if ACTUAL < 500 then SALE_CATEGORY = 'C'; else if ACTUAL < 700 then SALE_CATEGORY = 'D'; else if ACTUAL < 900 then SALE_CATEGORY = 'E'; else if ACTUAL < 2000 then SALE_CATEGORY = 'F'; run;
What is wrong with this code? Nothing. Except when category definition changes you would have to find every place in your code where to apply that change. Besides, if a number of categories is large, the code becomes large too.
Let’s implement this the data-driven way, without any hard-coded values. Notice, that in the code above we have multiple if-then-else statements of a certain pattern that are repeated multiple times and thus they can be generated via %do-loop.
Let’s create the following driver table that contains boundary and sale category definitions that match the above hard-coded data step:
The data-driven macro loop can be implemented using the following code:
%macro mloop; /* get observations number - num, variable numbers - vnum1, vnum2, */ /* variable type - vtype2, getfunc = getvarC or getvarN */ %let dsid = %sysfunc(open(work.salecategory)); %let num = %sysfunc(attrn(&dsid,nlobs)); %let vnum1 = %sysfunc(varnum(&dsid,upboundary)); %let vnum2 = %sysfunc(varnum(&dsid,salecat)); %let vtype2 = %sysfunc(vartype(&dsid,&vnum2)); %let getfunc = getvar&vtype2; data work.prdsale; set sashelp.prdsale; %do i=1 %to # /* get upboundary and salecat values from driver table salecategory */ /* and assign them to upper and categ macro variables */ %let rc = %sysfunc(fetchobs(&dsid,&i)); %let upper = %sysfunc(getvarn(&dsid,&vnum1)); %let categ = %sysfunc(&getfunc(&dsid,&vnum2)); %if &vtype2 eq C %then %let categ = "&categ"; /* generate if ... then ...; else statements */ if ACTUAL < &upper then SALE_CATEGORY = &categ; %if (&i ne &num) %then %str(else); %end; %let rc = %sysfunc(close(&dsid)); run; %mend mloop; %mloop;
With a little overhead of several %sysfunc() functions and SAS Component Language (SCL) functions we effectively generate a set of if-then-else statements based on the values in the driver table. Notably, even if the number of categories increases to hundreds the code does not have to be changed a bit.
Of course, this approach can be used for any SAS code generating efforts where there is a repetition of SAS code pattern.
Bonus
As a bonus to those who was patient enough to bear with me to the end, here is a macro equivalent of the data _null_ statement used in the first section - SAS macro loops containing data or proc step:
Data step:
data _null_; p = &i; set work.countries point=p; call symputx('cntry',COUNTRY); stop; run;
Macro equivalent:
%let dsid = %sysfunc(open(work.countries)); %let vnum = %sysfunc(varnum(&dsid,COUNTRY)); %let rc = %sysfunc(fetchobs(&dsid,&i)); %let cntry = %sysfunc(getvarc(&dsid,&vnum)); %let rc = %sysfunc(close(&dsid));
Thoughts?
Please share your thoughts and comments.
Handpicked Related Content: CALL EXECUTE made easy for SAS data-driven programming
16 Comments
For those interested, here is an alternative to macro loops way of implementing data-driven looping: CALL EXECUTE made easy for SAS data-driven programming.
Very interesting post, Leonid. Thank you for sharing.
Kirk
Great! I am glad you find it interesting, Kirk.
Hi Leonid,
Thank you for your useful and informative post. I use a similar, but different, approach in my own code. I hope you don’t mind if I share that approach.
I have a number of SAS utility macros at https://github.com/scottbass/SAS/tree/master/Macro, one of which is %loop: https://github.com/scottbass/SAS/blob/master/Macro/loop.sas.
%loop is a “wrapper” macro, which loops over a delimited list of items, tokenizing that list into individual “words”, and calling a user-defined inner macro for each “word” in the list.
Extracting the looping and parsing logic into this utility macro has allowed me to “think” differently/more clearly when approaching coding issues as you’ve described in your post. I just have to think about 1) what is the list I want to loop over, and 2) what code do I want to run for each item. For the inner macro, I know to just reference the macro variable &word at some point.
Because this is “pure macro”, i.e. executes to completion during code compilation, the approach is the same whether called in global code, data step, procs, etc.
One caveat is that, in *very* rare cases, creating the __LIST__ parameter can overflow the maximum size of a macro variable (IIRC it’s 64K bytes). In that scenario, I would use the “macro array” approach you’ve illustrated (&mvar1, &mvar2, &mvar3, etc, then referencing those variables via &&mvar&i). I may write a %loop2 version using this approach in the future.
You can simulate a multidimensional array using two (or more) delimiters. A simplistic example:
proc sql noprint;
select catx(“|”,name,sex,age) into :list separated by "^" from sashelp.class;
quit;
%macro code;
%let name=%scan(&word,1,|);
%let sex=%scan(&word,2,|);
%let age=%scan(&word,3,|);
%put &=name &=sex &=age;
%mend;
%loop(&list,dlm=^);
One last example: rename each variable in a dataset (real world example, used to load a hash object):
data work.foo;
set sashelp.shoes;
run;
proc contents data=work.foo out=contents noprint;
run;
proc sql noprint;
select name into :vars separated by " " from contents;
drop table contents;
quit;
%macro code;
&word=_&word <<< no trailing semi-colon
%mend;
proc datasets lib=work nowarn nolist;
modify foo;
rename %loop(&vars);
quit;
I use my %loop macro all the time in my own code; %loop and %seplist are probably my two most used utility macros.
I hope some of your readers may find this approach useful.
Kind Regards,
Scott
Thank you, Scott, for sharing your approach to implementing data-driven SAS macro loops.
Thanks for these examples. For generating code I tend to use PROC SQL to create a pseudo-array of macro variables from the data and then use those in a macro loop. For pure data manipulation I use pure SQL; for example your SALE_CATEGORY example can be done as a join with a table of recodings. You could have used PROC FORMAT there too. I find non-procedural code easier to write and maintain.
Could your first example be done with a BY statement?
Please could you explain the advantages and disadvantages of the these various approaches?
Thank you, Peter, for your comments. Of course, the same problem can be solved in a variety different ways: by using different PROCs, statements, techniques, approaches, languages. In most cases, it is matter of personal background and preferences. As for advantages and disadvantages, it would take a dissertation to analyze them in detail for different scenarios. My shortcut take on this: as long as two different codes produce intended results within given constrains (run time, computing resources, etc.), the more advantageous code is the one that is clearer and easier to understand by your peers.
Good stuff Leonid. I have found the scl functions in macro and datastep very useful over the years.
If you use the "fetch" function to loop through your driver table you can simplify the code. You no longer need the data _null_.
/* output files location */
filename odsout "C:\temp";
/* get unique countries */
proc sort data=sashelp.prdsale(keep=COUNTRY) out=work.countries nodupkey;
by COUNTRY;
run;
%macro loop;
%local num i cntry;
/* get number of countries */
%let dsid = %sysfunc(open(work.countries));
%let num = %sysfunc(attrn(&dsid,nlobs));
%do %while(%sysfunc(fetch(&dsid)) = 0);
%let cntry=%sysfunc(getvarc(&dsid,%sysfunc(varnum(&dsid,COUNTRY))));
ods html path=odsout file="report_&cntry..html" style=styles.seaside;
goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in;
title1 "Product sales by year for &cntry";
axis1 minor=none label=('Actual Sales');
proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&cntry"));
vbar YEAR /
sumvar = ACTUAL
width = 10
outside = sum
raxis = axis1
cframe = white nozero discrete
;
format ACTUAL dollar12.0;
run;
quit;
%end;
%let rc = %sysfunc(close(&dsid));
ods html close;
%mend loop;
%loop;
Thank you, Gerry. Isn't your suggestion to simplify code by replacing data _null_ with SCL functions essentially the same as described in the Bonus section of this blog post?
Yes similar, but you don't need to build the indexes for the loop, just use the fetch to fetch each row from the table.
You could also simplify further and remove the getvarc if you use %syscall set(dsid);. Like this:
/* get number of countries */
%let dsid = %sysfunc(open(work.countries));
%syscall set(dsid);
%do %while(%sysfunc(fetch(&dsid)) = 0);
%let cntry=&country;
ods html path=odsout file="report_&cntry..html" style=styles.seaside;
goptions reset=all device=actximg colors=() htext=9pt hsize=5in vsize=3in;
title1 "Product sales by year for &cntry";
axis1 minor=none label=('Actual Sales');
proc gchart data=sashelp.prdsale(where=(COUNTRY eq "&cntry"));
vbar YEAR /
sumvar = ACTUAL
width = 10
outside = sum
raxis = axis1
cframe = white nozero discrete
;
format ACTUAL dollar12.0;
run;
quit;
%end;
%let rc = %sysfunc(close(&dsid));
Ah, yes, indeed, Gerry. Using FETCH function instead of FETCHOBS effectively eliminates the outer loop with its primary index. Good CATCH, thank you!
Thanks for great content, just shared on my twitter wall. We also have some courses relevant to SAS- BIG DATA, PYTHON, Angular and other programming languages. Some of them are completely free. You can view it here Big Data, SAS, Python, AngularJS. Also would like to ask if i could republish some of your article on my blog? i think our readers would be happy to get technical knowledge and tips that you provide.
Thanks for your feedback and your interest in our content. To your question regarding republishing some of our articles, we prefer not to republish our posts on your blog. In most cases, we do not allow duplicating content on other sites because it has a negative impact on the SEO for our site as well as the republishing sites. You are free to reference our articles in your blog.
I am quite interested in your use of %SYSFUNC to capture counts from the data. Did you intend to make & an unspecified macro variable? Thank you. Rick
Thank you Rick. No it was not intentional, somehow WordPress replaced all my ampersands
&
with&
which is a mnemonic for&
in HTML. It's been fixed now.Great, thank you!