Programming loops are a staple of any computer programming language. In essence, loops are what make programming languages so powerful.
In all programming loops, a group of statements is executed repeatedly based on the value of an index variable that changes from one iteration to another. The iterations are continued until a certain condition is met. However, in ordinary loops that index variable’s values are at least partially embedded in the programming code.
For example in the SAS’ iterative do statement embedded (hard coded) in the code are initial value 1, upper boundary 21 and increment 3:
do i=1 to 21 by 3;
Still, it is very convenient in case of linear pattern of incrementing or decrementing the index, and convenience trumps anathema of the programming – hardcoding.
If we don’t have a pattern of index variable alteration, SAS graciously provides a list of index variable values without any pattern in its iterative do-statement. For example:
do i=1, 5, 10, 8, 7;
do c=’Volvo’, ’GM’, ’Honda’, ’Ford’, ’Toyota’;
Even though the index variables can be of any data type (numeric or character), embedding their values into the do statement makes a program hard-coded (which is a curse word in programming). That requires change of the code every time when the index variable values or their order change.
SAS goes even further, allowing us to use variable names in the list of index values:
do c=var1, var2, var3, var4;
But if your data step is processing a long table, adding new variables to it with duplicate values across its observations is not the most efficient way of doing it.
While loops in general and SAS do-loops in particular belong to the code class, index variable values are data and as such ideally should not belong to the code.
Wouldn’t it be nice to have those index variable values removed from the code altogether and have them pulled from some external data source? By answering “yes” to this question we arrive at the notion of a data-driven loop.
What is a data-driven loop?
A data-driven loop is one in which the index variable changes from one iteration to another based on the value of a variable in a data table; let’s call them a driver table and a driver variable correspondingly.
The following figure illustrates this concept.
In the data-driven loop, its index-variable value in each iteration is assigned based on the table variable value in each observation. In this case, the data table controls index-variable values, their order and number of iterations which is equal to the number of observations (records) in the data table. If your data step with a data-driven loop processes some input data table, the driver data table is a separate data table, not the one that is being processed by a data step where you have the do-loop. Programming code becomes completely separated from data and free of any hard-coded values, and that makes it data-driven.
Data-driven code for iterative loops does not require any change when your iteration data changes. All you need is to change your data table that controls the looping.
The syntax for the data-driven do-loop can be something like this:
do (index=(driver-table, driver-variable)); end; |
where index changes its value from iteration to iteration according to the values of driver-variable values in the driver-table; iterations continue until the last record is read.
SAS implementation of data-driven loop
Since I am not a software developer who can affect software syntax, but rather a SAS user as are most of the readers of this blog, I implemented the above idea of the data-driven iterative DO-loop in SAS by using SAS macro language.
I used SAS’ %sysfunc macro function to execute various SAS Component Language functions in order to read the external data table from within a data step where our do-loop resides.
Here is the single SAS macro that you would need:
/* generate a list of comma-separated values */ %macro ddlist (dataset,datavar); %local dsid rc vnum vtype getfunc val outlst; %let dsid = %sysfunc(open(&dataset)); %if &dsid %then %do; %let n = %sysfunc(attrn(&dsid,nlobs)); %if &n > 0 %then %do i=1 %to &n; %let rc = %sysfunc(fetchobs(&dsid,&i)); %let vnum = %sysfunc(varnum(&dsid,&datavar)); %let vtype = %sysfunc(vartype(&dsid,&vnum)); %let getfunc = getvar&vtype; %let val = %sysfunc(&getfunc(&dsid,&vnum)); %if &vtype eq C %then %let val = "&val"; %let outlst = &outlst &val; %if &i ne &n %then %let outlst = &outlst,; %end; %let rc = %sysfunc(close(&dsid)); %end; %else %put %sysfunc(sysmsg()); &outlst %mend ddlist; |
This macro, ddlist (abbreviation of data-driven list), takes 2 parameters:
1) dataset – name of the driver table,
and
2) datavar – name of the driver variable.
Depending on the type of variable datavar in the driver table, the ddlist macro returns a comma-separated list of numeric values or quoted character values in compliance with the SAS syntax of the iterative do-loop with a list of constant values.
SAS data-driven loop usage
Here are a couple simple examples:
1) Driver variable is numeric:
/* driver table */ data work.driver_tbl; input driver_var @@; datalines; 1 5 67 78 53 37 0 ; /* data step with data-driven do-loop */ data outtable; do x=%ddlist(work.driver_tbl, driver_var); y = sin(x); output; end; run; |
The output table is produced in the order specified by the driver table/variable. As you can see in the code, the data step with the data-driven do-loop is completely decoupled from the data values. Any change to the data in the driver table will change the behavior of the do-loop without any change to the code. That is the beauty of the data-driven method.
2) Driver variable is character:
/* driver table */ data work.driver_tblc; length driver_var $6; input driver_var @@; datalines; Volvo GM Honda Ford Toyota ; /* data step with data-driven do-loop */ data _null_; do car=%ddlist(work.driver_tblc, driver_var); put car=; end; run; |
The output produced in the SAS Log:
Instead of conclusion
Besides the ddlist macro, that implements the data-driven loop in SAS, I provided just very rudimentary samples of its usage. I would like to hear from you – SAS users – any feedback that you might have regarding the concept, its implementation, usage and possible alternative solutions.
14 Comments
Leonid, once again a disclaimer - I don't have the code handy, and I hope nothing has been lost trying to cut it down to the essentials.
data outtable;
if _n_ eq 1 then do; /* set up hash table */
declare hash lookup(dataset:'work.tbl2');
dts.DefineKey('keyvar1','keyvar2');
dts.DefineData('othvar1','othvar2','othvar3','othvar4');
dts.DefineDone();
call missing(keyvar1,keyvar2, othvar1,othvar2,othvar3,othvar4);
end; /* of set up hash table */
***;
set intable;
by byvar;
do i=1 to 50 until word eq '';
word = scan(InpLine,i);
if word ne '' then do;
rc = lookup.find(KEY: byvar, KEY:word);
if (rc = 0) then do; /* word found in lookup */
/* some processing of InpLine from intable and using the other vars */
end; /* of word found in lookup */
end; /* of word not blank */
end; /* of scan loop */
run;
As can be seen, I devised it to solve a 'parsing' type of problem. One issue being whether to search the InpLine for each of the keywords applicable to the byvar (varied between 3 to over 270 ! ), or to check whether any word from InpLine was an applicable keyword.
Thank you, David, for sharing this as well.
Thanks for an interesting addition to data driven coding techniques. One minor drawback is that the driver table has to be a native SAS data set, as views and dbms tables can't use nlobs. Like many others, I've generally used SQL Select Into: to populate my driver list. But I've also used double SET statements like e.g.:
data outtable;
set intable;
do until endDriver eq 1;
set driver_table end=endDriver;
/* some processing of intable */
end;
run;
I first devised this to replace two hard-code do loops, in which lots of extra coding was needed to check whether driver_var2 (i.e. for the inner loop) was valid for driver_var1 (the outer loop).
To give different loops for different by variables I also devised something like (I don't have the exact code handy, so it may be off a little):
data outtable;
set intable;
by byvar;
do until endDriver eq 1;
set driver_table end=endDriver key=byvar;
/* some processing of intable */
end;
run;
But ultimately I opted for a hash-table solution.
Thank you, David, for sharing your technique of implementing data-driven loops. Would love to see your hash-table solution.
The way I have been solving this is through a %scan e.g
proc sql;
select count(var), var into :no_rows, :varlist separated by ' '
from driver_tbl;
quit;
data outtable;
set intable;
%do i = 1 %to &no_rows;
%let current = %scan(&varlist,&i);
/* some procesing of intable using ¤t */
%end;
run;
This should work with all types of variables.
Thank you, Olle, for sharing your approach. You, however, solve slightly different problem. Yes, your code will generate either a numeric value or an unquoted character string through %scan. But I was mimicking do-loop with listed values; for listed character values they need to be quoted. Your unquoted character string can be used as variable name, but not as a character value.
I've been eschewing hard-coded programming for almost forty years now. The most egregious example I ever came across was in COBOL:
77 FOUR PIC S9(4) VALUE 5.
It took me ages to debug that.
I like your solution - very much. Especially the completely generic don't-care-about-type nature.
If I were to do it from scratch, I'd do it via an iterative hash table, which could also cope with typing. Do you see any problem with that?
Kia ora,
Laurie
Laurie, thank you for your feedback. I share your distaste for hard-coding. I don't see any problem with you implementing it using hash table or any other way. Please feel free to share your code sample - I would be delighted to see any alternative implementation of the data-driven approach.
I guess I'm confused by your proposal. Your examples are more easily accomplished by using the usual implicit loop in the DATA step. For example, your first example is equivalent to
Furthermore, the behavior you propose is already available in a simpler manner by using the SELECT INTO statement of PROC SQL. The following program creates a macro variable that is a comma-separated list of values, which you can use on the iterative DO statement in the DATA step:
Rick, sorry if I am being confusing. Acknowledged, my usage examples are too simplistic, which makes it possible to replicate usage case 1 with your first data step above. But what I wanted to accomplish by the ddlist macro is to be able to use it in cases where a data table other than driver table is processed by data step, e.g.:
Your second example also quite valid and simple, but it would only work for numeric values. See above Wes Patton's comments and my replies. ddmacro works for both, numeric and character values.
The idea was to create a single macro that simulates do loop with a list of values stripping all hard-coded values from the code.
Yes, you'd need to add cats with quotes.
data work.driver_tblc;
length driver_var $6;
input driver_var @@;
datalines;
Volvo GM Honda Ford Toyota
;
run;
proc sql;
select cats('"',driver_var,'"') into: dv_listc separated by ','
from driver_tblc;
quit;
%put &dv_listc.;
data _null_;
do car=&dv_listc.;
put car=;
end;
run;
Thanks again. Now we have a complete set.
DDLIST macro covers both, numeric and character cases in one piece.
I'm wondering if there's an advantage to this over a select into?
data work.driver_tbl;
input driver_var @@;
datalines;
1 5 67 78 53 37 0
;
run;
proc sql;
select driver_var into: dv_list separated by ','
from driver_tbl;
quit;
%put &dv_list.;
data outtable2;
do x=&dv_list.;
y = sin(x);
output;
end;
run;
Thank you, Wes, for the alternative solution. It would work just fine for numeric values. Same idea - to create a comma-separated list of values. Your code, however, would need some modification to accommodate character values as do-loop syntax requires character values to be quoted.