Programming loops are a staple of any computer programming language. In essence, loops are what make programming languages so powerful.
In all programming loops, a group of statements is executed repeatedly based on the value of an index variable that changes from one iteration to another. The iterations are continued until a certain condition is met. However, in ordinary loops that index variable’s values are at least partially embedded in the programming code.
For example in the SAS’ iterative do statement embedded (hard coded) in the code are initial value 1, upper boundary 21 and increment 3:
do i=1 to 21 by 3;
Still, it is very convenient in case of linear pattern of incrementing or decrementing the index, and convenience trumps anathema of the programming – hardcoding.
If we don’t have a pattern of index variable alteration, SAS graciously provides a list of index variable values without any pattern in its iterative do-statement. For example:
do i=1, 5, 10, 8, 7;
do c=’Volvo’, ’GM’, ’Honda’, ’Ford’, ’Toyota’;
Even though the index variables can be of any data type (numeric or character), embedding their values into the do statement makes a program hard-coded (which is a curse word in programming). That requires change of the code every time when the index variable values or their order change.
SAS goes even further, allowing us to use variable names in the list of index values:
do c=var1, var2, var3, var4;
But if your data step is processing a long table, adding new variables to it with duplicate values across its observations is not the most efficient way of doing it.
While loops in general and SAS do-loops in particular belong to the code class, index variable values are data and as such ideally should not belong to the code.
Wouldn’t it be nice to have those index variable values removed from the code altogether and have them pulled from some external data source? By answering “yes” to this question we arrive at the notion of a data-driven loop.
What is a data-driven loop?
A data-driven loop is one in which the index variable changes from one iteration to another based on the value of a variable in a data table; let’s call them a driver table and a driver variable correspondingly.
The following figure illustrates this concept.
In the data-driven loop, its index-variable value in each iteration is assigned based on the table variable value in each observation. In this case, the data table controls index-variable values, their order and number of iterations which is equal to the number of observations (records) in the data table. If your data step with a data-driven loop processes some input data table, the driver data table is a separate data table, not the one that is being processed by a data step where you have the do-loop. Programming code becomes completely separated from data and free of any hard-coded values, and that makes it data-driven.
Data-driven code for iterative loops does not require any change when your iteration data changes. All you need is to change your data table that controls the looping.
The syntax for the data-driven do-loop can be something like this:
do (index=(driver-table, driver-variable)); end;
where index changes its value from iteration to iteration according to the values of driver-variable values in the driver-table; iterations continue until the last record is read.
SAS implementation of data-driven loop
Since I am not a software developer who can affect software syntax, but rather a SAS user as are most of the readers of this blog, I implemented the above idea of the data-driven iterative DO-loop in SAS by using SAS macro language.
I used SAS’ %sysfunc macro function to execute various SAS Component Language functions in order to read the external data table from within a data step where our do-loop resides.
Here is the single SAS macro that you would need:
/* generate a list of comma-separated values */ %macro ddlist (dataset,datavar); %local dsid rc vnum vtype getfunc val outlst; %let dsid = %sysfunc(open(&dataset)); %if &dsid %then %do; %let n = %sysfunc(attrn(&dsid,nlobs)); %if &n > 0 %then %do i=1 %to &n; %let rc = %sysfunc(fetchobs(&dsid,&i)); %let vnum = %sysfunc(varnum(&dsid,&datavar)); %let vtype = %sysfunc(vartype(&dsid,&vnum)); %let getfunc = getvar&vtype; %let val = %sysfunc(&getfunc(&dsid,&vnum)); %if &vtype eq C %then %let val = "&val"; %let outlst = &outlst &val; %if &i ne &n %then %let outlst = &outlst,; %end; %let rc = %sysfunc(close(&dsid)); %end; %else %put %sysfunc(sysmsg()); &outlst %mend ddlist;
This macro, ddlist (abbreviation of data-driven list), takes 2 parameters:
1) dataset – name of the driver table,
2) datavar – name of the driver variable.
Depending on the type of variable datavar in the driver table, the ddlist macro returns a comma-separated list of numeric values or quoted character values in compliance with the SAS syntax of the iterative do-loop with a list of constant values.
SAS data-driven loop usage
Here are a couple simple examples:
1) Driver variable is numeric:
/* driver table */ data work.driver_tbl; input driver_var @@; datalines; 1 5 67 78 53 37 0 ; run; /* data step with data-driven do-loop */ data outtable; do x=%ddlist(work.driver_tbl, driver_var); y = sin(x); output; end; run;
The output table is produced in the order specified by the driver table/variable. As you can see in the code, the data step with the data-driven do-loop is completely decoupled from the data values. Any change to the data in the driver table will change the behavior of the do-loop without any change to the code. That is the beauty of the data-driven method.
2) Driver variable is character:
/* driver table */ data work.driver_tblc; length driver_var $6; input driver_var @@; datalines; Volvo GM Honda Ford Toyota ; run; /* data step with data-driven do-loop */ data _null_; do car=%ddlist(work.driver_tblc, driver_var); put car=; end; run;
The output produced in the SAS Log:
Instead of conclusion
Besides the ddlist macro, that implements the data-driven loop in SAS, I provided just very rudimentary samples of its usage. I would like to hear from you – SAS users – any feedback that you might have regarding the concept, its implementation, usage and possible alternative solutions.