Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns.

Today is the first anniversary of this blog, which is named The DO Loop, so it seems appropriate to blog about DO loops in SAS. I'll describe looping in the SAS DATA step and compare it with looping in the SAS/IML language.

### Loops in SAS

Loops are fundamental to programming because they enable you to repeat a computation for various values of parameters. Different languages use different keywords to define the iteration statement. The most well-known statement is the "for loop," which is used by C/C++, MATLAB, R, and other languages. Older languages, such as FORTRAN and SAS, call the iteration statement a "do loop," but it is exactly the same concept.

### DO loops in the DATA step

The basic iterative DO statement in SAS has the syntax `DO value = start TO stop`. An END statement marks the end of the loop, as shown in the following example:

data A; do i = 1 to 5; y = i**2; /* values are 1, 4, 9, 16, 25 */ output; end; run; |

By default, each iteration of a DO statement increments the value of the counter by 1, but you can use the BY option to increment the counter by other amounts, including non-integer amounts. For example, each iteration of the following DATA step increments the value `i` by 0.5:

data A; do i = 1 to 5 by 0.5; y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 25 */ output; end; run; |

You can also iterate "backwards" by using a negative value for the BY option: `do i=5 to 1 by -0.5`.

### DO loops in SAS/IML Software

A basic iterative DO statement in the SAS/IML language has exactly the same syntax as in the DATA step, as shown in the following PROC IML statements:

proc iml; x = 1:4; /* vector of values {1 2 3 4} */ do i = 1 to 5; z = sum(x##i); /* 10, 30, 100, 354, 1300 */ end; |

In the body of the loop, `z` is the sum of powers of the elements of `x`.
During the `i`th iteration, the elements of `x` are raised to the `i`th power. As mentioned in the previous section, you can also use the BY option to increment the counter by non-unit values and by negative values.

### Variations on the DO loop: DO WHILE and DO UNTIL

On occasion, you might want to stop iterating if a certain condition occurs.
There are two ways to do this: you can use the WHILE clause to iterate *as long as* a certain condition holds, or you can use the UNTIL clause to iterate *until* a certain condition holds.

You can use the DO statement with a WHILE clause to iterate while a condition is true. The condition is checked before each iteration, which implies that you should intialize the stopping condition prior to the loop. The following statements extend the DATA step example and iterate as long as the value of `y` is less than 20:

data A; y = 0; do i = 1 to 5 by 0.5 while(y < 20); y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 16 */ output; end; run; |

You can use the iterative DO statement with an UNTIL clause to iterate until a condition becomes true. The UNTIL condition is evaluated at the end of the loop, so you do not have to initialize the condition prior to the loop. The following statements extend the PROC IML example. The iteration stops after the value of `z` exceeds 200.

proc iml; x = 1:4; do i = 1 to 5 until(z > 200); z = sum(x##i); /* 10, 30, 100, 354 */ end; |

In these examples, the iteration stopped because the WHILE or UNTIL condition was satisfied. If the condition is not satisfied when `i=5` (the last value for the counter), the loop stops anyway. Consequently, the examples have two stopping conditions: a maximum number of iterations and the WHILE or UNTIL criterion. SAS also supports a DO WHILE and DO UNTIL syntax that does not involve using a counter variable.

It is worth noting that a DO loop with an UNTIL clause *always executes at least one time* because the condition is evaluated at the end of the loop. To prevent this behavior, use a DO loop with a WHILE clause.

### Looping over a set of items (foreach)

Some languages support a "foreach loop" that iterates over objects in a collection. SAS doesn't support that syntax directly, but there is a variant of the DO loop in which you can iterate over values in a specified list. The syntax in the DATA step is to specify a list of values (numeric or character) after the equal sign. The following example iterates over a few terms in the Fibonacci sequence:

data A; do v = 1, 1, 2, 3, 5, 8, 13, 21; y = v/lag(v); output; end; run; |

The ratio of adjacent values in a Fibonacci sequence converges to the golden ratio, which is 1.61803399....

The SAS/IML language does not support this syntax, but does enable you to iterate over values that are contained in a vector (or matrix). The following statements create a vector, `v`, that contains the Fibonacci numbers. An ordinary DO loop is used to iterate over the elements of the vector. At the end of the loop, the vector `z` contains the same values as the variable Y that was computed in the DATA step.

proc iml; v = {1, 1, 2, 3, 5, 8, 13, 21}; z = j(nrow(v),1,.); /* initialize ratio to missing values */ do i = 2 to nrow(v); z[i] = v[i]/v[i-1]; end; |

### Avoid unnecessary loops in the SAS/IML Language

I have some advice on using DO loops in SAS/IML language: look carefully to determine if you really need a loop. The SAS/IML language is a matrix/vector language, so statements that operate on a few long vectors run much faster than equivalent statements that involve many scalar quantities. Experienced SAS/IML programmers rarely operate on each element of a vector. Rather, they manipulate the vector as a single quantity. For example, the previous SAS/IML loop can be eliminated:

proc iml; v = {1, 1, 2, 3, 5, 8, 13, 21}; idx = 2:nrow(v); z = v[idx]/v[idx-1]; |

This computation, which computes the nonmissing ratios, is more efficient than looping over elements. For other tips and techniques that make your SAS/IML programs more efficient, see my book *Statistical Programming with SAS/IML Software*.

## 40 Comments

it have an algorithm likely the last one but for DO WHILE?.

i.e.

proc iml);

v = {1, 1, 2, 3, 5, 8, 13, 21};

idx = 2:nrow(v);

z = v[idx]/v[idx-1];

but for DO WHILE. For example i think using the LOC function.... Note: In the algorithm the ")" is wrong...

Thaks.

Hello;

I have a small problem which I think can be solved by loops in SAS, but I don't know how, I appreciate if you help me with that:

I have a data set in which if a value is missing then it shall be assigned the nearest non missing value in the previous observations, for example, 6 rows of my data looks like this:

1

.

.

2

.

4

I want to assign the missing values so that it would read like this

1

1

1

2

2

4

Can SAS do that for me?

Yes, of course. Direct questions like this to the SAS Community Forum on DATA step programming..

@Oscar: Here is the solution to your question.

*data set b4 with variable a as you have;

data b4;

a=1;output;

a=.;output;

a=.;output;

a=2;output;

a=.;output;

a=4;output;

run;

data after;

retain b;

set b4;

if not missing(a) then b=a;

run;

*data set after has variable b as you required;

In sas Using an array how to find the sum of higest two HWs from HW1--HW4

For help with SAS DATA step programming, post your questions to the SAS DATA step community.

sir, i have a question like cost of vehicle is 5000 and income every day is 100..how many days he will purchase the vehicle ..can we do this by using do until?

Yes, you could do that. It might look like this:

do day=1 to 1000 until(pay>=cost);

...

How do I create a column sequential as counter lines in SAS programming?

I don't understand what you are asking, but you can post questions and have discussions about SAS programming at the SAS Support Community.

Try with:

DATA b;

SET a;

SEQ+1;

RUN;

i have a question:

data _null_;

whereishi=0;

do until(whereishi=0);

if whereishi=0 then put "The End";

end;

run;

even though the do loop is satisfying intially with zero still it is looping,i could see in log The end .How come?

As the article says, "the UNTIL condition is evaluated at the end of the loop." This implies that an UNTIL loop always executes at least once. In contrast, a WHILE loop will not execute if the condition is false.

Why this code is not working, Basically I am splitting the data, When I was trying to print one of them, its showing the data set is not created, could anyone help on the same.

%MACRO DO_SQL;

%DO i = 1 %TO 10;

PROC SQL;

CREATE TABLE itemdef_&i AS

SELECT var2

FROM three

WHERE unique_key = "itemdef.&i";

QUIT;

%END;

%MEND DO_SQL;

PROC PRINT DATA = itemdef_1;

RUN;

You can ask questiones about SAS programming at the SAS Support Communities.

I have a column for each employee start date (startdt1 thru startdt12) in a new position, and a column for their last promotion date. How can I create a loop that will tell me when the promotion date is between startdt(i) and startdt(i+1) and output to a new variable with "Start_DT(i)" ? Thanks.

I suspect you'll want to use an ARRAY and loop over the 12 variables. Ask this question at a SAS Support Community such as https://communities.sas.com/community/support-communities/sas_macro_facility_data_step_and_sas_language_elements

Thank you

Perhaps this is not something the DO LOOP command does, but I am looking for an elegant way to repeat an analysis, though changing the outcome in each analysis. I'm using proc glimmix and need to model ~50 outcomes and would like to avoid the tedium of waiting for the run to complete, changing the outcome, running again... If the loop command doesn't do this, can anyone refer me to the command (or macro) that does (if there is one)? Thanks, Mark

1) It sounds like you want to use the BY statement, which loops over all values of a discrete variable such as US states or the numbers 1-50.

2) If you are doing a simulation, you might want to read about the advantage of the BY statement over macro loops.

3) If I haven't answered your question, post your question to the SAS Support Community

Hi Rick - Thank you for your prompt reply. I don't think the BY statement will do what I'm looking for: I need to run models across different outcome variables, not different values of one outcome variable. Thanks for the reference to the SAS Support Community; I'll try that next. Best, M

OK, good luck. If the explanatory variables and form of the model are the same for each, you can use a macro %DO loop, but I still think the most efficient option is to concatenate the data set and use a BY statement. Instead of N obsevations that look like

Y1 Y2 ... Y50 X1 X2 .. Xk

you will stack the data and have 50*N observations that look like this:

1 Y1 X1 X2 .. Xk

...

2 Y2 X1 X2 .. Xk

...

50 Y50 X1 X2 .. Xk

Rick I am wondering (1) can I use the do -loop when i have got n variables, lets say X1,X2,...,Xn, and i would like to introduce a new variable Z s.t. Z=X1+X2+X3+...+Xn

(2) If there is any hidden(or obvious that I have not notice) problem when I delete the continue word and the whole leave statement in the following programme written:

"data q1;

i = 100;

do while (1);

i = i - 2;

if mod(i,6)=0 then do;

output;

continue;

end;

if mod(i,4)=0 then do;

output;

continue;

end;

if mod(i,7)=0 and mod(i,5)=0 then leave;

end;

run;"

the thoughts are driving me crazy!

When SAS questions are driving you crazy, post your question to one of the many SAS Support Communities.

Hi Rick ,

my query is i have table ABC in which one column name is table_name which contains n different tables names. I want to create a separate data sets for the corresponding table names. can you please help me with this.

See my previous response. Go to the SAS Support Communities to get answers to questions like this.

Hi Rick,

I have a dataset with 3 columns, Customer ID, Product Key and Sales. Could you please advise how I can convert the table into something like the following:

Customer ID, Key1 _Sales, Key2_Sales, Key3_Sales, and so on.

Thank you for your help in advance.

Amy

Ask questions like this at the SAS Support Communities.

This particular question will require PROC TRANSPOSE, so it is suitable for the SAS Procedures Community.

I am a new user in SAS. When I am using DO-LOOP in SAS, I get a hesitation with the use in SET dataset. Simply speaking, I would like to know the following difference in the result:

**** input data source;

data a;

input name $1-10 test 12-15 score 17-22;

datalines;

Kitty 1 70

David 1 80

May 1 95

David 3 .

May 3 95

David 2 85

Kitty 3 .

Kitty 2 90

May 2 95

;

proc sort; by name test;

run;

*e.g. 1 : set dataset outside do-loop, and 9 observations are included in data temp;

data temp;

set a;

do x=1 to 10;

y=x;

end;

run;

*e.g. 2 : set dataset inside do-loop, and 1 observation (_N_=10) is included only in data temp;

data temp;

do x=1 to 10;

set a;

y=x;

end;

run;

May I thank those who can solve my weakness in SAS.

That's a good question. Understanding these cases requires a discussion of the implicit loop in the DATA step, and how SET, DO loops, and the OUTPUT statement interact. I suggest you post this and future questions to the SAS Support Community.

Rick,

I want to read a file sequentially, and then for each row on that file, using data from that row, perform an SQL proc to retrieve data associated with that data, to write out all of the data.

IE: how do I loop through the SQL proc for each row on the input sequential file?

You can ask questions and post example data at the SAS Support Communities. For SQL questions, post to the community for "Macro, DATA Step, and SAS Language Elements."

Hi all,

I have data for stocks (stock prices, returns, abnormal returns and cumulative abnormal returns) for the period 31 Jan 2007 and 31 Dec 2013. I have further given each date a stock Month Number (StockMonthNumber).

So, I now want sas to given me the mean, median, t-stat and p-value for my CumulativeAbnormalReturns only for up to month 6. How do I do this?

Do I say:

Proc univariate date=work.thokozani;

var CummulativeAbnormalReturns;

Where StockMonthNumber<=6;

run;

You can ask questions like this at the SAS Support Communities.

I am trying to calculate the following.

price=lag(price)*cpi/100. I have hand-calculated the price for the year 1979: 102.6133=101.19x100.07/100. The next year (1980) should be 102.6133*107.5/100. What is the SAS algorithm?

year cpi price

1978 100.07 100.07

1979 101.19 102.6133

1980 107.5 ?

1981

...

You can ask statistical ad programming questions at the SAS Support Communities.

I want to simulate a bivariate dataset (say weight vs blood pressure) with normal distribution, after which I would like to create a missing completely at random scenario with certain observations in the blood pressure variables missing. I would then like to perform mean imputation (or multiple imputation) for the missing observations to compare the mean and variance between the original (before missingness was induced) and the imputed dataset. I intend to do this multiple of times to get the average mean and variance for the imputed data, for the comparison with the original. I feel I could do this using SAS but can't figure out how, because I am a very basic user.

Could any one help figure out how I could do that please?

Yes, you should post your question to the SAS Support Communities.