# Loops in SAS

Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns.

Today is the first anniversary of this blog, which is named The DO Loop, so it seems appropriate to blog about DO loops in SAS. I'll describe looping in the SAS DATA step and compare it with looping in the SAS/IML language.

### Loops in SAS

Loops are fundamental to programming because they enable you to repeat a computation for various values of parameters. Different languages use different keywords to define the iteration statement. The most well-known statement is the "for loop," which is used by C/C++, MATLAB, R, and other languages. Older languages, such as FORTRAN and SAS, call the iteration statement a "do loop," but it is exactly the same concept.

### DO loops in the DATA step

The basic iterative DO statement in SAS has the syntax DO value = start TO stop. An END statement marks the end of the loop, as shown in the following example:

```data A; do i = 1 to 5; y = i**2; /* values are 1, 4, 9, 16, 25 */ output; end; run;```

By default, each iteration of a DO statement increments the value of the counter by 1, but you can use the BY option to increment the counter by other amounts, including non-integer amounts. For example, each iteration of the following DATA step increments the value i by 0.5:

```data A; do i = 1 to 5 by 0.5; y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 25 */ output; end; run;```

You can also iterate "backwards" by using a negative value for the BY option: do i=5 to 1 by -0.5.

### DO loops in SAS/IML Software

A basic iterative DO statement in the SAS/IML language has exactly the same syntax as in the DATA step, as shown in the following PROC IML statements:

```proc iml; x = 1:4; /* vector of values {1 2 3 4} */ do i = 1 to 5; z = sum(x##i); /* 10, 30, 100, 354, 1300 */ end;```

In the body of the loop, z is the sum of powers of the elements of x. During the ith iteration, the elements of x are raised to the ith power. As mentioned in the previous section, you can also use the BY option to increment the counter by non-unit values and by negative values.

### Variations on the DO loop: DO WHILE and DO UNTIL

On occasion, you might want to stop iterating if a certain condition occurs. There are two ways to do this: you can use the WHILE clause to iterate as long as a certain condition holds, or you can use the UNTIL clause to iterate until a certain condition holds.

You can use the DO statement with a WHILE clause to iterate while a condition is true. The condition is checked before each iteration, which implies that you should intialize the stopping condition prior to the loop. The following statements extend the DATA step example and iterate as long as the value of y is less than 20:

```data A; y = 0; do i = 1 to 5 by 0.5 while(y < 20); y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 16 */ output; end; run;```

You can use the iterative DO statement with an UNTIL clause to iterate until a condition becomes true. The UNTIL condition is evaluated at the end of the loop, so you do not have to initialize the condition prior to the loop. The following statements extend the PROC IML example. The iteration stops after the value of z exceeds 200.

```proc iml; x = 1:4; do i = 1 to 5 until(z > 200); z = sum(x##i); /* 10, 30, 100, 354 */ end;```

In these examples, the iteration stopped because the WHILE or UNTIL condition was satisfied. If the condition is not satisfied when i=5 (the last value for the counter), the loop stops anyway. Consequently, the examples have two stopping conditions: a maximum number of iterations and the WHILE or UNTIL criterion. SAS also supports a DO WHILE and DO UNTIL syntax that does not involve using a counter variable.

It is worth noting that a DO loop with an UNTIL clause always executes at least one time because the condition is evaluated at the end of the loop. To prevent this behavior, use a DO loop with a WHILE clause.

### Looping over a set of items (foreach)

Some languages support a "foreach loop" that iterates over objects in a collection. SAS doesn't support that syntax directly, but there is a variant of the DO loop in which you can iterate over values in a specified list. The syntax in the DATA step is to specify a list of values (numeric or character) after the equal sign. The following example iterates over a few terms in the Fibonacci sequence:

```data A; do v = 1, 1, 2, 3, 5, 8, 13, 21; y = v/lag(v); output; end; run;```

The ratio of adjacent values in a Fibonacci sequence converges to the golden ratio, which is 1.61803399....

The SAS/IML language does not support this syntax, but does enable you to iterate over values that are contained in a vector (or matrix). The following statements create a vector, v, that contains the Fibonacci numbers. An ordinary DO loop is used to iterate over the elements of the vector. At the end of the loop, the vector z contains the same values as the variable Y that was computed in the DATA step.

```proc iml; v = {1, 1, 2, 3, 5, 8, 13, 21}; z = j(nrow(v),1,.); /* initialize ratio to missing values */ do i = 2 to nrow(v); z[i] = v[i]/v[i-1]; end;```

### Avoid unnecessary loops in the SAS/IML Language

I have some advice on using DO loops in SAS/IML language: look carefully to determine if you really need a loop. The SAS/IML language is a matrix/vector language, so statements that operate on a few long vectors run much faster than equivalent statements that involve many scalar quantities. Experienced SAS/IML programmers rarely operate on each element of a vector. Rather, they manipulate the vector as a single quantity. For example, the previous SAS/IML loop can be eliminated:

```proc iml; v = {1, 1, 2, 3, 5, 8, 13, 21}; idx = 2:nrow(v); z = v[idx]/v[idx-1];```

This computation, which computes the nonmissing ratios, is more efficient than looping over elements. For other tips and techniques that make your SAS/IML programs more efficient, see my book Statistical Programming with SAS/IML Software.

1. Gustavo Cano
Posted September 9, 2011 at 1:37 pm | Permalink

it have an algorithm likely the last one but for DO WHILE?.

i.e.
proc iml);
v = {1, 1, 2, 3, 5, 8, 13, 21};
idx = 2:nrow(v);
z = v[idx]/v[idx-1];

but for DO WHILE. For example i think using the LOC function.... Note: In the algorithm the ")" is wrong...

Thaks.

2. oscar
Posted May 14, 2012 at 5:36 pm | Permalink

Hello;
I have a small problem which I think can be solved by loops in SAS, but I don't know how, I appreciate if you help me with that:

I have a data set in which if a value is missing then it shall be assigned the nearest non missing value in the previous observations, for example, 6 rows of my data looks like this:
1
.
.
2
.
4
I want to assign the missing values so that it would read like this
1
1
1
2
2
4
Can SAS do that for me?

• Posted May 14, 2012 at 7:41 pm | Permalink

Yes, of course. Direct questions like this to the SAS Community Forum on DATA step programming..

• Luke Du
Posted December 12, 2012 at 12:29 pm | Permalink

@Oscar: Here is the solution to your question.
*data set b4 with variable a as you have;
data b4;
a=1;output;
a=.;output;
a=.;output;
a=2;output;
a=.;output;
a=4;output;
run;
data after;
retain b;
set b4;
if not missing(a) then b=a;
run;
*data set after has variable b as you required;

3. sirisha pannala
Posted October 1, 2012 at 5:09 pm | Permalink

In sas Using an array how to find the sum of higest two HWs from HW1--HW4

• Posted October 1, 2012 at 5:44 pm | Permalink

For help with SAS DATA step programming, post your questions to the SAS DATA step community.

• sushma
Posted April 5, 2013 at 10:39 am | Permalink

sir, i have a question like cost of vehicle is 5000 and income every day is 100..how many days he will purchase the vehicle ..can we do this by using do until?

• Posted April 5, 2013 at 10:47 am | Permalink

Yes, you could do that. It might look like this:
do day=1 to 1000 until(pay>=cost);
...

4. Romulo Freitas
Posted May 13, 2013 at 4:59 pm | Permalink

How do I create a column sequential as counter lines in SAS programming?

• Posted May 14, 2013 at 5:45 am | Permalink

I don't understand what you are asking, but you can post questions and have discussions about SAS programming at the SAS Support Community.

• cargoltreubanya
Posted September 23, 2013 at 11:42 am | Permalink

Try with:

DATA b;
SET a;
SEQ+1;
RUN;

5. sureshkumar
Posted May 29, 2013 at 8:05 am | Permalink

i have a question:
data _null_;
whereishi=0;
do until(whereishi=0);
if whereishi=0 then put "The End";
end;
run;

even though the do loop is satisfying intially with zero still it is looping,i could see in log The end .How come?

• Posted May 29, 2013 at 8:18 am | Permalink

As the article says, "the UNTIL condition is evaluated at the end of the loop." This implies that an UNTIL loop always executes at least once. In contrast, a WHILE loop will not execute if the condition is false.

6. Apoorv Mishra
Posted June 5, 2013 at 12:01 pm | Permalink

Why this code is not working, Basically I am splitting the data, When I was trying to print one of them, its showing the data set is not created, could anyone help on the same.

%MACRO DO_SQL;
%DO i = 1 %TO 10;
PROC SQL;
CREATE TABLE itemdef_&i AS
SELECT var2
FROM three
WHERE unique_key = "itemdef.&i";
QUIT;
%END;
%MEND DO_SQL;

PROC PRINT DATA = itemdef_1;
RUN;

7. Brian
Posted June 13, 2013 at 2:48 pm | Permalink

I have a column for each employee start date (startdt1 thru startdt12) in a new position, and a column for their last promotion date. How can I create a loop that will tell me when the promotion date is between startdt(i) and startdt(i+1) and output to a new variable with "Start_DT(i)" ? Thanks.

8. Mark
Posted October 10, 2013 at 1:09 pm | Permalink

Perhaps this is not something the DO LOOP command does, but I am looking for an elegant way to repeat an analysis, though changing the outcome in each analysis. I'm using proc glimmix and need to model ~50 outcomes and would like to avoid the tedium of waiting for the run to complete, changing the outcome, running again... If the loop command doesn't do this, can anyone refer me to the command (or macro) that does (if there is one)? Thanks, Mark

• Posted October 10, 2013 at 2:04 pm | Permalink

1) It sounds like you want to use the BY statement, which loops over all values of a discrete variable such as US states or the numbers 1-50.
2) If you are doing a simulation, you might want to read about the advantage of the BY statement over macro loops.

• Mark
Posted October 11, 2013 at 7:46 am | Permalink

Hi Rick - Thank you for your prompt reply. I don't think the BY statement will do what I'm looking for: I need to run models across different outcome variables, not different values of one outcome variable. Thanks for the reference to the SAS Support Community; I'll try that next. Best, M

• Posted October 11, 2013 at 8:06 am | Permalink

OK, good luck. If the explanatory variables and form of the model are the same for each, you can use a macro %DO loop, but I still think the most efficient option is to concatenate the data set and use a BY statement. Instead of N obsevations that look like
Y1 Y2 ... Y50 X1 X2 .. Xk
you will stack the data and have 50*N observations that look like this:
1 Y1 X1 X2 .. Xk
...
2 Y2 X1 X2 .. Xk
...
50 Y50 X1 X2 .. Xk

9. Chung
Posted November 7, 2013 at 9:15 pm | Permalink

Rick I am wondering (1) can I use the do -loop when i have got n variables, lets say X1,X2,...,Xn, and i would like to introduce a new variable Z s.t. Z=X1+X2+X3+...+Xn
(2) If there is any hidden(or obvious that I have not notice) problem when I delete the continue word and the whole leave statement in the following programme written:
"data q1;
i = 100;
do while (1);
i = i - 2;
if mod(i,6)=0 then do;
output;
continue;
end;
if mod(i,4)=0 then do;
output;
continue;
end;
if mod(i,7)=0 and mod(i,5)=0 then leave;
end;
run;"

the thoughts are driving me crazy!

10. nikhil
Posted February 3, 2014 at 9:25 am | Permalink

Hi Rick ,

my query is i have table ABC in which one column name is table_name which contains n different tables names. I want to create a separate data sets for the corresponding table names. can you please help me with this.

• Posted February 3, 2014 at 9:33 am | Permalink

See my previous response. Go to the SAS Support Communities to get answers to questions like this.

11. Amy
Posted May 6, 2014 at 9:56 am | Permalink

Hi Rick,

I have a dataset with 3 columns, Customer ID, Product Key and Sales. Could you please advise how I can convert the table into something like the following:

Customer ID, Key1 _Sales, Key2_Sales, Key3_Sales, and so on.

Amy

12. momoe
Posted June 20, 2014 at 1:24 am | Permalink

I am a new user in SAS. When I am using DO-LOOP in SAS, I get a hesitation with the use in SET dataset. Simply speaking, I would like to know the following difference in the result:

**** input data source;
data a;
input name \$1-10 test 12-15 score 17-22;

datalines;
Kitty 1 70
David 1 80
May 1 95
David 3 .
May 3 95
David 2 85
Kitty 3 .
Kitty 2 90
May 2 95
;
proc sort; by name test;
run;

*e.g. 1 : set dataset outside do-loop, and 9 observations are included in data temp;
data temp;
set a;
do x=1 to 10;
y=x;
end;
run;

*e.g. 2 : set dataset inside do-loop, and 1 observation (_N_=10) is included only in data temp;
data temp;
do x=1 to 10;
set a;
y=x;
end;
run;

• momoe
Posted June 20, 2014 at 1:25 am | Permalink

May I thank those who can solve my weakness in SAS.

• Posted June 20, 2014 at 6:32 am | Permalink

That's a good question. Understanding these cases requires a discussion of the implicit loop in the DATA step, and how SET, DO loops, and the OUTPUT statement interact. I suggest you post this and future questions to the SAS Support Community.

13. Dusty Simmons
Posted July 9, 2014 at 10:18 am | Permalink

Rick,
I want to read a file sequentially, and then for each row on that file, using data from that row, perform an SQL proc to retrieve data associated with that data, to write out all of the data.
IE: how do I loop through the SQL proc for each row on the input sequential file?

• Posted July 9, 2014 at 10:29 am | Permalink

You can ask questions and post example data at the SAS Support Communities. For SQL questions, post to the community for "Macro, DATA Step, and SAS Language Elements."

14. Thokozani
Posted October 9, 2014 at 7:19 am | Permalink

Hi all,

I have data for stocks (stock prices, returns, abnormal returns and cumulative abnormal returns) for the period 31 Jan 2007 and 31 Dec 2013. I have further given each date a stock Month Number (StockMonthNumber).

So, I now want sas to given me the mean, median, t-stat and p-value for my CumulativeAbnormalReturns only for up to month 6. How do I do this?

15. Thokozani
Posted October 9, 2014 at 7:21 am | Permalink

Do I say:

Proc univariate date=work.thokozani;
var CummulativeAbnormalReturns;
Where StockMonthNumber<=6;
run;

16. bash
Posted March 24, 2015 at 5:29 pm | Permalink

I want to simulate a bivariate dataset (say weight vs blood pressure) with normal distribution, after which I would like to create a missing completely at random scenario with certain observations in the blood pressure variables missing. I would then like to perform mean imputation (or multiple imputation) for the missing observations to compare the mean and variance between the original (before missingness was induced) and the imputed dataset. I intend to do this multiple of times to get the average mean and variance for the imputed data, for the comparison with the original. I feel I could do this using SAS but can't figure out how, because I am a very basic user.
Could any one help figure out how I could do that please?

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, statistical graphics, statistical simulation, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.