Little known secrets of DO-loops with index variables

By Leonid Batkhan on SAS Users July 6, 2021 Topics | Learn SAS Programming Tips

Complex loops in SAS programming Iterative loops are one of the most powerful and imperative features of any programming language, allowing blocks of code to be automatically executed repeatedly with some variations. In SAS we call them DO-loops because they are defined by the iterative DO statements. These statements come in three distinct forms:

DO with index variable
DO UNTIL
DO WHILE

In this blog post we will focus on the versatile iterative DO loops with index variable pertaining to SAS DATA steps, as opposed to its modest IML’s DO loops subset.

Iterative DO statement with index variable

The syntax of the DATA step’s iterative DO statement with index variable is remarkably simple yet powerful:

DO statement with index-variable

DO index-variable=specification-1 <, ...specification-n>;

...more SAS statements...

END;

It executes a block of code between the DO and END statements repeatedly, controlled by the value of an index variable. Given that angle brackets (< and >) denote “optional”, notice how index-variable requires at least one specification (specification-1) yet allows for multiple additional optional specifications (<, ...specification-n>) separated by commas.

Now, let’s look into the DO statement’s index-variable specifications.

Index-variable specification

Each specification denotes an expression, or a series of expressions as follows:

start-expression <TO stop-expression> <BY increment-expression> <WHILE (expression) | UNTIL (expression)>

Note that only start-expression is required here whereas <TO stop-expression>, <BY increment-expression>, and <WHILE (expression) or UNTIL (expression)> are optional.

Start-expression may be of either Numeric or Character type, while stop-expression and increment-expression may only be Numeric complementing Numeric start-expression.

Expressions in <WHILE (expression) | UNTIL (expression)> are Boolean Numeric expressions (numeric value other than 0 or missing is TRUE and a value of 0 or missing is FALSE).

Other iterative DO statements

For comparison, here is a brief description of the other two forms of iterative DO statement:

The DO UNTIL statement executes statements in a DO loop repetitively until a condition is true, checking the condition after each iteration of the DO loop. In other words, if the condition is true at the end of the current loop it will not iterate anymore, and processing continues with the next statement after END. Otherwise, it will iterate.
The DO WHILE statement executes statements in a DO loop repetitively while a condition is true, checking the condition before each iteration of the DO loop. That is if the condition is true at the beginning of the current loop it will iterate, otherwise it will not, and processing continues with the next statement after the END.

Looping over a list of index variable values/expressions

DO loops can iterate over a list of index variable values. For example, the following DO-loop will iterate its index variable values over a list of 7, 13, 5, 1 in the order they are specified:

data A; 
   do i=7, 13, 5, 1;
      put i=;
      output;
   end;
run;

This is not yet another syntax of iterative DO loop as it is fully covered by the iterative DO statement with index variable definition. In this case, the first value (7) is the required start expression of the required first specification, and all subsequent values (13, 5 and 1) are required start expressions of the additional optional specifications.

Similarly, the following example illustrates looping over a list of index variable character values:

data A1;
   length j $4;
   do j='a', 'bcd', 'efgh', 'xyz';
      put j=;
      output;
   end;
run;

Note: For character indexes, make sure to explicitly define a length for a character variable. Otherwise, it will be determined by SAS implicitly from its first occurrence. In this case, j=’a’ and the length of variable j will be assigned 1 which will result in truncating other specified longer value. That is why we have length j $4; statement before the DO-loop.

Since DO loop specifications denote expressions (values are just instances or subsets of expressions), we can expand our example to a list of actual expressions:

data B;
   p = constant('pi');
   do i=round(sin(p)), sin(p/2), sin(p/3);
      put i=;
      output;
   end;
run;

In this code DO-loop will iterate its index variable over a list of values defined by the following expressions: round(sin(p)), sin(p/2), sin(p/3).

Infinite loops

Since <TO stop> is optional for the index-variable specification, the following code is perfectly syntactically correct:

data C;
   do j=1 by 1;
      output;
   end;
run;

It will result in an infinite (endless) loop in which resulting data set will be growing indefinitely.

While unintentional infinite looping is considered to be a bug and programmers’ anathema, sometimes it may be used intentionally. For example, to find out what happens when data set size reaches the disk space capacity… Or instead of supplying a “big enough” hard-coded number (which is not a good programming practice) for the loop’s TO expression, we may want to define an infinite DO-loop and take care of its termination and exit inside the loop. For example, you can use IF exit-condition THEN LEAVE; or IF exit-condition THEN STOP; construct.

LEAVE statement immediately stops processing the current DO-loop and resumes with the next statement after its END.

STOP statement immediately stops execution of the current DATA step and SAS resumes processing statements after the end of the current DATA step.

The exit-condition may be unrelated to the index-variable and be based on some events occurrence. For instance, the following code will continue running syntactically “infinite” loop, but the IF-THEN-LEAVE statement will limit it to 200 seconds:

data D;
   start = datetime();
   do k=1 by 1;
      if datetime()-start gt 200 then leave;
      /* ... some processing ...*/
      output; 
   end;
run;

You can also create endless loop using DO UNTIL(0); or DO WHILE(1); statement, but again you would need to take care of its termination inside the loop.

Changing “TO stop” within DO-loop will not affect the number of iterations

If you think you can break out of your DO loop prematurely by adjusting TO stop expression value from within the loop, you may want to run the following code snippet to prove to yourself it’s not going to happen:

data E;
   n = 4;
   do i=1 to n;
      put i=;
      output;
      if i eq 2 then n = 2;
   end;
run;

This code will execute DO-loop 4 times despite that you change value of n from 4 to 2 within the loop.

According to the iterative DO statement documentation, any changes to stop made within the DO group do not affect the number of iterations. Instead, in order to stop iteration of DO-loop before index variable surpasses stop, change the value of index-variable so that it becomes equal to the value of stop, or use LEAVE statement to jump out of the loop. The following two examples will do just that:

data F;
   do i=1 to 4;
      put i=;
      output;
      if i eq 2 then i = 4;
   end;
run;
 
data G;
   do i=1 to 4;
      put i=;
      output;
      if i eq 2 then leave;
   end;
run;

Know thy DO-loop specifications

Here is a little attention/comprehension test for you.

How many times will the following DO-loop iterate?

data H;
   do i=1, 7, 3, 6, 2 until (i>3);
      put i=;
      output;
   end;
run;

If your answer is 2, you need to re-read the whole post from the beginning (I am only partly joking here).

You may easily find out the correct answer by running this code snippet in SAS. If you are surprised by the result, just take a closer look at the DO statement: there are 5 specifications for the index variable here (separated by commas) whereas UNTIL (expression) belongs to the last specification where i=2. Thus, UNTIL only applies to a single value of i=2 (not to any previous specifications of i =1,7,3,6); therefore, it has no effect as it is evaluated at the end of each iteration.

Now consider the following DO-loop definition:

data Z;
   pi = constant('pi');
   do x=3 while(x>pi), 10 to 1 by -pi*3, 20, 30 to 35 until(pi);
      put x=;
      output;
   end;
run;

I hope after reading this blog post you can easily identify the index variable list of values the DO-loop will iterate over. Feel free to share your solution and explanation in the comments section below.

Additional resources

The Magnificent DO (SGF paper, by Paul M. Dorfman)
Loops in SAS (blog post, by Rick Wicklin)
Introducing data-driven loops (blog post, by Leonid Batkhan)
Data-driven SAS macro loops (blog post, by Leonid Batkhan)

Questions? Thoughts? Comments?

Do you find this post useful? Do you have questions, other secrets, tips or tricks about the DO loop? Please share with us below.

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

22 Comments

Murat Polat on June 16, 2022 5:57 pm

Thanks for the great summary! Happy to learn something new and interesting!

Reply
- Leonid Batkhan on June 17, 2022 9:24 am
  
  You are welcome! I am glad to hear that you are "happy to learn" and that you find "something new and interesting" here.
  
  Reply
Nate on February 24, 2022 4:01 pm

Thanks for the article, I typically use DO loops in SAS. The one I have never used is the DO OVER. Have you ever found a use for it?

Reply
- Leonid Batkhan on February 24, 2022 4:29 pm
  Great question Nate! DO OVER loop is used to perform the operations in the DO loop over all elements in an array. For example, if you have an array A defined in a data step, you can loop through all its elements by either this indexed loop:
```
do i=1 to dim(A);
   /* within this loop you reference array elements as A[i], for example: */
   A[i] = A[i] + 3;
end;
```
  or by this "do over" loop:
```
do over A;
   /* within this loop you use just array name, not references A[i] by elements, */
   /* for example:                                                               */
   A = A + 3;
end;
```
  For more details and examples, take a look at this article: How to Use ARRAYs and DO Loops: Do I DO OVER or Do I DO i?.
  
  Reply
Kelley Weston on September 28, 2021 8:53 pm

Please keep in mind the definition of "iterate". It means to repeat. Therefore, if a loop (or data step) iterates once, it has executed twice. It seems that "iterate" is habitually misused in SAS documentation & articles.

Reply
- Leonid Batkhan on September 28, 2021 9:38 pm
  
  I’d agree with you, but then we’d both be wrong. In computer science, not just "in SAS documentation and articles", "iterate" is habitually used in conjunction with "single iteration" as a single pass of a repetitive process. Therefore, if a loop (or a data step) iterates once, it means it has executed once.
  
  Reply
Bruce Gilsen on August 24, 2021 6:53 pm

Thanks Leonid, excellent stuff as usual. I hadn't realized that you could loop over a list of character values and I'll add that to my mental toolkit.

Reply
- Leonid Batkhan on August 25, 2021 10:47 am
  
  You are welcome, Bruce. I am super happy that such a SAS wizard as yourself finds useful bits in my posts. Thank you for your feedback.
  
  Reply
Deb Summons on July 31, 2021 1:05 am

Fascinating comments to a great article on Do Loops.

Learned something new. Will keep as a reference.

Thank you for posting

Reply
- Leonid Batkhan on July 31, 2021 11:26 am
  
  You are welcome and thank you for one more fascinating comment.
  
  Reply
Peter Lancashire on July 14, 2021 10:39 am

You have shown that the DO statement is powerful, and also that it can be confusing, particularly for those more familiar with other programming languages. For maintainability I prefer simple, easily understood code. For that reason I avoid the DATA step and loops if possible and use PROC SQL.

The picture of loops is Tiger and Turtle. It is one of my pandemic cycling destinations - I was there on Monday. More: https://de.wikipedia.org/wiki/Tiger_and_Turtle_%E2%80%93_Magic_Mountain (also available, more briefly, in English).

Reply
- Leonid Batkhan on July 14, 2021 11:05 am
  
  Thank you, Peter, I appreciate your feedback. It is normal for those who are quite comfortable with one programming language to be reasonably confused when switching to another programming language, at least in the beginning. That is why I wrote this post - to make SAS data step iterative DO loops abundantly clear for all. Making some effort to absorb it may totally switch one's perception of code simplicity and maintainability.
  
  Reply
Paul Choate on July 13, 2021 10:13 am

Interesting stuff, thanks!
I’d like to add a reference to look-ahead techniques where two do-loops read a single sorted dataset stopping at each level of the by-groups.
Here is a paper on the topic: http://support.sas.com/resources/papers/proceedings12/052-2012.pdf
This is a technique I learned early in my sales career and have used many times!

Reply
- Leonid Batkhan on July 13, 2021 10:33 am
  
  Thank you, Paul, for your comment and sharing this resource. The technique you are referring to is called "DOW-loop", a "non-standard industry term", "W" standing for Ian Whitlock who introduced it. It is also described in SGF paper by Paul M. Dorfman The Magnificent DO which is referenced at the end of this blog post; also see The DOW-Loop Unrolled.
  
  Reply
Nicole Fox on July 8, 2021 9:32 am

This is a fantastic summary, Leonid. I didn't know about the leave statement--very cool.

Reply
- Leonid Batkhan on July 8, 2021 9:52 am
  
  Great! Thank you for the feedback, Nicole. I am glad you learned something new from my post.
  LEAVE statement is not necessarily used in conjunction with infinite loops. In any loop, e.g. if you search an array for some value and found it, you can cut that loop short by LEAVE-ing it.
  
  Reply
Anton Meshcheryakov on July 7, 2021 1:40 pm

The ultimate little known secret of SAS loops: the best loop is NONE loop. SAS provides a plethora of implicit looping constructs (DATA step implicit loop, BY groups in most PROCs, GROUP BY in SQL) what eliminate the majority of use cases of loops in other languages. While some bona fide uses of the explicit loops do remain, one should always ponder if there is really a need for the loop at all, or it is just a bad data structure. Implicit loops completely avoid any starting, stopping and iteration rules and thus eliminate the inherent potential of errors and maintenance, should the changed data require the parameters update.

Reply
- Leonid Batkhan on July 7, 2021 2:27 pm
  
  Wow, Anton, this is by far the most profound and unorthodox programming comment I have ever read. It shakes the very foundation of all existed and existing programming languages as we know it. Not only it eliminates iterative loops in one easy swoop, it also gets rid of explicit programming in favor of implicit (and I naively thought that explicit programming is less prone to errors…) 🙂
  
  Aside from being ironic, I am looking forward to you coming up with a new generation programming language based on your proposed principles.
  
  Reply
Bartosz Jabłoński on July 7, 2021 12:05 pm
The DO-loop can also iterate over variables, e.g.
```
data _null_;
   X1=1;
   X2=2;
   X3=3;
   do i = X1, X2, X3;
      put i=;
   end;
run;
```
But of course it is a special case of "specification".

All the best
Bart

Reply
- Leonid Batkhan on July 7, 2021 12:22 pm
  Of course! Variables are subsets (or instances) of "expressions". I thought I covered it by this more general example:
  data B; p = constant('pi'); do i=round(sin(p)), sin(p/2), sin(p/3); put i=; output; end; run;
  Reply
Rick Wicklin on July 7, 2021 7:21 am
The syntax
do i=7, 13, 5, 1; end;
is known as the DOLIST syntax. It is supported not only in the DATA step, but by many other SAS procedures. For more about the DOLIST syntax, see "The DOLIST syntax: Specify a list of numerical values in SAS."

Reply
- Leonid Batkhan on July 7, 2021 10:35 am
  Thank you, Rick, for showing the “DOLIST” usage outside DATA steps. What you call a “DOLIST” is an implicit DO-loop denoted as (index-variable=specification-1 <, ...specification-n>) or a part of the DO statement denoted as index-variable=specification-1 <, ...specification-n>.
  Therefore, I would clarify that
  do i=7, 13, 5, 1; end;
  is not the “DOLIST” per se, but a DO-loop with the “DOLIST”:
  do "DOLIST"; end;
  where "DOLIST" is i=7, 13, 5, 1.
  
  In your blog post that you reference in your comment, the “DOLIST syntax” goes beyond just listing the values; it also covers start, to, and by (also, commas are optional):
  proc sgplot data=sashelp.cars; scatter x=Weight y=Mpg_City; yaxis grid values=(10 to 40 by 5, 50 60); /* DOLIST; commas optional */ run;
  The point I am making in this post for the DATA step, is that comma-separated list of values (expressions) is not a separate (additional) syntax since it falls under and fully covered by the general definition of DO Statement: Iterative.
  
  Also, in contrast with your “DOLIST” example pertaining to the PROC SGPLOT, commas are not optional in the explicit DO statement, they are required in the DATA step.
  
  Reply

Blogs