Iterative loops are one of the most powerful and imperative features of any programming language, allowing blocks of code to be automatically executed repeatedly with some variations. In SAS we call them DO-loops because they are defined by the iterative DO statements. These statements come in three distinct forms:
- DO with index variable
- DO UNTIL
- DO WHILE
In this blog post we will focus on the versatile iterative DO loops with index variable pertaining to SAS DATA steps, as opposed to its modest IML’s DO loops subset.
Iterative DO statement with index variable
The syntax of the DATA step’s iterative DO statement with index variable is remarkably simple yet powerful:
DO statement with index-variable
...more SAS statements...
It executes a block of code between the DO and END statements repeatedly, controlled by the value of an index variable. Given that angle brackets (< and >) denote “optional”, notice how index-variable requires at least one specification (specification-1) yet allows for multiple additional optional specifications (<, ...specification-n>) separated by commas.
Now, let’s look into the DO statement’s index-variable specifications.
Each specification denotes an expression, or a series of expressions as follows:
start-expression <TO stop-expression> <BY increment-expression> <WHILE (expression) | UNTIL (expression)>
Note that only start-expression is required here whereas <TO stop-expression>, <BY increment-expression>, and <WHILE (expression) or UNTIL (expression)> are optional.
Start-expression may be of either Numeric or Character type, while stop-expression and increment-expression may only be Numeric complementing Numeric start-expression.
Expressions in <WHILE (expression) | UNTIL (expression)> are Boolean Numeric expressions (numeric value other than 0 or missing is TRUE and a value of 0 or missing is FALSE).
Other iterative DO statements
For comparison, here is a brief description of the other two forms of iterative DO statement:
- The DO UNTIL statement executes statements in a DO loop repetitively until a condition is true, checking the condition after each iteration of the DO loop. In other words, if the condition is true at the end of the current loop it will not iterate anymore, and processing continues with the next statement after END. Otherwise, it will iterate.
- The DO WHILE statement executes statements in a DO loop repetitively while a condition is true, checking the condition before each iteration of the DO loop. That is if the condition is true at the beginning of the current loop it will iterate, otherwise it will not, and processing continues with the next statement after the END.
Looping over a list of index variable values/expressions
DO loops can iterate over a list of index variable values. For example, the following DO-loop will iterate its index variable values over a list of 7, 13, 5, 1 in the order they are specified:
data A; do i=7, 13, 5, 1; put i=; output; end; run;
This is not yet another syntax of iterative DO loop as it is fully covered by the iterative DO statement with index variable definition. In this case, the first value (7) is the required start expression of the required first specification, and all subsequent values (13, 5 and 1) are required start expressions of the additional optional specifications.
Similarly, the following example illustrates looping over a list of index variable character values:
data A1; length j $4; do j='a', 'bcd', 'efgh', 'xyz'; put j=; output; end; run;
Note: For character indexes, make sure to explicitly define a length for a character variable. Otherwise, it will be determined by SAS implicitly from its first occurrence. In this case, j=’a’ and the length of variable j will be assigned 1 which will result in truncating other specified longer value. That is why we have length j $4; statement before the DO-loop.
Since DO loop specifications denote expressions (values are just instances or subsets of expressions), we can expand our example to a list of actual expressions:
data B; p = constant('pi'); do i=round(sin(p)), sin(p/2), sin(p/3); put i=; output; end; run;
In this code DO-loop will iterate its index variable over a list of values defined by the following expressions: round(sin(p)), sin(p/2), sin(p/3).
Since <TO stop> is optional for the index-variable specification, the following code is perfectly syntactically correct:
data C; do j=1 by 1; output; end; run;
It will result in an infinite (endless) loop in which resulting data set will be growing indefinitely.
While unintentional infinite looping is considered to be a bug and programmers’ anathema, sometimes it may be used intentionally. For example, to find out what happens when data set size reaches the disk space capacity… Or instead of supplying a “big enough” hard-coded number (which is not a good programming practice) for the loop’s TO expression, we may want to define an infinite DO-loop and take care of its termination and exit inside the loop. For example, you can use IF exit-condition THEN LEAVE; or IF exit-condition THEN STOP; construct.
LEAVE statement immediately stops processing the current DO-loop and resumes with the next statement after its END.
STOP statement immediately stops execution of the current DATA step and SAS resumes processing statements after the end of the current DATA step.
The exit-condition may be unrelated to the index-variable and be based on some events occurrence. For instance, the following code will continue running syntactically “infinite” loop, but the IF-THEN-LEAVE statement will limit it to 200 seconds:
data D; start = datetime(); do k=1 by 1; if datetime()-start gt 200 then leave; /* ... some processing ...*/ output; end; run;
You can also create endless loop using DO UNTIL(0); or DO WHILE(1); statement, but again you would need to take care of its termination inside the loop.
Changing “TO stop” within DO-loop will not affect the number of iterations
If you think you can break out of your DO loop prematurely by adjusting TO stop expression value from within the loop, you may want to run the following code snippet to prove to yourself it’s not going to happen:
data E; n = 4; do i=1 to n; put i=; output; if i eq 2 then n = 2; end; run;
This code will execute DO-loop 4 times despite that you change value of n from 4 to 2 within the loop.
According to the iterative DO statement documentation, any changes to stop made within the DO group do not affect the number of iterations. Instead, in order to stop iteration of DO-loop before index variable surpasses stop, change the value of index-variable so that it becomes equal to the value of stop, or use LEAVE statement to jump out of the loop. The following two examples will do just that:
data F; do i=1 to 4; put i=; output; if i eq 2 then i = 4; end; run; data G; do i=1 to 4; put i=; output; if i eq 2 then leave; end; run;
Know thy DO-loop specifications
Here is a little attention/comprehension test for you.
How many times will the following DO-loop iterate?
data H; do i=1, 7, 3, 6, 2 until (i>3); put i=; output; end; run;
If your answer is 2, you need to re-read the whole post from the beginning (I am only partly joking here).
You may easily find out the correct answer by running this code snippet in SAS. If you are surprised by the result, just take a closer look at the DO statement: there are 5 specifications for the index variable here (separated by commas) whereas UNTIL (expression) belongs to the last specification where i=2. Thus, UNTIL only applies to a single value of i=2 (not to any previous specifications of i =1,7,3,6); therefore, it has no effect as it is evaluated at the end of each iteration.
Now consider the following DO-loop definition:
data Z; pi = constant('pi'); do x=3 while(x>pi), 10 to 1 by -pi*3, 20, 30 to 35 until(pi); put x=; output; end; run;
I hope after reading this blog post you can easily identify the index variable list of values the DO-loop will iterate over. Feel free to share your solution and explanation in the comments section below.
- The Magnificent DO (SGF paper, by Paul M. Dorfman)
- Loops in SAS (blog post, by Rick Wicklin)
- Introducing data-driven loops (blog post, by Leonid Batkhan)
- Data-driven SAS macro loops (blog post, by Leonid Batkhan)
Questions? Thoughts? Comments?
Do you find this post useful? Do you have questions, other secrets, tips or tricks about the DO loop? Please share with us below.
Thanks for the great summary! Happy to learn something new and interesting!
You are welcome! I am glad to hear that you are "happy to learn" and that you find "something new and interesting" here.
Thanks for the article, I typically use DO loops in SAS. The one I have never used is the DO OVER. Have you ever found a use for it?
Great question Nate! DO OVER loop is used to perform the operations in the DO loop over all elements in an array. For example, if you have an array A defined in a data step, you can loop through all its elements by either this indexed loop:
or by this "do over" loop:
For more details and examples, take a look at this article: How to Use ARRAYs and DO Loops: Do I DO OVER or Do I DO i?.
Please keep in mind the definition of "iterate". It means to repeat. Therefore, if a loop (or data step) iterates once, it has executed twice. It seems that "iterate" is habitually misused in SAS documentation & articles.
I’d agree with you, but then we’d both be wrong. In computer science, not just "in SAS documentation and articles", "iterate" is habitually used in conjunction with "single iteration" as a single pass of a repetitive process. Therefore, if a loop (or a data step) iterates once, it means it has executed once.
Thanks Leonid, excellent stuff as usual. I hadn't realized that you could loop over a list of character values and I'll add that to my mental toolkit.
You are welcome, Bruce. I am super happy that such a SAS wizard as yourself finds useful bits in my posts. Thank you for your feedback.
Fascinating comments to a great article on Do Loops.
Learned something new. Will keep as a reference.
Thank you for posting
You are welcome and thank you for one more fascinating comment.
You have shown that the DO statement is powerful, and also that it can be confusing, particularly for those more familiar with other programming languages. For maintainability I prefer simple, easily understood code. For that reason I avoid the DATA step and loops if possible and use PROC SQL.
The picture of loops is Tiger and Turtle. It is one of my pandemic cycling destinations - I was there on Monday. More: https://de.wikipedia.org/wiki/Tiger_and_Turtle_%E2%80%93_Magic_Mountain (also available, more briefly, in English).
Thank you, Peter, I appreciate your feedback. It is normal for those who are quite comfortable with one programming language to be reasonably confused when switching to another programming language, at least in the beginning. That is why I wrote this post - to make SAS data step iterative DO loops abundantly clear for all. Making some effort to absorb it may totally switch one's perception of code simplicity and maintainability.
Interesting stuff, thanks!
I’d like to add a reference to look-ahead techniques where two do-loops read a single sorted dataset stopping at each level of the by-groups.
Here is a paper on the topic: http://support.sas.com/resources/papers/proceedings12/052-2012.pdf
This is a technique I learned early in my sales career and have used many times!
Thank you, Paul, for your comment and sharing this resource. The technique you are referring to is called "DOW-loop", a "non-standard industry term", "W" standing for Ian Whitlock who introduced it. It is also described in SGF paper by Paul M. Dorfman The Magnificent DO which is referenced at the end of this blog post; also see The DOW-Loop Unrolled.
This is a fantastic summary, Leonid. I didn't know about the leave statement--very cool.
Great! Thank you for the feedback, Nicole. I am glad you learned something new from my post.
LEAVE statement is not necessarily used in conjunction with infinite loops. In any loop, e.g. if you search an array for some value and found it, you can cut that loop short by LEAVE-ing it.
The ultimate little known secret of SAS loops: the best loop is NONE loop. SAS provides a plethora of implicit looping constructs (DATA step implicit loop, BY groups in most PROCs, GROUP BY in SQL) what eliminate the majority of use cases of loops in other languages. While some bona fide uses of the explicit loops do remain, one should always ponder if there is really a need for the loop at all, or it is just a bad data structure. Implicit loops completely avoid any starting, stopping and iteration rules and thus eliminate the inherent potential of errors and maintenance, should the changed data require the parameters update.
Wow, Anton, this is by far the most profound and unorthodox programming comment I have ever read. It shakes the very foundation of all existed and existing programming languages as we know it. Not only it eliminates iterative loops in one easy swoop, it also gets rid of explicit programming in favor of implicit (and I naively thought that explicit programming is less prone to errors…) 🙂
Aside from being ironic, I am looking forward to you coming up with a new generation programming language based on your proposed principles.
The DO-loop can also iterate over variables, e.g.
But of course it is a special case of "specification".
All the best
Of course! Variables are subsets (or instances) of "expressions". I thought I covered it by this more general example:
is known as the DOLIST syntax. It is supported not only in the DATA step, but by many other SAS procedures. For more about the DOLIST syntax, see "The DOLIST syntax: Specify a list of numerical values in SAS."
Thank you, Rick, for showing the “DOLIST” usage outside DATA steps. What you call a “DOLIST” is an implicit DO-loop denoted as (index-variable=specification-1 <, ...specification-n>) or a part of the DO statement denoted as index-variable=specification-1 <, ...specification-n>.
Therefore, I would clarify that
is not the “DOLIST” per se, but a DO-loop with the “DOLIST”:
where "DOLIST" is i=7, 13, 5, 1.
In your blog post that you reference in your comment, the “DOLIST syntax” goes beyond just listing the values; it also covers start, to, and by (also, commas are optional):
The point I am making in this post for the DATA step, is that comma-separated list of values (expressions) is not a separate (additional) syntax since it falls under and fully covered by the general definition of DO Statement: Iterative.
Also, in contrast with your “DOLIST” example pertaining to the PROC SGPLOT, commas are not optional in the explicit DO statement, they are required in the DATA step.