Loops in SAS

Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns.

Today is the first anniversary of this blog, which is named The DO Loop, so it seems appropriate to blog about DO loops in SAS. I'll describe looping in the SAS DATA step and compare it with looping in the SAS/IML language.

Loops in SAS

Loops are fundamental to programming because they enable you to repeat a computation for various values of parameters. Different languages use different keywords to define the iteration statement. The most well-known statement is the "for loop," which is used by C/C++, MATLAB, R, and other languages. Older languages, such as FORTRAN and SAS, call the iteration statement a "do loop," but it is exactly the same concept.

DO loops in the DATA step

The basic iterative DO statement in SAS has the syntax DO value = start TO stop. An END statement marks the end of the loop, as shown in the following example:

data A;
do i = 1 to 5;
   y = i**2; /* values are 1, 4, 9, 16, 25 */
   output;
end;
run;

By default, each iteration of a DO statement increments the value of the counter by 1, but you can use the BY option to increment the counter by other amounts, including non-integer amounts. For example, each iteration of the following DATA step increments the value i by 0.5:

data A;
do i = 1 to 5 by 0.5;
   y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 25 */
   output;
end;
run;

You can also iterate "backwards" by using a negative value for the BY option: do i=5 to 1 by -0.5.

DO loops in SAS/IML Software

A basic iterative DO statement in the SAS/IML language has exactly the same syntax as in the DATA step, as shown in the following PROC IML statements:

proc iml;
x = 1:4; /* vector of values {1 2 3 4} */
do i = 1 to 5;
   z = sum(x##i); /* 10, 30, 100, 354, 1300 */
end;

In the body of the loop, z is the sum of powers of the elements of x. During the ith iteration, the elements of x are raised to the ith power. As mentioned in the previous section, you can also use the BY option to increment the counter by non-unit values and by negative values.

Variations on the DO loop: DO WHILE and DO UNTIL

On occasion, you might want to stop iterating if a certain condition occurs. There are two ways to do this: you can use the WHILE clause to iterate as long as a certain condition holds, or you can use the UNTIL clause to iterate until a certain condition holds.

You can use the DO statement with a WHILE clause to iterate while a condition is true. The condition is checked before each iteration, which implies that you should intialize the stopping condition prior to the loop. The following statements extend the DATA step example and iterate as long as the value of y is less than 20:

data A;
y = 0;
do i = 1 to 5 by 0.5 while(y < 20);
   y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 16 */
   output;
end;
run;

You can use the iterative DO statement with an UNTIL clause to iterate until a condition becomes true. The UNTIL condition is evaluated at the end of the loop, so you do not have to initialize the condition prior to the loop. The following statements extend the PROC IML example. The iteration stops after the value of z exceeds 200.

proc iml;
x = 1:4;
do i = 1 to 5 until(z > 200);
   z = sum(x##i); /* 10, 30, 100, 354 */
end;

In these examples, the iteration stopped because the WHILE or UNTIL condition was satisfied. If the condition is not satisfied when i=5 (the last value for the counter), the loop stops anyway. Consequently, the examples have two stopping conditions: a maximum number of iterations and the WHILE or UNTIL criterion. SAS also supports a DO WHILE and DO UNTIL syntax that does not involve using a counter variable.

It is worth noting that a DO loop with an UNTIL clause always executes at least one time because the condition is evaluated at the end of the loop. To prevent this behavior, use a DO loop with a WHILE clause.

Looping over a set of items (foreach)

Some languages support a "foreach loop" that iterates over objects in a collection. SAS doesn't support that syntax directly, but there is a variant of the DO loop in which you can iterate over values in a specified list. The syntax in the DATA step is to specify a list of values (numeric or character) after the equal sign. The following example iterates over a few terms in the Fibonacci sequence:

data A;
do v = 1, 1, 2, 3, 5, 8, 13, 21;
   y = v/lag(v);
   output;
end;
run;

The ratio of adjacent values in a Fibonacci sequence converges to the golden ratio, which is 1.61803399....

The SAS/IML language does not support this syntax, but does enable you to iterate over values that are contained in a vector (or matrix). The following statements create a vector, v, that contains the Fibonacci numbers. An ordinary DO loop is used to iterate over the elements of the vector. At the end of the loop, the vector z contains the same values as the variable Y that was computed in the DATA step.

proc iml;
v = {1, 1, 2, 3, 5, 8, 13, 21};
z = j(nrow(v),1,.); /* initialize ratio to missing values */
do i = 2 to nrow(v);
   z[i] = v[i]/v[i-1];
end;

Avoid unnecessary loops in the SAS/IML Language

I have some advice on using DO loops in SAS/IML language: look carefully to determine if you really need a loop. The SAS/IML language is a matrix/vector language, so statements that operate on a few long vectors run much faster than equivalent statements that involve many scalar quantities. Experienced SAS/IML programmers rarely operate on each element of a vector. Rather, they manipulate the vector as a single quantity. For example, the previous SAS/IML loop can be eliminated:

proc iml;
v = {1, 1, 2, 3, 5, 8, 13, 21};
idx = 2:nrow(v);
z = v[idx]/v[idx-1];

This computation, which computes the nonmissing ratios, is more efficient than looping over elements. For other tips and techniques that make your SAS/IML programs more efficient, see my book Statistical Programming with SAS/IML Software.

tags: SAS Programming, Statistical Programming

33 Comments

  1. Gustavo Cano
    Posted September 9, 2011 at 1:37 pm | Permalink

    it have an algorithm likely the last one but for DO WHILE?.

    i.e.
    proc iml);
    v = {1, 1, 2, 3, 5, 8, 13, 21};
    idx = 2:nrow(v);
    z = v[idx]/v[idx-1];

    but for DO WHILE. For example i think using the LOC function.... Note: In the algorithm the ")" is wrong...

    Thaks.

  2. oscar
    Posted May 14, 2012 at 5:36 pm | Permalink

    Hello;
    I have a small problem which I think can be solved by loops in SAS, but I don't know how, I appreciate if you help me with that:

    I have a data set in which if a value is missing then it shall be assigned the nearest non missing value in the previous observations, for example, 6 rows of my data looks like this:
    1
    .
    .
    2
    .
    4
    I want to assign the missing values so that it would read like this
    1
    1
    1
    2
    2
    4
    Can SAS do that for me?

    • Posted May 14, 2012 at 7:41 pm | Permalink

      Yes, of course. Direct questions like this to the SAS Community Forum on DATA step programming..

    • Luke Du
      Posted December 12, 2012 at 12:29 pm | Permalink

      @Oscar: Here is the solution to your question.
      *data set b4 with variable a as you have;
      data b4;
      a=1;output;
      a=.;output;
      a=.;output;
      a=2;output;
      a=.;output;
      a=4;output;
      run;
      data after;
      retain b;
      set b4;
      if not missing(a) then b=a;
      run;
      *data set after has variable b as you required;

  3. sirisha pannala
    Posted October 1, 2012 at 5:09 pm | Permalink

    In sas Using an array how to find the sum of higest two HWs from HW1--HW4

    • Posted October 1, 2012 at 5:44 pm | Permalink

      For help with SAS DATA step programming, post your questions to the SAS DATA step community.

      • sushma
        Posted April 5, 2013 at 10:39 am | Permalink

        sir, i have a question like cost of vehicle is 5000 and income every day is 100..how many days he will purchase the vehicle ..can we do this by using do until?

        • Posted April 5, 2013 at 10:47 am | Permalink

          Yes, you could do that. It might look like this:
          do day=1 to 1000 until(pay>=cost);
          ...

  4. Romulo Freitas
    Posted May 13, 2013 at 4:59 pm | Permalink

    How do I create a column sequential as counter lines in SAS programming?

    • Posted May 14, 2013 at 5:45 am | Permalink

      I don't understand what you are asking, but you can post questions and have discussions about SAS programming at the SAS Support Community.

    • cargoltreubanya
      Posted September 23, 2013 at 11:42 am | Permalink

      Try with:

      DATA b;
      SET a;
      SEQ+1;
      RUN;

  5. sureshkumar
    Posted May 29, 2013 at 8:05 am | Permalink

    i have a question:
    data _null_;
    whereishi=0;
    do until(whereishi=0);
    if whereishi=0 then put "The End";
    end;
    run;

    even though the do loop is satisfying intially with zero still it is looping,i could see in log The end .How come?

    • Posted May 29, 2013 at 8:18 am | Permalink

      As the article says, "the UNTIL condition is evaluated at the end of the loop." This implies that an UNTIL loop always executes at least once. In contrast, a WHILE loop will not execute if the condition is false.

  6. Apoorv Mishra
    Posted June 5, 2013 at 12:01 pm | Permalink

    Why this code is not working, Basically I am splitting the data, When I was trying to print one of them, its showing the data set is not created, could anyone help on the same.

    %MACRO DO_SQL;
    %DO i = 1 %TO 10;
    PROC SQL;
    CREATE TABLE itemdef_&i AS
    SELECT var2
    FROM three
    WHERE unique_key = "itemdef.&i";
    QUIT;
    %END;
    %MEND DO_SQL;

    PROC PRINT DATA = itemdef_1;
    RUN;

  7. Brian
    Posted June 13, 2013 at 2:48 pm | Permalink

    I have a column for each employee start date (startdt1 thru startdt12) in a new position, and a column for their last promotion date. How can I create a loop that will tell me when the promotion date is between startdt(i) and startdt(i+1) and output to a new variable with "Start_DT(i)" ? Thanks.

  8. Mark
    Posted October 10, 2013 at 1:09 pm | Permalink

    Perhaps this is not something the DO LOOP command does, but I am looking for an elegant way to repeat an analysis, though changing the outcome in each analysis. I'm using proc glimmix and need to model ~50 outcomes and would like to avoid the tedium of waiting for the run to complete, changing the outcome, running again... If the loop command doesn't do this, can anyone refer me to the command (or macro) that does (if there is one)? Thanks, Mark

    • Posted October 10, 2013 at 2:04 pm | Permalink

      1) It sounds like you want to use the BY statement, which loops over all values of a discrete variable such as US states or the numbers 1-50.
      2) If you are doing a simulation, you might want to read about the advantage of the BY statement over macro loops.
      3) If I haven't answered your question, post your question to the SAS Support Community

      • Mark
        Posted October 11, 2013 at 7:46 am | Permalink

        Hi Rick - Thank you for your prompt reply. I don't think the BY statement will do what I'm looking for: I need to run models across different outcome variables, not different values of one outcome variable. Thanks for the reference to the SAS Support Community; I'll try that next. Best, M

        • Posted October 11, 2013 at 8:06 am | Permalink

          OK, good luck. If the explanatory variables and form of the model are the same for each, you can use a macro %DO loop, but I still think the most efficient option is to concatenate the data set and use a BY statement. Instead of N obsevations that look like
          Y1 Y2 ... Y50 X1 X2 .. Xk
          you will stack the data and have 50*N observations that look like this:
          1 Y1 X1 X2 .. Xk
          ...
          2 Y2 X1 X2 .. Xk
          ...
          50 Y50 X1 X2 .. Xk

  9. Chung
    Posted November 7, 2013 at 9:15 pm | Permalink

    Rick I am wondering (1) can I use the do -loop when i have got n variables, lets say X1,X2,...,Xn, and i would like to introduce a new variable Z s.t. Z=X1+X2+X3+...+Xn
    (2) If there is any hidden(or obvious that I have not notice) problem when I delete the continue word and the whole leave statement in the following programme written:
    "data q1;
    i = 100;
    do while (1);
    i = i - 2;
    if mod(i,6)=0 then do;
    output;
    continue;
    end;
    if mod(i,4)=0 then do;
    output;
    continue;
    end;
    if mod(i,7)=0 and mod(i,5)=0 then leave;
    end;
    run;"

    the thoughts are driving me crazy!

  10. nikhil
    Posted February 3, 2014 at 9:25 am | Permalink

    Hi Rick ,

    my query is i have table ABC in which one column name is table_name which contains n different tables names. I want to create a separate data sets for the corresponding table names. can you please help me with this.

    • Posted February 3, 2014 at 9:33 am | Permalink

      See my previous response. Go to the SAS Support Communities to get answers to questions like this.

  11. Amy
    Posted May 6, 2014 at 9:56 am | Permalink

    Hi Rick,

    I have a dataset with 3 columns, Customer ID, Product Key and Sales. Could you please advise how I can convert the table into something like the following:

    Customer ID, Key1 _Sales, Key2_Sales, Key3_Sales, and so on.

    Thank you for your help in advance.
    Amy

  12. momoe
    Posted June 20, 2014 at 1:24 am | Permalink

    I am a new user in SAS. When I am using DO-LOOP in SAS, I get a hesitation with the use in SET dataset. Simply speaking, I would like to know the following difference in the result:

    **** input data source;
    data a;
    input name $1-10 test 12-15 score 17-22;

    datalines;
    Kitty 1 70
    David 1 80
    May 1 95
    David 3 .
    May 3 95
    David 2 85
    Kitty 3 .
    Kitty 2 90
    May 2 95
    ;
    proc sort; by name test;
    run;

    *e.g. 1 : set dataset outside do-loop, and 9 observations are included in data temp;
    data temp;
    set a;
    do x=1 to 10;
    y=x;
    end;
    run;

    *e.g. 2 : set dataset inside do-loop, and 1 observation (_N_=10) is included only in data temp;
    data temp;
    do x=1 to 10;
    set a;
    y=x;
    end;
    run;

    • momoe
      Posted June 20, 2014 at 1:25 am | Permalink

      May I thank those who can solve my weakness in SAS.

    • Posted June 20, 2014 at 6:32 am | Permalink

      That's a good question. Understanding these cases requires a discussion of the implicit loop in the DATA step, and how SET, DO loops, and the OUTPUT statement interact. I suggest you post this and future questions to the SAS Support Community.

  13. Dusty Simmons
    Posted July 9, 2014 at 10:18 am | Permalink

    Rick,
    I want to read a file sequentially, and then for each row on that file, using data from that row, perform an SQL proc to retrieve data associated with that data, to write out all of the data.
    IE: how do I loop through the SQL proc for each row on the input sequential file?

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>