Loops in SAS

90
Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns. Today is the first anniversary of this blog, which is named The DO Loop, so it seems appropriate to blog about DO loops in SAS. I'll describe looping in the SAS DATA step and compare it with looping in the SAS/IML language.

Loops in SAS

Loops are fundamental to programming because they enable you to repeat a computation for various values of parameters. Different languages use different keywords to define the iteration statement. The most well-known statement is the "for loop," which is used by C/C++, MATLAB, R, and other languages. Older languages, such as FORTRAN and SAS, call the iteration statement a "do loop," but it is exactly the same concept.

DO loops in the DATA step

The basic iterative DO statement in SAS has the syntax DO value = start TO stop. An END statement marks the end of the loop, as shown in the following example:
data A;
do i = 1 to 5;
   y = i**2; /* values are 1, 4, 9, 16, 25 */
   output;
end;
run;
By default, each iteration of a DO statement increments the value of the counter by 1, but you can use the BY option to increment the counter by other amounts, including non-integer amounts. For example, each iteration of the following DATA step increments the value i by 0.5:
data A;
do i = 1 to 5 by 0.5;
   y = i**2; /* values are 1, 2.25, 4, ..., 16, 20.25, 25 */
   output;
end;
run;
You can also iterate "backwards" by using a negative value for the BY option: do i=5 to 1 by -0.5.

DO loops in SAS/IML Software

A basic iterative DO statement in the SAS/IML language has exactly the same syntax as in the DATA step, as shown in the following PROC IML statements:
proc iml;
x = 1:4; /* vector of values {1 2 3 4} */
do i = 1 to 5;
   z = sum(x##i); /* 10, 30, 100, 354, 1300 */
end;
In the body of the loop, z is the sum of powers of the elements of x. During the ith iteration, the elements of x are raised to the ith power. As mentioned in the previous section, you can also use the BY option to increment the counter by non-unit values and by negative values.

Variations on the DO loop: DO WHILE and DO UNTIL

On occasion, you might want to stop iterating if a certain condition occurs. There are two ways to do this: you can use the WHILE clause to iterate as long as a certain condition holds, or you can use the UNTIL clause to iterate until a certain condition holds. You can use the DO statement with a WHILE clause to iterate while a condition is true. The condition is checked before each iteration, which implies that you should intialize the stopping condition prior to the loop. The following statements extend the DATA step example and iterate as long as the value of y is less than 20. When i=4, the WHILE condition is not satisfied, so the loop iterates again.
data A;
y = 0;
do i = 1 to 5 by 0.5 while(y < 20);
   y = i**2; /* values are 1, 2.25, 4, ..., 16, 20.5 */
   output;
end;
run;
You can use the iterative DO statement with an UNTIL clause to iterate until a condition becomes true. The UNTIL condition is evaluated at the end of the loop, so you do not have to initialize the condition prior to the loop. The following statements extend the PROC IML example. The iteration stops after the value of z exceeds 200.
proc iml;
x = 1:4;
do i = 1 to 5 until(z > 200);
   z = sum(x##i); /* 10, 30, 100, 354 */
end;
In these examples, the iteration stopped because the WHILE or UNTIL condition was satisfied. If the condition is not satisfied when i=5 (the last value for the counter), the loop stops anyway. Consequently, the examples have two stopping conditions: a maximum number of iterations and the WHILE or UNTIL criterion. SAS also supports a DO WHILE and DO UNTIL syntax that does not involve using a counter variable. It is worth noting that a DO loop with an UNTIL clause always executes at least one time because the condition is evaluated at the end of the loop. To prevent this behavior, use a DO loop with a WHILE clause.

Looping over a set of items (foreach)

Some languages support a "foreach loop" that iterates over objects in a collection. SAS doesn't support that syntax directly, but there is a variant of the DO loop in which you can iterate over values in a specified list. The syntax in the DATA step is to specify a list of values (numeric or character) after the equal sign. The following example iterates over a few terms in the Fibonacci sequence:
data A;
do v = 1, 1, 2, 3, 5, 8, 13, 21;
   y = v/lag(v);
   output;
end;
run;
The ratio of adjacent values in a Fibonacci sequence converges to the golden ratio, which is 1.61803399.... The SAS/IML language does not support this syntax, but does enable you to iterate over values that are contained in a vector (or matrix). The following statements create a vector, v, that contains the Fibonacci numbers. An ordinary DO loop is used to iterate over the elements of the vector. At the end of the loop, the vector z contains the same values as the variable Y that was computed in the DATA step.
proc iml;
v = {1, 1, 2, 3, 5, 8, 13, 21};
z = j(nrow(v),1,.); /* initialize ratio to missing values */
do i = 2 to nrow(v);
   z[i] = v[i]/v[i-1];
end;

Avoid unnecessary loops in the SAS/IML Language

I have some advice on using DO loops in SAS/IML language: look carefully to determine if you really need a loop. The SAS/IML language is a matrix/vector language, so statements that operate on a few long vectors run much faster than equivalent statements that involve many scalar quantities. Experienced SAS/IML programmers rarely operate on each element of a vector. Rather, they manipulate the vector as a single quantity. For example, the previous SAS/IML loop can be eliminated:
proc iml;
v = {1, 1, 2, 3, 5, 8, 13, 21};
idx = 2:nrow(v);
z = v[idx]/v[idx-1];
This computation, which computes the nonmissing ratios, is more efficient than looping over elements. For other tips and techniques that make your SAS/IML programs more efficient, see my book Statistical Programming with SAS/IML Software.
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

90 Comments

  1. it have an algorithm likely the last one but for DO WHILE?.

    i.e.
    proc iml);
    v = {1, 1, 2, 3, 5, 8, 13, 21};
    idx = 2:nrow(v);
    z = v[idx]/v[idx-1];

    but for DO WHILE. For example i think using the LOC function.... Note: In the algorithm the ")" is wrong...

    Thaks.

  2. Hello;
    I have a small problem which I think can be solved by loops in SAS, but I don't know how, I appreciate if you help me with that:

    I have a data set in which if a value is missing then it shall be assigned the nearest non missing value in the previous observations, for example, 6 rows of my data looks like this:
    1
    .
    .
    2
    .
    4
    I want to assign the missing values so that it would read like this
    1
    1
    1
    2
    2
    4
    Can SAS do that for me?

  3. sureshkumar on

    i have a question:
    data _null_;
    whereishi=0;
    do until(whereishi=0);
    if whereishi=0 then put "The End";
    end;
    run;

    even though the do loop is satisfying intially with zero still it is looping,i could see in log The end .How come?

    • Rick Wicklin

      As the article says, "the UNTIL condition is evaluated at the end of the loop." This implies that an UNTIL loop always executes at least once. In contrast, a WHILE loop will not execute if the condition is false.

  4. Apoorv Mishra on

    Why this code is not working, Basically I am splitting the data, When I was trying to print one of them, its showing the data set is not created, could anyone help on the same.

    %MACRO DO_SQL;
    %DO i = 1 %TO 10;
    PROC SQL;
    CREATE TABLE itemdef_&i AS
    SELECT var2
    FROM three
    WHERE unique_key = "itemdef.&i";
    QUIT;
    %END;
    %MEND DO_SQL;

    PROC PRINT DATA = itemdef_1;
    RUN;

  5. I have a column for each employee start date (startdt1 thru startdt12) in a new position, and a column for their last promotion date. How can I create a loop that will tell me when the promotion date is between startdt(i) and startdt(i+1) and output to a new variable with "Start_DT(i)" ? Thanks.

  6. Perhaps this is not something the DO LOOP command does, but I am looking for an elegant way to repeat an analysis, though changing the outcome in each analysis. I'm using proc glimmix and need to model ~50 outcomes and would like to avoid the tedium of waiting for the run to complete, changing the outcome, running again... If the loop command doesn't do this, can anyone refer me to the command (or macro) that does (if there is one)? Thanks, Mark

      • Hi Rick - Thank you for your prompt reply. I don't think the BY statement will do what I'm looking for: I need to run models across different outcome variables, not different values of one outcome variable. Thanks for the reference to the SAS Support Community; I'll try that next. Best, M

        • Rick Wicklin

          OK, good luck. If the explanatory variables and form of the model are the same for each, you can use a macro %DO loop, but I still think the most efficient option is to concatenate the data set and use a BY statement. Instead of N obsevations that look like
          Y1 Y2 ... Y50 X1 X2 .. Xk
          you will stack the data and have 50*N observations that look like this:
          1 Y1 X1 X2 .. Xk
          ...
          2 Y2 X1 X2 .. Xk
          ...
          50 Y50 X1 X2 .. Xk

  7. Rick I am wondering (1) can I use the do -loop when i have got n variables, lets say X1,X2,...,Xn, and i would like to introduce a new variable Z s.t. Z=X1+X2+X3+...+Xn
    (2) If there is any hidden(or obvious that I have not notice) problem when I delete the continue word and the whole leave statement in the following programme written:
    "data q1;
    i = 100;
    do while (1);
    i = i - 2;
    if mod(i,6)=0 then do;
    output;
    continue;
    end;
    if mod(i,4)=0 then do;
    output;
    continue;
    end;
    if mod(i,7)=0 and mod(i,5)=0 then leave;
    end;
    run;"

    the thoughts are driving me crazy!

  8. Hi Rick ,

    my query is i have table ABC in which one column name is table_name which contains n different tables names. I want to create a separate data sets for the corresponding table names. can you please help me with this.

  9. Hi Rick,

    I have a dataset with 3 columns, Customer ID, Product Key and Sales. Could you please advise how I can convert the table into something like the following:

    Customer ID, Key1 _Sales, Key2_Sales, Key3_Sales, and so on.

    Thank you for your help in advance.
    Amy

  10. I am a new user in SAS. When I am using DO-LOOP in SAS, I get a hesitation with the use in SET dataset. Simply speaking, I would like to know the following difference in the result:

    **** input data source;
    data a;
    input name $1-10 test 12-15 score 17-22;

    datalines;
    Kitty 1 70
    David 1 80
    May 1 95
    David 3 .
    May 3 95
    David 2 85
    Kitty 3 .
    Kitty 2 90
    May 2 95
    ;
    proc sort; by name test;
    run;

    *e.g. 1 : set dataset outside do-loop, and 9 observations are included in data temp;
    data temp;
    set a;
    do x=1 to 10;
    y=x;
    end;
    run;

    *e.g. 2 : set dataset inside do-loop, and 1 observation (_N_=10) is included only in data temp;
    data temp;
    do x=1 to 10;
    set a;
    y=x;
    end;
    run;

  11. Dusty Simmons on

    Rick,
    I want to read a file sequentially, and then for each row on that file, using data from that row, perform an SQL proc to retrieve data associated with that data, to write out all of the data.
    IE: how do I loop through the SQL proc for each row on the input sequential file?

  12. Hi all,

    I have data for stocks (stock prices, returns, abnormal returns and cumulative abnormal returns) for the period 31 Jan 2007 and 31 Dec 2013. I have further given each date a stock Month Number (StockMonthNumber).

    So, I now want sas to given me the mean, median, t-stat and p-value for my CumulativeAbnormalReturns only for up to month 6. How do I do this?

  13. Do I say:

    Proc univariate date=work.thokozani;
    var CummulativeAbnormalReturns;
    Where StockMonthNumber<=6;
    run;

  14. I want to simulate a bivariate dataset (say weight vs blood pressure) with normal distribution, after which I would like to create a missing completely at random scenario with certain observations in the blood pressure variables missing. I would then like to perform mean imputation (or multiple imputation) for the missing observations to compare the mean and variance between the original (before missingness was induced) and the imputed dataset. I intend to do this multiple of times to get the average mean and variance for the imputed data, for the comparison with the original. I feel I could do this using SAS but can't figure out how, because I am a very basic user.
    Could any one help figure out how I could do that please?

  15. Arijit Chaki on

    If i want to run the do loop such that instead of i=1 to 12 i want to make it i=jan to dec. Is it possible to do this. If yes then how? Cz i have tried a lot but not happening. Plz do rply sir.

  16. Hi,

    I am new to SAS and need help with properly using the DO loop. I am doing a simulation in SAS, and I'm using R to simulate the DATA and then bring it back to SAS to use specific functions.

    Here is my code. My problem is I am finding it difficult using the DO loop to reiterate over the code as many times I would like while saving specific parameters every time it goes through a single iteration. Any help with this would help. Thanks!

    data Meta_Results5;
    do i= 1 to 3;

    proc iml; /* using the SAS/IML environment to run the R code */
    submit / R;
    ...

  17. My turn to ask a question:
    I need to write the line below for each year from 1997 to 2010:
    filename r1997 pipe ' R:\7za.exe e "R:\Data\registry1997.B.dat.gz" "registry1997.dat" -y -so '; data r1997; infile r1997; &i; run;

    Is there a way to write it only once in a loop? I really appreciate any help.

  18. Hi!
    I often follow your advises on sas community. Here I see you are posting about do loops. Today I was playing with some data and found out a problem. I have 5 data sets in txt format named roh_1.txt, roh_2.txt, roh_4.txt, roh_8.txt and roh_16.txt
    I would like to make macro which will allow me to import those into SAS. I am familiar with infile option, but having issues with making macro. Of course I could rename files to be 1 to 5 but... Well I don't want to. Can you please give some hints.
    Thanks

  19. Pingback: Do Loops, Probabilities and Simulations | Analytics

  20. i have a question
    there are some missing in data , i calculate -((xi-xj)^2)^0.5 for x2&x3 (contain complete data)
    x1 x2 x3
    1 3 2
    2 1
    3 2 1
    4 1

    how i can write this on sas?

    • Rick Wicklin

      It's not clear if you are asking about how to represent missing values or how to write the computation. The dot (.) is the missing value in SAS, so you can create the data set like this:

      data A;
      input x1 x2 x3;
      datalines;
       1 3 2
       . 2 1
       3 2 1
       . 4 1
       ;

      If you have a question about the computation, post it to a SAS Support Community

  21. I have a sas dataset as below.
    fix var1 var2 var3
    1 a b c
    2 p q r

    But I want to create a sas dataset as below
    1 a
    1 b
    1 c
    2 p
    2 q
    2 r
    Anybody can help me that to write a sas code

  22. Do you have a template for a do loop to do the following: I need to filter a large file by ID and then export the files into a folder, the files named with ID.

    • Rick Wicklin

      The basic template does not require a DO loop:

      data out1 out2 out3;
       set bigfile;
       if ID=1 then output out1;
       else if ID=2 then output out2;
       else if ID=3 then output out3;
       run;

      However, there are many reasons why you should NOT split data sets. For some of the reasons (and other techniques that split data sets), see "Split data into subsets."

  23. Confusing.
    data A;
    y = 0;
    do i = 1 to 5 by 0.5 while(y < 20);
    y = i**2; /* values are 1, 2.25, 4, 6.25, ..., 16 */
    output;
    end;
    run;

    but the output reads:
    y i
    1 1
    2,25 1,5
    4 2
    6,25 2,5
    9 3
    12,25 3,5
    16 4
    20,25 4,5

    • Rick Wicklin

      Since the numbers are correct, I assume that you are confused that the Y variable is printed first and the i variable is second?
      The DATA step creates variables in the order that it sees them. The first statement sets y=0, so that that is why Y appears first in the data.
      You can use PROC PRINT to print the data in any order:
      proc print data=A; var i y; run;
      You can also delete the "y=0;" statement and the i variable will appear first.

      • Hi,
        No, what might confuse is:
        /* values are 1, 2.25, 4, 6.25, ..., 16 */ in the comment.
        The output data set contains the value 20.25 as well.

        I see I used: , as comma instead of: . in my first comment.
        Maybe that confused you :).

  24. James Walker on

    I use this type of code frequently for making changes in place on a large dataset in reverse, since the obs I'm normally updating are at the bottom and then I stop once the record has been found, but I can't get the code to work for just removing a record any thoughts?

    data CRRG.CRRG;
    do k = nobs to 1 by -1;
    modify CRRG.CRRG nobs= nobs point=k;
    if arrangement_key ='yyyyy' and as_of_date = '30Jun2016'd then do;
    CUST_NUM = 'xxxxxx';
    replace;
    stop;
    end;
    end;
    run;

    I tried

    data CRRG.CRRG;
    do k = nobs to 1 by -1;
    modify CRRG.CRRG nobs= nobs point=k;
    if arrangement_key ='yyyy' and as_of_date = '30Jun2016'd then remove;
    replace;
    stop;
    end;
    run;

  25. Question:

    When using two or more Do Loops should they always be NESTED or can they OVERLAP?

    DO /* Start of Do Loop One */
    SAS Statements for Loop One ;

    DO /* Start of Do Loop Two */
    SAS Statements for Loop Two ;
    END ; /* End of Do Loop Two */

    More SAS Statements for Loop One ;
    END ; /* End of Do Loop One */

    /*******************************************************************************/
    DO /* Start of Do Loop One */
    SAS Statements for Loop One ;

    DO /* Start of Do Loop Two */
    SAS Statements for Loop Two ;

    SAS Statements for Loop One ;

    END ; /* End of Do Loop One */

    More SAS Statements for Loop Two ;

    END ; /* End of Do Loop Two */

    If it is legal to overlap Do Loop structures, when would you need to do so?

    Regards

    • Rick Wicklin

      The END statement always matches the closest DO staetment, so loops are always nested and never "overlap." This same fact is true in all computer languages that I am familiar with.

  26. I have to use the value of a formula calculated in one row and then use the answer to calculate the value for the next row.
    Foe eg: when the first instance is encountered it calculates the formalue and after that keeps re-using the pred value in the next few time - How can do it?
    data starting_point_3;
    set starting_point_2;
    by key;
    firstKey=first.key;
    if firstKey then pred=(intercept+coeff_lag*pred_last_week);
    run;

  27. Please i have ten tables created from
    %n=1 %to &nobs
    data test&n. ..........

    So now i want to have all tables in one table,
    I use proc append. it works but i some variables aren't found in the base.
    So can i think i can use proc iml and do loops to create matrix and concatenate horizontally.
    I don't know how proceed.
    Can you help me please?

      • %macro te();
        %do n=1 %to 10;
        data test&n. ( keep = id &idn. y1 yd yd1 - yd&p.);
        set mytable (where=(id=&n.)) nobs=nobs;
        N=nobs;
        array un {*} yd yd1 - yd&p.;
        do j=2 to dim (un);
        un {j}= lag(un{j-1});
        end;drop j;
        run;
        %end;
        %macro te();

        I get 10 datasets but i don't know how proceed to concatenate without lose variables because with proc append some variables aren't added.

        Thanks,

  28. i have a problem to solve this
    A= {1 2 3 4 5}
    B= {1 2 1}

    step 1: A= {1 2 3 4 5}
    B= {1 2 1}
    step 2: A= {1 2 3 4 5}
    B= {1 2 1}
    step 3: A= {1 2 3 4 5}
    B= {1 2 1}

    please tell me the method & which loop i used to solve this.
    if you get that method kindly reply soon & after that tell me if the length of A & B is increasing means we dont know the length then which method should i use?

  29. Hello, Rick,

    I tried transforming the followings into a more efficient code using DO LOOP, but i failed. Can you please give me some guide for this?

    data Input1;
    set Input1;
    if no= 2 then no= 2 + 0.1;
    run;
    data Input2;
    set Input2;
    if no= 2 then no= 2 + 0.2;
    run;
    data Input3;
    set Input3;
    if no= 2 then no= 2 + 0.3;
    run;

    I switched the above to the below.

    DO i= 1 to 3;
    data Input&i;
    set Input&i;
    if no= 2 then no= 2 + 0.1 * &i;
    run;

    End;

  30. Hello I would like to do a do loop over a list selected from a proc sql small data set. (in this case a 103 obs)

    proc sql;
    create table TERM_TBL_DIST as
    select DISTINCT strm||'*'||acad_career as key, STRM, ACAD_CAREER
    from &snapshotlib..PS_TERM_TBL
    WHERE ACAD_YEAR BETWEEN "&year_four." and "&year.";
    quit;
    proc print data = TERM_TBL_DIST noobs n;run;

    where my 'key' is the control and the data looks like:

    ACAD_
    key STRM CAREER

    2153*UGRD 2153 UGRD
    2153*VETM 2153 VETM
    2155*BUSN 2155 BUSN
    2155*GRAD 2155 GRAD
    2155*IALC 2155 IALC
    2155*MEDI 2155 MEDI
    2155*PHAR 2155 PHAR
    2155*UGRD 2155 UGRD
    2155*VETM 2155 VETM
    2157*BUSN 2157 BUSN
    2157*GRAD 2157 GRAD
    2157*IALC 2157 IALC
    2157*MEDI 2157 MEDI
    2157*PHAR 2157 PHAR
    2157*UGRD 2157 UGRD
    2157*VETM 2157 VETM
    2160*UGRD 2160 UGRD
    2163*BUSN 2163 BUSN
    2163*GRAD 2163 GRAD
    2163*IALC 2163 IALC
    2163*MEDI 2163 MEDI
    2163*PHAR 2163 PHAR
    2163*UGRD 2163 UGRD
    2163*VETM 2163 VETM
    2165*BUSN 2165 BUSN
    2165*GRAD 2165 GRAD

    I notice a part in the above information that made it looks like any random txt can make up the list of control var, but can I auto populate this list or do I have to prerun my list and stuff the data into a written out list in my code? If this is possible can someone show me how to get my data in to this list? TIA. -KJ

  31. Pingback: LEAVE and CONTINUE: Two ways to control the flow in a SAS DO loop - The DO Loop

  32. Hi,

    Suppose I have

    PROC MEANS data=s100 NOPRINT;
    VAR s100;
    CLASS school;
    OUTPUT OUT=s100(drop = _type_ _freq_) SUM=S100;
    RUN;
    data S100;
    set S100;
    population=100;
    if school=' ' THEN school='999999';
    RUN;

    And I want to loop from S100 to S200. How?

  33. Manjil Khambe on

    Hi

    I have employee code, date, totalhours(time format), weekno, now i want is to calculate sum of hours per week per person
    i.e in sql sum(hours) group by(week_no) ,
    please help.

    regards,

    • Rick Wicklin

      The best way to learn is by doing. I suggest you run the program here and look at the resulting data set. Then delete the OUTPUT statement, run it again, and look at the result. If you have further questions, ask at the SAS Support Communities.

  34. Hello,

    How could I iterate through all obs per obs in a SAS dataset? Example, I have a dataset with five obs. I'd like to take the first one and compare that one against all next observations. Then take the second obs, and compare it against all others, then the third compare against all, etc..

    I guess from a C stand point it'd be somehting like this:

    for (i=0;i<5;i++)
    {
    for (j=0; j <5; j++)
    {
    if (pin[i] == pin[j])
    {
    aux += 1;
    }
    }
    }

    • Rick Wicklin

      You can ask questions like this at the SAS Support Communities. Be sure to provide sample data in the form of a DATA step. In general, the SAS DATA step processes one observation at a time, so solutions to problems like this often involve transposing the data and using arrays in the DATA step, or using a procedure such as PROC SQL or PROC IML.

Leave A Reply

Back to Top