Friends don't let friends concatenate results inside a loop

1

Friends have to look out for each other.

Sometimes this can be slightly embarrassing. At lunch you might need to tell a friend that he has some tomato sauce on his chin. Or that she has a little spinach stuck between her teeth. Or you might need to tell your friend discreetly that he should examine his zipper before he presents at the staff meeting.

Yes, it can be awkward to speak up. However, it would be more embarrassing to my friends if I don't tell them these things. What kind of a friend would I be if I allow them to walk around all day with food on their face or an unzipped fly?

I consider many of my readers to be friends, so let me whisper a little secret in your ear: "When you program in a matrix-vector language such as SAS/IML, do not concatenate results inside a loop."

It is the programming equivalent of having spinach in your teeth. People will see it and think "how could he have missed such an obvious thing?" It makes you look bad. Some will snicker.

Concatenation inside a loop is a "rookie mistake" that experienced statistical programmers avoid. The better way to accumulate a vector or matrix of results is to allocate an array for the results prior to the loop and fill the array inside the loop.

The following example illustrates why it is more efficient to use the "allocate and fill" method instead of dynamically growing an array by using concatenation. The programmer wants to call the ComputeIt function 10,000 times inside a loop. The function returns a row vector with 100 elements. The first group of statements times how long it takes to dynamically grow the matrix of results. The second group of statements times how long it takes to allocate the matrix outside the loop and fill each row inside the loop:

proc iml;
NumIters = 10000;
start ComputeIt(n);
   return( n:(n+99) );
finish;
 
/* time how long it takes to dynamically grow an array */
t0 = time();
free result;
do i = 1 to NumIters;
   result = result // ComputeIt(i);     /* inefficient */
end;
tConcat = time() - t0;
 
/* same computation, but allocate array and fill each row */
t0 = time();
result = j(NumIters, 100);
do i = 1 to nrow(result);
   result[i,] = ComputeIt(i);           /* more efficient */
end;
tAlloc = time() - t0;
 
print tConcat tAlloc;
concatalloc

For this example, forming the result matrix by using concatenation takes almost 3 seconds. Filling rows of a pre-allocated array takes about 0.02 seconds for the same computation. The computations and the results are identical, but one method is one hundred times faster.

The results are not always this dramatic. If the ComputeIt function returns a tiny vector (such as return(2#n);), then the difference in run time is not as extreme. Similarly, if the ComputeIt function takes a long time to run, then it won't matter much whether you concatenate or pre-allocate the result. If each iteration of the loop takes 1 second, then the difference between 10,003 seconds and 10,000 seconds is inconsequential.

Nevertheless, I recommend that you adopt the "allocate and fill" technique as a good programming habit. Good habits pay off and make you look professional. And if someone points to your program and asks, "Why are you doing it this way?", you can pull them aside and whisper, "Friend, let me tell you a secret...."

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Back to Top