Way back when I learned to program, I remember a computer instructor explaining that an IF-THEN statement can be a relatively slow operation. He said "If a multiplication takes one unit of time, an IF statement requires about 70 units." I don't know where his numbers came from, or even if they were correct, but I know that I spent a lot of time avoiding IF statements in order to make my programs run as fast as possible. I learned to write expressions such as y = (x>0) rather than the equivalent statement if (x>0) then y=1; else y=0.
Modern computers are much faster than they were back then, but old habits die hard. The following program was written by a colleague and examines the relative performances of a DO loop (=slow) versus a call to the CUSUM function (=fast) on a vector with one million elements. When I saw the program, I could hear my old instructor's voice whispering in my ear....watch out for those IF statements....
proc iml; N = 1e6; z = rannorm(j(N,1)); /* generate random values */ y = j(N, 1, .); /* time DO loop with compute cumulative sum (inefficient) */ t0 = time(); do i = 1 to n; if i = 1 then y[i] = z[i]; else y[i] = y[i-1] + z[i]; end; tDO_IF = time() - t0; /* time vectorized computation (fast) */ t0 = time(); x = cusum(z); tCUSUM = time() - t0;
As expected, the CUSUM function is much faster than writing a DO loop (the times are in the table that follows), which is why I always try to vectorize computations. That result was not surprising, but what caught my eye was the IF-THEN statement sitting inside of the DO loop. It occurred to me that part of the time spent by the first computation was because the IF-THEN statement is called one million times. To make a fairer comparison between the time taken by the DO loop and the CUSUM function, you can eliminate the IF-THEN statement entirely:
t0 = time(); y=z; /* initialize */ do i = 2 to n; y[i] = y[i-1] + z[i]; end; tDO = time() - t0; print tDO_IF tDO tCUSUM;
The table shows that omitting the IF-THEN statement results in the DO loop running 20% faster.
What if the IF-THEN statement is unavoidable?
You can't always eliminate an IF-THEN statement, but when you have an IF-THEN/ELSE statement inside a loop, you can usually restructure your program by writing two loops, one for when the IF condition is TRUE, and one for when it is FALSE. For example, consider the following statements that (inefficiently) compute two sums: the sum of the positive elements in a vector, and the sum of the negative elements:
/* loop that "needs" an IF-THEN statement (?) */ /* IF-THEN inside loop...SLOW! */ t0 = time(); posSum = 0; negSum = 0; do i = 1 to n; if z[i] < 0 then negSum = negSum + z[i]; else posSum = posSum + z[i]; end; tDO_IF = time() - t0; /* rewrite as two loops: one over positive elements and one over negative */ t0 = time(); negIdx = loc(z<0); posIdx = loc(z>0);; negSum = 0; do i = 1 to ncol(negIdx); negSum = negSum + z[negIdx[i]]; end; posSum = 0; do i = 1 to ncol(posIdx); posSum = posSum + z[posIdx[i]]; end; t2DO_IF = time() - t0;
Of course, neither of these examples are as efficient as using a vectorized computation that calls the SUM function (see below). However, once again we observe a 20% improvement in the performance of the code by removing IF-THEN statements from inside the DO loop, as shown by the following table:
/* vectorized computation (fast) */ t0 = time(); negIdx = loc(z<0); negSum = sum(z[negIdx]); posIdx = loc(z>0); posSum = sum(z[posIdx]); tCUSUM = time() - t0; print tDO_IF t2DO_IF tCUSUM;
Avoiding IF-THEN statements in a loop is just one of several ways to optimize loop performance. In general, you should remove any conditions that do not change inside of the loop. This is sometimes called "loop hoisting." For some compiled programming languages such as FORTRAN and C/C++, an optimizing compiler can sometimes determine that an expression is constant and move it outside of the loop for you.
So remember this little efficiency tip: avoid unnecessary statements inside loops. If it is easy to restructure the code to remove an IF-THEN statement inside a loop, or to move a constant expression outside of a loop, do so. It will pay off in terms of efficiency.
Do you have an efficiency tip to share? Post a comment.