Programming tip: Avoid testing floating-point values for equality

No matter what statistical programming language you use, be careful of testing for an exact value of a floating-point number. This is known in the world of numerical analysis as "10.0 times 0.1 is hardly ever 1.0" (Kernighan and Plauger, 1974, The Elements of Programming Style).

There are many examples of arithmetic operations with finite precision numbers that can serve as cautionary tales, but the one that I saw recently was a DO loop in SAS that looked something like the following:

data A;
x = 0;
do i = 1 to 20 until(x=1); /* sometimes WRONG */
   x = x + 0.1;
   output;
end;
run;

Of course, the original code didn't have a comment that shows where the error is! The programmer clearly intends for the loop to exit after 10 iterations when x is 1. However, because of finite precision arithmetic and the fact that 0.1 is not representable in binary, the value of x is never exactly equal to 1.

Practice "defensive programming": don't use exact comparisons in IF-THEN, DO-UNTIL, or DO-WHILE statements. A better way to code the previous loop is to use an inequality. In this example, the iteration could stop when x is greater than 1 – 0.1 / 2 = 0.95. If you don't know the exact amount that x will be incremented, use some quantity related to "machine epsilon," as follows:

data A;
x = 0;
eps = constant("SQRTMACEPS"); /* or 100*constant("MACEPS") ... */
do i = 1 to 20 until(x > 1-eps);
   x = x + 0.1;
   output;
end;
run;

As I've said, these examples are ubiquitous in scientific programming, and this is a popular topic on "tip of the day" services such as @RLangTip on Twitter. So regardless of the programming language that you use, avoid the exact comparison of floating-point numbers.

tags: Getting Started, SAS Programming, Tips and Techniques

5 Comments

  1. Posted June 25, 2012 at 9:25 am | Permalink

    eps = constant("MACEPS") works. Why would you use 100*constant("MACEPS")? I'm guessing it depends on the physical machine / OS?

    • Posted June 25, 2012 at 9:33 am | Permalink

      The constant you should use depends on how close the approximate and exact answers should be before you declare "success" for the numerical approximation. For a simple addition, MACEPS will often work. For something like a numerical solution of a system of linear equations, A*x=b, you might want a factor of 1000*MACEPS or, even better, a factor of k*MACEPS where k depends on the condition number of the matrix A. There is no single constant that will work for all situations.

  2. raisins25
    Posted June 27, 2012 at 5:33 pm | Permalink

    Hi Rick,
    I am struggling with a floating point issue right now..I am trying to flag a set of records using their meidan value as the cutoff and one record which has exact value as the median is falling in Greater than median value bucket..to understand what is going on, I tried to do the difference betwen the two vars and am getting it as -2.22045E-16..
    Thanks,
    R

    • Posted June 27, 2012 at 6:33 pm | Permalink

      If the median values are random (like in a simulation), it won't matter where any one value falls, but you can prevent the situation you describe by using the ideas in this blog post. Instead of comparing
      if x < value then...
      you can compare
      if x < value+epsilon then...
      where epsilon is a suitable small number.

  3. raisins25
    Posted June 28, 2012 at 12:31 pm | Permalink

    Thank you! it worked!!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>