Why am I surprised that people found this surprising?

Must be something in the water. I dunno.

I reported on a simple program written to demonstrate roundoff error accumulation for a class I taught (the day job does teach parallel programming with OpenMP/MPI/… as we are contracted to), and why people should pay very careful attention to it when using floating point arithmetic.

Before I go into this, it is worth stealing some text from one of my slides on this. Computers know integers. They do not know real numbers. They know floating point numbers, which are approximations to real numbers. This distinction is important, and if you don’t pay attention to it, it will bite you. Hard.

That said, it is time to go down this short rabbit hole. It seems to be surprising for a fair chunk of programmers, so it is worth a read just to see it if you haven’t. This is by the way, one of a group of potential pitfalls to be avoided in parallel programming, or any programming system that allows you to re-order the way you perform mathematical operations.

Lets start out with a very simple C program

#include <stdio .h>
int main()
float x=0.0, y=0.0, one=1.0;
int i,N;

N = 100000000;

for(i=1;i< =N;i++) { x += one /(float)i; } for(i=N;i>=1;i--)
y += one /(float)i;

printf("[index increasing] sum = %10.7f\n",x);
printf("[index decreasing] sum = %10.7f\n",y);

Notice the two inner loops. The C syntax for the first one reads approximately

start at i=1, continue to loop while i is less than or equal to N, and increment i on each loop.

The second loop is the same as the first, though we reverse the index (sum it in the opposite order). Same series.

(Yes, it is a divergent series, but for these purposes, it should not matter as we are taking the same group of terms, and summing them … the grouping isn’t changing, but the order of summation is).

Now compile and run it with your favorite compiler.

$ gcc sum_bad.c -o sum_bad.exe
$ ./sum_bad.exe
[index increasing] sum = 15.4036827
[index decreasing] sum = 18.8079185

Neat, huh.

This is roundoff error. Remember, floating point numbers are not real numbers. You can infinitely subdivide the real number line. It has “infinite” resolution. You cannot do that with floating point numbers. They have finite resolution.

All we did is change the order of summation. Which, when you write parallel code, you are effectively doing. Any time you change the order, you will impact roundoff error.

I have been showing this to researchers for years, and most have not seen/heard of it. Many people think that the numbers out of computers are exact (they aren’t). Many aren’t aware of the side effects of the design decisions that go into machines. Designs are compromises. It helps to have a really good sense of what is being compromized so you understand what you are dealing with.

These are not corner cases. These are cases people deal with on a daily basis.

Sure, some people will note that this is single precision calculation. What if you change it to double precision? The answers will be identical then … right? Just use double precision?

#include <stdio .h>

int main()
double x=0.0, y=0.0, one=1.0;
int i,N;

N = 100000000;

for(i=1;i< =N;i++) { x += one /(double)i; } for(i=N;i>=1;i--)
y += one /(double)i;

printf("[index increasing] sum = %20.17f\n",x);
printf("[index decreasing] sum = %20.17f\n",y);


$ gcc sum_bad2.c -o sum_bad2.exe
$ ./sum_bad2.exe
[index increasing] sum = 18.99789641385255479
[index decreasing] sum = 18.99789641385344652

You still see the roundoff error, though it is much less of a factor than before. More precision does help, but it also helps to know something about why it helps.

This is not saying run everything double precision, rather, you need to pay attention to your algorithms, all of your algorithms, any time you change order of operations. The results can, and will surprise you.

Viewed 12776 times by 3019 viewers