Generic way of handling fused-multiply-add floating-point inaccuracies

Generic way of handling fused-multiply-add floating-point inaccuracies - c++

Yesterday I was tracking a bug in my project, which - after several hours - I've narrowed down to a piece of code which more or less was doing something like this:
#include <iostream>
#include <cmath>
#include <cassert>
volatile float r = -0.979541123;
volatile float alpha = 0.375402451;
int main()
{
float sx = r * cosf(alpha); // -0.911326
float sy = r * sinf(alpha); // -0.359146
float ex = r * cosf(alpha); // -0.911326
float ey = r * sinf(alpha); // -0.359146
float mx = ex - sx; // should be 0
float my = ey - sy; // should be 0
float distance = sqrtf(mx * mx + my * my) * 57.2958f; // should be 0, gives 1.34925e-06
// std::cout << "sv: {" << sx << ", " << sy << "}" << std::endl;
// std::cout << "ev: {" << ex << ", " << ey << "}" << std::endl;
// std::cout << "mv: {" << mx << ", " << my << "}" << std::endl;
std::cout << "distance: " << distance << std::endl;
assert(distance == 0.f);
// assert(sx == ex && sy == ey);
// assert(mx == 0.f && my == 0.f);
}
After compilation and execution:
$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out
distance: 1.34925e-06
a.out: vfma.cpp:23: int main(): Assertion `distance == 0.f' failed.
Aborted (core dumped)
From my point of view something is wrong, as I've asked for 2 subtractions of two bitwise-identical pairs (I expected to get two zeroes), then squaring them (two zeroes again) and adding them together (zero).
It turns out that the root cause of problem is the use of fused-multiply-add operation, which somewhere along the line makes the result inexact (from my point of view). Generally I have nothing against this optimization, as it promises to give results which are more exact, but in this case 1.34925e-06 is really far from the 0 that I was expecting.
The test case is very "fragile" - if you enable more prints or more asserts, it stops asserting, because compiler doesn't use fused-multiply-add anymore. For example if I uncomment all lines:
$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out
sv: {-0.911326, -0.359146}
ev: {-0.911326, -0.359146}
mv: {0, 0}
distance: 0
As I've considered this to be a bug in the compiler, I've reported that, but it got closed with the explanation that this is correct behaviour.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79436
So I'm wondering - how should one code such calculations to avoid the problem? I was thinking about a generic solution, but something better than:
mx = ex != sx ? ex - sx : 0.f;
I would like to fix or improve my code - if there's anything to fix/improve - instead of setting -ffp-contract=off for my whole project, as fused-multiply-add is used internally in the compiler libraries anyway (I see a lot of that in sinf() and cosf()), so it would be a "partial work-around", not a solution... I would also like to avoid solutions like "don't use floating-point" (;

In general no: this is exactly the price you pay for using -ffp-contract=fast (coincidently, it is precisely this example that William Kahan notes in the problems with automatic contraction)
Theoretically, if you were using C (not C++), and your compiler supported C-1999 pragmas (i.e. not gcc), you could use
#pragma STDC FP_CONTRACT OFF
// non-contracted code
#pragma STDC FP_CONTRACT ON

Interestingly, thanks to fma, the floats mx and my gives you the rounding error that was made when multiplying r and cos.
fma( r,cos, -r*cos) = theoretical(r*cos) - float(r*cos)
So the result you get somehow indicates how far was the computed (sx,sy) from the theoretical (sx,sy), due to multiplication of floats (but not accounting from rounding errors in computations of cos and sin).
So the question is how can your program rely on a difference (ex-sx,ey-sy) which is within the uncertainty interval related to floating point rounding?

I can see this question has been around for a while, but in case other folks come across it looking for an answer I figured I'd mention a couple of points..
First, it's difficult to tell exactly without analyzing the resulting assembly code, but I suspect the reason that the FMA gives a result that is so far outside expectations is not just the FMA itself, but also that you are assuming all of the calculations are being done in the order you have specified them, but with optimizing C/C++ compilers this is often not the case. This is also likely why uncommenting the print statements changes the results.
If mx and my were being calculated as the comments suggest, then even if the final mx*mx + my*my were done with an FMA, it would still result in the expected 0 result. The problem is that since none of the sx/sy/ex/ey/mx/my variables are used by anything else, there is a good possibility that the compiler never actually evaluates them as independent variables at all, and simply mushes all the math together into a big mass of multiplications, adds, and subtracts to calculate distance in one single step, which can then be represented any number of different ways in machine code (in any order, potentially with multiple FMAs, etc) however it figures it will get the best performance for that one big calculation.
However, if something else (like a print statement) references mx and my, then it's much more likely the compiler will calculate them separately, before calculating distance as a second step. In that case, the math does work out the way the comments suggest, and even an FMA in the final distance calculation doesn't change the results (because the inputs are all exactly 0).
The Answer
But that doesn't actually answer the real question. In answer to that, the most robust (and generally recommended) way to avoid this sort of problem in general is: Never assume floating point operations will ever produce an exact number, even if that number is 0. This means that, in general, it's a bad idea to ever use == to compare floating point numbers. Instead, you should choose a small number (often referred to as epsilon), which is larger than any possible/likely accumulated error, but still smaller than any significant result (for example, if you know that the distances you care about are only really significant to a couple of decimal places, then you could choose EPSILON = 0.01, which will mean "any difference less than 0.01 we'll consider to be the same as zero"). Then, instead of saying:
assert(distance == 0.f);
you would say:
assert(distance < EPSILON);
(the exact value for your epsilon will likely depend on the application, and may even be different for different types of calculations, of course)
Likewise, instead of saying something like if (a == b) for floating point numbers, you would instead say something like if (abs(a - b) < EPSILON), etc.
Another way to reduce (but not necessarily eliminate) this problem is to implement "fail-fast" logic in your application. For example, in the above code, instead of going all the way through and calculating distance and then seeing if it's 0 at the end, you could "short-circuit" some of the math by testing if (mx < EPSILON && my < EPSILON) before you even get to the point of calculating distance and skipping the rest if they're both zero (since you know the result will be zero in that case). The quicker you catch the situation the less opportunity there is for errors to accumulate (and sometimes you can also avoid doing some more costly calculations in cases you don't need to).

Related

Eigen sum(), colwise().sum().sum() and rowwise().sum().sum() all give different answers

I have this example code:
#include <Eigen/Eigen>
#include <iostream>
int main() {
Eigen::MatrixXf M = Eigen::MatrixXf::Random(1000, 1000);
std::cout.precision(17);
std::cout << M.colwise().sum().sum() << std::endl;
std::cout << M.rowwise().sum().sum() << std::endl;
std::cout << M.sum() << std::endl;
}
I compile with the following command: (g++ version 7.3, but I have seen this with other compilers too)
g++ -O0 -o test -Ieigen-3.3.7 test.cc
And the output is
13.219823837280273
13.220325469970703
13.217720031738281
Shouldn't all these 3 values be the same? I am using no optimizations after all.

Your additions are basically a random walk, and the error you make is a different random walk (because you have roundoff error at almost every step). (Note that Eigen::MatrixXf::Random fills the matrix with random values in [-1, 1].)
Let's assume that you are, on average, at a float value of 10.0 (estimated only from that single data point you provided). Your epsilon (how much absolute rounding error you will probably make with any addition) is thus around 10.0 * 6e-8 (float epsilon is 2-23 or about 6e-8) or about 6e-7.
If you do N = 1000000 random error-accumulation steps of step size +6e-7 (or -6e-7), you have a good chance of ending up at around sqrt(N) * stepSize = 1000 * 6e-7 = 6e-4 (see here), which is not-too-coincidentally close to your 0.01%.
I would similarly estimate an absolute error of 1000 * 10 * 1e-16 = 1e-12 for the addition of 1 million random doubles between -1 and 1 due to floating point precision.
This is obviously not a rigorous mathematical treatment. It just shows that the error is certainly in the right ballpark.
The common way to reduce this issue is to sort the floats in order of ascending magnitude before adding them, but you can still be arbitrarily imprecise when doing so. (Example: Keep adding the number 1.0f to itself - the sum will stop increasing at 2^24 where the epsilon becomes larger than 1.0f.)

Why does it show nan?

Ok so i am doing an a program where I am trying to get the result of the right side to be equivalent to the left side with 0.0001% accuracy
sin x = x - (x^3)/3! + (x^5)/5! + (x^7)/7! +....
#include<iostream>
#include<iomanip>
#include<math.h>
using namespace std;
long int fact(long int n)
{
if(n == 1 || n == 0)
return 1;
else
return n*fact(n-1);
}
int main()
{
int n = 1, counts=0; //for sin
cout << "Enter value for sin" << endl;
long double x,value,next = 0,accuracy = 0.0001;
cin >> x;
value = sin(x);
do
{
if(counts%2 == 0)
next = next + (pow(x,n)/fact(n));
else
next = next - (pow(x,n)/fact(n));
counts++;
n = n+2;
} while((fabs(next - value))> 0);
cout << "The value of sin " << x << " is " << next << endl;
}
and lets say i enter 45 for x
I get the result
The value for sin 45 in nan.
can anyone help me out on where I did wrong ?

First your while condition should be
while((fabs(next - value))> accuracy) and fact should return long double.
When you change that it still won't work for value of 45. The reason is that this Taylor series converge too slowly for large values.
Here is the error term in the formula
Here k is the number of iterations a=0 and the function is sin.In order for the condition to become false 45^(k+1)/(k+1)! times some absolute value of sin or cos (depending what the k-th derivative is) (it's between 0 and 1) should be less than 0.0001.
Well in this formula for value of 50 the number is still very large (we should expect error of around 1.3*10^18 which means we will do more than 50 iterations for sure).
45^50 and 50! will overflow and then dividing them will give you infinity/infinity=NAN.
In your original version fact value doesn't fit in the integer (your value overflows to 0) and then the division over 0 gives you infinity which after subtract of another infinity gives you NAN.

I quote from here in regard to pow:
Return value
If no errors occur, base raised to the power of exp (or
iexp) (baseexp), is returned.
If a domain error occurs, an
implementation-defined value is returned (NaN where supported)
If a pole error or a range error due to overflow occurs, ±HUGE_VAL,
±HUGE_VALF, or ±HUGE_VALL is returned.
If a range error occurs due to
underflow, the correct result (after rounding) is returned.
Reading further:
Error handling
...
except where specified above, if any argument is NaN, NaN is returned
So basically, since n is increasing and and you have many loops pow returns NaN (the compiler you use obviously supports that). The rest is arithmetic. You calculate with overflowing values.
I believe you are trying to approximate sin(x) by using its Taylor series. I am not sure if that is the way to go.
Maybe you can try to stop the loop as soon as you hit NaN and not update the variable next and simply output that. That's the closest you can get I believe with your algorithm.

If the choice of 45 implies you think the input is in degrees, you should rethink that and likely should reduce mod 2 Pi.
First fix two bugs:
long double fact(long int n)
...
}while((fabs(next - value))> accuracy);
the return value of fact will overflow quickly if it is long int. The return value of fact will overflow eventually even for long double. When you compare to 0 instead of accuracy the answer is never correct enough, so only nan can stop the while
Because of rounding error, you still never converge (while pow is giving values bigger than fact you are computing differences between big numbers, which accumulates significant rounding error, which is then never removed). So you might instead stop by computing long double m=pow(x,n)/fact(n); before increasing n in each step of the loop and use:
}while(m > accuracy*.5);
At that point, either the answer has the specified accuracy or the remaining error is dominated by rounding error and iterating further won't help.

If you had compiled your system with any reasonable level of warnings enabled you would have immediately seen that you are not using the variable accuracy. This and the fact that your fact function returns a long int are but a small part of your problem. You will never get a good result for sin(45) using your algorithm even if you correct those issues.
The problem is that with x=45, the terms in the Taylor expansion of sin(x) won't start decreasing until n=45. This is a big problem because 4545/45! is a very large number, 2428380447472097974305091567498407675884664058685302734375 / 1171023117375434566685446533210657783808, or roughly 2*1018. Your algorithm initially adds and subtracts huge numbers that only start decreasing after 20+ additions/subtractions, with the eventual hope that the result will be somewhere between -1 and +1. That is an unrealizable hope given an input value of 45 and using a native floating point type.
You could use some BigNum type (the internet is chock-full of them) with your algorithm, but that's extreme overkill when you only want four place accuracy. Alternatively, you could take advantage of the cyclical nature of sin(x), sin(x+2*pi)=sin(x). An input value of 45 is equivalent to 1.017702849742894661522992634... (modulo 2*pi). Your algorithm works quite nicely for an input of 1.017702849742894661522992634.
You can do much better than that, but taking the input value modulo 2*pi is the first step toward a reasonable algorithm for computing sine and cosine. Even better, you can use the facts that sin(x+pi)=-sin(x). This lets you reduce the range from -infinity to +infinity to 0 to pi. Even better, you can use the fact that between 0 and pi, sin(x) is symmetric about pi/2. You can do even better than that. The implementations of the trigonometric functions take extreme advantage of these behaviors, but they typically do not use Taylor approximations.

Compiler calculating mistake

I have this big homework assignment and I got unexpected results, I traced it down to the following code
for (int i = 0; i < 4; i++)
cout << (int)((7163 / (int) pow (10, 4 - i - 1))) % 10;
to which 7263 appears on the screen, instead of 7163! This does not happen to every 4 digit number and it leaves me confused, is there something wrong with my logic or the compiler's gone nuts. Any ideas how to fix it?

The problem here is not with the compiler, but rather with the standard library implementation of the pow function.
But it is really not advisable to use (int)(pow(n, k)) to compute nk with two integers.
pow is not guaranteed to produce an exact answer; it may be out by a very small amount. (Actually, its accuracy is not guaranteed at all, but most implementations will try to not be wrong by more than the value of the low order bit of the result.) Since casting to (int) truncates rather than rounds, even a tiny error can result in the result being off by 1. And in this case, if the result of pow(10,2) ends up being 99.999999999999, then converting it to an int will make it 99, and 7163/99 is 72.
So if you insist on using pow, you need to ensure that the result is rounded rather than truncated (see the round standard library function). But it would be better to stick to integer arithmetic. For example:
for (int i = 1000; i > 0; i /= 10)
std::cout << 7163 / i % 10;

The problem, as I understand is that the result at i=1 yields a "2" when you would expect a "1".
7163 / (10^2) = 71.63... so it is pretty feasible to assume you are simply treading on a rounding error. How the value calculations are being done will depend on your environment, which you haven't specified, however it seems apparent that the assumptions your code makes about order of operations and data types are incorrect.
A heavy handed approach would be to more strictly cast your types and define your order of operations, leaving nothing to chance:
cout << ((int) (((int) 7163) / ((int) pow (10, 4 - i - 1)))) % 10;
Even still, you may need to incorporate a math library and perform a truncate operation on your division result if the environment insists on providing a floating point result.

Why does these two code variants produce different floating-point results?

Given this example C++ code snippet:
void floatSurprise()
{
// these come from some sort of calculation
int a = 18680, b = 3323524, c = 121;
float m = float(a) / c;
// variant 1: calculate result from single expression
float r1 = b - (2.0f * m * a) + (m * m * c);
cout << "r1 = " << r1 << endl;
// variant 2: break up the expression into intermediate parts,
/// then calculate
float
r2_p1 = 2.0f * m * a,
r2_p2 = m * m * c,
r2 = b - r2_p1 + r2_p2;
cout << "r2 = " << r2 << endl;
}
The output is:
dev1 = 439703
dev2 = 439702
When viewed in the debugger, the values are actually 439702.50 and 439702.25, respectively, which is interesting in itself - not sure why iostream prints floats without the fractional part by default. EDIT: The reason for this was that the default precision setting for cout was too low, needed cout << setprecision(7) at least to see the decimal point for numbers of this magnitude.
But I'm even more interested in why am I getting different results. I suppose it has to do with rounding and some subtle interplay of ints with the required float output type, but I can't put my finger on it. Which value is the correct one?
I was amazed that it was so easy to shoot myself in the foot with such a simple piece of code. Any insight will be greatly appreciated! The compiler was VC++2010.
EDIT2: I did some more investigating using a spreadsheet to generate "correct" values for the intermediate variables and found (via tracing) that indeed they were being trimmed, contributing to the precision loss in the ultimate result. I also found a problem with the single expression, because I actually used a handy function for calculating squares instead of m * m there:
template<typename T> inline T sqr(const T &arg) { return arg*arg; }
Even though I asked nicely, the compiler apparently didn't inline this, and calculated the value separately, trimming the result before returning the value to the expression, skewing the result yet again. Ouch.

You should read my long, long answer about why the same thing happens in C#:
(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?
Summing up: first of all, you only get about seven decimal places of accuracy with float. The correct answer were you do to it with exact arithmetic throughout the entire calculation is about 439702.51239669... so you are getting darn close to the correct answer considering the limitations of a float, in either case.
But that doesn't explain why you are getting different results with what looks like exactly the same calculations. The answer is: the compiler is permitted wide lattitude to make your math more accurate, and apparently you have hit upon two cases where the optimizer takes what is logically the same expression and does not optimize them down to the same code.
Anyway, read my answer regarding C# carefully; everything in there applies to C++ just as well.

Divide by zero prevention

What is 1.#INF and why does casting to a float or double prevent a division by 0 of crashing?
Also, any great ideas of how to prevent division by 0? (Like any macro or template)?
int nQuota = 0;
int nZero = 3 / nQuota; //crash
cout << nZero << endl;
float fZero = 2 / nQuota; //crash
cout << fZero << endl;
if I use instead:
int nZero = 3 / (float)nQuota;
cout << nZero << endl;
//Output = -2147483648
float fZero = 2 / (float)nQuota;
cout << fZero << endl;
//Output = 1.#INF

1.#INF is positive infinity. You will get it when you divide a positive float by zero (if you divide the float zero itself by zero, then the result will be "not a number").
On the other hand, if you divide an integer by zero, the program will crash.
The reason float fZero = 2 / nQuota; crashes is because both operands of the / operator are integers, so the division is performed on integers. It doesn't matter that you then store the result in a float; C++ has no notion of target typing.
Why positive infinity cast to an integer is the smallest integer, I have no idea.

Wwhy using (float) or (double) prevents a division by 0 of crashing?
It doesn't necessarily. The standard is amazing spare when it comes to floating point. Most systems use the IEEE floating point standard nowadays, and that says that the default action for division by zero is to return ±infinity rather than crash. You can make it crash by enabling the appropriate floating point exceptions.
Note well: The only thing the floating point exception model and the C++ exception model have in common is the word "exception". On every machine I work on, a floating point exception does not throw a C++ exception.
Also, any great ideas of how to prevent division by 0?
Simple answer: Don't do it.
This is one of those "Doctor, doctor it hurts when I do this!" kinds of situations. So don't do it!
Make sure that the divisor is not zero.
Do sanity checks on divisors that are user inputs. Always filter your user inputs for sanity. A user input value of zero when the number is supposed to be in the millions will cause all kinds of havoc besides overflow. Do sanity checks on intermediate values.
Enable floating point exceptions.
Making the default behavior to allow errors (and these are almost always errors) to go through unchecked was IMHO a big mistake on the part of the standards committee. Use the default and those infinities and not-a-numbers will eventually turn everything into an Inf or a NaN.
The default should have been to stop floating point errors in their tracks, with an option to allow things like 1.0/0.0 and 0.0/0.0 to take place. That isn't the case, so you have to enable those traps. Do that and you can oftentimes find the cause of the problem in short order.
Write custom divide, custom multiply, custom square root, custom sine, ... functions.
This unfortunately is the route that many safety critical software systems must take. It is a royal pain. Option #1 is out because it's just wishful thinking. Option #3 is out because the system cannot be allowed to crash. Option #2 is still a good idea, but it doesn't always work because bad data always has a way of sneaking in. It's Murphy's law.
BTW, the problem is a bit worse than just division by zero. 10200/10-200 will also overflow.

You usually check to make sure you aren't dividing by zero. The code below isn't particularly useful unless nQuota has a legitimate value but it does prevent crashes
int nQuota = 0;
int nZero = 0;
float fZero = 0;
if (nQuota)
nZero = 3 / nQuota;
cout << nZero << endl;
if (nQuota)
fZero = 2 / nQuota;
cout << fZero << endl;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js