Why does these two code variants produce different floating-point results? - c++

Given this example C++ code snippet:
void floatSurprise()
{
// these come from some sort of calculation
int a = 18680, b = 3323524, c = 121;
float m = float(a) / c;
// variant 1: calculate result from single expression
float r1 = b - (2.0f * m * a) + (m * m * c);
cout << "r1 = " << r1 << endl;
// variant 2: break up the expression into intermediate parts,
/// then calculate
float
r2_p1 = 2.0f * m * a,
r2_p2 = m * m * c,
r2 = b - r2_p1 + r2_p2;
cout << "r2 = " << r2 << endl;
}
The output is:
dev1 = 439703
dev2 = 439702
When viewed in the debugger, the values are actually 439702.50 and 439702.25, respectively, which is interesting in itself - not sure why iostream prints floats without the fractional part by default. EDIT: The reason for this was that the default precision setting for cout was too low, needed cout << setprecision(7) at least to see the decimal point for numbers of this magnitude.
But I'm even more interested in why am I getting different results. I suppose it has to do with rounding and some subtle interplay of ints with the required float output type, but I can't put my finger on it. Which value is the correct one?
I was amazed that it was so easy to shoot myself in the foot with such a simple piece of code. Any insight will be greatly appreciated! The compiler was VC++2010.
EDIT2: I did some more investigating using a spreadsheet to generate "correct" values for the intermediate variables and found (via tracing) that indeed they were being trimmed, contributing to the precision loss in the ultimate result. I also found a problem with the single expression, because I actually used a handy function for calculating squares instead of m * m there:
template<typename T> inline T sqr(const T &arg) { return arg*arg; }
Even though I asked nicely, the compiler apparently didn't inline this, and calculated the value separately, trimming the result before returning the value to the expression, skewing the result yet again. Ouch.

You should read my long, long answer about why the same thing happens in C#:
(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?
Summing up: first of all, you only get about seven decimal places of accuracy with float. The correct answer were you do to it with exact arithmetic throughout the entire calculation is about 439702.51239669... so you are getting darn close to the correct answer considering the limitations of a float, in either case.
But that doesn't explain why you are getting different results with what looks like exactly the same calculations. The answer is: the compiler is permitted wide lattitude to make your math more accurate, and apparently you have hit upon two cases where the optimizer takes what is logically the same expression and does not optimize them down to the same code.
Anyway, read my answer regarding C# carefully; everything in there applies to C++ just as well.

Related

pow() function gives an error [duplicate]

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

Generic way of handling fused-multiply-add floating-point inaccuracies

Yesterday I was tracking a bug in my project, which - after several hours - I've narrowed down to a piece of code which more or less was doing something like this:
#include <iostream>
#include <cmath>
#include <cassert>
volatile float r = -0.979541123;
volatile float alpha = 0.375402451;
int main()
{
float sx = r * cosf(alpha); // -0.911326
float sy = r * sinf(alpha); // -0.359146
float ex = r * cosf(alpha); // -0.911326
float ey = r * sinf(alpha); // -0.359146
float mx = ex - sx; // should be 0
float my = ey - sy; // should be 0
float distance = sqrtf(mx * mx + my * my) * 57.2958f; // should be 0, gives 1.34925e-06
// std::cout << "sv: {" << sx << ", " << sy << "}" << std::endl;
// std::cout << "ev: {" << ex << ", " << ey << "}" << std::endl;
// std::cout << "mv: {" << mx << ", " << my << "}" << std::endl;
std::cout << "distance: " << distance << std::endl;
assert(distance == 0.f);
// assert(sx == ex && sy == ey);
// assert(mx == 0.f && my == 0.f);
}
After compilation and execution:
$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out
distance: 1.34925e-06
a.out: vfma.cpp:23: int main(): Assertion `distance == 0.f' failed.
Aborted (core dumped)
From my point of view something is wrong, as I've asked for 2 subtractions of two bitwise-identical pairs (I expected to get two zeroes), then squaring them (two zeroes again) and adding them together (zero).
It turns out that the root cause of problem is the use of fused-multiply-add operation, which somewhere along the line makes the result inexact (from my point of view). Generally I have nothing against this optimization, as it promises to give results which are more exact, but in this case 1.34925e-06 is really far from the 0 that I was expecting.
The test case is very "fragile" - if you enable more prints or more asserts, it stops asserting, because compiler doesn't use fused-multiply-add anymore. For example if I uncomment all lines:
$ g++ -Wall -Wextra -Wshadow -march=native -O2 vfma.cpp && ./a.out
sv: {-0.911326, -0.359146}
ev: {-0.911326, -0.359146}
mv: {0, 0}
distance: 0
As I've considered this to be a bug in the compiler, I've reported that, but it got closed with the explanation that this is correct behaviour.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79436
So I'm wondering - how should one code such calculations to avoid the problem? I was thinking about a generic solution, but something better than:
mx = ex != sx ? ex - sx : 0.f;
I would like to fix or improve my code - if there's anything to fix/improve - instead of setting -ffp-contract=off for my whole project, as fused-multiply-add is used internally in the compiler libraries anyway (I see a lot of that in sinf() and cosf()), so it would be a "partial work-around", not a solution... I would also like to avoid solutions like "don't use floating-point" (;
In general no: this is exactly the price you pay for using -ffp-contract=fast (coincidently, it is precisely this example that William Kahan notes in the problems with automatic contraction)
Theoretically, if you were using C (not C++), and your compiler supported C-1999 pragmas (i.e. not gcc), you could use
#pragma STDC FP_CONTRACT OFF
// non-contracted code
#pragma STDC FP_CONTRACT ON
Interestingly, thanks to fma, the floats mx and my gives you the rounding error that was made when multiplying r and cos.
fma( r,cos, -r*cos) = theoretical(r*cos) - float(r*cos)
So the result you get somehow indicates how far was the computed (sx,sy) from the theoretical (sx,sy), due to multiplication of floats (but not accounting from rounding errors in computations of cos and sin).
So the question is how can your program rely on a difference (ex-sx,ey-sy) which is within the uncertainty interval related to floating point rounding?
I can see this question has been around for a while, but in case other folks come across it looking for an answer I figured I'd mention a couple of points..
First, it's difficult to tell exactly without analyzing the resulting assembly code, but I suspect the reason that the FMA gives a result that is so far outside expectations is not just the FMA itself, but also that you are assuming all of the calculations are being done in the order you have specified them, but with optimizing C/C++ compilers this is often not the case. This is also likely why uncommenting the print statements changes the results.
If mx and my were being calculated as the comments suggest, then even if the final mx*mx + my*my were done with an FMA, it would still result in the expected 0 result. The problem is that since none of the sx/sy/ex/ey/mx/my variables are used by anything else, there is a good possibility that the compiler never actually evaluates them as independent variables at all, and simply mushes all the math together into a big mass of multiplications, adds, and subtracts to calculate distance in one single step, which can then be represented any number of different ways in machine code (in any order, potentially with multiple FMAs, etc) however it figures it will get the best performance for that one big calculation.
However, if something else (like a print statement) references mx and my, then it's much more likely the compiler will calculate them separately, before calculating distance as a second step. In that case, the math does work out the way the comments suggest, and even an FMA in the final distance calculation doesn't change the results (because the inputs are all exactly 0).
The Answer
But that doesn't actually answer the real question. In answer to that, the most robust (and generally recommended) way to avoid this sort of problem in general is: Never assume floating point operations will ever produce an exact number, even if that number is 0. This means that, in general, it's a bad idea to ever use == to compare floating point numbers. Instead, you should choose a small number (often referred to as epsilon), which is larger than any possible/likely accumulated error, but still smaller than any significant result (for example, if you know that the distances you care about are only really significant to a couple of decimal places, then you could choose EPSILON = 0.01, which will mean "any difference less than 0.01 we'll consider to be the same as zero"). Then, instead of saying:
assert(distance == 0.f);
you would say:
assert(distance < EPSILON);
(the exact value for your epsilon will likely depend on the application, and may even be different for different types of calculations, of course)
Likewise, instead of saying something like if (a == b) for floating point numbers, you would instead say something like if (abs(a - b) < EPSILON), etc.
Another way to reduce (but not necessarily eliminate) this problem is to implement "fail-fast" logic in your application. For example, in the above code, instead of going all the way through and calculating distance and then seeing if it's 0 at the end, you could "short-circuit" some of the math by testing if (mx < EPSILON && my < EPSILON) before you even get to the point of calculating distance and skipping the rest if they're both zero (since you know the result will be zero in that case). The quicker you catch the situation the less opportunity there is for errors to accumulate (and sometimes you can also avoid doing some more costly calculations in cases you don't need to).

C++ Modulus returning wrong answer

Here is my code :
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
int n, i, num, m, k = 0;
cout << "Enter a number :\n";
cin >> num;
n = log10(num);
while (n > 0) {
i = pow(10, n);
m = num / i;
k = k + pow(m, 3);
num = num % i;
--n;
cout << m << endl;
cout << num << endl;
}
k = k + pow(num, 3);
return 0;
}
When I input 111 it gives me this
1
12
1
2
I am using codeblocks. I don't know what is wrong.
Whenever I use pow expecting an integer result, I add .5 so I use (int)(pow(10,m)+.5) instead of letting the compiler automatically convert pow(10,m) to an int.
I have read many places telling me others have done exhaustive tests of some of the situations in which I add that .5 and found zero cases where it makes a difference. But accurately identifying the conditions in which it isn't needed can be quite hard. Using it when it isn't needed does no real harm.
If it makes a difference, it is a difference you want. If it doesn't make a difference, it had a tiny cost.
In the posted code, I would adjust every call to pow that way, not just the one I used as an example.
There is no equally easy fix for your use of log10, but it may be subject to the same problem. Since you expect a non integer answer and want that non integer answer truncated down to an integer, adding .5 would be very wrong. So you may need to find some more complicated work around for the fundamental problem of working with floating point. I'm not certain, but assuming 32-bit integers, I think adding 1e-10 to the result of log10 before converting to int is both never enough to change log10(10^n-1) into log10(10^n) but always enough to correct the error that might have done the reverse.
pow does floating-point exponentiation.
Floating point functions and operations are inexact, you cannot ever rely on them to give you the exact value that they would appear to compute, unless you are an expert on the fine details of IEEE floating point representations and the guarantees given by your library functions.
(and furthermore, floating-point numbers might even be incapable of representing the integers you want exactly)
This is particularly problematic when you convert the result to an integer, because the result is truncated to zero: int x = 0.999999; sets x == 0, not x == 1. Even the tiniest error in the wrong direction completely spoils the result.
You could round to the nearest integer, but that has problems too; e.g. with sufficiently large numbers, your floating point numbers might not have enough precision to be near the result you want. Or if you do enough operations (or unstable operations) with the floating point numbers, the errors can accumulate to the point you get the wrong nearest integer.
If you want to do exact, integer arithmetic, then you should use functions that do so. e.g. write your own ipow function that computes integer exponentiation without any floating-point operations at all.

std::exp giving different result than MATLAB exp for complex number

I'm copying a script from matlab into a c++ function. However, I constantly get different result for the exp function. For example, following snippet:
std::complex<double> final_b = std::exp(std::complex<double>(0, 1 * pi));
should be equivalent to the MATLAB code
final_b = exp(1i * pi);
But it isn't. For MATLAB, I receive -1 + 0i (which is correct) and for c++, I get -1 + -2.068231e-013*i.
Now I thought at the beginning this is just a rounding error of sorts, but for the actual script I'm using, which has bigger complex exponentials, I get completely different numbers. What is the cause of this? How do I fix this?
Edit: I've manually tried calculating the exponential with eulers formula
exp(x+iy) = exp(x) * (cos(y) + i*sin(y))
and get the same wonky results in c++
That is called floating point approximation (or imprecision):
If you include the header cfloat there are some definitions. In particular, DBL_EPSILON, which is the smallest number that 1.0 + DBL_EPSILON != 1.0, which is usually 1e-9 (and -2.068231e-013 is much smaller than that. If you do the following piece of code, you can check if it is zero or not:
// The complete formula is std::abs(a - b), but since b is zero, I am ommiting it
if (std::abs(number.imag()) < DBL_EPSILON) {
// The number is either zero or very close to zero
}
For example, you can see the working code here: http://ideone.com/2OzNZm

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}