I'm wondering if x/y when x and y are integers (but floating point type), is guaranteed to yield the same floating point value as kx/ky, where k is an integer.
So, for example, does 1.0/3, 2.0/6, 3.0/9, ... all yield the same exact floating point number (one that would compare equally with == operator)?
In case this is different per language/ platform, I am specifically interested in c++ on Linux.
As long as k*x and k*y operations are exact (the result fits in a floating point), then IEEE754 standard guarantees that you'll get the nearest floating point to the exact division result.
Obviously, since (k*x)/(k*y)=(x/y) in exact math, the nearest floating point will be the same for both.
If k*x or k*y does not fit into a float (the floating point operation is inexact), then you don't get any guaranty.
Concerning bare minimum guaranteed by C++, I don't know, but you can consider that most platforms do comply with these basic IEEE754 properties.
If the calculations are done in the same precision, I think they'll end up the same. However, if that's not the case, both float->double and double->float conversions will create discrepancies. And that's not an impossible scenario (at least without fp:strict), since the compiler can mix FPU and SSE code (for instance, if it needs to call a function that's not implemented in SSE, or use it as an argument/return result for a cdecl function).
That said, you can also create a quotient (x/y) class and use it as the key. You can define all arithmetic for it, for instance
q0+q1 = (q0.x*q1.y+q1.x*q0.y)/(q0.y*q1.y)
q0<q1 = q0.x*q1.y*(q0.y*q1.y) < q1.x*q0.y*(q0.y*q1.y)
(in the latter case * (q0.y * q1.y) is added to account for the fact that we've multiplied the original expression, q0.x/q0.y < q1.x/q1.y by q0.y*q1.y, and if it's negative, < will change to >). You can also get rid of some divisions that way.
I don't know about guarantees, but compiling this
int main() {
int i = 0;
float
x = 1E20,
y = 3E20,
f = 10;
while ( ++i <=20 ) {
printf(" %d) %f = %f / %f\n", i, x / y, x, y );
x *= f;
y *= f;
}
}
with gcc -O0 (on Debian GNU/Linux on an Intel(R) Xeon(R) CPU E3-1246) produces
1) 0.333333 = 1.000000 / 3.000000
2) 0.333333 = 1000.000000 / 3000.000000
3) 0.333333 = 1000000.000000 / 3000000.000000
4) 0.333333 = 1000000000.000000 / 3000000000.000000
5) 0.333333 = 999999995904.000000 / 3000000053248.000000
6) 0.333333 = 999999986991104.000000 / 3000000028082176.000000
7) 0.333333 = 999999984306749440.000000 / 3000000159078678528.000000
8) 0.333333 = 999999949672133165056.000000 / 3000000271228864561152.000000
9) 0.333333 = 999999941790833817157632.000000 / 3000000329775659716968448.000000
10) 0.333333 = 999999914697178458896728064.000000 / 3000000186813393145719422976.000000
11) 0.333333 = 999999939489602493962365435904.000000 / 3000000196258126111458713403392.000000
12) 0.333333 = 999999917124474831091725703839744.000000 / 3000000060858434314620245836300288.000000
13) 0.333333 = 999999882462153731101078006664265728.000000 / 3000000043527273764624921987712548864.000000
14) -nan = inf / inf
Related
This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 6 years ago.
See the program below
#include<stdio.h>
int main()
{
float x = 0.1;
if (x == 0.1)
printf("IF");
else if (x == 0.1f)
printf("ELSE IF");
else
printf("ELSE");
}
And another program here
#include<stdio.h>
int main()
{
float x = 0.5;
if (x == 0.5)
printf("IF");
else if (x == 0.5f)
printf("ELSE IF");
else
printf("ELSE");
}
From the both programs we expect similar results because nothing has literally changed in both changed, everything is same and also comparison terms are changed correspondingly.
BUT 2 above programs produce different results
1st Program
ELSE
2nd Program
IF
Why is this 2 programs behaving differently
The behavior of these two programs will vary between computers and operating systems - you are testing for exact equality of floats.
In memory, floats are stored as a string of bits in binary - i.e. 0.1 in binary (0.1b) represents 0.5 in decimal (0.5d).
Similarly,
Binary | Decimal
0.1 | 2^-1 = 1/2
0.01 | 2^-2 = 1/4
0.001 | 2^-3 = 1/8
0.11 | 2^-1 + 2^-2 = 3/4
The problem is that some decimals don't have nice floating point representations.
0.1d = 0.0001100110011001100110011...
which is infinitely long.
So, 0.5 is really nice in binary
0.5d = 0.1000000000000000...b
but 0.1 is really nasty
0.1d = 0.00011001100110011...
Now depending on your compiler, it may assume that 0.1f is a double type, which stores more of the infinite sequence of 0.0001100110011001100110011001100110011...
so it is not equal to the float version, which truncates the sequence much earlier.
On the other hand, 0.5f is the same regardless of how many decimal places are stored, since it has all zeroes after the first place.
The accepted way to compare floats or doubles in C++ or C is to #define a very small number (I like to call it EPS, short for EPSILON) and replace
float a = 0.1f
if (a == 0.1f) {
printf("IF\n")
} else {
printf("ELSE\n")
}
with
#include <math.h>
#define EPS 0.0000001f
float a = 0.1f
if (abs(a - 0.1f) < EPS) {
printf("IF\n")
} else {
printf("ELSE\n")
}
Effectively, this tests if a is 'close enough' to 0.1f instead of exact equality. For 99% of applications, this approach works just fine, but for super-sensitive calculations some stranger tricks are needed that involve using long double, or defining a custom data type.
You are using two data types: double,automaticly in if(x=0.1))(0.1 is double) and x is float. these types differ how they store the value. 0.1 is not 0.1f, it is 0.100000000001 (double) or 0.09388383(something)
Say
int64_t x = (1UL << 53);
cout << x << end;
x+= 1.0;
cout << x << end;
The result of x is same, which is '9007199254740992'.
However, x += 1; can make x plus 1 correctly.
Moreover, for 1UL << 52 plus 1.0 can make the result correctly.
I think it could be the float imprecision. Could someone give me more details of that?
The line x+= 1.0 is evaluated as
x = (int64_t)((double)x + (double)1.0);
The number 2^53 + 1 = 9007199254740993 can't be represented exactly as IEEE double, so it's rounded to 2^53 = 9007199254740992 (this depends on the current rounding mode, actually) which is then (losslessly) converted to an int64_t.
x+= 1.0;
The expression x + 1.0 is done with floating point arithmetic.
Assuming IEEE-754 is used, the double precision floating point type can represent integers at most 253 precisely.
Code:
double v = (180*9.8)/(42*42); // v should be 1.000000
printf("%f ",v);
cout<<asin(v);
Output:
1.000000
nan
I am using 64-bit mingw (win 7).
This is because v is greater than 1 (when the (180*9.8)/(42*42) is evaluated using double precision floating point).
double v = (180*9.8)/(42*42);
std::cout.precision(20);
cout << fixed << v << endl;
Output:
1.00000000000000022204
nan
DEMO
To get away with the problem of finite precision can do below.
if (v > 1)
v = 1;
if (v < -1)
v = -1;
9.8 is a value that can't be represented exactly in floating point. That means, the actual value stored is equal to 9.8 + delta, where delta is a small value which may be positive or negative.
If delta is positive for your floating point representation (presumably IEEE), then 180*9.8 will be greater than 1764, so the value of v will exceed 1. The only valid inputs for asin() are in the range -1 to 1. Although the return value from asin() is not specified for values outside that range, a NaN is one way of reporting that.
With float a = ...; and float inva = 1/a; is x / a the same as x * inva?
And what is with this case:
unsigned i = ...;
float v1 = static_cast<float>(i) / 4294967295.0f;
float scl = 1.0f / 4294967295.0f;
float v2 = static_cast<float>(i) * scl;
Is v1 equal to v2 for all unsigned integers?
is v1 equal to v2 for all unsigned integers?
Yes, because 4294967295.0f is a power of two. Division and multiplication by the reciprocal are equivalent when the divisor is a power of two (assuming the computation of the reciprocal does not overflow or underflow to zero).
Division and multiplication by the reciprocal are not equivalent in general, only in the particular case of powers of two. The reason is that for (almost all) powers of two y, the computation of 1 / y is exact, so that x * (1 / y) only rounds once, just like x / y only rounds once.
No, the result will not always be the same. The way you group the operands in floating point multiplication, or division in this case, has an effect on the numerical accuracy of the answer. Thus, the product a*(1/b) might differ from a/b. Check the wikipedia article http://en.wikipedia.org/wiki/Floating_point#Accuracy_problems.
Vincent answered Fast Arc Cos algorithm by suggesting this function.
float arccos(float x)
{
x = 1 - (x + 1);
return pi * x / 2;
}
The question is, why x = 1 - (x + 1) and not x = -x?
It returns a different result only when (x + 1) causes a loss of precision, that is, x is many orders of magnitude larger or smaller than one.
But I don't think this is tricky or sleight of hand, I think it's just plain wrong.
cos(0) = 1 but f(1) = -pi/2
cos(pi/2) = 0 but f(0) = 0
cos(pi) = -1 but f(-1) = pi/2
where f(x) is Vincent's arccos implementation. All of them are off by pi/2, a linear approximation that gets at least these three points correct would be
g(x) = (1 - x) * pi / 2
I don't see the details instantly, but think about what happens as x approaches 1 or -1 from either side, and consider roundoff error.
Addition causes that both numbers are normalized (in this case, relevant for x). IIRC, in Knuth's volume 2, in the chapter on floating-point arithmetic, you can even see an expression like x+0.