Why comparing double and float leads to unexpected result? [duplicate] - c++

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
strange output in comparision of float with float literal
float f = 1.1;
double d = 1.1;
if(f == d) // returns false!
Why is it so?

The important factors under consideration with float or double numbers are:
Precision & Rounding
Precision:
The precision of a floating point number is how many digits it can represent without losing any information it contains.
Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3′s going out to infinity. An infinite length number would require infinite memory to be depicted with exact precision, but float or double data types typically only have 4 or 8 bytes. Thus Floating point & double numbers can only store a certain number of digits, and the rest are bound to get lost. Thus, there is no definite accurate way of representing float or double numbers with numbers that require more precision than the variables can hold.
Rounding:
There is a non-obvious differences between binary and decimal (base 10) numbers.
Consider the fraction 1/10. In decimal, this can be easily represented as 0.1, and 0.1 can be thought of as an easily representable number. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011…
An example:
#include <iomanip>
int main()
{
using namespace std;
cout << setprecision(17);
double dValue = 0.1;
cout << dValue << endl;
}
This output is:
0.10000000000000001
And not
0.1.
This is because the double had to truncate the approximation due to it’s limited memory, which results in a number that is not exactly 0.1. Such an scenario is called a Rounding error.
Whenever comparing two close float and double numbers such rounding errors kick in and eventually the comparison yields incorrect results and this is the reason you should never compare floating point numbers or double using ==.
The best you can do is to take their difference and check if it is less than an epsilon.
abs(x - y) < epsilon

Try running this code, the results will make the reason obvious.
#include <iomanip>
#include <iostream>
int main()
{
std::cout << std::setprecision(100) << (double)1.1 << std::endl;
std::cout << std::setprecision(100) << (float)1.1 << std::endl;
std::cout << std::setprecision(100) << (double)((float)1.1) << std::endl;
}
The output:
1.100000000000000088817841970012523233890533447265625
1.10000002384185791015625
1.10000002384185791015625
Neither float nor double can represent 1.1 accurately. When you try to do the comparison the float number is implicitly upconverted to a double. The double data type can accurately represent the contents of the float, so the comparison yields false.

Generally you shouldn't compare floats to floats, doubles to doubles, or floats to doubles using ==.
The best practice is to subtract them, and check if the absolute value of the difference is less than a small epsilon.
if(std::fabs(f - d) < std::numeric_limits<float>::epsilon())
{
// ...
}
One reason is because floating point numbers are (more or less) binary fractions, and can only approximate many decimal numbers. Many decimal numbers must necessarily be converted to repeating binary "decimals", or irrational numbers. This will introduce a rounding error.
From wikipedia:
For instance, 1/5 cannot be represented exactly as a floating point number using a binary base but can be represented exactly using a decimal base.
In your particular case, a float and double will have different rounding for the irrational/repeating fraction that must be used to represent 1.1 in binary. You will be hard pressed to get them to be "equal" after their corresponding conversions have introduced different levels of rounding error.
The code I gave above solves this by simply checking if the values are within a very short delta. Your comparison changes from "are these values equal?" to "are these values within a small margin of error from each other?"
Also, see this question: What is the most effective way for float and double comparison?
There are also a lot of other oddities about floating point numbers that break a simple equality comparison. Check this article for a description of some of them:
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm

The IEEE 754 32-bit float can store: 1.1000000238...
The IEEE 754 64-bit double can store: 1.1000000000000000888...
See why they're not "equal"?
In IEEE 754, fractions are stored in powers of 2:
2^(-1), 2^(-2), 2^(-3), ...
1/2, 1/4, 1/8, ...
Now we need a way to represent 0.1. This is (a simplified version of) the 32-bit IEEE 754 representation (float):
2^(-4) + 2^(-5) + 2^(-8) + 2^(-9) + 2^(-12) + 2^(-13) + ... + 2^(-24) + 2^(-25) + 2^(-27)
00011001100110011001101
1.10000002384185791015625
With 64-bit double, it's even more accurate. It doesn't stop at 2^(-25), it keeps going for about twice as much. (2^(-48) + 2^(-49) + 2^(-51), maybe?)
Resources
IEEE 754 Converter (32-bit)

Floats and doubles are stored in a binary format that can not represent every number exactly (it's impossible to represent the infinitely many possible different numbers in a finite space).
As a result they do rounding. Float has to round more than double, because it is smaller, so 1.1 rounded to the nearest valid Float is different to 1.1 rounded to the nearest valud Double.
To see what numbers are valid floats and doubles see Floating Point

Related

Why is the difference between 2 double values wrongly calculated? [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
I need to calculate the difference value between 2 string numbers by only taking only the first precision. I have to convert to double first then calculate the difference as below
#include <iostream>
#include <math.h>
#include <string>
using namespace std;
int main()
{
string v1 = "1568678435.244555";
string v2 = "1568678435.300111";
double s1 = atof(v1.substr(0,12).c_str()); // take upto first precision and convert to double
double s2 = atof(v2.substr(0,12).c_str()); // take upto first precision and convert to double
std::cout<<s1<<" "<<s2<<" "<<s2-s1<<endl;
if (s2-s1 >= 0.1)
cout<<"bigger";
else
cout<<"smaller";
return 0;
}
I expect the calculation would be 1568678435.3 - 1568678435.2 = 0.1 . But this program returns this value :
1.56868e+09 1.56868e+09 0.0999999
smaller
Why is that and how to get the value that I want properly?
Floating point format has limited precision. Not all values are representable. For example, the number 1568678435.2 is not representable (in IEEE-754 binary64 format). The closest representable value is:
1568678435.2000000476837158203125
1568678435.3 is also not a representable value. The closest reprecentable value is:
1568678435.2999999523162841796875
Given that the floating point values that you start with are not precise, it should be hardly surprising that the result of the calculation is also not precise. The floating point result of subtracting these numbers is:
0.099999904632568359375
Which very close to 0.1, but not quite. The error of the calculation was:
0.000000095367431640625
Also note that 0.1 is itself not a representable number, so there is no way to get that as the result of a floating point operation no matter what your inputs are.
how to get the value that I want properly?
To print the value 0.1, simply round the output to a sufficiently coarse precision:
std::cout << std::fixed << std::setprecision(1) << s2-s1;
This works as long as the error of the calculation doesn't exceed half of the desired precision.
If you don't want to deal with any accuracy error in your calculation, then you mustn't use floating point numbers.
You should round the difference between the values.
if (round((s2-s1) * 10) >= 1)
cout<<"bigger";
else
cout<<"smaller";

Convert float to string with precision and without tolerance

How can you convert float to string with specified precision without tolerance?
For example, with precision 6 get the following result.
40.432 -> 40.432000.
In a string the only value that I can get is 40.431999.
The problem is that you're using a float data type which only has a precision of at most 7.22 total digits and sometimes as little as 6 digits, and you're trying to display 8 total digits (2 before the decimal and 6 after). As noted in the comments, the closest possible binary float to 40.432 is 40.43199920654296875, the second closest would be 40.432003021240234375.
You can get more digits by converting to the larger double type. Once you've done that you can round to the nearest 6-digit number. Note that if the float was generated by a calculation, rounding may actually create a less accurate result.
If you always know your numbers will be between 10 and 100, this simple code will work. Otherwise you'll need a more complex process to determine the appropriate amount of rounding.
float f = 40.432;
double d = f;
double r = std::round(d * 10000.0) / 10000.0; // 2 digits before the decimal, 4 after
std::cout << std::fixed << std::setprecision(6) << r;
Note that the last 2 digits will always be zero because of the rounding.
See it in action: http://coliru.stacked-crooked.com/a/f085e56c03ebeb73
How can you convert float to string with specified precision
You can use a string stream:
std::ostringstream strs;
strs << std::fixed << std::setprecision(6) << the_value;
std::string str = strs.str();
In a string the only value that I can get is 40.431999.
Your problem may be that there exists no exact representation for 40.432 in the floating point format that your system uses. Since you can never have a floating point value 40.432, you can never convert such value to a string.
It just so happens that the closest representable value to 40.432 is closer to 40.431999 than it is to 40.432.
You need to:
Either Accept that 40.432 ~~ 40.431999
Or use a floating point format that is precise enough to have a representation for 40.432 that is closer to 40.432 than it is to 40.431999, and is also precise enough for all other numbers for which you have a specific expected value. IEEE 754 double precision floating point happens to have a representable value closer to 40.432 than 40.431999.
Or stop using floating point. You won't have problems like this if you use fixed point or arbitrary precision data types.

How to convert string to double with specified number of precision in c++

How to convert string to double with specified number of precision in c++
code snippet as below:
double value;
CString thestring(_T("1234567890123.4"));
value = _tcstod(thestring,NULL);
The value is coming as this:1234567890123.3999
expected value is:1234567890123.4
Basically you can use strtod or std::stod for the conversion and then round to your desired precision. For the actual rounding, a web search will provide lots of code examples.
But the details are more complicated than you might think: The string is (probably) a decimal representation of the number while the double is binary. I guess that you want to round to a specified number of decimal digits. The problem is that most decimal floating point decimal numbers cannot be exactly represented in binary. Even for numbers like 0.1 it is not possible.
You also need to define what kind of precision you are interested in. Is it the total number of digits (relative precision) or the number of digits after the decimal point (absolute precision).
The floating-point double type can not exactly represent the value 1234567890123.4 and 1234567890123.3999 is the best it can represent and that is what the result is. Note that floating point types (e.g. IEEE-754) can not exactly represent all real numbers, hence these use approximations for most cases.
To be more precise, according to IEEE-754 double-precision floating point format 1234567890123.4 is represented as the hexadecimal value of 4271F71FB04CB666, where in binary the sign bit is 0, the 11 exponent and 52 singificand bits are 10000100111 and 0001111101110001111110110000010011001011011001100110 respectively. So this results in the value of (-1)sign×2exponent×(1.significand bits) = 1×240×1.1228329550462148311851251492043957114219 = 1234567890123.39990234375.
Note that not even a 128-bit floating point would store the value exactly. It would still result in 1234567890123.39999999999999999999991529670527456996609316774993203580379486083984375. Maybe you should instead attempt to use some fixed-point or rational number types instead.
std::stod is generic and doesn't give this kind of manipulation. Thus, you have to craft something of your own, like I did below using std::stringstream and <iomanip> facilities:
double stodpre(std::string const &str, std::size_t const p) {
std::stringstream sstrm;
sstrm << std::setprecision(p) << std::fixed << str << std::endl;
double d;
sstrm >> d;
return d;
}
Live Demo
You cannot control the precision with which a decimal number is stored.
Decimal numbers are stored in binary using the floating point notation.
What you can do is to control the precision of what is displayed on outputting the number.
For example, do this to control the precision of the output to 2 digits -
std::cout << std::fixed;
std::cout << std::setprecision(2);
std::cout << value;
You can give any number for the precision.

IEEE 754 floating point, what is the largest number < 1?

When using IEEE 754 floating point representation (double type in c++), numbers that are very close to (representable) integers are rounded to their closest integer and represented exactly. Is that true?
Exactly how close does a number have to be to the nearest representable integer before it is rounded?
Is this distance constant?
For example, given that 1 can be represented exactly, what is the largest double less than 1?
When using IEEE 754 floating point representation (double type in
c++), numbers that are very close to (representable) integers are
rounded to their closest integer and represented exactly.
This depends upon whether the number is closer to the integer than to other values representable. 0.99999999999999994 is not equal to 1, but 0.99999999999999995 is.
Is this distance constant?
No, it becomes less with larger magnitudes - in particular with larger exponents in the representation. Larger exponents imply larger intervals to be covered by the mantissa, which in turn implies less precision overall.
For example, what is the largest double less than 1?
std::nexttoward(1.0, 0.0). E.g. 0.999999999999999889 on Coliru.
You will find much more definitive statements regarding the opposite direction from 1.0
The difference between 1.0 and the next larger number is documented here:
std::numeric_limits<double>::epsilon()
The way floating point works, the next smaller number should be exactly half as far away from 1.0 as the next larger number.
The first IEEE double below 1 can be written unambiguously as 0.99999999999999989, but is exactly 0.99999999999999988897769753748434595763683319091796875.
The distance is not constant, it depends on the exponent (and thus the magnitude) of the number. Eventually the gap becomes larger than 1, meaning even (not as opposed to odd - odd integers are the first to get rounded) integers will get rounded somewhat (or, eventually, a lot).
The binary representation of increasing IEEE floating point numbers can be seen as a increasing integer representation:
Sample Hack (Intel):
#include <cstdint>
#include <iostream>
#include <limits>
int main() {
double one = 1;
std::uint64_t one_representation = *reinterpret_cast<std::uint64_t*>(&one);
std::uint64_t lesser_representation = one_representation - 1;
std::cout.precision(std::numeric_limits<double>::digits10 + 1);
std::cout << std::hex;
std::cout << *reinterpret_cast<double*>(&lesser_representation)
<< " [" << lesser_representation
<< "] < " << *reinterpret_cast<double*>(&one_representation)
<< " [" << one_representation
<< "]\n";
}
Output:
0.9999999999999999 [3fefffffffffffff] < 1 [3ff0000000000000]
When advancing the integer representation to its limits, the difference of consecutive floating point numbers is increasing, if exponent bits change.
See also: http://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
When using IEEE 754 floating point representation (double type in c++), numbers that are very close to exact integers are rounded to the closest integer and represented exactly. Is that true?
This is false.
Exactly how close does a number have to be to the nearest int before it is rounded?
When you do a binary to string conversion the floating point number gets rounded to the current precision (for printf family of functions the default precision is 6) using the current rounding mode.

0.1 float is greater than 0.1 double. I expected it to be false [duplicate]

This question already has answers here:
If operator< works properly for floating-point types, why can't we use it for equality testing?
(5 answers)
Closed 9 years ago.
Let:
double d = 0.1;
float f = 0.1;
should the expression
(f > d)
return true or false?
Empirically, the answer is true. However, I expected it to be false.
As 0.1 cannot be perfectly represented in binary, while double has 15 to 16 decimal digits of precision, and float has only 7. So, they both are less than 0.1, while the double is more close to 0.1.
I need an exact explanation for the true.
I'd say the answer depends on the rounding mode when converting the double to float. float has 24 binary bits of precision, and double has 53. In binary, 0.1 is:
0.1₁₀ = 0.0001100110011001100110011001100110011001100110011…₂
^ ^ ^ ^
1 10 20 24
So if we round up at the 24th digit, we'll get
0.1₁₀ ~ 0.000110011001100110011001101
which is greater than the exact value and the more precise approximation at 53 digits.
The number 0.1 will be rounded to the closest floating-point representation with the given precision. This approximation might be either greater than or less than 0.1, so without looking at the actual values, you can't predict whether the single precision or double precision approximation is greater.
Here's what the double precision value gets rounded to (using a Python interpreter):
>>> "%.55f" % 0.1
'0.1000000000000000055511151231257827021181583404541015625'
And here's the single precision value:
>>> "%.55f" % numpy.float32("0.1")
'0.1000000014901161193847656250000000000000000000000000000'
So you can see that the single precision approximation is greater.
If you convert .1 to binary you get:
0.000110011001100110011001100110011001100110011001100...
repeating forever
Mapping to data types, you get:
float(.1) = %.00011001100110011001101
^--- note rounding
double(.1) = %.0001100110011001100110011001100110011001100110011010
Convert that to base 10:
float(.1) = .10000002384185791015625
double(.1) = .100000000000000088817841970012523233890533447265625
This was taken from an article written by Bruce Dawson. it can be found here:
Doubles are not floats, so don’t compare them
I think Eric Lippert's comment on the question is actually the clearest explanation, so I'll repost it as an answer:
Suppose you are computing 1/9 in 3-digit decimal and 6-digit decimal. 0.111 < 0.111111, right?
Now suppose you are computing 6/9. 0.667 > 0.666667, right?
You can't have it that 6/9 in three digit decimal is 0.666 because that is not the closest 3-digit decimal to 6/9!
Since it can't be exactly represented, comparing 1/10 in base 2 is like comparing 1/7 in base 10.
1/7 = 0.142857142857... but comparing at different base 10 precisions (3 versus 6 decimal places) we have 0.143 > 0.142857.
Just to add to the other answers talking about IEEE-754 and x86: the issue is even more complicated than they make it seem. There is not "one" representation of 0.1 in IEEE-754 - there are two. Either rounding the last digit down or up would be valid. This difference can and does actually occur, because x86 does not use 64-bits for its internal floating-point computations; it actually uses 80-bits! This is called double extended-precision.
So, even among just x86 compilers, it sometimes happen that the same number is represented two different ways, because some computes its binary representation with 64-bits, while others use 80.
In fact, it can happen even with the same compiler, even on the same machine!
#include <iostream>
#include <cmath>
void foo(double x, double y)
{
if (std::cos(x) != std::cos(y)) {
std::cout << "Huh?!?\n"; //← you might end up here when x == y!!
}
}
int main()
{
foo(1.0, 1.0);
return 0;
}
See Why is cos(x) != cos(y) even though x == y? for more info.
The rank of double is greater than that of float in conversions. By doing a logical comparison, f is cast to double and maybe the implementation you are using is giving inconsistent results. If you suffix f so the compiler registers it as a float, then you get 0.00 which is false in double type. Unsuffixed floating types are double.
#include <stdio.h>
#include <float.h>
int main()
{
double d = 0.1;
float f = 0.1f;
printf("%f\n", (f > d));
return 0;
}