C++ floating point precision [duplicate] - c++

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Floating point inaccuracy examples
double a = 0.3;
std::cout.precision(20);
std::cout << a << std::endl;
result: 0.2999999999999999889
double a, b;
a = 0.3;
b = 0;
for (char i = 1; i <= 50; i++) {
b = b + a;
};
std::cout.precision(20);
std::cout << b << std::endl;
result: 15.000000000000014211
So.. 'a' is smaller than it should be.
But if we take 'a' 50 times - result will be bigger than it should be.
Why is this?
And how to get correct result in this case?

To get the correct results, don't set precision greater than available for this numeric type:
#include <iostream>
#include <limits>
int main()
{
double a = 0.3;
std::cout.precision(std::numeric_limits<double>::digits10);
std::cout << a << std::endl;
double b = 0;
for (char i = 1; i <= 50; i++) {
b = b + a;
};
std::cout.precision(std::numeric_limits<double>::digits10);
std::cout << b << std::endl;
}
Although if that loop runs for 5000 iterations instead of 50, the accumulated error will show up even with this approach -- it's just how floating-point numbers work.

Why is this?
Because floating-point numbers are stored in binary, in which 0.3 is 0.01001100110011001... repeating just like 1/3 is 0.333333... is repeating in decimal. When you write 0.3, you actually get 0.299999999999999988897769753748434595763683319091796875 (the infinite binary representation rounded to 53 significant digits).
Keep in mind that for the applications for which floating-point is designed, it's not a problem that you can't represent 0.3 exactly. Floating-point was designed to be used with:
Physical measurements, which are often measured to only 4 sig figs and never to more than 15.
Transcendental functions like logarithms and the trig functions, which are only approximated anyway.
For which binary-decimal conversions are pretty much irrelevant compared to other sources of error.
Now, if you're writing financial software, for which $0.30 means exactly $0.30, it's different. There are decimal arithmetic classes designed for this situation.
And how to get correct result in this case?
Limiting the precision to 15 significant digits is usually enough to hide the "noise" digits. Unless you actually need an exact answer, this is usually the best approach.

Computers store floating point numbers in binary, not decimal.
Many numbers that look ordinary in decimal, such as 0.3, have no exact representation of finite length in binary.
Therefore, the compiler picks the closest number that has an exact binary representation, just like you write 0.33333 for 1⁄3.
If you add many floating-point numbers, these tiny difference add up, and you get unexpected results.

It's not that it's bigger or smaller, it's just that it's physically impossible to store "0.3" as an exact value inside a binary floating point number.
The way to get the "correct" result is to not display 20 decimal places.

To get the "correct" result, try
List of Arbitrary-precision arithmetic Libraries from Wikipedia:
http://en.wikipedia.org/wiki/Arbitrary-precision
or
http://speleotrove.com/decimal

Related

Errors in Casting Doubles to Integers [duplicate]

This question already has answers here:
Round a float to a regular grid of predefined points
(11 answers)
Closed 4 years ago.
I am calculating the number of significant numbers past the decimal point. My program discards any numbers that are spaced more than 7 orders of magnitude apart after the decimal point. Expecting some error with doubles, I accounted for very small numbers popping up when subtracting ints from doubles, even when it looked like it should equal zero (To my knowledge this is due to how computers store and compute their numbers). My confusion is why my program does not handle this unexpected number given this random test value.
Having put many cout statements it would seem that it messes up when it tries to cast the final 2. Whenever it casts it casts to 1 instead.
bool flag = true;
long double test = 2029.00012;
int count = 0;
while(flag)
{
test = test - static_cast<int>(test);
if(test <= 0.00001)
{
flag = false;
}
test *= 10;
count++;
}
The solution I found was to cast only once at the beginning, as rounding may produce a negative and terminate prematurely, and to round thenceforth. The interesting thing is that both trunc and floor also had this issue, seemingly turning what should be a 2 into a 1.
My Professor and I were both quite stumped as I fully expected small numbers to appear (most were in the 10^-10 range), but was not expecting that casting, truncing, and flooring would all also fail.
It is important to understand that not all rational numbers are representable in finite precision. Also, it is important to understand that set of numbers which are representable in finite precision in decimal base, is different from the set of numbers that are representable in finite precision in binary base. Finally, it is important to understand that your CPU probably represents floating point numbers in binary.
2029.00012 in particular happens to be a number that is not representable in a double precision IEEE 754 floating point (and it indeed is a double precision literal; you may have intended to use long double instead). It so happens that the closest number that is representable is 2029.000119999999924402800388634204864501953125. So, you're counting the significant digits of that number, not the digits of the literal that you used.
If the intention of 0.00001 was to stop counting digits when the number is close to a whole number, it is not sufficient to check whether the value is less than the threshold, but also whether it is greater than 1 - threshold, as the representation error can go either way:
if(test <= 0.00001 || test >= 1 - 0.00001)
After all, you can multiple 0.99999999999999999999999999 with 10 many times until the result becomes close to zero, even though that number is very close to a whole number.
As multiple people have already commented, that won't work because of limitations of floating-point numbers. You had a somewhat correct intuition when you said that you expected "some error" with doubles, but that is ultimately not enough. Running your specific program on my machine, the closest representable double to 2029.00012 is 2029.0001199999999244 (this is actually a truncated value, but it shows the series of 9's well enough). For that reason, when you multiply by 10, you keep finding new significant digits.
Ultimately, the issue is that you are manipulating a base-2 real number like it's a base-10 number. This is actually quite difficult. The most notorious use cases for this are printing and parsing floating-point numbers, and a lot of sweat and blood went into that. For example, it wasn't that long ago that you could trick the official Java implementation into looping endlessly trying to convert a String to a double.
Your best shot might be to just reuse all that hard work. Print to 7 digits of precision, and subtract the number of trailing zeroes from the result:
#include <iostream>
#include <sstream>
#include <iomanip>
#include <string>
int main() {
long double d = 2029.00012;
auto double_string = (std::stringstream() << std::fixed << std::setprecision(7) << d).str();
auto first_decimal_index = double_string.find('.') + 1;
auto last_nonzero_index = double_string.find_last_not_of('0');
if (last_nonzero_index == std::string::npos) {
std::cout << "7 significant digits\n";
} else if (last_nonzero_index < first_decimal_index) {
std::cout << -(first_decimal_index - last_nonzero_index + 1) << " significant digits\n";
} else {
std::cout << (last_nonzero_index - first_decimal_index) << " significant digits\n";
}
}
It feels unsatisfactory, but:
it correctly prints 5;
the "satisfactory" alternative is possibly significantly harder to implement.
It seems to me that your second-best alternative is to read on floating-point printing algorithms and implement just enough of it to get the length of the value that you're going to print, and that's not exactly an introductory-level task. If you decide to go this route, the current state of the art is the Grisu2 algorithm. Grisu2 has the notable benefit that it will always print the shortest base-10 string that will produce the given floating-point value, which is what you seem to be after.
If you want sane results, you can't just truncate the digits, because sometimes the floating point number will be a hair less than the rounded number. If you want to fix this via a fluke, change your initialization to be
long double test = 2029.00012L;
If you want to fix it for real,
bool flag = true;
long double test = 2029.00012;
int count = 0;
while (flag)
{
test = test - static_cast<int>(test + 0.000005);
if (test <= 0.00001)
{
flag = false;
}
test *= 10;
count++;
}
My apologies for butchering your haphazard indent; I can't abide by them. According to one of my CS professors, "ideally, a computer scientist never has to worry about the underlying hardware." I'd guess your CS professor might have similar thoughts.

Different rounding results in terms of floating number ending with 5 in C++

I was looking for a method to round float numbers in c++ this morning and I found this answer solve my problem.
However, I notice something unusual to me. When I try to round certain float numbers to two decimal places, it seems like numbers such as 1.075 and 1.895 follow different rounding rules. Specifically, with the following simple code:
#include <iostream>
#include <iomanip>
int main(int argc, char** argv)
{
float testme[] = { 1.07500, 1.89500, 2.70500, 3.47500};
std::cout << std::setprecision(2) << std::fixed;
for(int i = 0; i < 4; ++i)
{
std::cout << testme[i] << std::endl;
}
return 0;
}
The result I have is
1.08
1.89
2.70
3.47
So 1.075 turns to 1.08 while 1.895 becomes 1.89. This really confused me. I would appreciate some explanation. Thanks!
I believe this is an issue with floating-point numbers not being able to precisely represent the number 1.895. The closest floating point value to 1.895 that the computer can store is actually
1.894999980926513671875
which, if rounded to two decimal places, actually should be 1.89, since after looking at the next digit you'd round down.
I managed to get the above number using this tool, which might also come in handy for explaining other values.
Most decimal numbers cannot be represented exactly in binary format, so they have to be rounded. IEEE 754 defines several possibilities for this procedure, I think you see this one (citation from Wikipedia):
Round to nearest, ties to even – rounds to the nearest value; if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit; this is the default for binary floating-point and the recommended default for decimal.
Basically, decimal numbers which end with digit 5 are sometimes rounded up and sometimes rounded down to avoid a statistical drift.

why my answer gives 1.49012e when my input is 0.1 [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Floating point inaccuracy examples
double a = 0.3;
std::cout.precision(20);
std::cout << a << std::endl;
result: 0.2999999999999999889
double a, b;
a = 0.3;
b = 0;
for (char i = 1; i <= 50; i++) {
b = b + a;
};
std::cout.precision(20);
std::cout << b << std::endl;
result: 15.000000000000014211
So.. 'a' is smaller than it should be.
But if we take 'a' 50 times - result will be bigger than it should be.
Why is this?
And how to get correct result in this case?
To get the correct results, don't set precision greater than available for this numeric type:
#include <iostream>
#include <limits>
int main()
{
double a = 0.3;
std::cout.precision(std::numeric_limits<double>::digits10);
std::cout << a << std::endl;
double b = 0;
for (char i = 1; i <= 50; i++) {
b = b + a;
};
std::cout.precision(std::numeric_limits<double>::digits10);
std::cout << b << std::endl;
}
Although if that loop runs for 5000 iterations instead of 50, the accumulated error will show up even with this approach -- it's just how floating-point numbers work.
Why is this?
Because floating-point numbers are stored in binary, in which 0.3 is 0.01001100110011001... repeating just like 1/3 is 0.333333... is repeating in decimal. When you write 0.3, you actually get 0.299999999999999988897769753748434595763683319091796875 (the infinite binary representation rounded to 53 significant digits).
Keep in mind that for the applications for which floating-point is designed, it's not a problem that you can't represent 0.3 exactly. Floating-point was designed to be used with:
Physical measurements, which are often measured to only 4 sig figs and never to more than 15.
Transcendental functions like logarithms and the trig functions, which are only approximated anyway.
For which binary-decimal conversions are pretty much irrelevant compared to other sources of error.
Now, if you're writing financial software, for which $0.30 means exactly $0.30, it's different. There are decimal arithmetic classes designed for this situation.
And how to get correct result in this case?
Limiting the precision to 15 significant digits is usually enough to hide the "noise" digits. Unless you actually need an exact answer, this is usually the best approach.
Computers store floating point numbers in binary, not decimal.
Many numbers that look ordinary in decimal, such as 0.3, have no exact representation of finite length in binary.
Therefore, the compiler picks the closest number that has an exact binary representation, just like you write 0.33333 for 1⁄3.
If you add many floating-point numbers, these tiny difference add up, and you get unexpected results.
It's not that it's bigger or smaller, it's just that it's physically impossible to store "0.3" as an exact value inside a binary floating point number.
The way to get the "correct" result is to not display 20 decimal places.
To get the "correct" result, try
List of Arbitrary-precision arithmetic Libraries from Wikipedia:
http://en.wikipedia.org/wiki/Arbitrary-precision
or
http://speleotrove.com/decimal

Why comparing double and float leads to unexpected result? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
strange output in comparision of float with float literal
float f = 1.1;
double d = 1.1;
if(f == d) // returns false!
Why is it so?
The important factors under consideration with float or double numbers are:
Precision & Rounding
Precision:
The precision of a floating point number is how many digits it can represent without losing any information it contains.
Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3′s going out to infinity. An infinite length number would require infinite memory to be depicted with exact precision, but float or double data types typically only have 4 or 8 bytes. Thus Floating point & double numbers can only store a certain number of digits, and the rest are bound to get lost. Thus, there is no definite accurate way of representing float or double numbers with numbers that require more precision than the variables can hold.
Rounding:
There is a non-obvious differences between binary and decimal (base 10) numbers.
Consider the fraction 1/10. In decimal, this can be easily represented as 0.1, and 0.1 can be thought of as an easily representable number. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011…
An example:
#include <iomanip>
int main()
{
using namespace std;
cout << setprecision(17);
double dValue = 0.1;
cout << dValue << endl;
}
This output is:
0.10000000000000001
And not
0.1.
This is because the double had to truncate the approximation due to it’s limited memory, which results in a number that is not exactly 0.1. Such an scenario is called a Rounding error.
Whenever comparing two close float and double numbers such rounding errors kick in and eventually the comparison yields incorrect results and this is the reason you should never compare floating point numbers or double using ==.
The best you can do is to take their difference and check if it is less than an epsilon.
abs(x - y) < epsilon
Try running this code, the results will make the reason obvious.
#include <iomanip>
#include <iostream>
int main()
{
std::cout << std::setprecision(100) << (double)1.1 << std::endl;
std::cout << std::setprecision(100) << (float)1.1 << std::endl;
std::cout << std::setprecision(100) << (double)((float)1.1) << std::endl;
}
The output:
1.100000000000000088817841970012523233890533447265625
1.10000002384185791015625
1.10000002384185791015625
Neither float nor double can represent 1.1 accurately. When you try to do the comparison the float number is implicitly upconverted to a double. The double data type can accurately represent the contents of the float, so the comparison yields false.
Generally you shouldn't compare floats to floats, doubles to doubles, or floats to doubles using ==.
The best practice is to subtract them, and check if the absolute value of the difference is less than a small epsilon.
if(std::fabs(f - d) < std::numeric_limits<float>::epsilon())
{
// ...
}
One reason is because floating point numbers are (more or less) binary fractions, and can only approximate many decimal numbers. Many decimal numbers must necessarily be converted to repeating binary "decimals", or irrational numbers. This will introduce a rounding error.
From wikipedia:
For instance, 1/5 cannot be represented exactly as a floating point number using a binary base but can be represented exactly using a decimal base.
In your particular case, a float and double will have different rounding for the irrational/repeating fraction that must be used to represent 1.1 in binary. You will be hard pressed to get them to be "equal" after their corresponding conversions have introduced different levels of rounding error.
The code I gave above solves this by simply checking if the values are within a very short delta. Your comparison changes from "are these values equal?" to "are these values within a small margin of error from each other?"
Also, see this question: What is the most effective way for float and double comparison?
There are also a lot of other oddities about floating point numbers that break a simple equality comparison. Check this article for a description of some of them:
http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
The IEEE 754 32-bit float can store: 1.1000000238...
The IEEE 754 64-bit double can store: 1.1000000000000000888...
See why they're not "equal"?
In IEEE 754, fractions are stored in powers of 2:
2^(-1), 2^(-2), 2^(-3), ...
1/2, 1/4, 1/8, ...
Now we need a way to represent 0.1. This is (a simplified version of) the 32-bit IEEE 754 representation (float):
2^(-4) + 2^(-5) + 2^(-8) + 2^(-9) + 2^(-12) + 2^(-13) + ... + 2^(-24) + 2^(-25) + 2^(-27)
00011001100110011001101
1.10000002384185791015625
With 64-bit double, it's even more accurate. It doesn't stop at 2^(-25), it keeps going for about twice as much. (2^(-48) + 2^(-49) + 2^(-51), maybe?)
Resources
IEEE 754 Converter (32-bit)
Floats and doubles are stored in a binary format that can not represent every number exactly (it's impossible to represent the infinitely many possible different numbers in a finite space).
As a result they do rounding. Float has to round more than double, because it is smaller, so 1.1 rounded to the nearest valid Float is different to 1.1 rounded to the nearest valud Double.
To see what numbers are valid floats and doubles see Floating Point

Why do simple doubles like 1.82 end up being 1.819999999645634565360? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Why does Visual Studio 2008 tell me .9 - .8999999999999995 = 0.00000000000000055511151231257827?
c++
Hey so i'm making a function to return the number of a digits in a number data type given, but i'm having some trouble with doubles.
I figure out how many digits are in it by multiplying it by like 10 billion and then taking away digits 1 by 1 until the double ends up being 0. however when putting in a double of value say .7904 i never exit the function as it keeps taking away digits which never end up being 0 as the resut of .7904 ends up being 7,903,999,988 and not 7,904,000,000.
How can i solve this problem?? Thanks =) ! oh and any other feed back on my code is WELCOME!
here's the code of my function:
/////////////////////// Numb_Digits() ////////////////////////////////////////////////////
enum{DECIMALS = 10, WHOLE_NUMBS = 20, ALL = 30};
template<typename T>
unsigned long int Numb_Digits(T numb, int scope)
{
unsigned long int length= 0;
switch(scope){
case DECIMALS: numb-= (int)numb; numb*=10000000000; // 10 bil (10 zeros)
for(; numb != 0; length++)
numb-=((int)(numb/pow((double)10, (double)(9-length))))* pow((double)10, (double)(9-length)); break;
case WHOLE_NUMBS: numb= (int)numb; numb*=10000000000;
for(; numb != 0; length++)
numb-=((int)(numb/pow((double)10, (double)(9-length))))* pow((double)10, (double)(9-length)); break;
case ALL: numb = numb; numb*=10000000000;
for(; numb != 0; length++)
numb-=((int)(numb/pow((double)10, (double)(9-length))))* pow((double)10, (double)(9-length)); break;
default: break;}
return length;
};
int main()
{
double test = 345.6457;
cout << Numb_Digits(test, ALL) << endl;
cout << Numb_Digits(test, DECIMALS) << endl;
cout << Numb_Digits(test, WHOLE_NUMBS) << endl;
return 0;
}
It's because of their binary representation, which is discussed in depth here:
http://en.wikipedia.org/wiki/IEEE_754-2008
Basically, when a number can't be represented as is, an approximation is used instead.
To compare floats for equality, check if their difference is lesser than an arbitrary precision.
The easy summary about floating point arithmetic :
http://floating-point-gui.de/
Read this and you'll see the light.
If you're more on the math side, Goldberg paper is always nice :
http://cr.yp.to/2005-590/goldberg.pdf
Long story short : real numbers are stored with a fixed, irregular precision, leading to non obvious behaviors. This is unrelated to the language but more a design choice of how to handle real numbers as a whole.
This is because C++ (like most other languages) can not store floating point numbers with infinte precision.
Floating points are stored like this:
sign * coefficient * 10^exponent if you're using base 10.
The problem is that both the coefficient and exponent are stored as finite integers.
This is a common problem with storing floating point in computer programs, you usually get a tiny rounding error.
The most common way of dealing with this is:
Store the number as a fraction (x/y)
Use a delta that allows small deviations (if abs(x-y) < delta)
Use a third party library such as GMP that can store floating point with perfect precision.
Regarding your question about counting decimals.
There is no way of dealing with this if you get a double as input. You cannot be sure that the user actually sent 1.819999999645634565360 and not 1.82.
Either you have to change your input or change the way your function works.
More info on floating point can be found here: http://en.wikipedia.org/wiki/Floating_point
This is because of the way the IEEE floating point standard is implemented, which will vary depending on operations. It is an approximation of precision. Never use logic of if(float == float), ever!
Float numbers are represented in the form Significant digits × baseexponent(IEEE 754). In your case, float 1.82 = 1 + 0.5 + 0.25 + 0.0625 + ...
Since only a limited digits could be stored, therefore there will be a round error if the float number cannot be represented as a terminating expansion in the relevant base (base 2 in the case).
You should always check relative differences with floating point numbers, not absolute values.
You need to read this, too.
Computers don't store floating point numbers exactly. To accomplish what you are doing, you could store the original input as a string, and count the number of characters.