Losing precision with floating point numbers (double) in c++ - c++

I'm trying to assign a big double value to a variable and print it on the console. The number I supply in is different than what is displayed as output. Is it possible to get the double value correctly assigned and output without the loss of precision? Here is the C++ code:
#include <iostream>
#include <limits>
int main( int argc, char *argv[] ) {
// turn off scientific notation on floating point numbers
std::cout << std::fixed << std::setprecision( 3 );
// maximum double value on my machine
std::cout << std::numeric_limits<double>::max() << std::endl;
// string representation of the double value I want to get
std::cout << "123456789123456789123456789123456789.01" << std::endl;
// value I supplied
double d = 123456789123456789123456789123456789.01;
// it's printing 123456789123456784102659645885120512.000 instead of 123456789123456789123456789123456789.01
std::cout << d << std::endl;
return EXIT_SUCCESS;
}
Could you, please, help me to understand the problem.

C++ built-in floating point types are finite in precision. double is usually implemented as IEEE-754 double precision, meaning it has 53 bits of mantissa (the "value") precision, 11 bits of exponent precision, and 1 sign bit.
The number 123456789123456789123456789123456789 requires way more than 53 bits to represent, meaning a typical double cannot possibly represent it accurately. If you want such large numbers with perfect precision, you need to use some sort of a "big number" library.
For more information on floating point formats and their inaccuracies, you can read What Every Programmer Should Know About Floating-Point Arithmetic.

Related

How to set precision of a float?

For a number a = 1.263839, we can do -
float a = 1.263839
cout << fixed << setprecision(2) << a <<endl;
output :- 1.26
But what if i want set precision of a number and store it, for example-
convert 1.263839 to 1.26 without printing it.
But what if i want set precision of a number and store it
You can store the desired precision in a variable:
int precision = 2;
You can then later use this stored precision when converting the float to a string:
std::cout << std::setprecision(precision) << a;
I think OP wants to convert from 1.263839 to 1.26 without printing the number.
If this is your goal, then you first must realise, that 1.26 is not representable by most commonly used floating point representation. The closest representable 32 bit binary IEEE-754 value is 1.2599999904632568359375.
So, assuming such representation, the best that you can hope for is some value that is very close to 1.26. In best case the one I showed, but since we need to calculate the value, keep in mind that some tiny error may be involved beyond the inability to precisely represent the value (at least in theory; there is no error with your example input using the algorithm below, but the possibility of accuracy loss should always be considered with floating point math).
The calculation is as follows:
Let P bet the number of digits after decimal point that you want to round to (2 in this case).
Let D be 10P (100 in this case).
Multiply input by D
std::round to nearest integer.
Divide by D.
P.S. Sometimes you might not want to round to the nearest, but instead want std::floor or std::ceil to the precision. This is slightly trickier. Simply std::floor(val * D) / D is wrong. For example 9.70 floored to two decimals that way would become 9.69, which would be undesirable.
What you can do in this case is multiply with one magnitude of precision, round to nearest, then divide the extra magnitude and proceed:
Let P bet the number of digits after decimal point that you want to round to (2 in this case).
Let D be 10P (100 in this case).
Multiply input by D * 10
std::round to nearest integer.
Divide by 10
std::floor or std::ceil
Divide by D.
You would need to truncate it. Possibly the easiest way is to multiply it by a factor (in case of 2 decimal places, by a factor of 100), then truncate or round it, and lastly divide by the very same factor.
Now, mind you, that floating-point precision issues might occur, and that even after those operations your float might not be 1.26, but 1.26000000000003 instead.
If your goal is to store a number with a small, fixed number of digits of precision after the decimal point, you can do that by storing it as an integer with an implicit power-of-ten multiplier:
#include <stdio.h>
#include <math.h>
// Given a floating point value and the number of digits
// after the decimal-point that you want to preserve,
// returns an integer encoding of the value.
int ConvertFloatToFixedPrecision(float floatVal, int numDigitsAfterDecimalPoint)
{
return (int) roundf(floatVal*powf(10.0f, numDigitsAfterDecimalPoint));
}
// Given an integer encoding of your value (as returned
// by the above function), converts it back into a floating
// point value again.
float ConvertFixedPrecisionBackToFloat(int fixedPrecision, int numDigitsAfterDecimalPoint)
{
return ((float) fixedPrecision) / powf(10.0f, numDigitsAfterDecimalPoint);
}
int main(int argc, char ** arg)
{
const float val = 1.263839;
int fixedTwoDigits = ConvertFloatToFixedPrecision(val, 2);
printf("fixedTwoDigits=%i\n", fixedTwoDigits);
float backToFloat = ConvertFixedPrecisionBackToFloat(fixedTwoDigits, 2);
printf("backToFloat=%f\n", backToFloat);
return 0;
}
When run, the above program prints this output:
fixedTwoDigits=126
backToFloat=1.260000
If you're talking about storing exactly 1.26 in your variable, chances are you can't (there may be an off chance that exactly 1.26 works, but let's assume it doesn't for a moment) because floating point numbers don't work like that. There are always little inaccuracies because of the way computers handle floating point decimal numbers. Even if you could get 1.26 exactly, the moment you try to use it in a calculation.
That said, you can use some math and truncation tricks to get very close:
int main()
{
// our float
float a = 1.263839;
// the precision we're trying to accomplish
int precision = 100; // 3 decimal places
// because we're an int, this will keep the 126 but lose everything else
int truncated = a * precision; // multiplying by the precision ensures we keep that many digits
// convert it back to a float
// Of course, we need to ensure we're doing floating point division
float b = static_cast<float>(truncated) / precision;
cout << "a: " << a << "\n";
cout << "b: " << b << "\n";
return 0;
}
Output:
a: 1.26384
b: 1.26
Note that this is not really 1.26 here. But is is very close.
This can be demonstrated by using setprecision():
cout << "a: " << std:: setprecision(10) << a << "\n";
cout << "b: " << std:: setprecision(10) << b << "\n";
Output:
a: 1.263839006
b: 1.25999999
So again, it's not exactly 1.26, but very close, and slightly closer than you were before.
Using a stringstream would be an easy way to achieve that:
#include <iostream>
#include <iomanip>
#include <sstream>
using namespace std;
int main() {
stringstream s("");
s << fixed << setprecision(2) << 1.263839;
float a;
s >> a;
cout << a; //Outputs 1.26
return 0;
}

Multiplying doubles in C++ error

I have a seemingly simple c++ issue that's bothering me. The output of the code
#include <iostream>
using namespace std;
int main() {
// your code goes here
double c = 9.43827 * 0.105952 ;
cout << c << endl ;
return 0;
}
is 1. Just 1. I guess this is due to precision loss based on how doubles are stored in c++ but surely there must be a way in c++ to get some sort of precision (2 or 3 decimal places) in the result.
It's not precision loss in storage, it's precision loss in converting to text. The stream inserter for double defaults to six significant digits. The product here, 1.000003583, rounded to six significant digits, is 1.00000. In addition, if you haven't set showpoint, the trailing zeros and the decimal point will be suppressed, so you'll see a bare 1. To get the decimal point to show, use std::cout << std::showpoint << c << '\n';. To see more significant digits, use std::cout << std::setprecision(whatever) << c << '\n';, where whatever is the number of digits you want the formatter to use.
#include <stdio.h>
int main() {
// your code goes here
double c = ((double)9.43827) * 0.105952 ;
for(int i = (sizeof(double)*8)-1; i >= 0; i-- ) {
printf("%ld", (*(long*)&c>>i)&1);
}
}
If you run that, you can clearly see the bit representation of your double is not the integer value 1. You're not losing any data.
0011111111110000000000000000001111000001110100001010001001001001
but it is very close to 1, so that's what gets printed out.
Try using cout<<setprecision(12)<<c<<endl;
setprecision sets the decimal precision to be used to format floating-point values on output operations.
source

Why is the output different from what I expected?

I run this code but the output was different from what I expected.
The output:
c = 1324
v = 1324.99
I expected that the output should be 1324.987 for v. Why is the data in v different from output?
I'm using code lite on Windows 8 32.
#include <iostream>
using namespace std;
int main()
{
double v = 1324.987;
int n;
n = int (v);
cout << "c = " << n << endl;
cout << "v = " << v << endl;
return 0;
}
Floating point types inherit rounding errors as a result of their fixed width representations. For more information, see What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The default precision when printing with cout is 6, so only 6 decimal places will be displayed. The number is rounded to the nearest value, that's why you saw 1324.99. You need to set a higher precision to see the more "correct" value
However, setting the precision too high may print out a lot of garbage digits behind, because binary floating-point types cannot store all decimal floating-point values exactly.

How to prevent rounding error in c++?

How I can prevent rounding error in C++ or fix it?
Example:
float SomeNumber = 999.9999;
cout << SomeNumber << endl;
It prints out 1000!
You can alter the rounding done by cout by setting the precision.
cout.precision(7);
float SomeNumber = 999.9999;
cout << SomeNumber << endl;
Alternatively, you can use printf from cstdio.
By default, formatted output via std::ostream rounds floating-point values to six significant decimal figures. You need seven to avoid your number being rounded to 1000:
cout << setprecision(7) << SomeNumber << endl;
^^^^^^^^^^^^^^^
Also, be aware that you're close to the limit of the precision of float, assuming the commonly-used 32-bit IEEE representation. If you need more than seven significant figures then you'll need to switch to double. For example, the following prints 1000, no matter how much precision you specify:
float SomeNumber = 999.99999; // 8 significant figures
cout << setprecision(10) << SomeNumber << endl;
To prevent your output being rounded, use setprecision in iomanip.
float SomeNumber = 999.9999;
std::cout << SomeNumber << std::endl; //outputs 1000
std::cout << std::setprecision (7) << SomeNumber << std::endl; //outputs 999.9999
return 0;
The actual value stored in SomeNumber will always be 999.9999 though, so you don't need to worry about the value itself (unless you need more precision than float provides).
As mentioned previously, if you're looking only for cout rounding fix, use the .precision function. If you're referring to the incapacity of floating points to represent every possible fractions, read below:
You can't avoid such rounding errors using floating point numbers. You need to represent your data in a different way. For example, if you want 5 digits of precision, just store it as a long which represent the number of your smallest units.
I.e. 5.23524 w/ precision at 0.00001 can be represented in a long (or int if your range of values fit) as 523524. You know the units are 0.00001 so you can easily make it work.

C++ Precision: String to Double

I am having a problem with precision of a double after performing some operations on a converted string to double.
#include <iostream>
#include <sstream>
#include <math.h>
using namespace std;
// conversion function
void convert(const char * a, const int i, double &out)
{
double val;
istringstream in(a);
in >> val;
cout << "char a -- " << a << endl;
cout << "val ----- " << val << endl;
val *= i;
cout << "modified val --- " << val << endl;
cout << "FMOD ----- " << fmod(val, 1) << endl;
out = val;
return 0;
}
This isn't the case for all numbers entered as a string, so the error isn't constant.
It only affects some numbers (34.38 seems to be constant).
At the minute, it returns this when i pass in a = 34.38 and i=100:
char a -- 34.38
Val ----- 34.38
modified val --- 3438
FMOD ----- 4.54747e-13
This will work if I change the Val to a float, as there is lower precision, but I need a double.
This also is repro when i use atof, sscanf and strtod instead of sstream.
In C++, what is the best way to correctly convert a string to a double, and actually return an accurate value?
Thanks.
This is almost an exact duplicate of so many questions here - basically there is no exact representation of 34.38 in binary floating point, so your 34 + 19/50 is represented as a 34 + k/n where n is a power of two, and there is no exact power of two which has 50 as a factor, so there is no exact value of k possible.
If you set the output precision, you can see that the best double representation is not exact:
cout << fixed << setprecision ( 20 );
gives
char a -- 34.38
val ----- 34.38000000000000255795
modified val --- 3438.00000000000045474735
FMOD ----- 0.00000000000045474735
So in answer to your question, you are already using the best way to convert a string to a double (though boost lexical cast wraps up your two or three lines into one line, so might save you writing your own function). The result is due to the representation used by doubles, and would apply to any finite representation based on binary floating point.
With floats, the multiplication happens to be rounded down rather than up, so you happen to get an exact result. This is not behaviour you can depend on.
The "problem" here is simply that 34.38 cannot be exactly represented in double-precision floating point. You should read this article which describes why it's impossible to represent decimal values exactly in floating point.
If you were to examine "34.38 * 100" in hex (as per "format hex" in MATLAB for example), you'd see:
40aadc0000000001
Notice the final digit.