c++ precision issue in storing floating point numbers - c++

I'm handling some mathematical calculation.
I'm losing precision. But i need extreme precision.
I then used to check the precision issue with the code given below.
Any solution for getting the precision?
#include <iostream>
#include <stdlib.h>
#include <cstdio>
#include <sstream>
#include <iomanip>
using namespace std;
int main(int argc,char** arvx)
{
float f = 1.00000001;
cout << "f: " <<std::setprecision(20)<< f <<endl;
return 0;
}
Output is
f: 1

If you truly want precise representation of these sorts of numbers (ie, with very small fractional components many places beyond the decimal point), then floating point types like float or even the much more precise double may still not give you the exact results you are looking for in all circumstances. Floating point types can only approximate some values with small fractional components.
You may need to use some sort of high precision fixed point C++ type in order to get exact representation of very small fractions in your values, and resulting accurate calculated results when you perform mathematical operations on such numbers. The following question/answers may provide you with some useful pointers: C++ fixed point library?

in c++
float f = 1.00000001;
support only 6 digits after decimal point
float f = 1.000001;
if you want more real calculation use double

Related

boost int 128 division to floating point number

I am new in boost. I have 128 bit integer (int128_t boost/multiprecision/cpp_int.hpp) in my project, which I need to divide to floating point number. In my current platform I have limitation and can't use boost/multiprecision/float128.hpp. It's still not supported in clang now https://github.com/boostorg/math/issues/181
Is there any way to this with boost math lib?
Although you can't use float128, Boost has several other implementations of long floating-point types:
cpp_bin_float
cpp_dec_float
gmp_float
mpfr_float
In particular, if you need binary high-precision floating-point type without dependencies on external libraries like GMP, you can use cpp_bin_float. Example:
#include <iomanip>
#include <iostream>
#include <boost/multiprecision/cpp_int.hpp>
#include <boost/multiprecision/cpp_bin_float.hpp>
int main()
{
using LongFloat=boost::multiprecision::cpp_bin_float_quad;
const auto x=boost::multiprecision::int128_t(1234123521);
const auto y=LongFloat(34532.52346246234);
const auto z=LongFloat(x)/y;
std::cout << "Ratio: " << std::setprecision(10) << z << "\n";
}
Here we've used a built-in typedef for 113-bit floating-point number, which has the same precision and range as IEEE 754 binary128. You can choose other parameters for the precision and range, see the docs I've linked to above for details.
Note though, that int128_t has more precision than any kind of float128, because some bits of the latter are used to store its exponent. If that's an issue, be sure to use higher precision.
Perhaps split the int128 into 64-bit numbers?
i128 = h64 * (1<<64) + l64
Then you could easily load those values shift and sum them on the 64bit floating point to get the equivalent number.
Or, as the floating point hardware is actually only 64 bit precision anyway, you could just shift down your int128 until it fits in 64 bit, cast that to float and then shift it back up, but the former may actually be faster because it is simpler.

Why does it seem like my floating point value is not equal to itself? [duplicate]

This question already has answers here:
strange output in comparison of float with float literal
(8 answers)
Closed 4 years ago.
#include<iostream.h>
using namespace std;
int main()
{
float x=1.1;
if(x==1.1)
cout<<"yes";
else
cout<<"no";
return 0;
}
I assign value 1.1 to x and checked value of x is 1.1 or not?
You've wandered into an interesting area of almost all programming languages. Floating point values are tricky things, and testing them for equality is very rarely recommended. The basic problem is that floating point values on modern computers are represented as binary decimals with a finite number of digits of precision.
To make this simpler to understand, lets work with base 10 decimals and use a number that can't be accurately represented using them. Take 1/3. If you are representing it as a base 10 decimal you get this:
0.̅3 (there is a bar over the three if it isn't showing up properly). Basically, it goes on forever, there is no finite number of digits that can represent 1/3 as a base ten decimal with perfect accuracy. So, if you only have so many digits, you chop it off and approximate:
0.333333
That's actually 333333/1000000, which is really close to 1/3, but not quite.
C++ has a few different floating point types. And these types usually (it depends on the platform the program is being compiled for) have different numbers of significant digits. By default, a floating point constant is of type double which usually has more digits than a float (and it never has less). Again, using base 10 as an example, since you were storing your value in a float you were doing something like this:
0.333333 == 0.3333333333333333333
which of course is false.
If you wrote your code this way:
#include <iostream>
using namespace std;
int main()
{
float x = 1.1f;
if(x == 1.1f)
cout<<"yes";
else
cout<<"no";
return 0;
}
you would likely get the expected result. Putting an f at the end of a bare floating point value (aka, a floating point literal) tells C++ that it's of type float.
This is all very fascinating of course, and there's a lot to get into. If you would like to learn a lot more about how floating point numbers are really represented, there is a nice Wikipedia page on IEEE 754 floating point representation, which is how most modern processors represent floating point numbers nowadays.
From a practical standpoint, you should rarely (if ever) compare floating point numbers for equality. Usually, a desire to do so indicates some sort of design flaw in your program. And if you really must than use an 'epsilon' comparison. Basically, test to see if your number is 'close enough', though determining what that means in any given situation isn't necessarily a trivial task, which is why it usually represents a design flaw if you need to compare them for equality at all. But, in your case, it could look like this:
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
float x=1.1;
if (fabs(x - 1.1) < 0.000001)
cout<<"yes";
else
cout<<"no";
return 0;
}
The reason the compare fails is that you're comparing a double value to a float variable.
Some compilers will issue a warning when you assign a double value to a float variable.
To get the desired output, you could try this:
double x = 1.1;
if (x == 1.1)
or this:
float x = 1.1f;
if (x == 1.1f)

C++: boost multiprecision printing [duplicate]

I am running a simulation of physical experiments, so I need really high floating point precision (more than 16 digits). I use Boost.Multiprecision, however I can't get a precision higher than 16 digits, no matter what I tried. I run the simulation with C++ and eclipse compiler, for example:
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
using boost::multiprecision::cpp_dec_float_50;
void main()
{
cpp_dec_float_50 my_num= cpp_dec_float_50(0.123456789123456789123456789);
std::cout.precision(std::numeric_limits<cpp_dec_float_50>::digits10);
std::cout << my_num << std::endl;
}
The output is:
0.12345678912345678379658409085095627233386039733887
^
But it should be:
0.123456789123456789123456789
As you can see, after 16 digits it is incorrect. Why?
Your issue is here:
cpp_dec_float_50 my_num = cpp_dec_float_50(0.123456789123456789123456789);
^ // This number is a double!
The compiler does not use arbitrary-precision floating point literals, and instead uses IEEE-754 doubles, which have finite precision. In this case, the closest double to the number you have written is:
0.1234567891234567837965840908509562723338603973388671875
And printing it to the 50th decimal does indeed give the output you are observing.
What you want is to construct your arbitrary-precision float from a string instead (demo):
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
using boost::multiprecision::cpp_dec_float_50;
int main() {
cpp_dec_float_50 my_num = cpp_dec_float_50("0.123456789123456789123456789");
std::cout.precision(std::numeric_limits<cpp_dec_float_50>::digits10);
std::cout << my_num << std::endl;
}
Output:
0.123456789123456789123456789
The problem is that the C++ compiler converts numbers to doubles when compiling (I also learned this a while ago). You have to use special functions to handle more decimal points. See the Boost documentation or other answers here on SO for examples.
That said, it is almost impossible that there would be any real need for such high precision. If you are loosing precision you should consider other floating point algorithms instead of blindly increasing the number of decimals.

higher precision floating point using boost lib (higher then 16 digits)

I am running a simulation of physical experiments, so I need really high floating point precision (more than 16 digits). I use Boost.Multiprecision, however I can't get a precision higher than 16 digits, no matter what I tried. I run the simulation with C++ and eclipse compiler, for example:
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
using boost::multiprecision::cpp_dec_float_50;
void main()
{
cpp_dec_float_50 my_num= cpp_dec_float_50(0.123456789123456789123456789);
std::cout.precision(std::numeric_limits<cpp_dec_float_50>::digits10);
std::cout << my_num << std::endl;
}
The output is:
0.12345678912345678379658409085095627233386039733887
^
But it should be:
0.123456789123456789123456789
As you can see, after 16 digits it is incorrect. Why?
Your issue is here:
cpp_dec_float_50 my_num = cpp_dec_float_50(0.123456789123456789123456789);
^ // This number is a double!
The compiler does not use arbitrary-precision floating point literals, and instead uses IEEE-754 doubles, which have finite precision. In this case, the closest double to the number you have written is:
0.1234567891234567837965840908509562723338603973388671875
And printing it to the 50th decimal does indeed give the output you are observing.
What you want is to construct your arbitrary-precision float from a string instead (demo):
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_dec_float.hpp>
#include <iostream>
#include <limits>
using boost::multiprecision::cpp_dec_float_50;
int main() {
cpp_dec_float_50 my_num = cpp_dec_float_50("0.123456789123456789123456789");
std::cout.precision(std::numeric_limits<cpp_dec_float_50>::digits10);
std::cout << my_num << std::endl;
}
Output:
0.123456789123456789123456789
The problem is that the C++ compiler converts numbers to doubles when compiling (I also learned this a while ago). You have to use special functions to handle more decimal points. See the Boost documentation or other answers here on SO for examples.
That said, it is almost impossible that there would be any real need for such high precision. If you are loosing precision you should consider other floating point algorithms instead of blindly increasing the number of decimals.

Loss of precision while working with double

Could we work with big numbers up to 10^308.
How can I calculate the 11^105 using just double?
The answer of (11^105) is:
22193813979407164354224423199022080924541468040973950575246733562521125229836087036788826138225193142654907051
Is it possible to get the correct result of 11^105?
As I know double can handle 10^308 which is much bigger than 11^105.
I know that this code is wrong:
#include <iostream>
#include <cstdio>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
double n, p, x;
cin >> n >> p;
//scanf("%lf %lf", &n,&p);
x = exp(log((double)n)*p);
//printf("%lf\n", x);
cout << x <<endl;
return 0;
}
Thanks.
double usually has 11bit for exp (-1022~1023 normalized), 52bit for fact and 1bit for sign. Thus 11^105 cannot be represented accurately.
For more explanation, see IEEE 754 on Wikipedia
Double can hold very large results, but not high precision. In constrast to fixed point numbers, double is floating point real number. This means, for the same accuracy double can shift the radix to handle different range of number and thus you see high range.
For your purpose, you need some home cooked big num library, or you can find one readily available and written by someone else.
BTW my home cooked recipe gives different answer for 11105
Confirmed with this haskell code