Loss of precision while working with double

Loss of precision while working with double - c++

Could we work with big numbers up to 10^308.
How can I calculate the 11^105 using just double?
The answer of (11^105) is:
22193813979407164354224423199022080924541468040973950575246733562521125229836087036788826138225193142654907051
Is it possible to get the correct result of 11^105?
As I know double can handle 10^308 which is much bigger than 11^105.
I know that this code is wrong:
#include <iostream>
#include <cstdio>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
double n, p, x;
cin >> n >> p;
//scanf("%lf %lf", &n,&p);
x = exp(log((double)n)*p);
//printf("%lf\n", x);
cout << x <<endl;
return 0;
}
Thanks.

double usually has 11bit for exp (-1022~1023 normalized), 52bit for fact and 1bit for sign. Thus 11^105 cannot be represented accurately.
For more explanation, see IEEE 754 on Wikipedia

Double can hold very large results, but not high precision. In constrast to fixed point numbers, double is floating point real number. This means, for the same accuracy double can shift the radix to handle different range of number and thus you see high range.
For your purpose, you need some home cooked big num library, or you can find one readily available and written by someone else.
BTW my home cooked recipe gives different answer for 11105
Confirmed with this haskell code

Related

C++ float vs double cout setprecision oddities(newbie)

Can anyone explain why these two variable of the same value can output different values when i use setprecision()?
#include <iostream>
#include <iomanip>
int main()
{
float a=98.765;
double b = 98.765;
//std::cout<<std::setprecision(2)<<a<<std::endl;
std::cout<<std::fixed;
std::cout<<std::setprecision(2)<<a<<std::endl;
std::cout<<std::setprecision(2)<<b<<std::endl;
}
The output for a will be 98.76 while the output for b will be 98.77.

Those variables don't have the same value. When you shoehorn the literal double of 98.765 into the float, it has to do a best fit, and some precision is lost.
You can see this quite easily if you change the precision to 50, you'll also see that not even the double can represent that value exactly:
98.76499938964843750000000000000000000000000000000000
98.76500000000000056843418860808014869689941406250000
However, the important thing is that the former float variable will round down, the latter double will round up.
See also the IEEE754 online converter.

c++ precision issue in storing floating point numbers

I'm handling some mathematical calculation.
I'm losing precision. But i need extreme precision.
I then used to check the precision issue with the code given below.
Any solution for getting the precision?
#include <iostream>
#include <stdlib.h>
#include <cstdio>
#include <sstream>
#include <iomanip>
using namespace std;
int main(int argc,char** arvx)
{
float f = 1.00000001;
cout << "f: " <<std::setprecision(20)<< f <<endl;
return 0;
}
Output is
f: 1

If you truly want precise representation of these sorts of numbers (ie, with very small fractional components many places beyond the decimal point), then floating point types like float or even the much more precise double may still not give you the exact results you are looking for in all circumstances. Floating point types can only approximate some values with small fractional components.
You may need to use some sort of high precision fixed point C++ type in order to get exact representation of very small fractions in your values, and resulting accurate calculated results when you perform mathematical operations on such numbers. The following question/answers may provide you with some useful pointers: C++ fixed point library?

in c++
float f = 1.00000001;
support only 6 digits after decimal point
float f = 1.000001;
if you want more real calculation use double

Why doesn't my C++ program calculate more digits of 'e'?

I recently picked up the c++ programming language, and I'm trying to calculate the digits of 'e' for a Calculus Project at school. I'll paste the pgoram that I've written below. It's based on e= lim(1+1/x)^x, as x-> infinity. In this program, I set x=100,000. I also set x=1,000,000 and noticed that the answers are somehow being subjected to a round-off error, instead of becoming longer in length.
Program:
#include <iostream>
#include <math.h>
using namespace std;
long double sum;
int main()
{
long double x=100000;
sum= (pow((1+(1/x)),(x)));
cout<<sum;
}
Any tips/ advice in making it print out more digits would be great. Thanks in advance.

On the first hand long double is limited in the number of digits it can produce, and because of how the real numbers are implemented it won't produce exact results.
But, to answer your question you can set cout's precision by doing
cout.precision(15);
cout << sum;
Also see this answer for more explanations and details see
How do I print a double value with full precision using cout?

Double in c++ is a floating point number. For accurate calculation like this you need use decimal number.
See this answer about decimal in cpp

C++ calculating more precise than double or long double

I'm teaching myself C++ and on this practice question it asks to write code that can calculate PI to >30 digits. I learned that double / long double are both 16 digits precise on my computer.
I think the lesson of this question is to be able to calculate precision beyond what is available. Therefore how do I do this? Is it possible?
my code for calculating Pi right now is
#include "stdafx.h"
#include <iostream>
#include <math.h>
#include <iomanip>
using namespace std;
int main(){
double pi;
pi = 4*atan(1.0);
cout<<setprecision(30)<<pi;
return 0;
}
Output is to 16 digits and pi to 30 digits is listed below for comparison.
3.1415926535897931
3.141592653589793238462643383279
Any suggestions for increasing precision or is this something that won't matter ever? Alternatively if there is another lesson you think I should be learning here feel free to offer it. Thank you!

You will need to perform the calculation using some other method than floating point. There are libraries for doing "long math" such as GMP.
If that's not what you're looking for, you can also write code to do this yourself. The simplest way is to just use a string, and store a digit per character. Do the math just like you would do if you did it by hand on paper. Adding numbers together is relatively easy, so is subtracting. Doing multiplication and division is a little harder.
For non-integer numbers, you'll need to make sure you line up the decimal point for add/subtract...
It's a good learning experience to write that, but don't expect it to be something you knock up in half an hour without much thought [add and subtract, perhaps!]

You can use quad math, builtin type __float128 and q/Q suffixes in GCC/clang.
#include <stdio.h>
#include <quadmath.h>
int main ()
{
__float128 x = strtoflt128("1234567891234567891234567891234566", nullptr);
auto y = 1.0q;
printf("%.Qf", x + y); // there is quadmath_snprintf, but this also works fine
return 0;
}

Using a long double or just a double for calculating pi?

I'm calculating pi using a long winded formula. I'm trying to get more familiar with floating point numbers etc. I have a working program that uses doubles. The problem with my code is:
If I use a double, pi is only accurate to the 7th decimal place. I can't get it to be any more accurate.
If I use a long double, pi is accurate up to the 9th decimal place however the code takes much longer to run. If I check for precision for less than 0.00000001 using a long double, pi returns a value of 9.4246775. I assume that this is due to the long double.
My question is what is the most accurate variable type? How could I change my code to improve the precision of pi?
Here is my code:
#include <iomanip>
#include <cstdlib>
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
double arctan;
double pi;
double precision;
double previous=0;
int y=3;
int loopcount=0;
cout<<"Start\n";
arctan=1-(pow(1,y)/y);
do
{
y=y+2;
arctan=arctan+(pow(1,y)/y);
y=y+2;
arctan=arctan-(pow(1,y)/y);
pi=4*(arctan);
// cout<<"Pi is: ";
// cout<<setprecision(12)<<pi<<endl;
precision=(pi*(pow(10,10)/10));
loopcount++;
if(precision-previous<0.000000001)
break;
previous=precision;
}
while(true);
cout<<"Pi is:"<<endl;
cout<<setprecision(11)<<pi<<endl;
cout<<"Times looped:"<<endl;
cout<<loopcount<<endl;
return 0;
}

You can get the max limits of doubles/long doubles from std::numeric_limits
#include <iostream>
#include <limits>
int main()
{
std::cout << " Double::digits10: " << std::numeric_limits<double>::digits10 << "\n";
std::cout << "Long Double::digits10: " << std::numeric_limits<long double>::digits10 << "\n";
}
On my machine this gives:
Double::digits10: 15
Long Double::digits10: 18
So I expect long double to be accurate to 18 digits.
The definition of this term can be found here:
http://www.cplusplus.com/reference/std/limits/numeric_limits/
Standard quote: 18.3.2 Numeric limits [limits]
Also Note: As the comment is way down in the above list:
That #sarnold is incorrect (though mysteriously he has two silly people up-voting his comment without checking) in his assertions on pow(). What he states is only applicable to C. C++ has overloads for the types because in C++ pow() is a template function. See: http://www.cplusplus.com/reference/clibrary/cmath/pow/ in the standard at 26.4.7 complex value operations [complex.value.ops]

The predefined floating-point type with the greatest precision is long double.
There are three predefined floating-point types:
float has at least 6 decimal digits of precision
double has at least 10, and at least as many as float
long double has at least 10, and at least as many as double
These are minimum requirements; any or all of these types could have more precision.
If you need more precision than long double can provide, you might look at GMP, which supports arbitrary precision (at considerable expense in speed and memory usage).

Or, you could just hard-code the digits of PI and see what happens. ^_^
http://www.joyofpi.com/pi.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Loss of precision while working with double - c++

double usually has 11bit for exp (-1022~1023 normalized), 52bit for fact and 1bit for sign. Thus 11^105 cannot be represented accurately. For more explanation, see IEEE 754 on Wikipedia

Related

C++ float vs double cout setprecision oddities(newbie)

c++ precision issue in storing floating point numbers

Why doesn't my C++ program calculate more digits of 'e'?

C++ calculating more precise than double or long double

Using a long double or just a double for calculating pi?

Categories

Resources