I have a program and I'm trying to calculatecos(M_PI*3/2) and instead of getting 0, as I should, I get -1.83691e-016
What am I missing here? I am in radians as I need to be.
First, M_PI is not a very portable macro and is usually good to about 15 decimal places, depending on the compiler you use - my guess is you're using Microsoft's C++ compiler.
Second, if you want a more accurate (and portable) version, use the Boost Math library:
http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/math_toolkit/tutorial/non_templ.html
Third, as Kay has pointed out, pi in itself is an irrational number and therefore no amount of bits (or digits in base 10) would be enough to accurately represent it. Therefore, What you're actually calculating is not cos(3*pi/2) exactly, but "the cosine of 3/2 times the closest approximation of pi given the bits required", which will NOT be 3 *pi/2 and therefore won't be zero.
Finally, if you want custom precision for your mathematical constants, read this: http://www.boost.org/doc/libs/1_55_0/libs/math/doc/html/math_toolkit/tutorial/user_def.html
The number M_PI is only an approximation of π. The cosine that you get back is also an approximation, and it's a pretty good one - it has fifteen correct digits after the decimal point.
Given the discrete nature of double values, the standard margin of error against which to test for numerical equality is numeric_limits<double>::epsilon():
#include <iostream>
#include <limits>
#include <cmath>
using namespace std;
int main()
{
double x = cos(M_PI*3/2);
cout << "x = << " << x << endl;
cout << "numeric_limits<double>::epsilon() = "
<< numeric_limits<double>::epsilon() << endl;
cout << "Is x sufficiently close to 0? "
<< (abs(x) < numeric_limits<double>::epsilon() ? "yes" : "no") << endl;
return 0;
}
Output:
x = << -1.83697e-16
numeric_limits<double>::epsilon() = 2.22045e-16
Is x sufficiently close to 0? yes
As you can see, the absolute value of -1.83697e-16 is within the margin of error given by epsilon 2.22045e-16.
Pi is irrational, the computer cannot represent the number perfectly. The small error to the "correct" value of pi causes the error in the output. Being 1.83691 × 10-16 off is still pretty good.
If you want to learn more about the restrictions of actual system and the impact of little errors in the input, then refer to http://en.wikipedia.org/wiki/Numerical_stability.
Related
I have the following piece of code
#include <iostream>
#include <iomanip>
int main()
{
double x = 7033753.49999141693115234375;
double y = 7033753.499991415999829769134521484375;
double z = (x+ y)/2.0;
std::cout << "y is " << std::setprecision(40) << y << "\n";
std::cout << "x is " << std::setprecision(40) << x << "\n";
std::cout << "z is " << std::setprecision(40) << z << "\n";
return 0;
}
When the above code is run I get,
y is 7033753.499991415999829769134521484375
x is 7033753.49999141693115234375
z is 7033753.49999141693115234375
When I do the same in Wolfram Alpha the value of z is completely different
z = 7033753.4999914164654910564422607421875 #Wolfram answer
I am familiar with floating point precision and that large numbers away from zero can not be exactly represented. Is that what is happening here? Is there anyway in c++ where I can get the same answer as Wolfram without any performance penalty?
large numbers away from zero can not be exactly represented. Is that what is happening here?
Yes.
Note that there are also infinitely many rational numbers that cannot be represented near zero as well. But the distance between representable values does grow exponentially in larger value ranges.
Is there anyway in c++ where I can get the same answer as Wolfram ...
You can potentially get the same answer by using long double. My system produces exactly the same result as Wolfram. Note that precision of long double varies between systems even among systems that conform to IEEE 754 standard.
More generally though, if you need results that are accurate to many significant digits, then don't use finite precision math.
... without any performance penalty?
No. Precision comes with a cost.
Just telling IOStreams to print to 40 significant decimal figures of precision, doesn't mean that the value you're outputting actually has that much precision.
A typical double takes you up to 17 significant decimal figures (ish); beyond that, what you see is completely arbitrary.
Per eerorika's answer, it looks like the Wolfram Alpha answer is also falling foul of this, albeit possibly with some different precision limit than yours.
You can try a different approach like a "bignum" library, or limit yourself to the precision afforded by the types that you've chosen.
when I use fmod(0.6,0.2) in c++ it returns 0.2
I know this is caused by floating point accuracy but it seems I have to get remainder of two double this moment
thanks very much for any solutions for this kind of problem
The mathematical values 0.6 and 0.2 cannot be represented exactly in binary floating-point.
This demo program will show you what's going on:
#include <iostream>
#include <iomanip>
#include <cmath>
int main() {
const double x = 0.6;
const double y = 0.2;
std::cout << std::setprecision(60)
<< "x = " << x << "\n"
<< "y = " << y << "\n"
<< "fmod(x, y) = " << fmod(x, y) << "\n";
}
The output on my system (and very likely on yours) is:
x = 0.59999999999999997779553950749686919152736663818359375
y = 0.200000000000000011102230246251565404236316680908203125
fmod(x, y) = 0.1999999999999999555910790149937383830547332763671875
The result returned by fmod() is correct given the arguments you passed it.
If you need some other result (I presume you were expecting 0.0), you'll have to do something different. There are a number of possibilities. You can check whether the result differs from 0.2 by some very small amount, or perhaps you can do the computation using integer arithmetic (if, for example, all the numbers you're dealing with are multiples of 0.1 or of 0.01).
I think your best bet here would be to use the remainder function instead of fmod. For the given example, it will return a very small number rather than 0.2. You can use this fact to assume a remainder of 0 by rounding to a certain level of precision.
You're right, the problem is a rounding error. Try this code:
#include <stdio.h>
#include <math.h>
int main()
{
double d6 = 0.6;
double d2 = 0.2;
double d = fmod (d6, d2);
printf ("%30.20e %30.20e %30.20e\n", d6, d2, d);
}
When I run it in gcc 4.4.7 the output is:
5.99999999999999977796e-01 2.00000000000000011102e-01 1.99999999999999955591e-01
As for how to "fix" it, I don't know enough about exactly what you are trying to do to know what to suggest. There will always be numbers that cannot be exactly represented in floating point, so this kind of behavior is unavoidable.
More information about the problem domain would be helpful to get better suggestions. For example, if the numbers you are working with always just have one digit after the decimal point, and are small enough, just multiply by 10, round to the nearest integer (or long or long long), and use the % operator rather than the fmod function. If you are trying to see whether the result of fmod is 0.0, simply accept a result that is close to 0.2 (in this case) as if it were close to 0.
I'm trying to find the minimum value (closest to zero) that I can store in a single precission floating point number. Using the <limits> header I can get the value, but if I make it much smaller, the float can still hold it and it gives the right result. Here is a test program, compiled with g++ 5.3.0.
#include <limits>
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
float a = numeric_limits<float>::max();
float b = numeric_limits<float>::min();
a = a*2;
b = b/pow(2,23);
cout << a << endl;
cout << b << endl;
}
As I expected, "a" gives infinity, but "b" keeps holding the good result even after dividing the minimum value by 2^23, after that it gives 0.
The value that gives numeric_limits<float>::min() is 2^(-126) which I belive is the correct answer, but why is the float on my progam holding such small numbers?
std::numeric_limits::min for floating-point types gives the smallest non-zero value that can be represented without loss of precision. std::numeric_limits::lowest gives the smallest representable value. With IEEE representations that's a subnormal value (previously called denormalized).
From wikipedia https://en.wikipedia.org/wiki/Single-precision_floating-point_format:
The minimum positive normal value is 2^−126 ≈ 1.18 × 10^−38 and the
minimum positive (denormal) value is 2^−149 ≈ 1.4 × 10^−45.
So, for
cout << (float)pow(2,-149)
<< "-->" << (float)pow(2,-150)
<< "-->" << (float)pow(2,-151) << endl;
I'm getting:
1.4013e-45-->0-->0
I'm trying to find the minimum value (closest to zero) that I can
store in a single precission floating point number
0 is the closest value to 0 that you can store in any precision float. In fact, you can store it two ways, as there is a positive and negative 0.
I'm on Manjaro 64 bit, latest edition. HP pavilion g6, Codeblocks
Release 13.12 rev 9501 (2013-12-25 18:25:45) gcc 5.2.0 Linux/unicode - 64 bit.
There was a discussion between students on why
sn = 1/n diverges
sn = 1/n^2 converges
So decided to write a program about it, just to show them what kind of output they can expect
#include <iostream>
#include <math.h>
#include <fstream>
using namespace std;
int main()
{
long double sn =0, sn2=0; // sn2 is 1/n^2
ofstream myfile;
myfile.open("/home/Projects/c++/test/test.csv");
for (double n =2; n<100000000;n++){
sn += 1/n;
sn2 += 1/pow(n,2);
myfile << "For n = " << n << " Sn = " << sn << " and Sn2 = " << sn2 << endl;
}
myfile.close();
return 0;
}
Starting from n=9944 I got sn2 = 0.644834, and kept getting it forever. I did expect that the compiler would round the number and ignore the 0s at some point, but this is just too early, no?
So at what theoretical point does 0s start to be ignored? And what to do if you care about all 0s in a number? If long double doesn't do it, then what does?
I know it seems like a silly question but I expected to see a longer number, since you can store big part of pi in long doubles. By the way same result for double too.
The code that you wrote suffers from a classic programming mistake: it sums a sequence of floating-point numbers by adding larger numbers to the sum first and smaller numbers later.
This will inevitably lead to precision loss during addition, since at some point in the sequence the sum will become relatively large, while the next member of the sequence will become relatively small. Adding a sufficiently small floating-point value to a sufficiently large floating-point sum does not affect the sum. Once you reach that point, it will look as if the addition operation is "ignored", even though the value you attempt to add is not zero.
You can observe the same effect if you try calculating 100000000.0f + 1 on a typical machine: it still evaluates to 100000000. This does not happen because 1 somehow gets rounded to zero. This happens because the mathematically-correct result 100000001 is rounded back to 100000000. In order to force 100000000.0f to change through addition, you need to add at least 5 (and the result will be "snapped" to 100000008).
So, the issue here is not that the compiler "rounds the number when it gets so small", as you seem to believe. Your 1/pow(n,2) number is probably fine and sufficiently precise (not rounded to 0). The issue here is that at some iteration of your cycle the small non-zero value of 1/pow(n,2) just cannot affect the sum anymore.
While it is true that adjusting output precision will help you to see better what is going on (as stated in the comments), the real issue is what is described above.
When calculating sums of floating-point sequences with large differences in member magnitudes, you should do it by adding smaller members of the sequence first. Using my 100000000.0f example again, you can easily see that 4.0f + 4.0f + 100000000.0f correctly produces 100000008, while 100000000.0f + 4.0f + 4.0f is still 100000000.
You're not running into precision issues here. The sum doesn't stop at 0.644834; it keeps going to roughly the correct value:
#include <iostream>
#include <math.h>
using namespace std;
int main() {
long double d = 0;
for (double n = 2; n < 100000000; n++) {
d += 1/pow(n, 2);
}
std::cout << d << endl;
return 0;
}
Result:
0.644934
Note the 9! That's not 0.644834 any more.
If you were expecting 1.644934, you should have started the sum at n=1. If you were expecting visible changes between successive partial sums, you didn't see those because C++ is truncating the representation of the sums to 6 significant digits. You can configure your output stream to display more digits with std::setprecision from the iomanip header:
myfile << std::setprecision(9);
I am unable to understand why C++ division behaves the way it does. I have a simple program which divides 1 by 10 (using VS 2003)
double dResult = 0.0;
dResult = 1.0/10.0;
I expect dResult to be 0.1, However i get 0.10000000000000001
Why do i get this value, whats the problem with internal representation of double/float
How can i get the correct value?
Thanks.
Because all most modern processors use binary floating-point, which cannot exactly represent 0.1 (there is no way to represent 0.1 as m * 2^e with integer m and e).
If you want to see the "correct value", you can print it out with e.g.:
printf("%.1f\n", dResult);
Double and float are not identical to real numbers, it is because there are infinite values for real numbers, but only finite number of bits to represent them in double/float.
You can further read: what every computer scientist should know about floating point arithmetics
The ubiquitous IEEE754 floating point format expresses floating point numbers in scientific notation base 2, with a finite mantissa. Since a fraction like 1/5 (and hence 1/10) does not have a presentation with finitely many digits in binary scientific notation, you cannot represent the value 0.1 exactly. More generally, the only values that can be represented exactly are those that fit precisely into binary scientific notation with a mantissa of a few (e.g. 24 or 53 or 64) binary digits, and a suitably small exponent.
Working with integers, floats, and doubles could be tricky. Depends on what is your purpose. If you only want to display in nice format, then you can play with the C++ iomanipulator, precision, showpint, noshowpint. If you are trying to do precise computing with numeric methods, you may have to use some library for accurate representation. If you are multiplying a lots of small and large number, you may have to resole to use log transformations. Here is a small test:
float x=1.0000001;
cout << x << endl;
float y=9.9999999999999;
cout << "using default io format " << y/x << endl;
cout << showpoint << "using showpoint " << y/x << endl;
y=9.9999;
cout << "fewer 9 default C++ " << y/x << endl;
cout << showpoint << "fewer 9 showpoint" << y/x << endl;
1
using default io format 10
using showpoint 10.0000
fewer 9 default C++ 9.99990
fewer 9 showpoint9.99990
In special cases you want to use double (which may be the result of some complicated algorithm) to represent integer numbers, you have to figure out the proper conversion method. Once I had a situation where I want to use a single double value to store two type of values: -1, +1, or (0-1) to make my code more memory efficient (and speed, large memory tends to reduce performance). It is a little tricky to distinguish between +1 and val < 1. In this case I know that the values < 1 has a resolution say only 1/500, Then I can safely use floor(val+0.000001) to get back the 1 value that I initially stored.