Same floating point operation, different results

Same floating point operation, different results - c++

I really can't wrap my head around the fact that this code gives 2 results for the same formula:
#include <iostream>
#include <cmath>
int main() {
// std::cout.setf(std::ios::fixed, std::ios::floatfield);
std::cout.precision(20);
float a = (exp(M_PI) - M_PI);
std::cout << (exp(M_PI) - M_PI) << "\n";
std::cout << a << "\n";
return (0);
}
I don't really think that the IEEE 754 floating point representation is playing a significant role here ...

The first expression (namely (exp(M_PI) - M_PI)) is a double, the second expression (namely a) is a float. Neither even have 20 decimal digits of precision, but the float has a lot less precision than the double.

Because M_PI are of type double, so change a to double, you will have the same result:
#include <iostream>
#include <cmath>
int main() {
// std::cout.setf(std::ios::fixed, std::ios::floatfield);
std::cout.precision(20);
double a = (exp(M_PI) - M_PI);
std::cout << (exp(M_PI) - M_PI) << "\n";
std::cout << a << "\n";
return (0);
}

Related

How do you use setprecision() when declaring a double variable in C++?

So I'm trying to learn more about C++ and I'm practicing by making a calculator class for the quadratic equation. This is the code for it down below.
#include "QuadraticEq.h"
string QuadraticEq::CalculateQuadEq(double a, double b, double c)
{
double sqrtVar = sqrt(pow(b, 2) - (4 * a * c));
double eqPlus = (-b + sqrtVar)/(2 * a);
double eqMinus = (-b - sqrtVar) / (2 * a);
return "Your answers are " + to_string(eqPlus) + " and " + to_string(eqMinus);
}
I'm trying to make it so that the double variables eqPlus and eqMinus have only two decimal points. I've seen people say to use setprecision() but I've only seen people use that function in cout statements and there are none in the class because I'm not printing a string out I'm returning one. So what would I do here? I remember way before learning about some setiosflags() method, is there anything I can do with that?

You can use stringstream instead of the usual std::cout with setprecision().
#include <iostream>
#include <string>
#include <sstream>
#include <iomanip>
std::string adjustDP(double value, int decimalPlaces) {
// change the number of decimal places in a number
std::stringstream result;
result << std::setprecision(decimalPlaces) << std::fixed << value;
return result.str();
}
int main() {
std::cout << adjustDP(2.25, 1) << std::endl; //2.2
std::cout << adjustDP(0.75, 1) << std::endl; //0.8
std::cout << adjustDP(2.25213, 2) << std::endl; //2.25
std::cout << adjustDP(2.25, 0) << std::endl; //2
}
However, as seen from the output, this approach introduces some rounding off errors when value cannot be represented exactly as a floating point binary number.

C++ wrong output for little numbers for function (1-exp(-x))/x

I need to find function (1-exp(-x))/x values from 10^-30 to 10^9. But for very small numbers the output is 0 but it should be 1. X is correct but there're some issues with fx, and i've not idea how to solve it.
#include <math.h>
#include <iomanip>
int main()
{
double fx ,x;
x=pow(10.0L,-30);
std::cout<<"fx\t\t\t\t"<<"x\t\t"<<std::endl;
for(double i=0;x<pow(10,9);i+=0.01)
{
std::setprecision(20);
x=pow(10.0L,-30+i);
fx =(double)((1-exp(-pow(10,-30+i)))/pow(10,-30+i));
std::cout<<x<<"\t\t\t\t\t"<<fx<<std::endl;
}
}
My expected Output is:
for x ==10^-30 fx==1
But realoutput is:
for x ==10^-30 fx==0

You are right that fx should be equal to about 1 for small x.
The issue is that double has a finite precision. Its relative error is much higher than 10^{-30}.
A mantissa of at leat 100 bits would be needed.
This is illustrated by the following relation:
(1 - (1-x)) = 0
for very small x. See the code hereafter.
In other words, because of rounding errors, the addition is no longer associative.
A workaround is to use a different formula for very small x, for example less than 10^{10} in absolute value.
#include <iostream>
#include <cmath>
#include <iomanip>
int main()
{
double fx ,x;
x = pow(10.0L,-30);
std::setprecision(20);
fx = (1.0 - exp(-x)) / x;
std::cout << x << "\t" << fx << std::endl;
fx = (1.0 - (1.0 - x))/x;
std::cout << x << "\t" << fx << std::endl;
// for very small x:
fx = 1 - x/2 + (x*x)/6;
std::cout << x << "\t" << fx << std::endl;
}
Output:
1e-30 0
1e-30 0
1e-30 1

Infinity vs NAN values

I noticed a difference in the floating point values represented for Infinity and NAN. Is this specified some where in the standard?
#include <cmath>
#include <iostream>
#include <limits>
#include <stdint.h>
union Double
{
double value;
uint64_t repr;
};
int main()
{
Double d;
d.value = std::numeric_limits<double>::infinity();
std::cout << std::hex << "inf: " << d.repr << std::endl;
d.value = std::numeric_limits<double>::quiet_NaN();
std::cout << std::hex << "NAN: " << d.repr << std::endl;
return 0;
}
Ouput:
inf: 0x7ff0000000000000
NAN: 0x7ff8000000000000

I noticed a difference in the floating point values represented for Infinity and NAN.
Yes, this is not surprising. These values differ, so their representation should differ too.
Is this specified some where in the standard?
In the C++ standard? No.
In some floating-point standard, like IEEE-754? Yes.
Note: in C++, your union trick has undefined behavior. Use memcpy instead.

What does double store?

I'm sending the value 4 *cos( fmod( acos(2.0/4.0), 2*3.14159265) ) as double to this function but I get output as
2
1k1
What is wrong here?
void convert_d_to_f(double n)
{
cout<<n<<" ";
double mantissa;
double fractional_part;
fractional_part = modf(n,&mantissa);
double x = fractional_part;
cout<<mantissa<<"k"<<fractional_part<<'\n';
}

The problem is that cout truncates and rounds double while printing. You can print the desired number of decimal places usingiomanip library.
#include <iostream>
#include <cmath>
#include <iomanip>
void convert_d_to_f(double n)
{
cout<<std::fixed<<std::setprecision(20); //number of decimal places you need to print to
cout<<n<<" ";
double mantissa;
double fractional_part;
fractional_part = modf(n,&mantissa);
double x = fractional_part;
cout<<mantissa<<"k"<<fractional_part<<'\n';
}
int main() {
convert_d_to_f(4 *cos( fmod( acos(2.0/4.0), 2*3.14159265) ));
return 0;
}

For all practical intents and purposes, your number n evaluates to 2. If you want it to display as 1.9999999... etc. then follow Kapil's solution and set the floating point precision for std::cout to many decimal places. Keep in mind the difference between precision and accuracy if you are going to go that route.
That being said, your void convert_d_to_f(double n) function is replicating the functionality of std::frexp(double arg, int* exp) with a limitation where your results are going out of scope after you print them to the screen. If you desire to use your exponent and mantissa values after computing them, then you can do it like this.
#include <iostream>
#include <cmath>
int main()
{
double n = 4 *cos( fmod( acos(2.0/4.0), 2*3.14159265) );
std::cout << "Given the number " << n << std::endl;
// convert the given floating point value `n` into a
// normalized fraction and an integral power of two
int exp;
double mantissa = std::frexp(n, &exp);
// display results as Mantissa x 2^Exponent
std::cout << "We have " << n << " = "
<< mantissa << " * 2^" << exp << std::endl;
return 0;
}

Small numerical error when calculating Weight Average

Here is a part in a Physics engine.
The simplified function centerOfMass calculates 1D-center-of-mass of two rigid bodies (demo) :-
#include <iostream>
#include <iomanip>
float centerOfMass(float pos1,float m1, float pos2,float m2){
return (pos1*m1+pos2*m2)/(m1+m2);
}
int main(){
float a=5.55709743f;
float b= centerOfMass(a,50,0,0);
std::cout << std::setprecision(9) << a << '\n'; //5.55709743
std::cout << std::setprecision(9) << b << '\n'; //5.55709696
}
I need b to be precisely = 5.55709743.
The tiny difference can, sometimes (my real case = 5%), introduces a nasty Physics divergence.
There are some ways to solve it e.g. heavily do some conditional checking.
However, it is very error-prone for me.
Question: How to solve the calculation error while keep the code clean, fast, and still easily to be maintained?
By the way, if it can't be done elegantly, I would probably need to improve the caller to be more resistant against such numerical error.
Edit
(clarify duplicate question)
Yes, the cause is the precision error from the storage/computing format (mentioned in Is floating point math broken?).
However, this question asks about how to neutralize its symptom in a very specific case.

You are trying to get 9 decimal digits of precision , but the datatype float has a precision of about 7 decimal digits.
Use double instead. (demo)

Use double, not float. IEEE 754 double has about 16 decimal places of precision.
#include <iostream>
#include <iomanip>
double centerOfMass(double pos1, double m1, double pos2, double m2) {
return (pos1*m1 + pos2 * m2) / (m1 + m2);
}
int main() {
double a = 5.55709743;
double b = centerOfMass(a, 50, 0, 0);
std::cout << std::setprecision(16) << a << '\n'; //5.55709743
std::cout << std::setprecision(16) << b << '\n'; //5.55709743
std::cout << std::setprecision(16) << (b - a) << '\n'; // 0
}
For the example given, centerOfMass(a, 50, 0, 0), the following will give exact results for all values of a, but of course the example does not look realistic.
double centerOfMass(double pos1, double m1, double pos2, double m2) {
double divisor = m1 + m2;
return pos1*(m1/divisor) + pos2*(m2/ divisor);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Same floating point operation, different results - c++

The first expression (namely (exp(M_PI) - M_PI)) is a double, the second expression (namely a) is a float. Neither even have 20 decimal digits of precision, but the float has a lot less precision than the double.

Related

How do you use setprecision() when declaring a double variable in C++?

C++ wrong output for little numbers for function (1-exp(-x))/x

Infinity vs NAN values

What does double store?

Small numerical error when calculating Weight Average

Categories

Resources