Simple Floating Point Calculations Produce Different Results

Simple Floating Point Calculations Produce Different Results - c++

I'm using Visual Studio 2008 and have encountered a case where the same code in two different versions of the same class library apparently produces different answers.
The following code reflects my attempts to reproduce the problem. It simply takes two pairs of numbers, (cx, cy) and (rx, ry), subtracts them, and displays the results (tx, ty) in decimal and hex format.
#include <iostream>
#include <fstream>
#include <sstream>
#include <iomanip>
using namespace std;
string dhex(double x) { // double to hex
union {
unsigned long long n;
double d;
} value;
value.d = x;
std::ostringstream buf;
buf << "0x" << std::hex << std::setfill('0') << std::setw(16) << value.n;
return buf.str();
}
double i64tod(unsigned long long n) { // hex to double
double *DP = (double *) &n;
return *DP;
}
int main(int argc, char **argv) {
double tx, ty, cx, cy, rx, ry;
cx = i64tod(0x3fb63f141205bc02); cy = i64tod(0x40019eb851eb851f);
rx = i64tod(0x3fa222fa84a5161c); ry = i64tod(0x40011f8441720667);
tx = cx - rx;
ty = cy - ry;
cout << setprecision(22);
cout << " cx = " << setw(22) << cx << ", cy = " << setw(22) << cy
<< " (" << dhex(cx) << ", " << dhex(cy) << ")" << endl;
cout << " rx = " << setw(22) << rx << ", ry = " << setw(22) << ry
<< " (" << dhex(rx) << ", " << dhex(ry) << ")" << endl;
cout << " tx = " << setw(22) << tx << ", ty = " << setw(22) << ty
<< " (" << dhex(tx) << ", " << dhex(ty) << ")" << endl;
return 0;
}
The output from this code is:
cx = 0.086900000000000005, cy = 2.2025000000000001 (0x3fb63f141205bc02, 0x40019eb851eb851f)
rx = 0.035423115436554492, ry = 2.1403889763758346 (0x3fa222fa84a5161c, 0x40011f8441720667)
tx = 0.051476884563445513, ty = 0.06211102362416554 (0x3faa5b2d9f6661e8, 0x3fafcd041e5fae00)
The application with the problem has a class library which is a "pool engine", ie. it simulates the physics of pool/snooker shots.
A recent set of changes to that library resulted in a failure of a verification test, which involves a particular sequence of shots that should produce a particular table state.
I tracked it down to that simple subtraction operation. When the new version executes the subtraction, it produces:
cx = 0.086900000000000005, cy = 2.2025000000000001 (0x3fb63f141205bc02, 0x40019eb851eb851f)
rx = 0.035423115436554492, ry = 2.1403889763758346 (0x3fa222fa84a5161c, 0x40011f8441720667)
tx = 0.051476884563445513, ty = ?0.062111023624165547 (0x3faa5b2d9f6661e8, 0x3fafcd041e5fae01)
As you can see the difference is in the least significant bit (of ty), but sometimes more than that. This is the result of another pair of point values:
cx = 1.0641, cy = -0.0545 (0x3ff1068db8bac711, 0xbfabe76c8b439581)
rx = 1.0878512271746064, ry = 0.022594280641953058 (0x3ff167d6b03a0dee, 0x3f9722f481bc3f0d)
tx = -0.023751227174606315, ty = -0.077094280641953061 (0xbf98523ddfd1b740, 0xbfb3bc736610da84)
The program above, and the older library version agree on this case too, but the new library version gives a different answer (for tx) in this case:
tx = -0.023751227174606343, ty = -0.077094280641953061 (0xbf98523ddfd1b748, 0xbfb3bc736610da84)
Both versions of the library have the same compiler options, and the function in question (a collision handler) was not altered by the changes (I was just simplifying the library by replacing a class that had no methods with a struct).
I should add that when the new version gets a different result, at least it is consistent, ie. it ALWAYS produces those different answers.
I could simply accept the new version and replace the verification test with one that matches, but I would really like to know why this is happening.
For the record, I've tried to get the standalone program above to give the "alternative" answers by changing compiler options, etc, but no luck. It only fails in situ, in the new library version.

Related

Creating a C++ program to solve an equation of motion using Euler's method

I am trying to compute the time history of the velocity described by the equation:
dV/dt = g − (C_d/m) * V^2. g = 9.81, m = 1.0, and C_d = 1.5.
To do this I need to create a program in c++ that uses the Euler explicit method to numerically solve the equation. I am trying to find the velocity from t = 0 to t = 1 seconds with three different step sizes of delta_t = 0.05, 0.1, and 0.2 seconds. And then you are supposed to show your percent error to the analytical solution given as: V(t) = sqrt((m*g)/C_d) * tanh(sqrt((g*C_d)/m) * t).
My problem is I am not sure how to iterate through Euler's method multiple times with different time intervals. So far I have solved the analytical equation, but am unsure where to go from here. If anyone could help point me in the right direction it would be greatly appreciated.
#include <iomanip>
#include <cmath>
#include <math.h>
using namespace std;
int main() {
double m = 1.0; // units in [kg]
double g = 9.81; // units in [m/s^2]
double C_d = 1.5; // units in [kg/m]
double t; // units in [s]
double v; // units in [m/s]
cout << "The velocity will be examined from the time t = 0 to t = 1 seconds." << endl;
cout << "Please select either 0.05, 0.1, or 0.2 to be the time interval:" << endl;
cin >> t;
cout << "You have chosen the time interval of: " << t << " seconds." << endl;
v = sqrt((m * g) / C_d) * tanh(sqrt((g * C_d) / m) * t);
cout << "The velecity at a time of "<< t << " seconds is equal to: " << v << " m/s." << endl;
return 0;
} ```

If you want to iterate over t with increments of A, calculating the result of the formula with each t, you would write a for loop.
#include <iostream>
int main()
{
double m = 1.0; // units in [kg]
double g = 9.81; // units in [m/s^2]
double C_d = 1.5; // units in [kg/m]
std::cout << "The velocity will be examined from the time t = 0 to t = 1 seconds." << std::endl;
std::cout << "Please select the time interval:" << std::endl;
std::cout << "1: 0.05" << std::endl;
std::cout << "2: 0.1" << std::endl;
std::cout << "3: 0.2" << std::endl;
double A = 0; // increment in for loop
int x;
std::cin >> x;
switch (x) { // check what the input is equal to
case 1: A = 0.05; break;
case 2: A = 0.1; break;
case 3: A = 0.2; break;
default: std::cout << "Unknown option!" << std::endl; return 1;
}
std::cout << "You have chosen the time interval of: " << A << " seconds." << std::endl;
std::cout << "Results of V(t):" << std::endl;
// this initializes a variable t as 0,
//and while t is lower than or equal to 1,
//it will increment it by a and execute the logic within the scope of the loop.
for (double t = 0; t < (1 + A); t += A) {
std::cout << "at t = " << t << ": " << sqrt((m*g) / C_d) * tanh(sqrt((g*C_d) / m) * t) << std::endl;
}
return 0;
}
Refer to https://beginnersbook.com/2017/08/cpp-for-loop/ for more information. Note: I've also introduced a switch statement into the code to prevent unknown values from being input. https://beginnersbook.com/2017/08/cpp-switch-case/

sparse matrix-matrix / matrix-vector multiplication c++

I'm using eigen3 package in c++ to do some linear algebra, but one part of the code which includes some matrix-matrix and matrix-vector multiplications takes too long. my matrices and vectors are pretty big (order 20kx20k) but some are sparse. what I read from eigen documentation, it is designed to be working efficiently with sparse matrices. I don't know what I'm doing wrong or how I can improve it. Would appreciate any help.
Here is part of the code; we have n input data for which we calculate 'k' from a function and for each point we need to find a 'mean' value defined in the code:
#pragma omp parallel for ordered schedule(dynamic)
for (unsigned long n = 0; n < nNew; n++) {
SparseVector<double> kJ(totalJ);
double k = something; #calculates using a function
for(int i=0; i<totalJ; i++) {
double covTmp = xxx; #calculates using a function
kJ.insert(i) = covTmp;
}
SparseVector<double> CJikJ(totalJ);
CJikJ = CJi * kJ;
double kJTCJikJ = kJ.transpose().dot(CJikJ);
double mu = 1. / (k - kJTCJikJ);
SparseVector<double> mJ(totalJ);
mJ= -mu * CJikJ;
SparseMatrix<double> MJi(totalJ, totalJ);
MJi = CJ - kJ*kJ.transpose()*mu / (1. + mu * kJTCJikJ);
SparseMatrix<double> VNGJMJiGJTi(nstars, nstars);
VNGJMJiGJTi = invertMatrix(VN + GJ * (MJi * GJT), nstars);
SparseMatrix<double> RJi(totalJ, totalJ);
RJi = MJi - MJi * GJT * (VNGJMJiGJTi) * (GJ * MJi); ## this line takes too long
RJi.prune(prunelim);
SparseVector<double> RJimJ;
RJimJ = RJi*mJ;
double alpha = mu - mJ.dot(RJimJ);
double beta = AN.dot((VNi * GJ) * RJimJ);
double mean = -beta / alpha;
outfile << setprecision(8) << newposm[n][0] << ", " << newposm[n][1] << ", " << newposm[n][2] << ", " << alpha << ", " << beta << ", " << mean << ", " << variance << "\n";
if(params.vb) {
cout << setprecision(8) << "# l, b, dist, alpha, beta, mean, var" << endl;
cout << setprecision(8) << newposm[n][0] << ", " << newposm[n][1] << ", " << newposm[n][2] << ", " << alpha << ", " << beta << ", " << mean << ", " << variance << "\n";
}
}

How can I justify my cursor to line up with my output colums in a terminal program?

#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
const double PI = 3.14159;
double rad = 0;
double area = 0;
double vol = 0;
int areaPi = 0;
int volPi = 0;
cout << setprecision(5) << fixed;
cout << setw(38) << left << "Enter radius for the sphere: " << right);
cin >> rad;
area = (4 * PI * (rad * rad));
vol = ((4.0/3.0) * PI * (rad * rad * rad));
areaPi = (4 * (rad *rad));
volPi = (4 * (rad * rad * rad));
cout << right << "Surface area of the sphere: " << setw(12) << area << " (" << areaPi << "\u03C0)";
cout << "\n";
cout << "The volume of the sphere: " << setw(14) << vol << " (" << volPi << "π/3)";
cout << "\n";
return 0;
}
Hi guys. So the problem I'm having is that when you enter a value for the radius (rad) variable the cursor wants to work its way from the left to the right when the user types resulting in double digit numbers being longer than the output columns.
It looks like this when the program runs and you enter anything longer than one digit:
//Enter radius for the sphere: 17
//Surface area of the sphere: 3631.67804 (1156π)
//The volume of the sphere: 20579.50889 (19652π/3)
I would like the 7 to line up with the column below it. I tried setting the width to one less than I had before & single digits end up one space too far to the left like so:
//Enter radius for the sphere: 4
//Surface area of the sphere: 201.06176 (64π)
//The volume of the sphere: 268.08235 (256π/3)

I would store the output into a set of strings. Then you could check and manipulate the data as needed. Alternatively you could calculate the offset of spaces you'd need before printing
// convert to string for digit count
std::string output_1 = std::to_string(x);
std::string output_2 = std::to_string(y);
int o_1_2_dist = output_1.size() - output_2.size(); // difference in digits
std::string padding_1, padding_2;
if (o_1_2_dist < 0)
padding_1 = std::string(abs(o_1_2_dist), ' ');
else
padding_2 = std::string(o_1_2_dist, ' ');
std::cout << padding_1 << output_1 << '\n' << padding_2 << output_2;
you'd want to adjust on of the output strings so it doesn't count the extra bits of the number you don't care about. Maybe do output_1 = std::to_string(floor(x)); or something like that so you don't count the digits after the decimal

This can be solved by calculating the length of the input. I used c++11's to_string to convert the resulting values to strings and find out their lengths. I haven't tried how portable that is. It seems to work under linux with gcc 6.1.1., but for some reason it did not work with the input, so I changed that part as well so that the users enters a std::string which gets converted to a double afterwards.
#include <iostream>
#include <iomanip>
using namespace std;
int main()
{
const double PI = 3.14159;
double rad = 0;
double area = 0;
double vol = 0;
int areaPi = 0;
int volPi = 0;
int width_col1 = 40;
//cout.fill('.');
cout << setprecision(5) << fixed;
cout << left << setw(width_col1) << "Enter radius for the sphere: " << right;
std::string input;
cin >> input;
rad = stod(input);
area = (4 * PI * (rad * rad));
vol = ((4.0/3.0) * PI * (rad * rad * rad));
areaPi = (4 * (rad *rad));
volPi = (4 * (rad * rad * rad));
int indent = width_col1 + input.length() + 1;
cout << left << setw(indent - to_string(area).length()) << "Surface area of the sphere: " << area << " (" << areaPi << "\u03C0)" << std::endl;
cout << left << setw(indent - to_string(vol).length()) << "The volume of the sphere: " << vol << " (" << volPi << "π/3)" << std::endl;
return 0;
}
This solution resembles what C programmers would have done with printf.
I would love to learn why this did not work with the input.

C++ Simple calculation outputting 0.0000000000000000 instead of 0.003333

The calculation for dx and dy is returning 0 and I don't see what the issue is. The console seems to show all the correct values are being used.
void drawBackground()
{
double r, g, b, dx, dy, Wx, Wy, Wz;
Ray ray;
cout << "xmax: " << sceneDescription::imagePlaneXmax << " xmin: " << sceneDescription::imagePlaneXmin << endl;
cout << "ymax: " << sceneDescription::imagePlaneYmax << " ymin: " << sceneDescription::imagePlaneYmin << endl;
cout << "Iw: " << sceneDescription::Iw << " Ih: " << sceneDescription::Ih << endl;
cout << " " << endl;
dx = (sceneDescription::imagePlaneXmax - (sceneDescription::imagePlaneXmin))/sceneDescription::Iw;
dy = (sceneDescription::imagePlaneYmax - (sceneDescription::imagePlaneYmin))/sceneDescription::Ih;
std::cout << "dx: "<< boost::format("%1$.16f") % dx << " dy: "<< boost::format("%1$.16f") % dy << endl;
}
sceneDescription.h
#include <glm/glm.hpp>
using namespace glm;
class sceneDescription{
public:
static const int imagePlaneXmin = -1;
static const int imagePlaneXmax = 1;
static const int imagePlaneYmin = -1;
static const int imagePlaneYmax = 1;
static const int Iw = 600;
static const int Ih = 800;
};
Console output:
xmax: 1 xmin: -1
ymax: 1 ymin: -1
Iw: 600 Ih: 800
dx: 0.0000000000000000 dy: 0.0000000000000000

The problem is that the statement:
dx = (sceneDescription::imagePlaneXmax -
(sceneDescription::imagePlaneXmin))/sceneDescription::Iw;
will give the following result:
(1-(-1))/600 = 2/600 = 0.00 (since this is integer division).
You may want to cast the number to double.
Something like this would work:
dx = (double)(sceneDescription::imagePlaneXmax -
(sceneDescription::imagePlaneXmin)) / sceneDescription::Iw;
Since cast operator has higher priority than division, the numerator will be cast by (double) and the denominator will be cast implicitly giving the double result.
Hope that helps!

Dividing a Float by Itself Produces Very Large Integers

So I'm having what seems to me to be a very bizarre problem. I've got a crude system for applying forces to objects on 2D planes, and one of the simplest calculations seems to be causing one of my variables to overflow. I have the following line:
int ySign = m_Momentum.y / abs(m_Momentum.y);
Where Momentum has two data members, x y (m_Momentum is an SFML sf::Vector2 of floats). Now, normally the formula should always return either 1 or -1, depending on the sign of Momentum.y (unless I'm grossly mistaken).
However, it occasionally returns insanely high numbers such as -2147483648. In that particular case, the value of m_Momentum.y was 0.712165 (both values were obtained by sending to std::cout); I tried again, m_Momentum.y was -0.578988 and ySign was still -2147483648. There is a corresponding xSign that also flips out sometimes, often with the same final value. I can't confirm 100% that this is always the result, but at the moment that seems to be the case.
I'm sort of stumped as to why this is happening, and when it does, it basically invalidates my program (it instantly sends objects millions of pixels in the wrong direction). It seems logically impossible that the line above is returning such strange results.
Below is the function I am working on. Probably the wrong way to do it, but I didn't expect it to go so horribly wrong. The printout it produces reveals that all numbers look normal until the signs are printed out; one of them is invariably massive, and afterwards you see numbers like -2.727e+008 (which, as far as I'm aware, is scientific notation - i.e. -2.727 * 10 ^ 8).
///MODIFY MOMENTUM
//Reset, if necessary
if (Reset == true)
{
m_Momentum.x = 0;
m_Momentum.y = 0;
}
sf::Vector2<float> OldMoment = m_Momentum;
//Apply the force to the new momentum.
m_Momentum.x += Force.x;
m_Momentum.y += Force.y;
sf::Vector2<float> NewMoment = m_Momentum;
//Calculate total momentum.
float sqMomentum = m_Momentum.x * m_Momentum.x + m_Momentum.y * m_Momentum.y;
float tMomentum = sqrt(sqMomentum);
//Preserve signs for later use.
int xSign = m_Momentum.x / abs(m_Momentum.x);
int ySign = m_Momentum.y / abs(m_Momentum.y);
//Determine more or less the ratio of importance between x and y components
float xProp;
float yProp;
if (abs(tMomentum) > m_MaxVelocity)
{
//Get square of maximum velocity
int sqMax = m_MaxVelocity * m_MaxVelocity;
//Get proportion of contribution of each direction to velocity
xProp = (m_Momentum.x * m_Momentum.x) / sqMomentum;
yProp = (m_Momentum.y * m_Momentum.y) / sqMomentum;
//Reset such that the total does not exceed maximum velocity.
m_Momentum.x = sqrt(sqMax * xProp) * xSign;
m_Momentum.y = sqrt(sqMax * yProp) * ySign;
}
///SANITY CHECK
//Preserve old tMomentum
float tOld = tMomentum;
//Calculate current tMomentum
sqMomentum = m_Momentum.x * m_Momentum.x + m_Momentum.y * m_Momentum.y;
tMomentum = sqrt(sqMomentum);
//If it's still too high, print a report.
if (tMomentum > m_MaxVelocity)
{
std::cout << "\n\nSANITY CHECK FAILED\n";
std::cout << "-\n";
std::cout << "Old Components: " << OldMoment.x << ", " << OldMoment.y << "\n";
std::cout << "Force Components: " << Force.x << ", " << Force.y << "\n";
std::cout << "-\n";
std::cout << "New Components: " << NewMoment.x << ", " << NewMoment.y << "\n";
std::cout << "Which lead to...\n";
std::cout << "tMomentum: " << tOld << "\n";
std::cout << "-\n";
std::cout << "Found these proportions: " << xProp << ", " << yProp << "\n";
std::cout << "Using these signs: " << xSign << ", " << ySign << "\n";
std::cout << "New Components: " << m_Momentum.x << ", " << m_Momentum.y << "\n";
std::cout << "-\n";
std::cout << "Current Pos: " << m_RealPosition.x << ", " << m_RealPosition.y << "\n";
std::cout << "New Pos: " << m_RealPosition.x + m_Momentum.x << ", " << m_RealPosition.y + m_Momentum.y << "\n";
std::cout << "\n\n";
}
///APPLY FORCE
//To the object's position.
m_RealPosition.x += m_Momentum.x;
m_RealPosition.y += m_Momentum.y;
//To the sprite's position.
m_Sprite.Move(m_Momentum.x, m_Momentum.y);
Can somebody explain what's going on here?
EDIT: RedX helpfully directed me to the following post: Is there a standard sign function (signum, sgn) in C/C++? Which led me to write the following lines of code:
//Preserve signs for later use.
//int xSign = m_Momentum.x / abs(m_Momentum.x);
//int ySign = m_Momentum.y / abs(m_Momentum.y);
int xSign = (m_Momentum.x > 0) - (m_Momentum.x < 0);
int ySign = (m_Momentum.y > 0) - (m_Momentum.y < 0);
Thanks to the above, I no longer have the strange problem. For an explanation/alternative solution, see Didier's post below.

You should use fabs() instead of abs() to get the absolute value of a floating point number. If you use the integer absolute function, then the result is an integer ...
For instance, -0.5 / abs(-0.5) is treated as -0.5 / 0 which results in negative infinity (as a floating point value) that is converted to the minimum value of an int 0x80000000 = -2147483648

Taking absolute values and dividing sounds like an awful waste of cycles to me. What's wrong with
x > 0 ? 1 : -1
which you could always put in a function
template <class T>
inline int sgn(const T &x) { return x > 0 ? : 1; }

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Simple Floating Point Calculations Produce Different Results - c++

Related

Creating a C++ program to solve an equation of motion using Euler's method

sparse matrix-matrix / matrix-vector multiplication c++

How can I justify my cursor to line up with my output colums in a terminal program?

C++ Simple calculation outputting 0.0000000000000000 instead of 0.003333

Dividing a Float by Itself Produces Very Large Integers

Categories

Resources