Getting different output from seemingly identical calculations - c++

Can anyone tell me why the calculations on lines 9 and 11, which seem to be identical, produce two different outputs. I know the difference isn't that great, but I am using these values to draw lines with OpenGL and the difference is noticeable.
#include <iostream>
#include <cmath>
int main()
{
int ypos=400;
/// Output: 410.
std::cout << 400+(sin((90*3.14159)/180)*10) << std::endl;
ypos=ypos+(sin((90*3.14159)/180)*10);
/// Output: 409.
std::cout << ypos << std::endl;
return 0;
}

This is outputting a floating point number
std::cout << 400+(sin((90*3.14159)/180)*10) << std::endl;
But this is outputting an integer, so will have been truncated
std::cout << ypos << std::endl;

The real answer is somewhere around 409.9999999.
This is outputting a double and rounding to 410 because the math is all inlined:
std::cout << 400+(sin((90*3.14159)/180)*10) << std::endl;
since ypos is declared as an int, the double value is being truncated to 409 (which is the defined behavior when casting from double to int):
ypos=ypos+(sin((90*3.14159)/180)*10);
/// Output: 409.
std::cout << ypos << std::endl;
Note that you could also increase the accuracy by using a better constant for PI:
const double PI = 3.141592653589793238463;
std::cout << 400+(sin((90*PI)/180)*10) << std::endl;
but I would still store the result in a double instead of an int to avoid truncating. If you need an integer result then I would round first:
ypos += round(sin((90*PI)/180)*10);

This is an instance of GCC's most common non bug
That's how float behaves. Because of rounding error you don't get an exact result. It is extremely close to 410 for example 409.99999923. However, if you print it as a float, by default c++ round to 6 figures and thus gives you 410. In the second time, you assign it to integer. In this case, c++ doesn't perform a rounding but a truncation. This is why 409.

To keep your code stable so your OpenGL output behaves nicely, you should probably do all (or as much as possible) of your calculations using doubles so that truncation effects do not accumulate and at the end (where I assume you need an integer pixel number) either always round to generate an integer value or truncate, but not mix the two.

Related

Avoid rounding with std::setprecision [duplicate]

Can I specify setprecision to round double values when stream to std output?
ofile << std::setprecision(12) << total_run_time/TIME << "\n";
Output:
0.756247615801
ofile << std::setprecision(6)<< total_run_time/TIME << "\n";
Output:
0.756248
But I need the output as 0.756247
Thanks
There is also std::fesetround from <cfenv>, which sets the rounding direction:
#include <iostream>
#include <iomanip>
#include <cmath>
#include <cfenv>
int main () {
double runtime = 0.756247615801;
// Set rounding direction and output with some precision:
const auto prev_round = std::fegetround();
std::fesetround(FE_DOWNWARD);
std::cout << "desired: " << std::setprecision(6) << runtime << "\n";
// Restore previous rounding direction and output for testing:
std::fesetround(prev_round);
std::cout << "default: " << std::setprecision(6) << runtime << "\n";
}
(note that these are not the kind of comments I recommend, they are just for tutoring purposes)
Output:
desired: 0.756247
default: 0.756248
Important note, though: I did not find any mention in the standard, that the operator<< overloads for floating types have to honour the rounding direction.
Another approach is to defeat the rounding by subtracting, in your second case, 0.000005 from the double before outputting it:
total_run_time / TIME - 0.000005
In many ways I prefer this as it avoids the potential for integer overflow.
Multiply the result of your division by a million, convert to an integer, and divide by a million (as a double). Have the side-effect that std::setprecision is not needed for the output.
std::cout.write(std::to_string(0.756247615801).c_str(), 8);
It looks really dirty, but it works!

c++ half even rounding to x digits

Given a float, I want to round the result to 4 decimal places using half-even rounding, i.e., rounding to the next even number method. For example, when I have the following code snippet:
#include <iostream>
#include <iomanip>
int main(){
float x = 70.04535;
std::cout << std::fixed << std::setprecision(4) << x << std::endl;
}
The output is 70.0453, but I want to be 70.0454. I could not find anything in the standard library, is there any function to achieve this? If not, what would a custom function look like?
If you use float, you're kind of screwed here. There is no such value as 70.04535, because it's not representable in IEEE 754 binary floating point.
Easy demonstration with Python's decimal.Decimal class, which will try to reproduce the actual float (well, Python float is a C double, but it's the same principle) value out to 30 digits of precision:
>>> import decimal
>>> decimal.Decimal(70.04535)
Decimal('70.0453499999999991132426657713949680328369140625')
So your actual value doesn't end in a 5, it ends in 49999... (the closest to 70.04535 a C double can get; C float is even less precise); even banker's rounding would round it down. If this is important to your program, you need to use an equivalent C or C++ library that matches "human" (base-10) math expectations, e.g. libmpdec (which is what Python's decimal.Decimal uses under the hood).
I'm sure someone can improve this, but it gets the job done.
double round_p( double x, int p ){
double d = std::pow(10,p+1);
return ((x*d)+5)/d;
}
void main(int argc, const char**argv){
double x = 70.04535;
{
std::cout << "value " << x << " rounded " << round_p(x,4) << std::endl;
std::cout << "CHECK " << (bool)(round_p(x,4) == 70.0454) << std::endl;
}
}

float to int conversion going wrong (even though the float is already an int)

I was writing a little function to calculate the binomial coefficiant using the tgamma function provided by c++. tgamma returns float values, but I wanted to return an integer. Please take a look at this example program comparing three ways of converting the float back to an int:
#include <iostream>
#include <cmath>
int BinCoeffnear(int n,int k){
return std::nearbyint( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeffcast(int n,int k){
return static_cast<int>( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeff(int n,int k){
return (int) std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1));
}
int main()
{
int n = 7;
int k = 2;
std::cout << "Correct: " << std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1)); //returns 21
std::cout << " BinCoeff: " << BinCoeff(n,k); //returns 20
std::cout << " StaticCast: " << BinCoeffcast(n,k); //returns 20
std::cout << " nearby int: " << BinCoeffnear(n,k); //returns 21
return 0;
}
why is it, that even though the calculation returns a float equal to 21, 'normal' conversion fails and only nearbyint returns the correct value. What is the nicest way to implement this?
EDIT: according to c++ documentation here tgamma(int) returns a double.
From this std::tgamma reference:
If arg is a natural number, std::tgamma(arg) is the factorial of arg-1. Many implementations calculate the exact integer-domain factorial if the argument is a sufficiently small integer.
It seems that the compiler you're using is doing that, calculating the factorial of 7 for the expression std::tgamma(7+1).
The result might differ between compilers, and also between optimization levels. As demonstrated by Jonas there is a big difference between optimized and unoptimized builds.
The remark by #nos is on point. Note that the first line
std::cout << "Correct: " <<
std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1));
Prints a double value and does not perform a floating point to integer conversion.
The result of your calculation in floating point is indeed less than 21, yet this double precision value is printed by cout as 21.
On my machine (x86_64, gnu libc, g++ 4.8, optimization level 0) setting cout.precision(18) makes the results explicit.
Correct: 20.9999999999999964 BinCoeff: 20 StaticCast: 20 nearby int: 21
In this case practical to replace integer operations with floating point operations, but one has to keep in mind that the result must be integer. The intention is to use std::round.
The problem with std::nearbyint is that depending on the rounding mode it may produce different results.
std::fesetround(FE_DOWNWARD);
std::cout << " nearby int: " << BinCoeffnear(n,k);
would return 20.
So with std::round the BinCoeff function might look like
int BinCoeffRound(int n,int k){
return static_cast<int>(
std::round(
std::tgamma(n+1) /
(std::tgamma(k+1)*std::tgamma(n-k+1))
));
}
Floating-point numbers have rounding errors associated with them. Here is a good article on the subject: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
In your case the floating-point number holds a value very close but less than 21. Rules for implicit floating–integral conversions say:
The fractional part is truncated, that is, the fractional part is
discarded.
Whereas std::nearbyint:
Rounds the floating-point argument arg to an integer value in floating-point format, using the current rounding mode.
In this case the floating-point number will be exactly 21 and the following implicit conversion would return 21.
The first cout outputs 21 because of rounding that happens in cout by default. See std::setprecition.
Here's a live example.
What is the nicest way to implement this?
Use the exact integer factorial function that takes and returns unsigned int instead of tgamma.
the problem is on handling the floats.
floats cant 2 as 2 but as 1.99999 something like that.
So converting to int will drop out the decimal part.
So instead of converting to int immediately first round it to by calling the ceil function w/c declared in cmath or math.h.
this code will return all 21
#include <iostream>
#include <cmath>
int BinCoeffnear(int n,int k){
return std::nearbyint( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeffcast(int n,int k){
return static_cast<int>( ceil(std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1))) );
}
int BinCoeff(int n,int k){
return (int) ceil(std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)));
}
int main()
{
int n = 7;
int k = 2;
std::cout << "Correct: " << (std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1))); //returns 21
std::cout << " BinCoeff: " << BinCoeff(n,k); //returns 20
std::cout << " StaticCast: " << BinCoeffcast(n,k); //returns 20
std::cout << " nearby int: " << BinCoeffnear(n,k); //returns 21
std::cout << "\n" << (int)(2.9995) << "\n";
}

Function Returning Negative Value

I still have not run it through enough tests however for some reason, using certain non-negative values, this function will sometimes pass back a negative value. I have done a lot of manual testing in calculator with different values but I have yet to have it display this same behavior.
I was wondering if someone would take a look at see if I am missing something.
float calcPop(int popRand1, int popRand2, int popRand3, float pERand, float pSRand)
{
return ((((((23000 * popRand1) * popRand2) * pERand) * pSRand) * popRand3) / 8);
}
The variables are all contain randomly generated values:
popRand1: between 1 and 30
popRand2: between 10 and 30
popRand3: between 50 and 100
pSRand: between 1 and 1000
pERand: between 1.0f and 5500.0f which is then multiplied by 0.001f before being passed to the function above
Edit:
Alright so after following the execution a bit more closely it is not the fault of this function directly. It produces an infinitely positive float which then flips negative when I use this code later on:
pPMax = (int)pPStore;
pPStore is a float that holds popCalc's return.
So the question now is, how do I stop the formula from doing this? Testing even with very high values in Calculator has never displayed this behavior. Is there something in how the compiler processes the order of operations that is causing this or are my values simply just going too high?
In this case it seems that when you are converting back to an int after the function returns it is possible that you reach the maximum value of an int, my suggestion is for you to use a type that can represent a greater range of values.
#include <iostream>
#include <limits>
#include <boost/multiprecision/cpp_int.hpp>
int main(int argc, char* argv[])
{
std::cout << "int min: " << std::numeric_limits<int>::min() << std::endl;
std::cout << "int max: " << std::numeric_limits<int>::max() << std::endl;
std::cout << "long min: " << std::numeric_limits<long>::min() << std::endl;
std::cout << "long max: " << std::numeric_limits<long>::max() << std::endl;
std::cout << "long long min: " << std::numeric_limits<long long>::min() << std::endl;
std::cout << "long long max: " << std::numeric_limits<long long>::max() << std::endl;
boost::multiprecision::cpp_int bigint = 113850000000;
int smallint = 113850000000;
std::cout << bigint << std::endl;
std::cout << smallint << std::endl;
std::cin.get();
return 0;
}
As you can see here, there are other types which have a bigger range. If these do not suffice I believe the latest boost version has just the thing for you.
Throw an exception:
if (pPStore > static_cast<float>(INT_MAX)) {
throw std::overflow_error("exceeds integer size");
} else {
pPMax = static_cast<int>(pPStore);
}
or use float instead of int.
When you multiply the maximum values of each term together you get a value around 1.42312e+12 which is somewhat larger than a 32 bit integer can hold, so let's see what the standard has to say about floating point-to-integer conversions, in 4.9/1:
A prvalue of a floating point type can be converted to a prvalue of an
integer type. The conversion trun- cates; that is, the fractional part
is discarded. The behavior is undefined if the truncated value cannot
be represented in the destination type.
So we learn that for a large segment of possible result values your function can generate, the conversion back to a 32 bit integer would be undefined, which includes making negative numbers.
You have a few options here. You could use a 64 bit integer type (long or long long possibly) to hold the value instead of truncating down to int.
Alternately you could scale down the results of your function by a factor of around 1000 or so, to keep the maximal results within the range of values that a 32 bit integer could hold.

Output precision is higher than double precision

I am printing some data from a C++ program to be processed/visualized by ParaView, but I am having a problem with floating point numbers. Paraview supports both Float32 and Float64 data types. Float64 is equivalent to double with the typical limits +/-1.7e +/- 308. But, my code is printing numbers like 6.5e-318. This is throwing errors in ParaView when reading the data. I have verified that rounding those smalls numbers to zero make the errors in ParaView disappear. I am not sure why I have such "high precision" output, maybe is because some numbers are stored in higher precision than double. For example, the following code reproduces the same behavior on my system:
#include <iostream>
int main(void)
{
const double var1 = 1.0e-318, var2 = 1.5e-318;
std::cout << 1.0e-318 << std::endl;
std::cout << var1 << std::endl;
std::cout << var1 - var2 << std::endl;
std::cout.setf(std::ios_base::fixed | std::ios_base::scientific, std::ios_base::floatfield);
std::cout << 1.0e-318 << std::endl;
std::cout << var1 << std::endl;
std::cout << var1 - var2 << std::endl;
return 0;
}
My output is:
9.99999e-319
9.99999e-319
-4.99999e-319
9.99999e-319
9.99999e-319
-4.99999e-319
My system is a Mac OS X Snow Leopard and I tested the above with GCC 4.2 and GCC 4.6 with the flags -m32, -m64 and -ffloat-store (not sure if this is useful).
Actually the output for me is fine, but for ParaView is not. I just want to know why I have this difference. I am very likely ignoring something related with floating point numbers which could be important. Could you please please give me some clue about this output/numerical behavior for doubles?
Subnormal numbers, i.e. numbers with the smallest-possible exponent and leading zeros in the fraction, can be smaller than 1E-308, down to 1E-324. You can probably filter them out using std::numeric_limits.