I still have not run it through enough tests however for some reason, using certain non-negative values, this function will sometimes pass back a negative value. I have done a lot of manual testing in calculator with different values but I have yet to have it display this same behavior.
I was wondering if someone would take a look at see if I am missing something.
float calcPop(int popRand1, int popRand2, int popRand3, float pERand, float pSRand)
{
return ((((((23000 * popRand1) * popRand2) * pERand) * pSRand) * popRand3) / 8);
}
The variables are all contain randomly generated values:
popRand1: between 1 and 30
popRand2: between 10 and 30
popRand3: between 50 and 100
pSRand: between 1 and 1000
pERand: between 1.0f and 5500.0f which is then multiplied by 0.001f before being passed to the function above
Edit:
Alright so after following the execution a bit more closely it is not the fault of this function directly. It produces an infinitely positive float which then flips negative when I use this code later on:
pPMax = (int)pPStore;
pPStore is a float that holds popCalc's return.
So the question now is, how do I stop the formula from doing this? Testing even with very high values in Calculator has never displayed this behavior. Is there something in how the compiler processes the order of operations that is causing this or are my values simply just going too high?
In this case it seems that when you are converting back to an int after the function returns it is possible that you reach the maximum value of an int, my suggestion is for you to use a type that can represent a greater range of values.
#include <iostream>
#include <limits>
#include <boost/multiprecision/cpp_int.hpp>
int main(int argc, char* argv[])
{
std::cout << "int min: " << std::numeric_limits<int>::min() << std::endl;
std::cout << "int max: " << std::numeric_limits<int>::max() << std::endl;
std::cout << "long min: " << std::numeric_limits<long>::min() << std::endl;
std::cout << "long max: " << std::numeric_limits<long>::max() << std::endl;
std::cout << "long long min: " << std::numeric_limits<long long>::min() << std::endl;
std::cout << "long long max: " << std::numeric_limits<long long>::max() << std::endl;
boost::multiprecision::cpp_int bigint = 113850000000;
int smallint = 113850000000;
std::cout << bigint << std::endl;
std::cout << smallint << std::endl;
std::cin.get();
return 0;
}
As you can see here, there are other types which have a bigger range. If these do not suffice I believe the latest boost version has just the thing for you.
Throw an exception:
if (pPStore > static_cast<float>(INT_MAX)) {
throw std::overflow_error("exceeds integer size");
} else {
pPMax = static_cast<int>(pPStore);
}
or use float instead of int.
When you multiply the maximum values of each term together you get a value around 1.42312e+12 which is somewhat larger than a 32 bit integer can hold, so let's see what the standard has to say about floating point-to-integer conversions, in 4.9/1:
A prvalue of a floating point type can be converted to a prvalue of an
integer type. The conversion trun- cates; that is, the fractional part
is discarded. The behavior is undefined if the truncated value cannot
be represented in the destination type.
So we learn that for a large segment of possible result values your function can generate, the conversion back to a 32 bit integer would be undefined, which includes making negative numbers.
You have a few options here. You could use a 64 bit integer type (long or long long possibly) to hold the value instead of truncating down to int.
Alternately you could scale down the results of your function by a factor of around 1000 or so, to keep the maximal results within the range of values that a 32 bit integer could hold.
Related
Consider the following:
#include <iostream>
#include <cstdint>
int main() {
std::cout << std::hex
<< "0x" << std::strtoull("0xFFFFFFFFFFFFFFFF",0,16) << std::endl
<< "0x" << uint64_t(double(std::strtoull("0xFFFFFFFFFFFFFFFF",0,16))) << std::endl
<< "0x" << uint64_t(double(uint64_t(0xFFFFFFFFFFFFFFFF))) << std::endl;
return 0;
}
Which prints:
0xffffffffffffffff
0x0
0xffffffffffffffff
The first number is just the result of converting ULLONG_MAX, from a string to a uint64_t, which works as expected.
However, if I cast the result to double and then back to uint64_t, then it prints 0, the second number.
Normally, I would attribute this to the precision inaccuracy of floats, but what further puzzles me, is that if I cast the ULLONG_MAX from uint64_t to double and then back to uint64_t, the result is correct (third number).
Why the discrepancy between the second and the third result?
EDIT (by #Radoslaw Cybulski)
For another what-is-going-on-here try this code:
#include <iostream>
#include <cstdint>
using namespace std;
int main() {
uint64_t z1 = std::strtoull("0xFFFFFFFFFFFFFFFF",0,16);
uint64_t z2 = 0xFFFFFFFFFFFFFFFFull;
std::cout << z1 << " " << uint64_t(double(z1)) << "\n";
std::cout << z2 << " " << uint64_t(double(z2)) << "\n";
return 0;
}
which happily prints:
18446744073709551615 0
18446744073709551615 18446744073709551615
The number that is closest to 0xFFFFFFFFFFFFFFFF, and is representable by double (assuming 64 bit IEEE) is 18446744073709551616. You'll find that this is a bigger number than 0xFFFFFFFFFFFFFFFF. As such, the number is outside the representable range of uint64_t.
Of the conversion back to integer, the standard says (quoting latest draft):
[conv.fpint]
A prvalue of a floating-point type can be converted to a prvalue of an integer type.
The conversion truncates; that is, the fractional part is discarded.
The behavior is undefined if the truncated value cannot be represented in the destination type.
Why the discrepancy between the second and the third result?
Because the behaviour of the program is undefined.
Although it is mostly pointless to analyse reasons for differences in UB because the scope of variation is limitless, my guess at the reason for the discrepancy in this case is that in one case the value is compile time constant, while in the other there is a call to a library function that is invoked at runtime.
I am facing with following issue.
when I multiply two numbers depending from values of this numbers I get different results. I tried to experiment with types but didn't get expected result.
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <iomanip>
#include <math.h>
int main()
{
const double value1_39 = 1.39;
const long long m_100000 = 100000;
const long long m_10000 = 10000;
const double m_10000double = 10000;
const long long longLongResult_1 = value1_39 * m_100000;
const double doubleResult_1 = value1_39 * m_100000;
const long long longLongResult_2 = value1_39 * m_10000;
const double doubleResult_2 = value1_39 * m_10000;
const long long longLongResult_3 = value1_39 * m_10000double;
const double doubleResult_3 = value1_39 * m_10000double;
std::cout << std::setprecision(6) << value1_39 << '\n';
std::cout << std::setprecision(6) << longLongResult_1 << '\n';
std::cout << std::setprecision(6) << doubleResult_1 << '\n';
std::cout << std::setprecision(6) << longLongResult_2 << '\n';
std::cout << std::setprecision(6) << doubleResult_2 << '\n';
std::cout << std::setprecision(6) << longLongResult_3 << '\n';
std::cout << std::setprecision(6) << doubleResult_3 << '\n';
return 0;
}
result seen in debuger
Variable Value
value1_39 1.3899999999999999
m_100000 100000
m_10000 10000
m_10000double 10000
longLongResult_1 139000
doubleResult_1 139000
longLongResult_2 13899
doubleResult_2 13899.999999999998
longLongResult_3 13899
doubleResult_3 13899.999999999998
result seen in cout
1.39
139000
139000
13899
13900
13899
13900
I know that the problem is that the problem is in nature of keeping floating point format in computer. It keeps data as a fractions in base 2.
My question is how to get 1.39 * 10 000 as 13900?(because I am getting 139000 when multipling with 100000 the same value) is there any trick which can help to achieve my goal?
I have some ideas in my mind bunt not sure are they good enough.
1) pars string to get number from left of . and rigth of doth
2) multiply number by 100 and divide by 100 when calculation is done, but each of this solutions has their drawback. I am wondering is there any nice trick for this.
As the comments already said, no there is no solution. This problem is due to the nature of floating points being stored as base 2 (as you already said). The type floating point is defined in IEEE 754. Everything that is not a base two number can't be stored precisely in base 2.
To be more specific
You CAN store:
1.25 (2^0 + 2^-2)
0.75 (2^-1 + 2^-2)
because there is an exact representation.
You CAN'T store:
1.1
1.4
because this will result in an irrational fracture in the base 2 system. You can try to round or use a sort of arbitrary precision float point library (but even they have their limits [memory/speed]) with a much greater precision than float and then backcast to float after multiplication.
There are also a lot of other related problems when it comes to floating points. You will find out that the result of 10^20 + 2 is only 10^20 because you have a fixed digit resolution (6-7 digits for float and 15-16 digits for double). When you calculate with numbers that have huge differences in magnitude the smaller ones will just "disappear".
Question: Why does multiply 1.39 * 10^6 get 139000 but multiplying 1.39 * 10^5 not?
This could be because of the order of magnitude. 10000 has 5 digits, 1.39 has 3 digits (distance 7 - just within the float). Both could be near enough to "show" the problem. When it comes to 100000 you have 6 digits but you have one more magnitude difference to 1.39 (distance 8 - just out of float). Therefore one of the trailing digits gets cut off and you get a more "natural" result. (This is just one reason for this. Compiler, OS and other reasons might exist)
I was writing a little function to calculate the binomial coefficiant using the tgamma function provided by c++. tgamma returns float values, but I wanted to return an integer. Please take a look at this example program comparing three ways of converting the float back to an int:
#include <iostream>
#include <cmath>
int BinCoeffnear(int n,int k){
return std::nearbyint( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeffcast(int n,int k){
return static_cast<int>( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeff(int n,int k){
return (int) std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1));
}
int main()
{
int n = 7;
int k = 2;
std::cout << "Correct: " << std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1)); //returns 21
std::cout << " BinCoeff: " << BinCoeff(n,k); //returns 20
std::cout << " StaticCast: " << BinCoeffcast(n,k); //returns 20
std::cout << " nearby int: " << BinCoeffnear(n,k); //returns 21
return 0;
}
why is it, that even though the calculation returns a float equal to 21, 'normal' conversion fails and only nearbyint returns the correct value. What is the nicest way to implement this?
EDIT: according to c++ documentation here tgamma(int) returns a double.
From this std::tgamma reference:
If arg is a natural number, std::tgamma(arg) is the factorial of arg-1. Many implementations calculate the exact integer-domain factorial if the argument is a sufficiently small integer.
It seems that the compiler you're using is doing that, calculating the factorial of 7 for the expression std::tgamma(7+1).
The result might differ between compilers, and also between optimization levels. As demonstrated by Jonas there is a big difference between optimized and unoptimized builds.
The remark by #nos is on point. Note that the first line
std::cout << "Correct: " <<
std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1));
Prints a double value and does not perform a floating point to integer conversion.
The result of your calculation in floating point is indeed less than 21, yet this double precision value is printed by cout as 21.
On my machine (x86_64, gnu libc, g++ 4.8, optimization level 0) setting cout.precision(18) makes the results explicit.
Correct: 20.9999999999999964 BinCoeff: 20 StaticCast: 20 nearby int: 21
In this case practical to replace integer operations with floating point operations, but one has to keep in mind that the result must be integer. The intention is to use std::round.
The problem with std::nearbyint is that depending on the rounding mode it may produce different results.
std::fesetround(FE_DOWNWARD);
std::cout << " nearby int: " << BinCoeffnear(n,k);
would return 20.
So with std::round the BinCoeff function might look like
int BinCoeffRound(int n,int k){
return static_cast<int>(
std::round(
std::tgamma(n+1) /
(std::tgamma(k+1)*std::tgamma(n-k+1))
));
}
Floating-point numbers have rounding errors associated with them. Here is a good article on the subject: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
In your case the floating-point number holds a value very close but less than 21. Rules for implicit floating–integral conversions say:
The fractional part is truncated, that is, the fractional part is
discarded.
Whereas std::nearbyint:
Rounds the floating-point argument arg to an integer value in floating-point format, using the current rounding mode.
In this case the floating-point number will be exactly 21 and the following implicit conversion would return 21.
The first cout outputs 21 because of rounding that happens in cout by default. See std::setprecition.
Here's a live example.
What is the nicest way to implement this?
Use the exact integer factorial function that takes and returns unsigned int instead of tgamma.
the problem is on handling the floats.
floats cant 2 as 2 but as 1.99999 something like that.
So converting to int will drop out the decimal part.
So instead of converting to int immediately first round it to by calling the ceil function w/c declared in cmath or math.h.
this code will return all 21
#include <iostream>
#include <cmath>
int BinCoeffnear(int n,int k){
return std::nearbyint( std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)) );
}
int BinCoeffcast(int n,int k){
return static_cast<int>( ceil(std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1))) );
}
int BinCoeff(int n,int k){
return (int) ceil(std::tgamma(n+1) / (std::tgamma(k+1)*std::tgamma(n-k+1)));
}
int main()
{
int n = 7;
int k = 2;
std::cout << "Correct: " << (std::tgamma(7+1) / (std::tgamma(2+1)*std::tgamma(7-2+1))); //returns 21
std::cout << " BinCoeff: " << BinCoeff(n,k); //returns 20
std::cout << " StaticCast: " << BinCoeffcast(n,k); //returns 20
std::cout << " nearby int: " << BinCoeffnear(n,k); //returns 21
std::cout << "\n" << (int)(2.9995) << "\n";
}
Can anyone tell me why the calculations on lines 9 and 11, which seem to be identical, produce two different outputs. I know the difference isn't that great, but I am using these values to draw lines with OpenGL and the difference is noticeable.
#include <iostream>
#include <cmath>
int main()
{
int ypos=400;
/// Output: 410.
std::cout << 400+(sin((90*3.14159)/180)*10) << std::endl;
ypos=ypos+(sin((90*3.14159)/180)*10);
/// Output: 409.
std::cout << ypos << std::endl;
return 0;
}
This is outputting a floating point number
std::cout << 400+(sin((90*3.14159)/180)*10) << std::endl;
But this is outputting an integer, so will have been truncated
std::cout << ypos << std::endl;
The real answer is somewhere around 409.9999999.
This is outputting a double and rounding to 410 because the math is all inlined:
std::cout << 400+(sin((90*3.14159)/180)*10) << std::endl;
since ypos is declared as an int, the double value is being truncated to 409 (which is the defined behavior when casting from double to int):
ypos=ypos+(sin((90*3.14159)/180)*10);
/// Output: 409.
std::cout << ypos << std::endl;
Note that you could also increase the accuracy by using a better constant for PI:
const double PI = 3.141592653589793238463;
std::cout << 400+(sin((90*PI)/180)*10) << std::endl;
but I would still store the result in a double instead of an int to avoid truncating. If you need an integer result then I would round first:
ypos += round(sin((90*PI)/180)*10);
This is an instance of GCC's most common non bug
That's how float behaves. Because of rounding error you don't get an exact result. It is extremely close to 410 for example 409.99999923. However, if you print it as a float, by default c++ round to 6 figures and thus gives you 410. In the second time, you assign it to integer. In this case, c++ doesn't perform a rounding but a truncation. This is why 409.
To keep your code stable so your OpenGL output behaves nicely, you should probably do all (or as much as possible) of your calculations using doubles so that truncation effects do not accumulate and at the end (where I assume you need an integer pixel number) either always round to generate an integer value or truncate, but not mix the two.
Out of nowhere I get quite a big result for this function... It should be very simple, but I can't see it now.
double prob_calculator_t::pimpl_t::B_full_term() const
{
double result = 0.0;
for (uint32_t j=0, j_end=U; j<j_end; j++)
{
uint32_t inhabited_columns = doc->row_sums[j];
// DEBUG
cout << "inhabited_columns: " << inhabited_columns << endl;
cout << "log_of_sum[j]: " << log_of_sum[j] << endl;
cout << "sum_of_log[j]: " << sum_of_log[j] << endl;
// end DEBUG
result += ( -inhabited_columns * log( log_of_sum[j] ) + sum_of_log[ j ] );
cout << "result: " << result << endl;
}
return result;
}
and where is the trace:
inhabited_columns: 1
log_of_sum[j]: 110.56
sum_of_log[j]: -2.81341
result: 2.02102e+10
inhabited_columns: 42
log_of_sum[j]: 110.56
sum_of_log[j]: -143.064
result: 4.04204e+10
Thanks for the help!
inhabited_columns is unsigned and I see a unary - just before it: -inhabited_columns.
(Note that unary - has a really high operator precedence; higher than * etc).
That is where your problem is! To quote Mike Seymour's answer:
When you negate it, the result is still unsigned; the value is reduced
modulo 232 to give a large positive value.
One fix would be to write
-(inhabited_columns * log(log_of_sum[j]))
as then the negation will be carried out in floating point
inhabited_columns is an unsigned type. When you negate it, the result is still unsigned; the value is reduced modulo 232 to give a large positive value.
You should change it to a sufficiently large signed type (maybe int32_t, if you're not going to have more than a couple of billion columns), or perhaps double since you're about to use it in double-precision arithmetic.