#include <iostream>
using namespace std;
int main()
{
cout.precision(32);
float val = 268433072;
float add = 13.5;
cout << "result =" << (val + add) << endl;
}
I'm compiling the above program with standard g++ main.cc
and running it with ./a.out
The ouput I receive however, is,
result =268433088
Clearly, this is not the correct answer..Why is this happening?
EDIT: This does not occur when using double in place of float
You can reproduce your "float bug" with an even simpler piece of code
#include <iostream>
using namespace std;
int main()
{
cout.precision(32);
float val = 2684330722;
cout << "result =" << val << endl;
}
The output
result =2684330752
As you can see the output does not match the value val was initialized with.
As it has been stated many times, floating-point types have limited precision. Your code sample simply exceeded that precision, so the result got rounded. There's no "bug" here.
Aside from the reference to (1991, PDF) What Every Computer Scientist Should Know About Floating-Point Arithmetic
The short answer is, that because float has limited storage (like the other primitives too) the engineers had to make a choice: which numbers to store with which precision. For the floating point formats they decided to store numbers of small magnitude precisely (some decimal digits), but numbers of large magnitude very imprecisely, in fact starting with +-16,777,217 the representable numbers are becoming so thin that not even all integers are represented which is the thing you noticed.
Related
Using the GSL (GNU Scientific Library), I'm trying to understand why gsl_vector_view_array() returns a slighly modified value after assignment.
In the code below, I declare a vector_view 'qview_test' which is linked to table q_test[0]=0.0 and display its value which is 0.0. Then, I change the value of q_test[0]=1.12348 and expecting the same value for qview_test, but it gets alterated to qview_test=1.1234800000000000341771055900608189.
How do you explain such a result ? How to replicate the result without GSL ?
#include <iostream>
#include <gsl/gsl_blas.h>
using namespace std;
double q_test[1]={0.0};
gsl_vector_view qview_test;
int nb_variable = 1;
int main()
{
qview_test=gsl_vector_view_array(q_test,nb_variable);
cout.precision(35);
cout << "qview before: " << gsl_vector_get(&qview_test.vector,0)<< endl;
// Assign value
q_test[0]=1.12348;
cout << "qview after: " << gsl_vector_get(&qview_test.vector,0) << endl;
return 0;
}
Thanks for any help,
H.Nam
This looks like floating point rounding to me.
Basically any decimal number can only have a finite precision and all numbers in between get rounded to the nearest float.
I am not familiar with gsl so I don't know why it's displaying so many digits.
In other words, to get more precise give your number more bits (128 bit float or something like that) to be represented. This will give you more precision, however you most likely won't need it.
I am trying to get the fixed points for the tent equation. With the given initial conditions, the solution must be 0.6. Everything works fine when I use float for x0, but when I define x0 as a double, the solution changes to 0.59999 in 55th iteration, which causes further changes in the next iteration and so on. Why is there such a difference while choosing the data types?
using namespace std;
#include <iostream>
main()
{
double x0=.6;
for (int i=0;i<100;i++)
{
if(x0<.5)
x0=1.5*x0;
else
x0=1.5*(1-x0);
cout << i << "\t" << x0 << endl;
}
}
I have posted an image of the results. Comparison of solutions - Float and Double
The real value of the 55-th iteration is 0.5999994832150543633275674437754787504673004150390625 when a double is used and 0.60000002384185791015625 for a float (on my system).
The difference between the two is in how precise the numbers are and the rounding is what throws you off.
BTW, neither of the two values is absolutely accurate, they are just close enough approximations and nothing more. With the double being "more precise".
UPDATE
After a few comments back and forth it turned out that there was no need for floating-point arithmetic to be gin with. Integers (slightly modified) do just fine and do not introduce any rounding.
#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For I/P:
3
5
2.98
3.16
O/P:
1
If my code is:
#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
cout<<"";//only this statement is added;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For the same input O/P is:
1
50
25
The only extra line added in 2nd code is: cout<<"";
Can anyone please help in finding why there is such a difference in output just because of the cout statement added in the 2nd code?
Well this is a veritable Heisenbug. I've tried to strip your code down to a minimal replicating example, and ended up with this (http://ideone.com/mFgs0S):
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
float n;
cin >> n; // this input is needed to reproduce, but the value doesn't matter
n = 2.98; // overwrite the input value
cout << ""; // comment this out => y = z = 149
float x = n * 50; // 149
float y = ceilf(x); // 150
cout << ""; // comment this out => y = z = 150
float z = ceilf(x); // 149
cout << "x:" << x << " y:" << y << " z:" << z << endl;
}
The behaviour of ceilf appears to depend on the particular sequence of iostream operations that occur around it. Unfortunately I don't have the means to debug in any more detail at the moment, but maybe this will help someone else to figure out what's going on. Regardless, it seems almost certain that it's a bug in gcc-4.9.2 and gcc-5.1. (You can check on ideone that you don't get this behaviour in gcc-4.3.2.)
You're probably getting an issue with floating point representations - which is to say that computers cannot perfectly represent all fractions. So while you see 50, the result is probably something closer to 50.00000000001. This is a pretty common problem you'll run across when dealing with doubles and floats.
A common way to deal with it is to define a very small constant (in mathematical terms this is Epsilon, a number which is simply "small enough")
const double EPSILON = 0.000000001;
And then your comparison will change from
if (x==ceilf(x))
to something like
double difference = fabs(x - ceilf(x));
if (difference < EPSILON)
This will smooth out those tiny inaccuracies in your doubles.
"Comparing for equality
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false."
From http://www.cygnus-software.com/papers/comparingfloats/Comparing%20floating%20point%20numbers.htm
Hope this answers your question.
Also you had a problem with
if(x==ceilf(x))
ceilf() returns a float value and x you have declared as a double.
Refer to problems in floating point comparison as to why that wont work.
change x to float and the program runs fine,
I made a plain try on my laptop and even online compilers.
g++ (4.9.2-10) gave the desired output (3 outputs), along with online compiler at geeksforgeeks.org. However, ideone, codechef did not gave the right output.
All I can infer is that online compilers name their compiler as "C++(gcc)" and give wrong output. While, geeksforgeeks.org, which names the compiler as "C++" runs perfectly, along with g++ (as tested on Linux).
So, we could arrive at a hypothesis that they use gcc to compile C++ code as a method suggested at this link. :)
Why does a1=72 instead of 73 in this (terrible) snippet of C++ code ?
#include <iostream>
#include <string>
using namespace std;
int main (int argc, char* argv[])
{
double a = 136.73;
unsigned int a1 = (100*(a-(int)a));
cout << (a-(int)a) << endl; // 0.73
cout << 100*(a-(int)a) << endl; // 73
cout << a1 << endl; // 72 !
}
You can execute it at http://codepad.org/HhGwTFhw
If you increase the output precision, you'll see that (a - (int) a) prints 0.7299999999999898.
Therefore, the truncation of this value (which you obtain when you cast it to an int) is indeed 72.
(Updated codepad here.)
This is a common precision issue. The literal 136.73 actually stands for the number
136.729999999999989768184605054557323455810546875
and the result of a-(int)a is not 0.73 (even though that is what is displayed), but rather
0.729999999999989768184605054557323455810546875
When you multiply that by 100, you get
72.9999999999989768184605054557323455810546875
And since converting from double to int cuts off everything after the decimal point, you get 72.
0.73 cannot be represented exactly so it is rounded to a number close to it, which in this example is lower then 0.73. when multiplying by 100, you get 72.[something], which is later trimmed to 72.
more info
It might become clear if you read the following article: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The others have already shown that the result of (a - (int)a) is not exactly 0.73, but a little lower. The article explains why that is so. It is not exactly an easy read, but really something everyone working with FP should read and try to understand, IMO.
here is a quite simple question(I think), is there a STL library method that provides the limit of a variable type (e.g integer) ? I know these limits differ on different computers but there must be a way to get them through a method, right?
Also, would it be really hard to write a method to calculate the limit of a variable type?
I'm just curious! :)
Thanks ;).
Use std::numeric_limits:
// numeric_limits example
// from the page I linked
#include <iostream>
#include <limits>
using namespace std;
int main () {
cout << boolalpha;
cout << "Minimum value for int: " << numeric_limits<int>::min() << endl;
cout << "Maximum value for int: " << numeric_limits<int>::max() << endl;
cout << "int is signed: " << numeric_limits<int>::is_signed << endl;
cout << "Non-sign bits in int: " << numeric_limits<int>::digits << endl;
cout << "int has infinity: " << numeric_limits<int>::has_infinity << endl;
return 0;
}
I see that the 'correct' answer has already been given: Use <limits> and let the magic happen. I happen to find that answer unsatisfying, since the question is:
would it be really hard to write a method to calculate the limit of a variable type?
The answer is : easy for integer types, hard for float types. There are 3 basic types of algorithms you would need to do this. signed, unsigned, and floating point. each has a different algorithm for how you get the min and max, and the actual code involves some bit twiddling, and in the case of floating point, you have to loop unless you have a known integer type that is the same size as the float type.
So, here it is.
Unsigned is easy. the min is when all bits are 0's, the max is when all bits are 1's.
const unsigned type unsigned_type_min = (unsigned type)0;
const unsigned type unsigned_type_max = ~(unsigned type)0;
For signed, the min is when the sign bit is set but all of the other bits are zeros, the max is when all bits except the sign bit are set. with out knowing the size of the type, we don't know where the sign bit is, but we can use some bit tricks to get this to work.
const signed type signed_type_max = (signed type)(unsigned_type_max >> 1);
const signed type signed_type_min = (signed type)(~(signed_type_max));
for floating point, there are 4 limits, although knowning only the positive limits is sufficient, the negative limits are just sign inverted positive limits. There a potentially many ways to represent floating point numbers, but for those that use binary (rather than base 10) floating point, nearly everyone uses IEEE representations.
For IEEE floats, The smallest positive floating point value is when the low bit of the exponent is 1 and all other bits are 0's. The largest negative floating point value is the bitwise inverse of this. However, without an integer type that is known to be the same size as the given floating point type, there isn't any way to do this bit manipulation other than executing a loop. if you have an integer type that you know is the same size as your floating point type, you can do this as a single operation.
const float_type get_float_type_smallest() {
const float_type float_1 = (float_type)1.0;
const float_type float_2 = (float_type)0.5;
union {
byte ab[sizeof(float_type)];
float_type fl;
} u;
for (int ii = 0; ii < 0; ++ii)
u.ab[ii] = ((byte*)&float_1)[ii] ^ ((byte*)&float_2)[ii];
return u.fl;
}
const float_type get_float_type_largest() {
union {
byte ab[sizeof(float_type)];
float_type fl;
} u;
u.fl = get_float_type_smallest();
for (int ii = 0; ii < 0; ++ii)
u.ab[ii] = ~u.ab[ii];
return -u.fl; // Need to re-invert the sign bit.
}
(related to C, but I think this also applies for C++)
You can also try "enquire", which is a script which can re-create limits.h for your compiler. A quote from the projetc's home page:
This is a program that determines many
properties of the C compiler and
machine that it is run on, such as
minimum and maximum [un]signed
char/int/long, many properties of
float/ [long] double, and so on.
As an option it produces the ANSI C
float.h and limits.h files.
As a further option, it even checks
that the compiler reads the header
files correctly.
It is a good test-case for compilers,
since it exercises them with many
limiting values, such as the minimum
and maximum floating-point numbers.
#include <limits>
std::numeric_limits<type>::max() // min() etc