float overflow?

float overflow? - c++

The following code seems to always generate wrong result. I have tested it on gcc and windows visual studio. Is it because of float overflow or something else? Thanks in advance:)
#include <stdio.h>
#define N 51200000
int main()
{
float f = 0.0f;
for(int i = 0; i < N; i++)
f += 1.0f;
fprintf(stdout, "%f\n", f);
return 0;
}

float only has 23 bits of precision. 512000000 requires 26. Simply put, you do not have the precision required for a correct answer.

For more information on precision of data types in C please refer this.
Your code is expected to give abnormal behaviour when you exceed the defined precision.

Unreliable things to do with floating point arithmetic include adding two numbers together when they are very different in magnitude, and subtracting them when they are similar in magnitude. The first is what you are doing here; 1 << 51200000. The CPU normalises one of the numbers so they both have the same exponent; that will shift the actual value (1) off the end of the available precision when the other operand is large, so by the time you are part way through the calculation, one has become (approximately) equal to zero.

Your problem is the unit of least precision. Short: Big float values cannot be incremented with small values as they will be rounded to the next valid float. While 1.0 is enough to increment small values the minimal increment for 16777216 seems to be 2.0 (checked for java Math.ulp, but should work for c++ too).
Boost has some functions for this.

The precision of float is only 7 digits. Adding number 1 to a float larger than 2^24 gives a wrong result. With using double types instead of float you will get a correct result.

Whilst editing the code in your question, I came across an unblocked for loop:
#include <stdio.h>
#define N 51200000
int main() {
float f = 0.0f;
for(int i = 0; i < N; i++) {
f += 1.0f;
fprintf(stdout, "%f\n", f);
}
return 0;
}

Related

Casting float to int in C++

int uniquePaths(int m, int n) {
int num = m+n-2;
int den=1;
double ans = 1;
while(den<=m-1) {
ans = ans*(num--)/(den++);
}
cout<<ans;
return (int)ans;
}
The expected answer for m=53, n=4 as input to the above piece of code is 26235 but the code returns 26234. However, the stdout shows 26235.
Could you please help me understand this behavior?

Due to floating-point rounding, your code computes ans to be 26,234.999999999985448084771633148193359375. When it is printed with cout<<ans, the default formatting does not show the full value and rounds it to “26235”. However, when the actual value is converted to int, the result is 26,234.
After setting num to m+n-2, your code is computing num! / ((m-1)!(num-m+1)!), which of course equals num! / ((num-m+1)!(m-1)!). Thus, you can use either m-1 or num-m+1 as the limit. So you can change the while line to these two lines:
int limit = m-1 < num-m+1 ? m-1 : num-m+1;
while(den<=limit) {
and then your code will run to the lower limit, which will avoid dividing ans by factors that are not yet in it. All results will be exact integer results, with no rounding errors, unless you try to calculate a result that exceeds the range of your double format where it is able to represent all integers (up to 253 in the ubiquitous IEEE-754 binary64 format used for double).

pow() function gives an error [duplicate]

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!

Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.

When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).

From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)

Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.

If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.

There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.

You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}

What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

Using data type other than int as index value in C++ for loop

I am accepting an input from user as a float value.
This value can go up to "10 raised to power 18".
The next step involves finding all the divisors of this number. for which i am doing the following:
for(i=2; i<=n/2 ; i++)
{
if(n%i==0)
v.push_back(i);
}
Here, n is the number entered by the user.
Problem is that n is float and using it in if loop index causes it's value to be limited to '10 raised to the power 9'
Hence, is there any way to use data type other than int for using values of range '10 raised to power 18'?

You can use an unsigned long long which is 264 or roughly 1019
This assumes that your compiler supports 64-bit integers.

The question has been answered (use long long int), but wanted to point out that floats are called "floating point" for a reason. They incorporate an exponent, basically the position of the decimal point, which determines the precision of the mantissa. This conveniently allows you to both represent small numbers with high precision and large numbers with low precision, but not both at the same time.
For more details: http://en.wikipedia.org/wiki/IEEE_754-2008
Try this:
int main(void)
{
float i = 16777217.0f;
printf("i = %f\n", i);
i++;
printf("i+1 = %f\n", i);
}
w/ 32-bit floats this returns:
i = 16777216.000000
i+1 = 16777216.000000
So question of the day: what do you think will happen if you have a loop like this?
for(float f; f < 20000000; ++f)
{
// do stuff
}

Sure you can use other data types for loop , use any of the types mentioned here

I think a long double should be substantial.

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!

Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.

When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).

From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)

Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.

If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.

There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.

You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}

What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

Why does this double to int conversion not work?

I've been thoroughly searching for a proper explanation of why this is happening, but still don't really understand, so I apologize if this is a repost.
#include <iostream>
int main()
{
double x = 4.10;
double j = x * 100;
int k = (int) j;
std::cout << k;
}
Output: 409
I can't seem to replicate this behavior with any other number. That is, replace 4.10 with any other number in that form and the output is correct.
There must be some sort of low level conversion stuff I'm not understanding.
Thanks!

4.1 cannot be exactly represented by a double, it gets approximated by something ever so slightly smaller:
double x = 4.10;
printf("%.16f\n", x); // Displays 4.0999999999999996
So j will be something ever so slightly smaller than 410 (i.e. 409.99...). Casting to int discards the fractional part, so you get 409.
(If you want another number that exhibits similar behaviour, you could try 8.2, or 16.4, or 32.8... see the pattern?)
Obligatory link: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

The fix
int k = (int)(j+(j<0?-0.5:0.5));
The logic
You're experiencing a problem with number bases.
Although on-screen, 4.10 is a decimal, after compilation, it gets expressed as a binary floating point number, and .10 doesn't convert exactly into binary, and you end up with 4.099999....
Casting 409.999... to int just drops the digits. If you add 0.5 before casting to int, it effectively rounds to the nearest number, or 410 (409.49 would go to 409.99, cast to 409)

Try this.
#include <iostream>
#include "math.h"
int main()
{
double x = 4.10;
double j = x * 100;
int k = (int) j;
std::cout << trunc(k);
std::cout << round(k);
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

float overflow? - c++

float only has 23 bits of precision. 512000000 requires 26. Simply put, you do not have the precision required for a correct answer.

For more information on precision of data types in C please refer this. Your code is expected to give abnormal behaviour when you exceed the defined precision.

The precision of float is only 7 digits. Adding number 1 to a float larger than 2^24 gives a wrong result. With using double types instead of float you will get a correct result.

Whilst editing the code in your question, I came across an unblocked for loop: #include <stdio.h> #define N 51200000 int main() { float f = 0.0f; for(int i = 0; i < N; i++) { f += 1.0f; fprintf(stdout, "%f\n", f); } return 0; }

Related

Casting float to int in C++

pow() function gives an error [duplicate]

Using data type other than int as index value in C++ for loop

Why pow(10,5) = 9,999 in C++

Why does this double to int conversion not work?

Categories

Resources