Why does this double to int conversion not work? - c++

I've been thoroughly searching for a proper explanation of why this is happening, but still don't really understand, so I apologize if this is a repost.
#include <iostream>
int main()
{
double x = 4.10;
double j = x * 100;
int k = (int) j;
std::cout << k;
}
Output: 409
I can't seem to replicate this behavior with any other number. That is, replace 4.10 with any other number in that form and the output is correct.
There must be some sort of low level conversion stuff I'm not understanding.
Thanks!

4.1 cannot be exactly represented by a double, it gets approximated by something ever so slightly smaller:
double x = 4.10;
printf("%.16f\n", x); // Displays 4.0999999999999996
So j will be something ever so slightly smaller than 410 (i.e. 409.99...). Casting to int discards the fractional part, so you get 409.
(If you want another number that exhibits similar behaviour, you could try 8.2, or 16.4, or 32.8... see the pattern?)
Obligatory link: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

The fix
int k = (int)(j+(j<0?-0.5:0.5));
The logic
You're experiencing a problem with number bases.
Although on-screen, 4.10 is a decimal, after compilation, it gets expressed as a binary floating point number, and .10 doesn't convert exactly into binary, and you end up with 4.099999....
Casting 409.999... to int just drops the digits. If you add 0.5 before casting to int, it effectively rounds to the nearest number, or 410 (409.49 would go to 409.99, cast to 409)

Try this.
#include <iostream>
#include "math.h"
int main()
{
double x = 4.10;
double j = x * 100;
int k = (int) j;
std::cout << trunc(k);
std::cout << round(k);
}

Related

Infinite loop of zero in table

this code is dividing a circle into X part and display a table of radian value of each part.
but
I get an infinite loop of 0 display when I'm using value superior to 6
with value under 6 I get '0 1 2 3 4 5 6'
It seems that displayed value are not float either.
I have the code using degrees and work fine.
#include <iostream>
using namespace std;
#define pi 3.14159265359
#define pi2 6.28318530718
int nbObjets = 0;
void objetsPositionRadian(int tab[], int nbObjets);
int main(){
int tabRadian[] = {0};
std::cout << "Nombre d'objets ? ";
std::cin >> nbObjets;
objetsPositionRadian(tabRadian, nbObjets);
return 0;
}
void objetsPositionRadian(int tab[], int nbObjets){
float radian = (360/nbObjets) * (pi/180);
for (int i = 0; i < pi2; i+=radian){
std::cout << i << " ";
}
std::cout << endl;
}
The obvious move here is to change i in your loop to be a double (or float) instead of an int.
Along with that, when computing radian, 360/nbObjets does integer division, so if nbObjets is > 360, it'll give a result of 0. Changing 360 to 360.0 fixes that problem.
But that leaves another problem: depending on the vagaries of floating point math, if you ask for a lot of objects, there's a good change you'll end up computing positions for one more object than you asked for (and conceivably even more than that if you asked for a really large number of objects).
This problem arises from cumulative errors as you add radian to i in the loop. Rather than doing that, you almost certain want something on this order:
for (int i=0; i<nbObjets; i++)
cout << i * radian;
This way, you always get exactly the number of objects you asked for, and any possible errors in the value don't accumulate from one iteration to the next.
You are doing two dangerous things in the following two lines:
float radian = (360/nbObjets) * (pi/180);
for (int i = 0; i < pi2; i+=radian)
First: nbObject is defined as an integer. As a result 360/nbObjects will be calculated as an integer (e.g. for 7, the result will be 51, not a floating point number).
Next, you define i as an integer. When you add a number to it which is smaller than 1, it will always remain the same.
Therefore I advise you to use more floating point numbers when needed, as in this proposal:
double radian = ((double)360/nbObjets) * (pi/180); // first typecast 360 as a
// floating point, in order
// to enforce floating
// point arithmetic.
for (double i = 0; i < pi2; i+=radian)
This should work better.

pow() function gives an error [duplicate]

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

precision error in nth root of a number in C++

I know from previous threads on this topic that using float arithmetic causes precision anomalies. But Interestingly I observed that the same function is behaving in two different ways.Using COUT output is 4 but if I am saving the result into a variable, then result is 3!
#include <iostream>
#include <cmath>
using namespace std;
#define mod 1000000007
long long int fastPower(long long int a, int n){
long long int res = 1;
while (n) {
if (n & 1) res = (res * a) % mod;
n >>= 1; a = (a * a) % mod;
}
return res;
}
int main() {
int j = 3;
cout << pow(64, (double)1.0/(double)j) << endl; // Outputs 4
int root = pow(64, (double)1.0/(double)j);
cout << root << endl; // Outputs 3
/* As said by "pts", i tried including this condition in my code but including this line in my code resulted in TimeLimitExceeded(TLE). */
if (fastPower(root+1,j) <= 64) root++;
cout << root << endl; // Outputs 4 :)
return 0;
}
Code output on Ideone.com
Now, how can we avoid such errors in a programing contest.
I do not want to use 'round' function because I need only integer value of root. i.e
63(1/6) = 1, 20(1/2) = 4, etc...
How should I modify my code so that correct result is stored in the root variable.
pow returns double. When cout is used, it is rounded(thus, it is 4). When you cast it to int, it just truncates fractional part. Pow returns something like 4 - eps(because of precision issues). When it is just truncated, it is equal to 3.
Dirty hack useful in programming contests: int root = (int)(pow(...) + 1e-7)
As far as I know, there is no single-line answer in C and C++ for getting the ath root of b rounded down.
As a quick workaround, you can do something like:
int root(int a, int b) {
return floor(pow(b, 1.0 / a) + 0.001);
}
This doesn't work for every value, but by adjusting the constant (0.001), you may get lucky and it would work for the test input.
As a workaround, use pow as you use it already, and if it returns r, then try r - 1, r and r + 1 by multiplying it back (using fast exponentiation of integers). This will work most of the time.
If you need a solution which works 100% of the time, then don't use floating point numbers. Use for example binary search with exponentiation. There are faster algorithms (such as Newton iteration), but if you use them on integers then you need to write custom logic to find the exact solution as soon as they stop converging.
There are two problems with your program:
The pow(int, int) overload is no longer available. To avoid this problem, cast the first parameter to double, float, or long double.
Also, command of cout is rounding off your answer in upper roof (3.something into 4) and saving your data is removing all the decimal part and is accepting only integer part.

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

float overflow?

The following code seems to always generate wrong result. I have tested it on gcc and windows visual studio. Is it because of float overflow or something else? Thanks in advance:)
#include <stdio.h>
#define N 51200000
int main()
{
float f = 0.0f;
for(int i = 0; i < N; i++)
f += 1.0f;
fprintf(stdout, "%f\n", f);
return 0;
}
float only has 23 bits of precision. 512000000 requires 26. Simply put, you do not have the precision required for a correct answer.
For more information on precision of data types in C please refer this.
Your code is expected to give abnormal behaviour when you exceed the defined precision.
Unreliable things to do with floating point arithmetic include adding two numbers together when they are very different in magnitude, and subtracting them when they are similar in magnitude. The first is what you are doing here; 1 << 51200000. The CPU normalises one of the numbers so they both have the same exponent; that will shift the actual value (1) off the end of the available precision when the other operand is large, so by the time you are part way through the calculation, one has become (approximately) equal to zero.
Your problem is the unit of least precision. Short: Big float values cannot be incremented with small values as they will be rounded to the next valid float. While 1.0 is enough to increment small values the minimal increment for 16777216 seems to be 2.0 (checked for java Math.ulp, but should work for c++ too).
Boost has some functions for this.
The precision of float is only 7 digits. Adding number 1 to a float larger than 2^24 gives a wrong result. With using double types instead of float you will get a correct result.
Whilst editing the code in your question, I came across an unblocked for loop:
#include <stdio.h>
#define N 51200000
int main() {
float f = 0.0f;
for(int i = 0; i < N; i++) {
f += 1.0f;
fprintf(stdout, "%f\n", f);
}
return 0;
}