Casting float to int in C++

Casting float to int in C++ - c++

int uniquePaths(int m, int n) {
int num = m+n-2;
int den=1;
double ans = 1;
while(den<=m-1) {
ans = ans*(num--)/(den++);
}
cout<<ans;
return (int)ans;
}
The expected answer for m=53, n=4 as input to the above piece of code is 26235 but the code returns 26234. However, the stdout shows 26235.
Could you please help me understand this behavior?

Due to floating-point rounding, your code computes ans to be 26,234.999999999985448084771633148193359375. When it is printed with cout<<ans, the default formatting does not show the full value and rounds it to “26235”. However, when the actual value is converted to int, the result is 26,234.
After setting num to m+n-2, your code is computing num! / ((m-1)!(num-m+1)!), which of course equals num! / ((num-m+1)!(m-1)!). Thus, you can use either m-1 or num-m+1 as the limit. So you can change the while line to these two lines:
int limit = m-1 < num-m+1 ? m-1 : num-m+1;
while(den<=limit) {
and then your code will run to the lower limit, which will avoid dividing ans by factors that are not yet in it. All results will be exact integer results, with no rounding errors, unless you try to calculate a result that exceeds the range of your double format where it is able to represent all integers (up to 253 in the ubiquitous IEEE-754 binary64 format used for double).

Related

Floating points in C++ (float and double)

I know that we shouldn't use floating points in the loops. But could someone explain it to me what happens when we have a loop and we add a small number to a large number until we reach a certain value that allows the loop to terminate?
I guess it might cause potential errors. But apart from that?
What would it look like with a single-precision (float) and double-precision (double) floating-point numbers? I guess more rounding errors would appear in the double type. Could someone give me an example (the best in C ++) because I have no idea how to start with it...
I would be very grateful if you could provide me with a hint. Thanks!

In a C++ implementation using IEEE-754 arithmetic and the “single” (binary32) format for float, this code prints “count = 3”:
int count = 0;
for (float f = 0; f < .3f; f += .1f)
++count;
std::cout << "count = " << count << ".\n";
but this code prints “count = 4”:
int count = 0;
for (float f = 0; f < .33f; f += .11f)
++count;
std::cout << "count = " << count << ".\n";
In the first example, the source text .1f is converted to 0.100000001490116119384765625, which is the value representable in float that is closed to .1. The source text .3f is converted to 0.300000011920928955078125, the float value closest to .3. Adding this converted value for .1f to f produces 0.100000001490116119384765625, then 0.20000000298023223876953125, and then 0.300000011920928955078125, at which point f < .3f is false, and the loop stops.
In the second example, .11f is converted to 0.10999999940395355224609375, and .33f is converted to 0.3300000131130218505859375. In this case, adding the converted value of .11f to f produces 0.10999999940395355224609375, then 0.2199999988079071044921875, and then 0.329999983310699462890625. Note that, due to rounding, this result of adding .11f three times is 0.329999983310699462890625, which is less than .33f (0.3300000131130218505859375), so f < .33f is true, and the loop continues for another iteration.
This is similar to adding ⅓ in a two-digit decimal format with a loop bound of three-thirds (which is 1). If we had for (f = 0; f < 1; f += ⅓), the ⅓ in the source text would have to be converted to .33 (two-digit decimal). Then f would be stepped through .33, .66, and .99. The loop would not stop until it reached 1.32. The same rounding issues occur in binary floating-point arithmetic.
When the amount added in the loop is a small number relative to the large number, these rounding issues are greater. First, there will be more additions, so there will be more rounding errors, and they may accumulate. Second, since larger numbers require a larger exponent to scale them in the floating-point format, they have less absolute precision than smaller numbers. This means the roundings have to be come larger relative to the small number that is being added. So the rounding errors are larger in magnitude.
Then, even if the loop eventually terminates, the values of f in each iteration may be far from the desired values, due to the accumulated errors. If f is used for calculations inside the loop, the calculations might not be using the desired values and may produce incorrect results.

With increasing values the difference between 2 floating point values increases too. There is a point where i+1 results in the same value.
Consider this code:
#include <iostream>
int main()
{
float i = 0;
while (i != i + 1) i++;
std::cout << i << std::endl;
return 0;
}
while (i != i + 1) should be an endless loop, but for floating point variables, it is not.
The code above prints 1.67772e+07 on https://godbolt.org/z/7xf8n8
So, for (float f = 0; f < 2e7; f++) is an endless loop.
You can try it with double yourself, the value is bigger.

Output is NaN , how?

I am trying to code Taylor series but I am getting 'nan' as output in case of large value of n(=100).
Where am I doing things wrong?
#include<iostream>
#include<cmath>
using namespace std;
int main(){
int n;
double x;
cin >> n;
cin >> x;
long double temp_val = 1;
int sign = 1;
int power = 1;
long long int factorial = 1;
for(int i = 1 ; i < n ; i++){
sign = sign* -1 ;
power = 2*i;
factorial = factorial*(2*i)*(2*i-1);
temp_val += sign*pow(x,power)/factorial;
}
cout<<temp_val;
}

For large n your program has undefined behavior.
You are calculating the factorial of 2n (so 200) in factorial. 200! is, according to Wolfram Alpha:
788657867364790503552363213932185062295135977687173263294742533244359449963403342920304284011984623904177212138919638830257642790242637105061926624952829931113462857270763317237396988943922445621451664240254033291864131227428294853277524242407573903240321257405579568660226031904170324062351700858796178922222789623703897374720000000000000000000000000000000000000000000000000
For comparison, the typical largest value that a long long int can hold is
9223372036854775807
(which is assuming it is 64-bit)
Clearly you will not be able to fit 200! into that. When you overflow a signed integer variable your program will have undefined behavior. That means that there will be no guarantee how it will behave.
But even if you change the variable type to be unsigned, not much will change. The program won't have undefined behavior anymore, but the factorial will not actually hold the correct value. Instead it will keep wrapping around back to zero.
Even if you change factorial to be type double, this will probably not be enough with at typical double implementation to hold this value. Your platform might have a long double type that is larger than double and able to hold this value.
You will have similar problems with pow(x, power) if x is not close to 1.
As mentioned in the answer by #idclev463035818 the Taylor series, if evaluated straightforwardly, is numerically very ill-behaved and can not really be used practically in this form for large n.

Calculating the taylor series has a trap that also occurs in other situations: Both the numerator and denominator of the terms to add grow rather fast and overflow easily, but their quotient converges to zero (otherwise adding them up till infinity would not converge to a finite number).
Instead of keeping track of both terms individually you need to update the result and the total increment. I wont provide you a full solution. In pseudo-code
double res = 0;
double delta = x;
int n = 1;
double sign = -1;
while ( ! stop_condition ) {
delta *= (x / n);
res += sign*delta;
++n;
sign *= -1;
}

pow() function gives an error [duplicate]

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!

Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.

When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).

From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)

Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.

If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.

There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.

You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}

What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

Calculating Probability C++ Bernoulli Trials

The program asks the user for the number of times to flip a coin (n; the number of trials).
A success is considered a heads.
Flawlessly, the program creates a random number between 0 and 1. 0's are considered heads and success.
Then, the program is supposed to output the expected values of getting x amount of heads. For example if the coin was flipped 4 times, what are the following probabilities using the formula
nCk * p^k * (1-p)^(n-k)
Expected 0 heads with n flips: xxx
Expected 1 heads with n flips: xxx
...
Expected n heads with n flips: xxx
When doing this with "larger" numbers, the numbers come out to weird values. It happens if 15 or twenty are put into the input. I have been getting 0's and negative values for the value that should be xxx.
Debugging, I have noticed that the nCk has come out to be negative and not correct towards the upper values and beleive this is the issue. I use this formula for my combination:
double combo = fact(n)/fact(r)/fact(n-r);
here is the psuedocode for my fact function:
long fact(int x)
{
int e; // local counter
factor = 1;
for (e = x; e != 0; e--)
{
factor = factor * e;
}
return factor;
}
Any thoughts? My guess is my factorial or combo functions are exceeding the max values or something.

You haven't mentioned how is factor declared. I think you are getting integer overflows. I suggest you use double. That is because since you are calculating expected values and probabilities, you shouldn't be concerned much about precision.
Try changing your fact function to.
double fact(double x)
{
int e; // local counter
double factor = 1;
for (e = x; e != 0; e--)
{
factor = factor * e;
}
return factor;
}
EDIT:
Also to calculate nCk, you need not calculate factorials 3 times. You can simply calculate this value in the following way.
if k > n/2, k = n-k.
n(n-1)(n-2)...(n-k+1)
nCk = -----------------------
factorial(k)

You're exceeding the maximum value of a long. Factorial grows so quickly that you need the right type of number--what type that is will depend on what values you need.
Long is an signed integer, and as soon as you pass 2^31, the value will become negative (it's using 2's complement math).
Using an unsigned long will buy you a little time (one more bit), but for factorial, it's probably not worth it. If your compiler supports long long, then try an "unsigned long long". That will (usually, depends on compiler and CPU) double the number of bits you're using.
You can also try switching to use double. The problem you'll face there is that you'll lose accuracy as the numbers increase. A double is a floating point number, so you'll have a fixed number of significant digits. If your end result is an approximation, this may work okay, but if you need exact values, it won't work.
If none of these solutions will work for you, you may need to resort to using an "infinite precision" math package, which you should be able to search for. You didn't say if you were using C or C++; this is going to be a lot more pleasant with C++ as it will provide a class that acts like a number and that would use standard arithmetic operators.

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!

Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.

When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).

From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)

Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.

If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.

There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.

You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}

What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js