Why pow() return 999... in C++ [duplicate] - c++

While running the following lines of code:
int i,a;
for(i=0;i<=4;i++)
{
a=pow(10,i);
printf("%d\t",a);
}
I was surprised to see the output, it comes out to be 1 10 99 1000 9999 instead of 1 10 100 1000 10000.
What could be the possible reason?
Note
If you think it's a floating point inaccuracy that in the above for loop when i = 2, the values stored in variable a is 99.
But if you write instead
a=pow(10,2);
now the value of a comes out to be 100. How is that possible?

You have set a to be an int. pow() generates a floating point number, that in SOME cases may be just a hair less than 100 or 10000 (as we see here.)
Then you stuff that into the integer, which TRUNCATES to an integer. So you lose that fractional part. Oops. If you really needed an integer result, round may be a better way to do that operation.
Be careful even there, as for large enough powers, the error may actually be large enough to still cause a failure, giving you something you don't expect. Remember that floating point numbers only carry so much precision.

The function pow() returns a double. You're assigning it to variable a, of type int. Doing that doesn't "round off" the floating point value, it truncates it. So pow() is returning something like 99.99999... for 10^2, and then you're just throwing away the .9999... part. Better to say a = round(pow(10, i)).

This is to do with floating point inaccuracy. Although you are passing in ints they are being implicitly converted to a floating point type since the pow function is only defined for floating point parameters.

Mathematically, the integer power of an integer is an integer.
In a good quality pow() routine this specific calculation should NOT produce any round-off errors. I ran your code on Eclipse/Microsoft C and got the following output:
1 10 100 1000 10000
This test does NOT indicate if Microsoft is using floats and rounding or if they are detecting the type of your numbers and choosing the appropriate method.
So, I ran the following code:
#include <stdio.h>
#include <math.h>
main ()
{
double i,a;
for(i=0.0; i <= 4.0 ;i++)
{
a=pow(10,i);
printf("%lf\t",a);
}
}
And got the following output:
1.000000 10.000000 100.000000 1000.000000 10000.000000

No one spelt out how to actually do it correctly - instead of pow function, just have a variable that tracks the current power:
int i, a, power;
for (i = 0, a = 1; i <= 4; i++, a *= 10) {
printf("%d\t",a);
}
This continuing multiplication by ten is guaranteed to give you the correct answer, and quite OK (and much better than pow, even if it were giving the correct results) for tasks like converting decimal strings into integers.

Related

Problem in conversion of decimal to binary number by using bit manipulation [duplicate]

For some values (like 9) it works perfectly but, for most (like 7, 19 or 6), it subtracts 1 from the return (binary) value.
#include<iostream>
#include<cmath>
using namespace std;
int decimaltobinary(int);
int main()
{
int num;
cout<<"Enter the number: ";
cin>>num;
cout<<num<<" in decimal = "<<decimaltobinary(num)<<" in binary.";
return 0;
}
int decimaltobinary(int num)
{
int remainder,i=0,binary=0;
while(num!=0)
{
remainder=num%2;
num=num/2;
binary=binary+remainder*pow(10,i);
i++;
}
return binary;
}
There are two main problems with the shown code:
The shown code attempts to build a binary version of the input number in decimal producing, for example, the result of 111 for the number 7. That's an integer value of one hundred and eleven.
On a 32 bit platform, with a 32 bit integer means that the largest number that can be "converted" to decimal this way will be 2047. 2048 is 10000000000 in binary, which will exceed the capacity of a 32 bit integer. An unsigned 32 bit integer's maximum value is 4294967295 (and half of that for it's plain, signed, int value, but either signed or unsigned, you're out of gas at this point).
Any use of pow() with two values that are integets is automatically broken by default, becase floating point math is broken. This is not what pow() really does. Here's what pow() does: a) it takes the natural logarithm of its first parameter, b) multiples the result from step a by its 2nd parameter, c) raises e to the power resulting from step b. Does this sound like something you expected to do here?
And since pow() takes floating point parameters, and the result is a floating point, the end result of the shown code is a bunch of needless conversions between floating point and integer values, and non-specific rounding errors as a result of imprecise floating point exponential math.
But the main flaw in the shown code is an attempt to use plain ints to assemble a decimal number represented of a binary value, which simply doesn't have enough digits for this. Switching to long long int won't be much of a help. Counting things off on my fingers, you'll be able to go up only to somewhere slightly north of a million, that way. A completely different approach must be taken for the described programming tasks.
Your problem is that binary+remainder*pow(10,i); is all done in floating-point arithmetic and only converted to int at the assignment. Since pow is not exact, you may get the result slightly below the exact value, in which case the conversion truncates it and makes 1 less than the desired result.
While there are various better ways to achieve your goal, the immediate fix is to use std::round() and then cast the result to int:
binary=binary+remainder*int(round(pow(10,i)));

Why does hardcoded variable (2^62 + 1) subtracted by pow(2, 62) evaluate to 0 instead of 1? [duplicate]

While running the following lines of code:
int i,a;
for(i=0;i<=4;i++)
{
a=pow(10,i);
printf("%d\t",a);
}
I was surprised to see the output, it comes out to be 1 10 99 1000 9999 instead of 1 10 100 1000 10000.
What could be the possible reason?
Note
If you think it's a floating point inaccuracy that in the above for loop when i = 2, the values stored in variable a is 99.
But if you write instead
a=pow(10,2);
now the value of a comes out to be 100. How is that possible?
You have set a to be an int. pow() generates a floating point number, that in SOME cases may be just a hair less than 100 or 10000 (as we see here.)
Then you stuff that into the integer, which TRUNCATES to an integer. So you lose that fractional part. Oops. If you really needed an integer result, round may be a better way to do that operation.
Be careful even there, as for large enough powers, the error may actually be large enough to still cause a failure, giving you something you don't expect. Remember that floating point numbers only carry so much precision.
The function pow() returns a double. You're assigning it to variable a, of type int. Doing that doesn't "round off" the floating point value, it truncates it. So pow() is returning something like 99.99999... for 10^2, and then you're just throwing away the .9999... part. Better to say a = round(pow(10, i)).
This is to do with floating point inaccuracy. Although you are passing in ints they are being implicitly converted to a floating point type since the pow function is only defined for floating point parameters.
Mathematically, the integer power of an integer is an integer.
In a good quality pow() routine this specific calculation should NOT produce any round-off errors. I ran your code on Eclipse/Microsoft C and got the following output:
1 10 100 1000 10000
This test does NOT indicate if Microsoft is using floats and rounding or if they are detecting the type of your numbers and choosing the appropriate method.
So, I ran the following code:
#include <stdio.h>
#include <math.h>
main ()
{
double i,a;
for(i=0.0; i <= 4.0 ;i++)
{
a=pow(10,i);
printf("%lf\t",a);
}
}
And got the following output:
1.000000 10.000000 100.000000 1000.000000 10000.000000
No one spelt out how to actually do it correctly - instead of pow function, just have a variable that tracks the current power:
int i, a, power;
for (i = 0, a = 1; i <= 4; i++, a *= 10) {
printf("%d\t",a);
}
This continuing multiplication by ten is guaranteed to give you the correct answer, and quite OK (and much better than pow, even if it were giving the correct results) for tasks like converting decimal strings into integers.

Difference in behaviour of pow from math.h for same input [duplicate]

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int n,i,ele;
n=5;
ele=pow(n,2);
printf("%d",ele);
return 0;
}
The output is 24.
I'm using GNU/GCC in Code::Blocks.
What is happening?
I know the pow function returns a double , but 25 fits an int type so why does this code print a 24 instead of a 25? If n=4; n=6; n=3; n=2; the code works, but with the five it doesn't.
Here is what may be happening here. You should be able to confirm this by looking at your compiler's implementation of the pow function:
Assuming you have the correct #include's, (all the previous answers and comments about this are correct -- don't take the #include files for granted), the prototype for the standard pow function is this:
double pow(double, double);
and you're calling pow like this:
pow(5,2);
The pow function goes through an algorithm (probably using logarithms), thus uses floating point functions and values to compute the power value.
The pow function does not go through a naive "multiply the value of x a total of n times", since it has to also compute pow using fractional exponents, and you can't compute fractional powers that way.
So more than likely, the computation of pow using the parameters 5 and 2 resulted in a slight rounding error. When you assigned to an int, you truncated the fractional value, thus yielding 24.
If you are using integers, you might as well write your own "intpow" or similar function that simply multiplies the value the requisite number of times. The benefits of this are:
You won't get into the situation where you may get subtle rounding errors using pow.
Your intpow function will more than likely run faster than an equivalent call to pow.
You want int result from a function meant for doubles.
You should perhaps use
ele=(int)(0.5 + pow(n,2));
/* ^ ^ */
/* casting and rounding */
Floating-point arithmetic is not exact.
Although small values can be added and subtracted exactly, the pow() function normally works by multiplying logarithms, so even if the inputs are both exact, the result is not. Assigning to int always truncates, so if the inexactness is negative, you'll get 24 rather than 25.
The moral of this story is to use integer operations on integers, and be suspicious of <math.h> functions when the actual arguments are to be promoted or truncated. It's unfortunate that GCC doesn't warn unless you add -Wfloat-conversion (it's not in -Wall -Wextra, probably because there are many cases where such conversion is anticipated and wanted).
For integer powers, it's always safer and faster to use multiplication (division if negative) rather than pow() - reserve the latter for where it's needed! Do be aware of the risk of overflow, though.
When you use pow with variables, its result is double. Assigning to an int truncates it.
So you can avoid this error by assigning result of pow to double or float variable.
So basically
It translates to exp(log(x) * y) which will produce a result that isn't precisely the same as x^y - just a near approximation as a floating point value,. So for example 5^2 will become 24.9999996 or 25.00002

Using scientific notation in for loops

I've recently come across some code which has a loop of the form
for (int i = 0; i < 1e7; i++){
}
I question the wisdom of doing this since 1e7 is a floating point type, and will cause i to be promoted when evaluating the stopping condition. Should this be of cause for concern?
The elephant in the room here is that the range of an int could be as small as -32767 to +32767, and the behaviour on assigning a larger value than this to such an int is undefined.
But, as for your main point, indeed it should concern you as it is a very bad habit. Things could go wrong as yes, 1e7 is a floating point double type.
The fact that i will be converted to a floating point due to type promotion rules is somewhat moot: the real damage is done if there is unexpected truncation of the apparent integral literal. By the way of a "proof by example", consider first the loop
for (std::uint64_t i = std::numeric_limits<std::uint64_t>::max() - 1024; i ++< 18446744073709551615ULL; ){
std::cout << i << "\n";
}
This outputs every consecutive value of i in the range, as you'd expect. Note that std::numeric_limits<std::uint64_t>::max() is 18446744073709551615ULL, which is 1 less than the 64th power of 2. (Here I'm using a slide-like "operator" ++< which is useful when working with unsigned types. Many folk consider --> and ++< as obfuscating but in scientific programming they are common, particularly -->.)
Now on my machine, a double is an IEEE754 64 bit floating point. (Such as scheme is particularly good at representing powers of 2 exactly - IEEE754 can represent powers of 2 up to 1022 exactly.) So 18,446,744,073,709,551,616 (the 64th power of 2) can be represented exactly as a double. The nearest representable number before that is 18,446,744,073,709,550,592 (which is 1024 less).
So now let's write the loop as
for (std::uint64_t i = std::numeric_limits<std::uint64_t>::max() - 1024; i ++< 1.8446744073709551615e19; ){
std::cout << i << "\n";
}
On my machine that will only output one value of i: 18,446,744,073,709,550,592 (the number that we've already seen). This proves that 1.8446744073709551615e19 is a floating point type. If the compiler was allowed to treat the literal as an integral type then the output of the two loops would be equivalent.
It will work, assuming that your int is at least 32 bits.
However, if you really want to use exponential notation, you should better define an integer constant outside the loop and use proper casting, like this:
const int MAX_INDEX = static_cast<int>(1.0e7);
...
for (int i = 0; i < MAX_INDEX; i++) {
...
}
Considering this, I'd say it is much better to write
const int MAX_INDEX = 10000000;
or if you can use C++14
const int MAX_INDEX = 10'000'000;
1e7 is a literal of type double, and usually double is 64-bit IEEE 754 format with a 52-bit mantissa. Roughly every tenth power of 2 corresponds to a third power of 10, so double should be able to represent integers up to at least 105*3 = 1015, exactly. And if int is 32-bit then int has roughly 103*3 = 109 as max value (asking Google search it says that "2**31 - 1" = 2 147 483 647, i.e. twice the rough estimate).
So, in practice it's safe on current desktop systems and larger.
But C++ allows int to be just 16 bits, and on e.g. an embedded system with that small int, one would have Undefined Behavior.
If the intention to loop for a exact integer number of iterations, for example if iterating over exactly all the elements in an array then comparing against a floating point value is maybe not such a good idea, solely for accuracy reasons; since the implicit cast of an integer to float will truncate integers toward zero there's no real danger of out-of-bounds access, it will just abort the loop short.
Now the question is: When do these effects actually kick in? Will your program experience them? The floating point representation usually used these days is IEEE 754. As long as the exponent is 0 a floating point value is essentially an integer. C double precision floats 52 bits for the mantissa, which gives you integer precision to a value of up to 2^52, which is in the order of about 1e15. Without specifying with a suffix f that you want a floating point literal to be interpreted single precision the literal will be double precision and the implicit conversion will target that as well. So as long as your loop end condition is less 2^52 it will work reliably!
Now one question you have to think about on the x86 architecture is efficiency. The very first 80x87 FPUs came in a different package, and later a different chip and as aresult getting values into the FPU registers is a bit awkward on the x86 assembly level. Depending on what your intentions are it might make the difference in runtime for a realtime application; but that's premature optimization.
TL;DR: Is it safe to to? Most certainly yes. Will it cause trouble? It could cause numerical problems. Could it invoke undefined behavior? Depends on how you use the loop end condition, but if i is used to index an array and for some reason the array length ended up in a floating point variable always truncating toward zero it's not going to cause a logical problem. Is it a smart thing to do? Depends on the application.

C++ integer floor function

I want to implement greatest integer function. [The "greatest integer function" is a quite standard name for what is also known as the floor function.]
int x = 5/3;
My question is with greater numbers could there be a loss of precision as 5/3 would produce a double?
EDIT: Greatest integer function is integer less than or equal to X.
Example:
4.5 = 4
4 = 4
3.2 = 3
3 = 3
What I want to know is 5/3 going to produce a double? Because if so I will have loss of precision when converting to int.
Hope this makes sense.
You will lose the fractional portion of the quotient. So yes, with greater numbers you will have more relative precision, such as compared with 5000/3000.
However, 5 / 3 will return an integer, not a double. To force it to divide as double, typecast the dividend as static_cast<double>(5) / 3.
Integer division gives integer results, so 5 / 3 is 1 and 5 % 3 is 2 (the remainder operator). However, this doesn't necessarily hold with negative numbers. In the original C++ standard, -5 / 3 could be either -1 (rounding towards zero) or -2 (the floor), but -1 was recommended. In the latest C++0B draft (which is almost certainly very close to the final standard), it is -1, so finding the floor with negative numbers is more involved.
5/3 will always produce 1 (an integer), if you do 5.0/3 or 5/3.0 the result will be a double.
As far as I know, there is no predefined function for this purpose.
It might be necessary to use such a function, if for some reason floating-point calculations are out of question (e.g. int64_t has a higher precision than double can represent without error)
We could define this function as follows:
#include <cmath>
inline long
floordiv (long num, long den)
{
if (0 < (num^den))
return num/den;
else
{
ldiv_t res = ldiv(num,den);
return (res.rem)? res.quot-1
: res.quot;
}
}
The idea is to use the normal integer divison, but adjust for negative results to match the behaviour of the double floor(double) function. The point is to truncate always towards the next lower integer, irrespective of the position of the zero point. This can be very important if the intention is to create even sized intervals.
Timing measurements show that this function here only creates a small overhead compared with the built-in / operator, but of course the floating point based floor function is significantly faster....
Since in C and C++, as others have said, / is integer division, it will return an int. in particular, it will return the floor of the double answer... (C and C++ always truncate) So, basically 5/3 is exactly what you want.
It may get a little weird in negatives as -5/3 => -2 which may or may not be what you want...