Calculation using power function gave wrong answers for large numbers - c++

I have two values p=19, q=14. I want to calculate pq using the power function pow(p, q).
Here is my code:
long long p=19,q=14;
cout<<pow(p,q);
The correct answer is: 799006685782884121 but my code gives me 799006685782884096 which is incorrect.
I have also tried doing these calculations using unsigned long long instead of long long, but this didn't help.

The pow function is defined as:
double pow(double x, double y);
Which means that it takes floating point arguments and returns a floating point result. Due to the nature of floating point numbers, some numbers can not be exactly represented. The result you're getting is probably the closest match possible.
Note also that you're doing two (probably lossy) conversions:
converting the arguments from long long to double, and
converting the result of the function from double to long long.

Related

why c++ is rounding of big numbers to ceil and small numbers to floor [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 2 years ago.
"x","y" are two long long type variables in c++ to which i have assigned two different numbers.
Variable type is long long but i have assigned decimals to the integer.
so i expected that it will trim the decimal part and display only integer part.
it trimmed off the numbers after the decimal and retured an integer.
Output :
i was expecting floor() of x but it returned some integer ending with 6 instead of 5, i mean it returned ceil(x).but in the second case it returned floor(y).
and its only occurring when the integer is too long.
So what might be the possible reason for this ?
I am using minGW c++17 version on visual studio code .. but same is happening with online compiler also.
Each initialization involves two conversions, first from the decimal numeral in the source text to double, and then from double to long long.
Let’s discuss the second declaration first. Because 2.001 is a double constant, the decimal source text 2.001 must be converted to double. Assuming your C implementation uses IEEE-754 binary64, the result is 2.000999999999999889865875957184471189975738525390625. Then, for the initialization, this double value is converted to long long. This conversion to an integer type discards the fraction, so the result is 2.
In the first declaration, when 9223372036854775.001 is converted to double, the result is 9223372036854776. This is because the two double numbers nearest 9223372036854775.001 are 9223372036854774 and 9223372036854776. The latter one is closer, so it is chosen. Then this double value is converted to long long. There is no fraction part, so the result is simply 9223372036854776.
Thus, the first conversion rounds up because it is not simply converting to the nearest long long value. It first has to round to the nearest double value. And, at the scale of that number, the double format does not have enough resolution to represent every integer. It is representing only every second integer: 9223372036854770, …772, …774, …776, …778, and so on. So 9223372036854775 is not a candidate.

Difference in behaviour of pow from math.h for same input [duplicate]

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
{
int n,i,ele;
n=5;
ele=pow(n,2);
printf("%d",ele);
return 0;
}
The output is 24.
I'm using GNU/GCC in Code::Blocks.
What is happening?
I know the pow function returns a double , but 25 fits an int type so why does this code print a 24 instead of a 25? If n=4; n=6; n=3; n=2; the code works, but with the five it doesn't.
Here is what may be happening here. You should be able to confirm this by looking at your compiler's implementation of the pow function:
Assuming you have the correct #include's, (all the previous answers and comments about this are correct -- don't take the #include files for granted), the prototype for the standard pow function is this:
double pow(double, double);
and you're calling pow like this:
pow(5,2);
The pow function goes through an algorithm (probably using logarithms), thus uses floating point functions and values to compute the power value.
The pow function does not go through a naive "multiply the value of x a total of n times", since it has to also compute pow using fractional exponents, and you can't compute fractional powers that way.
So more than likely, the computation of pow using the parameters 5 and 2 resulted in a slight rounding error. When you assigned to an int, you truncated the fractional value, thus yielding 24.
If you are using integers, you might as well write your own "intpow" or similar function that simply multiplies the value the requisite number of times. The benefits of this are:
You won't get into the situation where you may get subtle rounding errors using pow.
Your intpow function will more than likely run faster than an equivalent call to pow.
You want int result from a function meant for doubles.
You should perhaps use
ele=(int)(0.5 + pow(n,2));
/* ^ ^ */
/* casting and rounding */
Floating-point arithmetic is not exact.
Although small values can be added and subtracted exactly, the pow() function normally works by multiplying logarithms, so even if the inputs are both exact, the result is not. Assigning to int always truncates, so if the inexactness is negative, you'll get 24 rather than 25.
The moral of this story is to use integer operations on integers, and be suspicious of <math.h> functions when the actual arguments are to be promoted or truncated. It's unfortunate that GCC doesn't warn unless you add -Wfloat-conversion (it's not in -Wall -Wextra, probably because there are many cases where such conversion is anticipated and wanted).
For integer powers, it's always safer and faster to use multiplication (division if negative) rather than pow() - reserve the latter for where it's needed! Do be aware of the risk of overflow, though.
When you use pow with variables, its result is double. Assigning to an int truncates it.
So you can avoid this error by assigning result of pow to double or float variable.
So basically
It translates to exp(log(x) * y) which will produce a result that isn't precisely the same as x^y - just a near approximation as a floating point value,. So for example 5^2 will become 24.9999996 or 25.00002

Very large differences using float and double

#include <iostream>
using namespace std;
int main() {
int steps=1000000000;
float s = 0;
for (int i=1;i<(steps+1);i++){
s += (i/2.0) ;
}
cout << s << endl;
}
Declaring s as float: 9.0072e+15
Declaring s as double: 2.5e+17 (same result as implementing it in Julia)
I understand double has double precision than float, but float should still handle numbers up to 10^38.
I did read similar topics where results where not the same, but in that cases the differences were very small, here the difference is 25x.
I also add that using long double instead gives me the same result as double. If the matter is the precision, I would have expected to have something a bit different.
The problem is the lack of precision: https://en.wikipedia.org/wiki/Floating_point
After 100 million numbers you are adding 1e8 to 1e16 (or at least numbers of that magnitude), but single precision numbers are only accurate to 7 digits - so it is the same as adding 0 to 1e16; that's why your result is considerably lower for float.
Prefer double over float in most cases.
Problem with floating point precision! Infinite real numbers cannot possibly be represented by the finite memory of a computer. Float, in general, are just approximations of the number they are meant to represent.
For more details, please check the following documentation:
https://softwareengineering.stackexchange.com/questions/101163/what-causes-floating-point-rounding-errors
You didn't mention what type of floating point numbers you are using, but I'm going to assume that you use IEEE 754, or similar.
I understand double has double precision
To be more precise with the terminology, double uses twice as many bits. That's not double the number of reprensentable values, it's 4294967296 times as many representable values, despite being named "double precision".
but float should still handle numbers up to 10^38.
Float can handle a few numbers up to that magnitude. But that does't mean that float values in that range are precise. For example, 3,4028235E+38 can be represented as a single precision float. How much would you imagine is the difference between the previous value representable by float? Is it the machine epsilon? Perhaps 0.1? Maybe 1? No. The difference is about 2E+31.
Now, your numbers aren't quite in that range. But, they're outside the continuous range of whole integers that can be precisely represented by float. The highest value in that range happens to be 16777217, or about 1.7E+7, which is way less than 2.5E+17. So, every addition beyond that range adds some error to the result. You perform a billion calculations so those errors add up.
Conclusions:
Understand that single precision is way less precise than double precision.
Avoid long sequences of calculations where precision errors can accumulate.

tgamma() long long typecasting

I am writing a function in which I have to calculate factorial of numbers and do operations on them.The return value of the function should be long long so I think it would be better to do all operations in long long format. If I am wrong please correct me.
The tgamma() function by itself returns the correct value in scientific notation. But the the value returned by tgamma() is sometimes 1 less than actual answer when the value returned by the function is typecasted to 'long long'.
int main()
{
std::cout<<"11!:"<<tgamma(12)<<std::endl;
std::cout<<"12!"<<tgamma(13)<<std::endl;
std::cout<<"13!"<<tgamma(14)<<std::endl;
std::cout<<"14!"<<tgamma(15)<<std::endl;
std::cout<<"15!"<<tgamma(16)<<std::endl;
std::cout<<"16!"<<tgamma(17)<<std::endl;
std::cout<<"********************************"<<std::endl;
std::cout<<"11!:"<<(long long)tgamma(12)<<std::endl;
std::cout<<"12!"<<(long long)tgamma(13)<<std::endl;
std::cout<<"13!"<<(long long)tgamma(14)<<std::endl;
std::cout<<"14!"<<(long long)tgamma(15)<<std::endl;
std::cout<<"15!"<<(long long)tgamma(16)<<std::endl;
std::cout<<"16!"<<(long long)tgamma(17)<<std::endl;
return 0;
}
I am getting the following output:
11!:3.99168e+07
12!4.79002e+08
13!6.22702e+09
14!8.71783e+10
15!1.30767e+12
16!2.09228e+13
********************************
11!:39916800
12!479001599
13!6227020799
14!87178291199
15!1307674367999
16!20922789888000
The actual value of 15! according to this site is 1307674368000 but when I typecast tgamma(16) to long long, I get only 1307674367999. The thing is this discrepancy only appears for some numbers. The typecasted answer for 16! is correct - 20922789888000.
This function is for a competitive programming problem which is currently going on, so I can't paste the function and the solution I am developing to it here.
I would roll my own factorial function but I want to reduce the number of characters in my program to get bonus points.
Any tips on how to detect this discrepancy in typecasted value and correct it? Or maybe some other function that I can use?
Obviously, unless we have very unusual implementation, not all long long numbers can be exactly represented as double. Therefore, tgamma cannot store double values such that casting to long long would produce exact value. Simply there are more long long values than double values within long long interval.
If you want exact long long factorial, you should implement it yourself.
On top of this, if you want precision, you transform double to long long not as (long long)x, but as (long long)round(x), or (long long)(x+0.5), assuming x is positive.
Casting from a floating point type to an integral type truncates. Try (long long) roundl(tgammal(xxx)) to get rid of integer truncation error. This is also using long doubles so it may give you more digits.
#include <math.h>
#include <iostream>
int main(){
std::cout<<"11!:"<<(long long)roundl(tgammal(12))<<std::endl;
std::cout<<"12!"<<(long long)roundl(tgammal(13))<<std::endl;
std::cout<<"13!"<<(long long)roundl(tgammal(14))<<std::endl;
std::cout<<"14!"<<(long long)roundl(tgammal(15))<<std::endl;
std::cout<<"15!"<<(long long)roundl(tgammal(16))<<std::endl;
std::cout<<"16!"<<(long long)roundl(tgammal(17))<<std::endl;
std::cout<<"********************************"<<std::endl;
std::cout<<"11!:"<<(long long)roundl(tgammal(12))<<std::endl;
std::cout<<"12!"<<(long long)roundl(tgammal(13))<<std::endl;
std::cout<<"13!"<<(long long)roundl(tgammal(14))<<std::endl;
std::cout<<"14!"<<(long long)roundl(tgammal(15))<<std::endl;
std::cout<<"15!"<<(long long)roundl(tgammal(16))<<std::endl;
std::cout<<"16!"<<(long long)roundl(tgammal(17))<<std::endl;
return 0;
}
Gives:
11!:39916800
12!479001600
13!6227020800
14!87178291200
15!1307674368000
16!20922789888000
********************************
11!:39916800
12!479001600
13!6227020800
14!87178291200
15!1307674368000
16!20922789888000

C++ calculation with type "long"

I have a inline function does a frequency to period conversion. The calculation precision has to be using type long, not type double. Otherwise, it may cause some rounding errors. The function then converts the result back to double. I was wondering in below code, which line would keep the calculation in type long. No matter the parameter bar is 100, 100.0 or 33.3333.
double foo(long bar)
{
return 1000000/bar;
return 1000000.0/bar;
return (long)1000000/bar;
return (long)1000000.0/bar;
}
I tried it myself, and the 4th line works. But just wondering the concept of type conversion in this case.
EDIT:
One of the error is 1000000/37038 = 26, not 26.9993.
return 1000000/bar;
This will do the math as a long.
return 1000000.0/bar;
This will do the math as a double.
return (long)1000000.0/bar;
This is equivalent to the first -- 1000000.0 is a double, but then you cast it to long before the division, so the division will be done on longs.
This problem, as you posed it, doesn't make sense.
bar is of an integral type, so 1000000/bar will surely be less than 1000000, which can be represented exactly by a double1, so there's no way in which performing the calculation all in integral arithmetic can give better precision - actually, you will get integer division, that in this case is less precise for any value of bar, since it will truncate the decimal part. The only way you can have a problem in a long to double conversion here is in bar conversion to double, but if it exceeds the range of double the final result of the division will be 0, as it would be anyway in integer arithmetic.
Still:
1000000/bar
performs a division between longs: 1000000 is an int or a long, depending on the platform, bar is a long; the first operand gets promoted to a long if necessary and then an integer division is performed.
1000000.0/bar
performs a division between doubles: 1000000.0 is a double literal, so bar gets promoted to double before the division.
(long)1000000/bar
is equivalent to the first one: the cast has precedence over the division, and forces 1000000 (which is either a long or an int) to be a long; bar is a long, division between longs is performed.
(long)1000000.0/bar
is equivalent to the previous one: 1000000.0 is a double, but you cast it to a long and then integer division is performed.
The C standard, to which the C++ standard delegates the matter, asks for a minimum of 10 decimal digits for the mantissa of doubles (DBL_DIG) and at least 10**37 as representable power of ten before going out of range (DBL_MAX_10_EXP) (C99, annex E, ¶4).
The first line (and third more verbosely) will do the math as long (whihc in C++ always truncates down any result) and then return the integral value as a double. I don't understand what you're saying in your question about bar being 33.3333 because that's not a possible long value.