Loss of precision when using pow in C++ - c++

10^1.64605 = 44.2639330165
However in C++ using pow:
double p = pow(10,1.64605) returns 44.2641.
Is there a way to increase the precision here? I tried casting both sides to long double but that didn't help either.
More interesting is:
cout<<p;
double a = -1.64605;
cout<<pow(10,-a);
p = pow(10, (-p));
cout<<p;
the output is:
-1.64605
44.2639
44.2641
Why?

cout is truncating your double for display, but the value calculated by pow is probably at least as precise as you expect. For how to get more precision displayed in the console see:
How do I print a double value with full precision using cout?
I'll elaborate given the prodding by David.
You stated that double p = pow(10,1.64605) returns 44.2641 but this is incorrect. It returns 44.26393301653639156; without any formatting specifiers this displays as 44.2639 (as you see later).
When you cout the original value of p in the second code snippit it displays -1.64605 (due to reduced precision formatting) and you are assuming that it is exactly -1.64605 whereas it is actually somewhere between -1.64605115... and -1.64605213..., which does evaluate to 44.2641 in the expression cout << pow(10, (-p));

The answer will be found by examining p. You did not show how it was initialized.
You will find that p != a, even though they appear to be the same when you print them to the console. When you printed to the console you only printed the first 6 significant decimal digits. Print both values to greater precision and you will see that they are not equal.
You said that:
double p = pow(10,1.64605);
evaluates to 44.2641. But that is not true. If you actually execute that code you will see that.
double p = pow(10,1.64605);
cout << p;
outputs 44.2639.
Your code differs slightly from that above. It is:
cout << p;
p = pow(10, (-p));
cout << p;
Output the original value of p to full precision and all will be revealed.

Related

Floating points in C++ (float and double)

I know that we shouldn't use floating points in the loops. But could someone explain it to me what happens when we have a loop and we add a small number to a large number until we reach a certain value that allows the loop to terminate?
I guess it might cause potential errors. But apart from that?
What would it look like with a single-precision (float) and double-precision (double) floating-point numbers? I guess more rounding errors would appear in the double type. Could someone give me an example (the best in C ++) because I have no idea how to start with it...
I would be very grateful if you could provide me with a hint. Thanks!
In a C++ implementation using IEEE-754 arithmetic and the “single” (binary32) format for float, this code prints “count = 3”:
int count = 0;
for (float f = 0; f < .3f; f += .1f)
++count;
std::cout << "count = " << count << ".\n";
but this code prints “count = 4”:
int count = 0;
for (float f = 0; f < .33f; f += .11f)
++count;
std::cout << "count = " << count << ".\n";
In the first example, the source text .1f is converted to 0.100000001490116119384765625, which is the value representable in float that is closed to .1. The source text .3f is converted to 0.300000011920928955078125, the float value closest to .3. Adding this converted value for .1f to f produces 0.100000001490116119384765625, then 0.20000000298023223876953125, and then 0.300000011920928955078125, at which point f < .3f is false, and the loop stops.
In the second example, .11f is converted to 0.10999999940395355224609375, and .33f is converted to 0.3300000131130218505859375. In this case, adding the converted value of .11f to f produces 0.10999999940395355224609375, then 0.2199999988079071044921875, and then 0.329999983310699462890625. Note that, due to rounding, this result of adding .11f three times is 0.329999983310699462890625, which is less than .33f (0.3300000131130218505859375), so f < .33f is true, and the loop continues for another iteration.
This is similar to adding ⅓ in a two-digit decimal format with a loop bound of three-thirds (which is 1). If we had for (f = 0; f < 1; f += ⅓), the ⅓ in the source text would have to be converted to .33 (two-digit decimal). Then f would be stepped through .33, .66, and .99. The loop would not stop until it reached 1.32. The same rounding issues occur in binary floating-point arithmetic.
When the amount added in the loop is a small number relative to the large number, these rounding issues are greater. First, there will be more additions, so there will be more rounding errors, and they may accumulate. Second, since larger numbers require a larger exponent to scale them in the floating-point format, they have less absolute precision than smaller numbers. This means the roundings have to be come larger relative to the small number that is being added. So the rounding errors are larger in magnitude.
Then, even if the loop eventually terminates, the values of f in each iteration may be far from the desired values, due to the accumulated errors. If f is used for calculations inside the loop, the calculations might not be using the desired values and may produce incorrect results.
With increasing values the difference between 2 floating point values increases too. There is a point where i+1 results in the same value.
Consider this code:
#include <iostream>
int main()
{
float i = 0;
while (i != i + 1) i++;
std::cout << i << std::endl;
return 0;
}
while (i != i + 1) should be an endless loop, but for floating point variables, it is not.
The code above prints 1.67772e+07 on https://godbolt.org/z/7xf8n8
So, for (float f = 0; f < 2e7; f++) is an endless loop.
You can try it with double yourself, the value is bigger.

pow() function gives an error [duplicate]

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

Why does these two code variants produce different floating-point results?

Given this example C++ code snippet:
void floatSurprise()
{
// these come from some sort of calculation
int a = 18680, b = 3323524, c = 121;
float m = float(a) / c;
// variant 1: calculate result from single expression
float r1 = b - (2.0f * m * a) + (m * m * c);
cout << "r1 = " << r1 << endl;
// variant 2: break up the expression into intermediate parts,
/// then calculate
float
r2_p1 = 2.0f * m * a,
r2_p2 = m * m * c,
r2 = b - r2_p1 + r2_p2;
cout << "r2 = " << r2 << endl;
}
The output is:
dev1 = 439703
dev2 = 439702
When viewed in the debugger, the values are actually 439702.50 and 439702.25, respectively, which is interesting in itself - not sure why iostream prints floats without the fractional part by default. EDIT: The reason for this was that the default precision setting for cout was too low, needed cout << setprecision(7) at least to see the decimal point for numbers of this magnitude.
But I'm even more interested in why am I getting different results. I suppose it has to do with rounding and some subtle interplay of ints with the required float output type, but I can't put my finger on it. Which value is the correct one?
I was amazed that it was so easy to shoot myself in the foot with such a simple piece of code. Any insight will be greatly appreciated! The compiler was VC++2010.
EDIT2: I did some more investigating using a spreadsheet to generate "correct" values for the intermediate variables and found (via tracing) that indeed they were being trimmed, contributing to the precision loss in the ultimate result. I also found a problem with the single expression, because I actually used a handy function for calculating squares instead of m * m there:
template<typename T> inline T sqr(const T &arg) { return arg*arg; }
Even though I asked nicely, the compiler apparently didn't inline this, and calculated the value separately, trimming the result before returning the value to the expression, skewing the result yet again. Ouch.
You should read my long, long answer about why the same thing happens in C#:
(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?
Summing up: first of all, you only get about seven decimal places of accuracy with float. The correct answer were you do to it with exact arithmetic throughout the entire calculation is about 439702.51239669... so you are getting darn close to the correct answer considering the limitations of a float, in either case.
But that doesn't explain why you are getting different results with what looks like exactly the same calculations. The answer is: the compiler is permitted wide lattitude to make your math more accurate, and apparently you have hit upon two cases where the optimizer takes what is logically the same expression and does not optimize them down to the same code.
Anyway, read my answer regarding C# carefully; everything in there applies to C++ just as well.

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}

Discrepancy between the values computed by Fortran and C++

I would have dared say that the numeric values computed by Fortran and C++ would be way more similar. However, from what I am experiencing, it turns out that the calculated numbers start to diverge after too few decimal digits. I have come across this problem during the process of porting some legacy code from the former language to the latter. The original Fortran 77 code...
INTEGER M, ROUND
DOUBLE PRECISION NUMERATOR, DENOMINATOR
M = 2
ROUND = 1
NUMERATOR=5./((M-1+(1.3**M))**1.8)
DENOMINATOR = 0.7714+0.2286*(ROUND**3.82)
WRITE (*, '(F20.15)') NUMERATOR/DENOMINATOR
STOP
... outputs 0.842201471328735, while its C++ equivalent...
int m = 2;
int round = 1;
long double numerator = 5.0 / pow((m-1)+pow(1.3, m), 1.8);
long double denominator = 0.7714 + 0.2286 * pow(round, 3.82);
std::cout << std::setiosflags(std::ios::fixed) << std::setprecision(15)
<< numerator/denominator << std::endl;
exit(1);
... returns 0.842201286195064. That is, the computed values are equal only up to the sixth decimal. Although not particularly a Fortran advocator, I feel inclined to consider its results as the 'correct' ones, given its legitimate reputation of number cruncher. However, I am intrigued about the cause of this difference between the computed values. Does anyone know what the reason for this discrepancy could be?
In Fortran, by default, floating point literals are single precision, whereas in C/C++ they are double precision.
Thus, in your Fortran code, the expression for calculating NUMERATOR is done in single precision; it is only converted to double precision when assigning the final result to the NUMERATOR variable.
And the same thing for the expression calculating the value that is assigned to the DENOMINATOR variable.