Does long double round the number when it gets so small? - c++

I'm on Manjaro 64 bit, latest edition. HP pavilion g6, Codeblocks
Release 13.12 rev 9501 (2013-12-25 18:25:45) gcc 5.2.0 Linux/unicode - 64 bit.
There was a discussion between students on why
sn = 1/n diverges
sn = 1/n^2 converges
So decided to write a program about it, just to show them what kind of output they can expect
#include <iostream>
#include <math.h>
#include <fstream>
using namespace std;
int main()
{
long double sn =0, sn2=0; // sn2 is 1/n^2
ofstream myfile;
myfile.open("/home/Projects/c++/test/test.csv");
for (double n =2; n<100000000;n++){
sn += 1/n;
sn2 += 1/pow(n,2);
myfile << "For n = " << n << " Sn = " << sn << " and Sn2 = " << sn2 << endl;
}
myfile.close();
return 0;
}
Starting from n=9944 I got sn2 = 0.644834, and kept getting it forever. I did expect that the compiler would round the number and ignore the 0s at some point, but this is just too early, no?
So at what theoretical point does 0s start to be ignored? And what to do if you care about all 0s in a number? If long double doesn't do it, then what does?
I know it seems like a silly question but I expected to see a longer number, since you can store big part of pi in long doubles. By the way same result for double too.

The code that you wrote suffers from a classic programming mistake: it sums a sequence of floating-point numbers by adding larger numbers to the sum first and smaller numbers later.
This will inevitably lead to precision loss during addition, since at some point in the sequence the sum will become relatively large, while the next member of the sequence will become relatively small. Adding a sufficiently small floating-point value to a sufficiently large floating-point sum does not affect the sum. Once you reach that point, it will look as if the addition operation is "ignored", even though the value you attempt to add is not zero.
You can observe the same effect if you try calculating 100000000.0f + 1 on a typical machine: it still evaluates to 100000000. This does not happen because 1 somehow gets rounded to zero. This happens because the mathematically-correct result 100000001 is rounded back to 100000000. In order to force 100000000.0f to change through addition, you need to add at least 5 (and the result will be "snapped" to 100000008).
So, the issue here is not that the compiler "rounds the number when it gets so small", as you seem to believe. Your 1/pow(n,2) number is probably fine and sufficiently precise (not rounded to 0). The issue here is that at some iteration of your cycle the small non-zero value of 1/pow(n,2) just cannot affect the sum anymore.
While it is true that adjusting output precision will help you to see better what is going on (as stated in the comments), the real issue is what is described above.
When calculating sums of floating-point sequences with large differences in member magnitudes, you should do it by adding smaller members of the sequence first. Using my 100000000.0f example again, you can easily see that 4.0f + 4.0f + 100000000.0f correctly produces 100000008, while 100000000.0f + 4.0f + 4.0f is still 100000000.

You're not running into precision issues here. The sum doesn't stop at 0.644834; it keeps going to roughly the correct value:
#include <iostream>
#include <math.h>
using namespace std;
int main() {
long double d = 0;
for (double n = 2; n < 100000000; n++) {
d += 1/pow(n, 2);
}
std::cout << d << endl;
return 0;
}
Result:
0.644934
Note the 9! That's not 0.644834 any more.
If you were expecting 1.644934, you should have started the sum at n=1. If you were expecting visible changes between successive partial sums, you didn't see those because C++ is truncating the representation of the sums to 6 significant digits. You can configure your output stream to display more digits with std::setprecision from the iomanip header:
myfile << std::setprecision(9);

Related

Loss of precision with pow function when surpassing 10^10 limit?

Doing one of my first homeworks of uni, and have ran into this problem:
Task: Find a sum of all n elements where n is the count of numerals in a number (n=1, means 1, 2, 3... 8, 9 for example, answer is 45)
Problem: The code I wrote has gotten all the test answers correctly up to 10 to the power of 9, but when it reaches 10 to the power of 10 territory, then the answers start being wrong, it's really close to what I should be getting, but not quite there (For example, my output = 49499999995499995136, expected result = 49499999995500000000)
Would really appreciate some help/insights, am guessing it's something to do with the variable types, but not quite sure of a possible solution..
#include <iostream>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
int n;
double ats = 0, maxi, mini;
cin >> n;
maxi = pow(10, n) - 1;
mini = pow(10, n-1) - 1;
ats = (maxi * (maxi + 1)) / 2 - (mini * (mini + 1)) / 2;
cout << setprecision(0) << fixed << ats;
}
The main reason of problems is pow() function. It works with double, not int. Loss of accuracy is price for representing huge numbers.
There are 3 way's to solve problem:
For small n you can make your own long long int pow(int x, int pow) function. But there is problem, that we can overflow even long long int
Use long arithmetic functions, as #rustyx sayed. You can write your own with vector, or find and include library.
There is Math solution specific for topic's task. It solves the big numbers problem.
You can write your formula like
((10^n) - 1) * (10^n) - (10^m - 1) * (10^m)) / 2 , (here m = n-1)
Then multiply numbers in numerator. Regroup them. Extract common multiples 10^(n-1). And then you can see, that answer have a structure:
X9...9Y0...0 for big enought n, where letter X and Y are constants.
So, you can just print the answer "string" without calculating.
I think you're stretching floating points beyond their precision. Let me explain:
The C pow() function takes doubles as arguments. You're passing ints, the compiler is adding the code to convert them to doubles before they reach pow(). (And anyway you're storing it as a double when you get the return value since you declared it that way).
Floating points are called that way precisely because the point "floats". Inside a double there's a sign bit, a few bits for the mantissa and a few bits for the exponent. In binary, elevating to a power of two is equivalent to moving the fractional point to the right (or to the left if you're elevating to a negative number). So basically the exponent is saying where the fractional point is, in binary. The great advantage of using this kind of in-memory representation for doubles is that you get a lot of precision for numbers close to 0, and gradually lose precision as numbers become bigger.
That last thing is exactly what's happening to you. Your number is too large to be stored exactly. So it's being rounded to the closest sum of powers of two (powers of two are the numbers that have all zeroes to the right in binary).
Quick experiment: press F12 in your browser, open the javascript console and type 49499999995499995136. In my case, in chrome, I reproduce the same problem.
If you really really really want precision with such big numbers then you can try some of these libraries, but that's too advanced for a student program, you don't need it. Just add an if block and print an error message if the number that the user typed is too big (professors love that, which is actually quite correct).

C++ Modulus returning wrong answer

Here is my code :
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
int n, i, num, m, k = 0;
cout << "Enter a number :\n";
cin >> num;
n = log10(num);
while (n > 0) {
i = pow(10, n);
m = num / i;
k = k + pow(m, 3);
num = num % i;
--n;
cout << m << endl;
cout << num << endl;
}
k = k + pow(num, 3);
return 0;
}
When I input 111 it gives me this
1
12
1
2
I am using codeblocks. I don't know what is wrong.
Whenever I use pow expecting an integer result, I add .5 so I use (int)(pow(10,m)+.5) instead of letting the compiler automatically convert pow(10,m) to an int.
I have read many places telling me others have done exhaustive tests of some of the situations in which I add that .5 and found zero cases where it makes a difference. But accurately identifying the conditions in which it isn't needed can be quite hard. Using it when it isn't needed does no real harm.
If it makes a difference, it is a difference you want. If it doesn't make a difference, it had a tiny cost.
In the posted code, I would adjust every call to pow that way, not just the one I used as an example.
There is no equally easy fix for your use of log10, but it may be subject to the same problem. Since you expect a non integer answer and want that non integer answer truncated down to an integer, adding .5 would be very wrong. So you may need to find some more complicated work around for the fundamental problem of working with floating point. I'm not certain, but assuming 32-bit integers, I think adding 1e-10 to the result of log10 before converting to int is both never enough to change log10(10^n-1) into log10(10^n) but always enough to correct the error that might have done the reverse.
pow does floating-point exponentiation.
Floating point functions and operations are inexact, you cannot ever rely on them to give you the exact value that they would appear to compute, unless you are an expert on the fine details of IEEE floating point representations and the guarantees given by your library functions.
(and furthermore, floating-point numbers might even be incapable of representing the integers you want exactly)
This is particularly problematic when you convert the result to an integer, because the result is truncated to zero: int x = 0.999999; sets x == 0, not x == 1. Even the tiniest error in the wrong direction completely spoils the result.
You could round to the nearest integer, but that has problems too; e.g. with sufficiently large numbers, your floating point numbers might not have enough precision to be near the result you want. Or if you do enough operations (or unstable operations) with the floating point numbers, the errors can accumulate to the point you get the wrong nearest integer.
If you want to do exact, integer arithmetic, then you should use functions that do so. e.g. write your own ipow function that computes integer exponentiation without any floating-point operations at all.

Is there a bug in numeric_limits or am I just confused?

I encountered some queer behavior, at least in my own mind, while debugging some code involved with determining if an addition operation would underflow a double. Here is an example program demonstrating what I found.
#include <iostream>
#include <limits>
using std::cout;
using std::endl;
using std::numeric_limits;
int main()
{
double lowest = numeric_limits<double>::lowest();
bool truth = (lowest + 10000) == lowest;
cout << truth << endl;
}
When I execute this code, I get true as a result. Is this a bug or am I just sleep deprived?
The smallest double is:
-1.7976931348623157e+308
Adding 10,000, or 1e4, to this would only have a noticeable effect if doubles had 300+ digits of precision, which they most definitely do not. Doubles can only hold 15-17 significant digits.
The difference in magnitude between these two numbers is so great that adding 10,000 does not produce a new number. In fact, the minimum double is such a huge number (so to speak) that you could add a googol to it—that's 1 followed by a hundred zeros—and it wouldn't change.

about float number comparision in C++

I have a loop to loop a floating number between given min and max range as follow
#include <iostream>
using namespace std;
void main(void)
{
for (double x=0.012; x<=0.013; x+=0.001)
{
cout << x << endl;
}
}
It is pretty simple code but as I know in computer language, we need to compare two floating numbers with EPS considered. Hence, above code doesn't work (we expect it to loop two times from 0.012 to 0.013 but it only loop once). So I manually add an EPS to the upper limit.
#include <iostream>
using namespace std;
#define EPS 0.0000001
void main(void)
{
for (double x=0.012; x<=0.013+EPS; x+=0.001)
{
cout << x << endl;
}
}
and it works now. But it looks ugly to do that manually since EPS should really depends on machine. I am porting my code from matlab to C++ and I don't have problem in matlab since there is eps command. But is there anything like that in C/C++?
Fudging the comparison is the wrong technique to use. Even if you get the comparison “right”, a floating-point loop counter will accumulate error from iteration to iteration.
You can eliminate accumulation of error by using exact arithmetic for the loop counter. It may still have floating-point type, but you use exactly representable values, such as:
for (double i = 12; i <= 13; ++i)
Then, inside the loop, you scale the counter as desired:
for (double i = 12; i <= 13; ++i)
{
double x = i / 1000.;
…
}
Obviously, there is not much error accumulating in a loop with two iterations. However, I expect your values are just one example, and there may be longer loops in practice. With this technique, the only error in x is in the scaling operation, so just one error in each iteration instead of one per iteration.
Note that dividing by 1000 is more accurate than scaling by .001. Division by 1000 has only one error (in the division). However, since .001 is not exactly representable in binary floating point, multiplying by it has two errors (in the conversion of .001 to floating point and in the multiplication). On the other hand, division is typically a very slow operation, so you might choose the multiplication.
Finally, although this technique guarantees the desired number of iterations, the scaled value might be slight outside the ideal target interval in the first or the last iteration, due to the rounding error in the scaling. If this matters to your application, you must adjust the value in these iterations.

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
}
And the result is wrong:
9999
1000
99
10
1
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
}
The problem doesn't occur anymore:
10000
1000
100
10
1
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Solution1
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
}
Solution2
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
}
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;
}