Related
Doing one of my first homeworks of uni, and have ran into this problem:
Task: Find a sum of all n elements where n is the count of numerals in a number (n=1, means 1, 2, 3... 8, 9 for example, answer is 45)
Problem: The code I wrote has gotten all the test answers correctly up to 10 to the power of 9, but when it reaches 10 to the power of 10 territory, then the answers start being wrong, it's really close to what I should be getting, but not quite there (For example, my output = 49499999995499995136, expected result = 49499999995500000000)
Would really appreciate some help/insights, am guessing it's something to do with the variable types, but not quite sure of a possible solution..
#include <iostream>
#include <cmath>
#include <iomanip>
using namespace std;
int main()
{
int n;
double ats = 0, maxi, mini;
cin >> n;
maxi = pow(10, n) - 1;
mini = pow(10, n-1) - 1;
ats = (maxi * (maxi + 1)) / 2 - (mini * (mini + 1)) / 2;
cout << setprecision(0) << fixed << ats;
}
The main reason of problems is pow() function. It works with double, not int. Loss of accuracy is price for representing huge numbers.
There are 3 way's to solve problem:
For small n you can make your own long long int pow(int x, int pow) function. But there is problem, that we can overflow even long long int
Use long arithmetic functions, as #rustyx sayed. You can write your own with vector, or find and include library.
There is Math solution specific for topic's task. It solves the big numbers problem.
You can write your formula like
((10^n) - 1) * (10^n) - (10^m - 1) * (10^m)) / 2 , (here m = n-1)
Then multiply numbers in numerator. Regroup them. Extract common multiples 10^(n-1). And then you can see, that answer have a structure:
X9...9Y0...0 for big enought n, where letter X and Y are constants.
So, you can just print the answer "string" without calculating.
I think you're stretching floating points beyond their precision. Let me explain:
The C pow() function takes doubles as arguments. You're passing ints, the compiler is adding the code to convert them to doubles before they reach pow(). (And anyway you're storing it as a double when you get the return value since you declared it that way).
Floating points are called that way precisely because the point "floats". Inside a double there's a sign bit, a few bits for the mantissa and a few bits for the exponent. In binary, elevating to a power of two is equivalent to moving the fractional point to the right (or to the left if you're elevating to a negative number). So basically the exponent is saying where the fractional point is, in binary. The great advantage of using this kind of in-memory representation for doubles is that you get a lot of precision for numbers close to 0, and gradually lose precision as numbers become bigger.
That last thing is exactly what's happening to you. Your number is too large to be stored exactly. So it's being rounded to the closest sum of powers of two (powers of two are the numbers that have all zeroes to the right in binary).
Quick experiment: press F12 in your browser, open the javascript console and type 49499999995499995136. In my case, in chrome, I reproduce the same problem.
If you really really really want precision with such big numbers then you can try some of these libraries, but that's too advanced for a student program, you don't need it. Just add an if block and print an error message if the number that the user typed is too big (professors love that, which is actually quite correct).
Here's the bank of tests I'm doing, learning how FP basic ops (+, -, *, /) would introduce errors:
#include <iostream>
#include <math.h>
int main() {
std::cout.precision(100);
double a = 0.499999999999999944488848768742172978818416595458984375;
double original = 47.9;
double target = original * a;
double back = target / a;
std::cout << original << std::endl;
std::cout << back << std::endl;
std::cout << fabs(original - back) << std::endl; // its always 0.0 for the test I did
}
Can you show to me two values (original and a) that, once * (or /), due to FP math, introduce error?
And if they exist, is it possible to establish if that error is introduced by * or /? And how? (since you need both for coming back to the value; 80 bit?)
With + is easy (just add 0.499999999999999944488848768742172978818416595458984375 to 0.5, and you get 1.0, as for 0.5 + 0.5).
But I'm not able to do the same with * or /.
The output of:
#include <cstdio>
int main(void)
{
double a = 1000000000000.;
double b = 1000000000000.;
std::printf("a = %.99g.\n", a);
std::printf("a = %.99g.\n", b);
std::printf("a*b = %.99g.\n", a*b);
}
is:
a = 1000000000000.
a = 1000000000000.
a*b = 999999999999999983222784.
assuming IEEE-754 basic 64-bit binary floating-point with correct rounding to nearest, ties to even.
Obviously, 999999999999999983222784 differs from the exact mathematical result of 1000000000000•1000000000000, 1000000000000000000000000.
Multiply any two large† numbers, and there is likely going to be error because representable values have great distances in the high range of values.
While this error can be great in absolute terms, it is still small in relation to the size of the number itself, so if you perform the reverse division, the error of the first operation is scaled down in the same ratio, and disappears completely. As such, this sequence of operations is stable.
If the result of the multiplication would be greater than the maximum value representable, then it would overflow to inifinity (may depend on configuration), in which case reverse division won't result in the original value, but remains as infinity.
Similarly, if you divide with a great number, you will potentially underflow the smallest representable value resulting in either zero or a subnormal value.
† Numbers do not necessarily have to be huge. It's just easier to perceive the issue when considering huge values. The problem applies to quite small values as well. For example:
2.100000000000000088817841970012523233890533447265625 ×
2.100000000000000088817841970012523233890533447265625
Correct result:
4.410000000000000373034936274052605470949292688633679117285...
Example floating point result:
4.410000000000000142108547152020037174224853515625
Error:
2.30926389122032568296724439173008679117285652827862296732064351090230047702789306640625
× 10^-16
Does exist two numbers that multiplied (or divided) each other introduce error?
This is much easier to see with "%a".
When the precision of the result is insufficient, rounding occurs. Typically double has 53 bits of binary precision. Multiplying 2 27-bit numbers below results in an exact 53-bit answer, but 2 28 bit ones cannot form a 55-bit significant answer.
Division is easy to demo, just try 1.0/n*n.
int main(void) {
double a = 1 + 1.0/pow(2,26);
printf("%.15a, %.17e\n", a, a);
printf("%.15a, %.17e\n", a*a, a*a);
double b = 1 + 1.0/pow(2,27);
printf("%.15a, %.17e\n", b, b);
printf("%.15a, %.17e\n", b*b, b*b);
for (int n = 47; n < 52; n += 2) {
volatile double frac = 1.0/n;
printf("%.15a, %.17e %d\n", frac, frac, n);
printf("%.15a, %.17e\n", frac*n, frac*n);
}
return 0;
}
Output
//v-------v 27 significant bits.
0x1.000000400000000p+0, 1.00000001490116119e+00
//v-------------v 53 significant bits.
0x1.000000800000100p+0, 1.00000002980232261e+00
//v-------v 28 significant bits.
0x1.000000200000000p+0, 1.00000000745058060e+00
//v--------------v not 55 significant bits.
0x1.000000400000000p+0, 1.00000001490116119e+00
// ^^^ all zeros here, not the expected mathematical answer.
0x1.5c9882b93105700p-6, 2.12765957446808505e-02 47
0x1.000000000000000p+0, 1.00000000000000000e+00
0x1.4e5e0a72f053900p-6, 2.04081632653061208e-02 49
0x1.fffffffffffff00p-1, 9.99999999999999889e-01 <==== Not 1.0
0x1.414141414141400p-6, 1.96078431372549017e-02 51
0x1.000000000000000p+0, 1.00000000000000000e+00
Problem description
During my fluid simulation, the physical time is marching as 0, 0.001, 0.002, ..., 4.598, 4.599, 4.6, 4.601, 4.602, .... Now I want to choose time = 0.1, 0.2, ..., 4.5, 4.6, ... from this time series and then do the further analysis. So I wrote the following code to judge if the fractpart hits zero.
But I am so surprised that I found the following two division methods are getting two different results, what should I do?
double param, fractpart, intpart;
double org = 4.6;
double ddd = 0.1;
// This is the correct one I need. I got intpart=46 and fractpart=0
// param = org*(1/ddd);
// This is not what I want. I got intpart=45 and fractpart=1
param = org/ddd;
fractpart = modf(param , &intpart);
Info<< "\n\nfractpart\t=\t"
<< fractpart
<< "\nAnd intpart\t=\t"
<< intpart
<< endl;
Why does it happen in this way?
And if you guys tolerate me a little bit, can I shout loudly: "Could C++ committee do something about this? Because this is confusing." :)
What is the best way to get a correct remainder to avoid the cut-off error effect? Is fmod a better solution? Thanks
Respond to the answer of
David Schwartz
double aTmp = 1;
double bTmp = 2;
double cTmp = 3;
double AAA = bTmp/cTmp;
double BBB = bTmp*(aTmp/cTmp);
Info<< "\n2/3\t=\t"
<< AAA
<< "\n2*(1/3)\t=\t"
<< BBB
<< endl;
And I got both ,
2/3 = 0.666667
2*(1/3) = 0.666667
Floating point values cannot exactly represent every possible number, so your numbers are being approximated. This results in different results when used in calculations.
If you need to compare floating point numbers, you should always use a small epsilon value rather than testing for equality. In your case I would round to the nearest integer (not round down), subtract that from the original value, and compare the abs() of the result against an epsilon.
If the question is, why does the sum differ, the simple answer is that they are different sums. For a longer explanation, here are the actual representations of the numbers involved:
org: 4.5999999999999996 = 0x12666666666666 * 2^-50
ddd: 0.10000000000000001 = 0x1999999999999a * 2^-56
1/ddd: 10 = 0x14000000000000 * 2^-49
org * (1/ddd): 46 = 0x17000000000000 * 2^-47
org / ddd: 45.999999999999993 = 0x16ffffffffffff * 2^-47
You will see that neither input value is exactly represented in a double, each having been rounded up or down to the nearest value. org has been rounded down, because the next bit in the sequence would be 0. ddd has been rounded up, because the next bit in that sequence would be a 1.
Because of this, when mathematical operations are performed the rounding can either cancel, or accumulate, depending on the operation and how the original numbers have been rounded.
In this case, 1/0.1 happens to round neatly back to exactly 10.
Multiplying org by 10 happens to round up.
Dividing org by ddd happens to round down (I say 'happens to', but you're dividing a rounded-down number by a rounded-up number, so it's natural that the result is less).
Different inputs will round differently.
It's only a single bit of error, which can be easily ignored with even a tiny epsilon.
If I understand your question correctly, it's this: Why, with limited-precision arithmetic, is X/Y not the same is X * (1/Y)?
And the reason is simple: Consider, for example, using six digits of decimal precision. While this is not what doubles actually do, the concept is precisely the same.
With six decimal digits, 1/3 is .333333. But 2/3 is .666667. So:
2 / 3 = .666667
2 * (1/3) = 2 * .333333 = .6666666
That's just the nature of fixed-precision math. If you can't tolerate this behavior, don't use limited-precision types.
Hm not really sure what you want to achieve, but if you want get a value and then want to
do some refine in the range of 1/1000, why not use integers instead of floats/doubles?
You would have a divisor, which is 1000, and have values that you iterate over that you need to multiply by your divisor.
So you would get something like
double org = ... // comes from somewhere
int divisor = 1000;
int referenceValue = org * div;
for (size_t step = referenceValue - 10; step < referenceValue + 10; ++step) {
// use (double) step / divisor to feed to your algorithm
}
You can't represent 4.6 precisely: http://www.binaryconvert.com/result_double.html?decimal=052046054
Use rounding before separating integer and fraction parts.
UPDATE
You may wish to use rational class from Boost library: http://www.boost.org/doc/libs/1_52_0/libs/rational/rational.html
CONCERNING YOUR TASK
To find required double take precision into account, for example, to find 4.6 calculate "closeness" to it:
double time;
...
double epsilon = 0.001;
if( abs(time-4.6) <= epsilon ) {
// found!
}
I have to find 4 digits number of the form XXYY that are perfect squares of any integer. I have written this code, but it gives the square root of all numbers when I have to filter only perfect integer numbers.
I want to show sqrt(z) only when it is an integer.
#include<math.h>
#include<iostream.h>
#include<conio.h>
void main()
{
int x,y=4,z;
clrscr();
for(x=1;x<=9;x++)
{
z=11*(100*x+y);
cout<<"\n"<<sqrt(z);
}
getch();
}
I'd probably check it like this, because my policy is to be paranoid about the accuracy of math functions:
double root = sqrt(z);
int introot = floor(root + 0.5); // round to nearby integer
if ((introot * introot) == z) { // integer arithmetic, so precise
cout << introot << " == sqrt(" << z << ")\n";
}
double can exactly represent all the integers we care about (for that matter, on most implementations it can exactly represent all the values of int). It also has enough precision to distinguish between sqrt(x) and sqrt(x+1) for all the integers we care about. sqrt(10000) - sqrt(9999) is 0.005, so we only need 5 decimal places of accuracy to avoid false positives, because a non-integer square root can't be any closer than that to an integer. A good sqrt implementation therefore can be accurate enough that (int)root == root on its own would do the job.
However, the standard doesn't specify the accuracy of sqrt and other math functions. In C99 this is explicitly stated in 5.2.4.2.2/5: I'm not sure whether C89 or C++ make it explicit. So I'm reluctant to rule out that the result could be out by a ulp or so. int(root) == root would give a false negative if sqrt(7744) came out as 87.9999999999999-ish
Also, there are much larger numbers where sqrt can't be exact (around the limit of what double can represent exactly). So I think it's easier to write the extra two lines of code than to write the comment explaining why I think math functions will be exact in the case I care about :-)
#include <iostream>
int main(int argc, char** argv) {
for (int i = 32; i < 100; ++i) {
// anything less than 32 or greater than 100
// doesn't result in a 4-digit number
int s = i*i;
if (s/100%11==0 && s%100%11==0) {
std::cout << i << '\t' << s << std::endl;
}
}
}
http://ideone.com/1Bn77
We can notice that
1 + 3 = 2^2
1 + 3 + 5 = 3^2,
1 + 3 + 5 + 7 = 4^2,
i.e. sum(1 + 3 + ... (2N + 1)) for any N is a square. (it is pretty easy to prove)
Now we can generate all squares in [0000, 9999] and check each square if it is XXYY.
There is absolutely no need to involve floating point math in this task at all. Here's an efficient piece of code that will do this job for you.
Since your number has to be a perfect square, it's quicker to only check perfect squares up front rather than all four digit numbers, filtering out non-squares (as you would do in the first-cut naive solution).
It's also probably safer to do it with integers rather than floating point values since you don't have to worry about all those inaccuracy issues when doing square root calculations.
#include <stdio.h>
int main (void) {
int d1, d2, d3, d4, sq, i = 32;
while ((sq = i * i) <= 9999) {
d1 = sq / 1000;
d2 = (sq % 1000) / 100;
d3 = (sq % 100) / 10;
d4 = (sq % 10);
if ((d1 == d2) && (d3 == d4))
printf (" %d\n", i * i);
i++;
}
return 0;
}
It relies on the fact that the first four-digit perfect square is 32 * 32 or 1024 (312 is 961). So it checks 322, 332, 342, and so on until you exceed the four-digit limit (that one being 1002 for a total of 69 possibilities whereas the naive solution would check about 9000 possibilities).
Then, for every possibility, it checks the digits for your final XXYY requirement, giving you the single answer:
7744
While I smell a homework question, here is a bit of guidance.
The problem with this solution is you are taking the square root, which introduces floating point arithmetic and the problems that causes in precise mathematics. You can get close by doing something like:
double epsilon = 0.00001;
if ((z % 1.0) < epsilon || (z % 1.0) + epsilon > 1) {
// it's probably an integer
}
It might be worth your while to rewrite this algorithm to just check if the number conforms to that format by testing the squares of ever increasing numbers. The highest number you'd have to test is short of the square root of the highest perfect square you're looking for. i.e. sqrt(9988) = 99.93... so you'd only have to test at most 100 numbers anyway. The lowest number you might test is 1122 I think, so you can actually start counting from 34.
There are even better solutions that involve factoring (and the use of the modulo operator)
but I think those are enough hints for now. ;-)
To check if sqrt(x) is an integer, compare it to its floored value:
sqrt(x) == (int) sqrt(x)
However, this is actually a bad way to compare floating point values due to precision issues. You should always factor in a small error component:
abs(sqrt(x) - ((int) sqrt(x))) < 0.0000001
Even if you make this correction though, your program will still be outputting the sqrt(z) when it sounds like what you want to do is output z. You should also loop through all y values, instead of just considering y=4 (note that y an also be 0, unlike x).
I want to show the sqrt(z) only when it is integer.
double result = sqrt( 25); // Took 25 as an example. Run this in a loop varying sqrt
// parameter
int checkResult = result;
if ( checkResult == result )
std::cout << "perfect Square" ;
else
std::cout << "not perfect square" ;
The way you are generating numbers is incorrect indeed correct (my bad) so all you need is right way to find square. : )
loop x: 1 to 9
if(check_for_perfect_square(x*1100 + 44))
print: x*1100 + 44
see here for how to find appropriate square Perfect square and perfect cube
You don't need to take square roots. Notice that you can easily generate all integer squares, and all numbers XXYY, in increasing order. So you just have to make a single pass through each sequence, looking for matches:
int n = 0 ;
int X = 1, Y = 0 ; // Set X=0 here to alow the solution 0000
while (X < 10) {
int nSquared = n * n ;
int XXYY = 1100 * X + 11 * Y ;
// Output them if they are equal
if (nSquared == XXYY) cout << XXYY << endl ;
// Increment the smaller of the two
if (nSquared <= XXYY) n++ ;
else if (Y < 9) Y++ ;
else { Y = 0 ; X++ ; }
}
The standard representation of constant e as the sum of the infinite series is very inefficient for computation, because of many division operations. So are there any alternative ways to compute the constant efficiently?
Since it's not possible to calculate every digit of 'e', you're going to have to pick a stopping point.
double precision: 16 decimal digits
For practical applications, "the 64-bit double precision floating point value that is as close as possible to the true value of 'e' -- approximately 16 decimal digits" is more than adequate.
As KennyTM said, that value has already been pre-calculated for you in the math library.
If you want to calculate it yourself, as Hans Passant pointed out, factorial already grows very fast.
The first 22 terms in the series is already overkill for calculating to that precision -- adding further terms from the series won't change the result if it's stored in a 64 bit double-precision floating point variable.
I think it will take you longer to blink than for your computer to do 22 divides. So I don't see any reason to optimize this further.
thousands, millions, or billions of decimal digits
As Matthieu M. pointed out, this value has already been calculated, and you can download it from Yee's web site.
If you want to calculate it yourself, that many digits won't fit in a standard double-precision floating-point number.
You need a "bignum" library.
As always, you can either use one of the many free bignum libraries already available, or re-invent the wheel by building your own yet another bignum library with its own special quirks.
The result -- a long file of digits -- is not terribly useful, but programs to calculate it are sometimes used as benchmarks to test the performance and accuracy of "bignum" library software, and as stress tests to check the stability and cooling capacity of new machine hardware.
One page very briefly describes the algorithms Yee uses to calculate mathematical constants.
The Wikipedia "binary splitting" article goes into much more detail.
I think the part you are looking for is the number representation:
instead of internally storing all numbers as a long series of digits before and after the decimal point (or a binary point),
Yee stores each term and each partial sum as a rational number -- as two integers, each of which is a long series of digits.
For example, say one of the worker CPUs was assigned the partial sum,
... 1/4! + 1/5! + 1/6! + ... .
Instead of doing the division first for each term, and then adding, and then returning a single million-digit fixed-point result to the manager CPU:
// extended to a million digits
1/24 + 1/120 + 1/720 => 0.0416666 + 0.0083333 + 0.00138888
that CPU can add all the terms in the series together first with rational arithmetic, and return the rational result to the manager CPU: two integers of perhaps a few hundred digits each:
// faster
1/24 + 1/120 + 1/720 => 1/24 + 840/86400 => 106560/2073600
After thousands of terms have been added together in this way, the manager CPU does the one and only division at the very end to get the decimal digits after the decimal point.
Remember to avoid PrematureOptimization, and
always ProfileBeforeOptimizing.
If you're using double or float, there is an M_E constant in math.h already.
#define M_E 2.71828182845904523536028747135266250 /* e */
There are other representions of e in http://en.wikipedia.org/wiki/Representations_of_e#As_an_infinite_series; all the them will involve division.
I'm not aware of any "faster" computation than the Taylor expansion of the series, i.e.:
e = 1/0! + 1/1! + 1/2! + ...
or
1/e = 1/0! - 1/1! + 1/2! - 1/3! + ...
Considering that these were used by A. Yee, who calculated the first 500 billion digits of e, I guess that there's not much optimising to do (or better, it could be optimised, but nobody yet found a way, AFAIK)
EDIT
A very rough implementation
#include <iostream>
#include <iomanip>
using namespace std;
double gete(int nsteps)
{
// Let's skip the first two terms
double res = 2.0;
double fact = 1;
for (int i=2; i<nsteps; i++)
{
fact *= i;
res += 1/fact;
}
return res;
}
int main()
{
cout << setprecision(50) << gete(10) << endl;
cout << setprecision(50) << gete(50) << endl;
}
Outputs
2.71828152557319224769116772222332656383514404296875
2.71828182845904553488480814849026501178741455078125
This page has a nice rundown of different calculation methods.
This is a tiny C program from Xavier Gourdon to compute 9000 decimal digits of e on your computer. A program of the same kind exists for π and for some other constants defined by mean of hypergeometric series.
[degolfed version from https://codereview.stackexchange.com/a/33019 ]
#include <stdio.h>
int main() {
int N = 9009, a[9009], x;
for (int n = N - 1; n > 0; --n) {
a[n] = 1;
}
a[1] = 2;
while (N > 9) {
int n = N--;
while (--n) {
a[n] = x % n;
x = 10 * a[n-1] + x/n;
}
printf("%d", x);
}
return 0;
}
This program [when code-golfed] has 117 characters. It can be changed to compute more digits (change the value 9009 to more) and to be faster (change the constant 10 to another power of 10 and the printf command). A not so obvious question is to find the algorithm used.
I gave this answer at CodeReviews on the question regarding computing e by its definition via Taylor series (so, other methods were not an option). The cross-post here was suggested in the comments. I've removed my remarks relevant to that other topic; Those interested in further explanations migth want to check the original post.
The solution in C (should be easy enough to adapt to adapt to C++):
#include <stdio.h>
#include <math.h>
int main ()
{
long double n = 0, f = 1;
int i;
for (i = 28; i >= 1; i--) {
f *= i; // f = 28*27*...*i = 28! / (i-1)!
n += f; // n = 28 + 28*27 + ... + 28! / (i-1)!
} // n = 28! * (1/0! + 1/1! + ... + 1/28!), f = 28!
n /= f;
printf("%.64llf\n", n);
printf("%.64llf\n", expl(1));
printf("%llg\n", n - expl(1));
printf("%d\n", n == expl(1));
}
Output:
2.7182818284590452354281681079939403389289509505033493041992187500
2.7182818284590452354281681079939403389289509505033493041992187500
0
1
There are two important points:
This code doesn't compute 1, 1*2, 1*2*3,... which is O(n^2), but computes 1*2*3*... in one pass (which is O(n)).
It starts from smaller numbers. If we tried to compute
1/1 + 1/2 + 1/6 + ... + 1/20!
and tried to add it 1/21!, we'd be adding
1/21! = 1/51090942171709440000 = 2E-20,
to 2.something, which has no effect on the result (double holds about 16 significant digits). This effect is called underflow.
However, when we start with these numbers, i.e., if we compute 1/32!+1/31!+... they all have some impact.
This solution seems in accordance to what C computes with its expl function, on my 64bit machine, compiled with gcc 4.7.2 20120921.
You may be able to gain some efficiency. Since each term involves the next factorial, some efficiency may be obtained by remembering the last value of the factorial.
e = 1 + 1/1! + 1/2! + 1/3! ...
Expanding the equation:
e = 1 + 1/(1 * 1) + 1/(1 * 1 * 2) + 1/(1 * 2 * 3) ...
Instead of computing each factorial, the denominator is multiplied by the next increment. So keeping the denominator as a variable and multiplying it will produce some optimization.
If you're ok with an approximation up to seven digits, use
3-sqrt(5/63)
2.7182819
If you want the exact value:
e = (-1)^(1/(j*pi))
where j is the imaginary unit and pi the well-known mathematical constant (Euler's Identity)
There are several "spigot" algorithms which compute digits sequentially in an unbounded manner. This is useful because you can simply calculate the "next" digit through a constant number of basic arithmetic operations, without defining beforehand how many digits you wish to produce.
These apply a series of successive transformations such that the next digit comes to the 1's place, so that they are not affected by float rounding errors. The efficiency is high because these transformations can be formulated as matrix multiplications, which reduce to integer addition and multiplication.
In short, the taylor series expansion
e = 1/0! + 1/1! + 1/2! + 1/3! ... + 1/n!
Can be rewritten by factoring out fractional parts of the factorials (note that to make the series regular we've moved 1 to the left side):
(e - 1) = 1 + (1/2)*(1 + (1/3)*(1 + (1/4)...))
We can define a series of functions f1(x) ... fn(x) thus:
f1(x) = 1 + (1/2)x
f2(x) = 1 + (1/3)x
f3(x) = 1 + (1/4)x
...
The value of e is found from the composition of all of these functions:
(e-1) = f1(f2(f3(...fn(x))))
We can observe that the value of x in each function is determined by the next function, and that each of these values is bounded on the range [1,2] - that is, for any of these functions, the value of x will be 1 <= x <= 2
Since this is the case, we can set a lower and upper bound for e by using the values 1 and 2 for x respectively:
lower(e-1) = f1(1) = 1 + (1/2)*1 = 3/2 = 1.5
upper(e-1) = f1(2) = 1 + (1/2)*2 = 2
We can increase precision by composing the functions defined above, and when a digit matches in the lower and upper bound, we know that our computed value of e is precise to that digit:
lower(e-1) = f1(f2(f3(1))) = 1 + (1/2)*(1 + (1/3)*(1 + (1/4)*1)) = 41/24 = 1.708333
upper(e-1) = f1(f2(f3(2))) = 1 + (1/2)*(1 + (1/3)*(1 + (1/4)*2)) = 7/4 = 1.75
Since the 1s and 10ths digits match, we can say that an approximation of (e-1) with precision of 10ths is 1.7. When the first digit matches between the upper and lower bounds, we subtract it off and then multiply by 10 - this way the digit in question is always in the 1's place where floating-point precision is high.
The real optimization comes from the technique in linear algebra of describing a linear function as a transformation matrix. Composing functions maps to matrix multiplication, so all of those nested functions can be reduced to simple integer multiplication and addition. The procedure of subtracting the digit and multiplying by 10 also constitutes a linear transformation, and therefore can also be accomplished by matrix multiplication.
Another explanation of the method:
http://www.hulver.com/scoop/story/2004/7/22/153549/352
The paper that describes the algorithm:
http://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/spigot.pdf
A quick intro to performing linear transformations via matrix arithmetic:
https://people.math.gatech.edu/~cain/notes/cal6.pdf
NB this algorithm makes use of Mobius Transformations which are a type of linear transformation described briefly in the Gibbons paper.
From my point of view, the most efficient way to compute e up to a desired precision is to use the following representation:
e := lim (n -> inf): (1 + (1/n))^n
Especially if you choose n = 2^x, you can compute the potency with just x multiplications, since:
a^n = (a^2)^(n/2), if n % 2 = 0
The binary splitting method lends itself nicely to a template metaprogram which produces a type which represents a rational corresponding to an approximation of e. 13 iterations seems to be the maximum - any higher will produce a "integral constant overflow" error.
#include <iostream>
#include <iomanip>
template<int NUMER = 0, int DENOM = 1>
struct Rational
{
enum {NUMERATOR = NUMER};
enum {DENOMINATOR = DENOM};
static double value;
};
template<int NUMER, int DENOM>
double Rational<NUMER, DENOM>::value = static_cast<double> (NUMER) / DENOM;
template<int ITERS, class APPROX = Rational<2, 1>, int I = 2>
struct CalcE
{
typedef Rational<APPROX::NUMERATOR * I + 1, APPROX::DENOMINATOR * I> NewApprox;
typedef typename CalcE<ITERS, NewApprox, I + 1>::Result Result;
};
template<int ITERS, class APPROX>
struct CalcE<ITERS, APPROX, ITERS>
{
typedef APPROX Result;
};
int test (int argc, char* argv[])
{
std::cout << std::setprecision (9);
// ExpType is the type containing our approximation to e.
typedef CalcE<13>::Result ExpType;
// Call result() to produce the double value.
std::cout << "e ~ " << ExpType::value << std::endl;
return 0;
}
Another (non-metaprogram) templated variation will, at compile-time, calculate a double approximating e. This one doesn't have the limit on the number of iterations.
#include <iostream>
#include <iomanip>
template<int ITERS, long long NUMERATOR = 2, long long DENOMINATOR = 1, int I = 2>
struct CalcE
{
static double result ()
{
return CalcE<ITERS, NUMERATOR * I + 1, DENOMINATOR * I, I + 1>::result ();
}
};
template<int ITERS, long long NUMERATOR, long long DENOMINATOR>
struct CalcE<ITERS, NUMERATOR, DENOMINATOR, ITERS>
{
static double result ()
{
return (double)NUMERATOR / DENOMINATOR;
}
};
int main (int argc, char* argv[])
{
std::cout << std::setprecision (16);
std::cout << "e ~ " << CalcE<16>::result () << std::endl;
return 0;
}
In an optimised build the expression CalcE<16>::result () will be replaced by the actual double value.
Both are arguably quite efficient since they calculate e at compile time :-)
#nico Re:
..."faster" computation than the Taylor expansion of the series, i.e.:
e = 1/0! + 1/1! + 1/2! + ...
or
1/e = 1/0! - 1/1! + 1/2! - 1/3! + ...
Here are ways to algebraically improve the convergence of Newton’s method:
https://www.researchgate.net/publication/52005980_Improving_the_Convergence_of_Newton's_Series_Approximation_for_e
It appears to be an open question as to whether they can be used in conjunction with binary splitting to computationally speed things up. Nonetheless, here is an example from Damian Conway using Perl that illustrates the improvement in direct computational efficiency for this new approach. It’s in the section titled “𝑒 is for estimation”:
http://blogs.perl.org/users/damian_conway/2019/09/to-compute-a-constant-of-calculusa-treatise-on-multiple-ways.html
(This comment is too long to post as a reply for answer on Jun 12 '10 at 10:28)
From wikipedia replace x with 1