The standard representation of constant e as the sum of the infinite series is very inefficient for computation, because of many division operations. So are there any alternative ways to compute the constant efficiently?
Since it's not possible to calculate every digit of 'e', you're going to have to pick a stopping point.
double precision: 16 decimal digits
For practical applications, "the 64-bit double precision floating point value that is as close as possible to the true value of 'e' -- approximately 16 decimal digits" is more than adequate.
As KennyTM said, that value has already been pre-calculated for you in the math library.
If you want to calculate it yourself, as Hans Passant pointed out, factorial already grows very fast.
The first 22 terms in the series is already overkill for calculating to that precision -- adding further terms from the series won't change the result if it's stored in a 64 bit double-precision floating point variable.
I think it will take you longer to blink than for your computer to do 22 divides. So I don't see any reason to optimize this further.
thousands, millions, or billions of decimal digits
As Matthieu M. pointed out, this value has already been calculated, and you can download it from Yee's web site.
If you want to calculate it yourself, that many digits won't fit in a standard double-precision floating-point number.
You need a "bignum" library.
As always, you can either use one of the many free bignum libraries already available, or re-invent the wheel by building your own yet another bignum library with its own special quirks.
The result -- a long file of digits -- is not terribly useful, but programs to calculate it are sometimes used as benchmarks to test the performance and accuracy of "bignum" library software, and as stress tests to check the stability and cooling capacity of new machine hardware.
One page very briefly describes the algorithms Yee uses to calculate mathematical constants.
The Wikipedia "binary splitting" article goes into much more detail.
I think the part you are looking for is the number representation:
instead of internally storing all numbers as a long series of digits before and after the decimal point (or a binary point),
Yee stores each term and each partial sum as a rational number -- as two integers, each of which is a long series of digits.
For example, say one of the worker CPUs was assigned the partial sum,
... 1/4! + 1/5! + 1/6! + ... .
Instead of doing the division first for each term, and then adding, and then returning a single million-digit fixed-point result to the manager CPU:
// extended to a million digits
1/24 + 1/120 + 1/720 => 0.0416666 + 0.0083333 + 0.00138888
that CPU can add all the terms in the series together first with rational arithmetic, and return the rational result to the manager CPU: two integers of perhaps a few hundred digits each:
// faster
1/24 + 1/120 + 1/720 => 1/24 + 840/86400 => 106560/2073600
After thousands of terms have been added together in this way, the manager CPU does the one and only division at the very end to get the decimal digits after the decimal point.
Remember to avoid PrematureOptimization, and
always ProfileBeforeOptimizing.
If you're using double or float, there is an M_E constant in math.h already.
#define M_E 2.71828182845904523536028747135266250 /* e */
There are other representions of e in http://en.wikipedia.org/wiki/Representations_of_e#As_an_infinite_series; all the them will involve division.
I'm not aware of any "faster" computation than the Taylor expansion of the series, i.e.:
e = 1/0! + 1/1! + 1/2! + ...
or
1/e = 1/0! - 1/1! + 1/2! - 1/3! + ...
Considering that these were used by A. Yee, who calculated the first 500 billion digits of e, I guess that there's not much optimising to do (or better, it could be optimised, but nobody yet found a way, AFAIK)
EDIT
A very rough implementation
#include <iostream>
#include <iomanip>
using namespace std;
double gete(int nsteps)
{
// Let's skip the first two terms
double res = 2.0;
double fact = 1;
for (int i=2; i<nsteps; i++)
{
fact *= i;
res += 1/fact;
}
return res;
}
int main()
{
cout << setprecision(50) << gete(10) << endl;
cout << setprecision(50) << gete(50) << endl;
}
Outputs
2.71828152557319224769116772222332656383514404296875
2.71828182845904553488480814849026501178741455078125
This page has a nice rundown of different calculation methods.
This is a tiny C program from Xavier Gourdon to compute 9000 decimal digits of e on your computer. A program of the same kind exists for π and for some other constants defined by mean of hypergeometric series.
[degolfed version from https://codereview.stackexchange.com/a/33019 ]
#include <stdio.h>
int main() {
int N = 9009, a[9009], x;
for (int n = N - 1; n > 0; --n) {
a[n] = 1;
}
a[1] = 2;
while (N > 9) {
int n = N--;
while (--n) {
a[n] = x % n;
x = 10 * a[n-1] + x/n;
}
printf("%d", x);
}
return 0;
}
This program [when code-golfed] has 117 characters. It can be changed to compute more digits (change the value 9009 to more) and to be faster (change the constant 10 to another power of 10 and the printf command). A not so obvious question is to find the algorithm used.
I gave this answer at CodeReviews on the question regarding computing e by its definition via Taylor series (so, other methods were not an option). The cross-post here was suggested in the comments. I've removed my remarks relevant to that other topic; Those interested in further explanations migth want to check the original post.
The solution in C (should be easy enough to adapt to adapt to C++):
#include <stdio.h>
#include <math.h>
int main ()
{
long double n = 0, f = 1;
int i;
for (i = 28; i >= 1; i--) {
f *= i; // f = 28*27*...*i = 28! / (i-1)!
n += f; // n = 28 + 28*27 + ... + 28! / (i-1)!
} // n = 28! * (1/0! + 1/1! + ... + 1/28!), f = 28!
n /= f;
printf("%.64llf\n", n);
printf("%.64llf\n", expl(1));
printf("%llg\n", n - expl(1));
printf("%d\n", n == expl(1));
}
Output:
2.7182818284590452354281681079939403389289509505033493041992187500
2.7182818284590452354281681079939403389289509505033493041992187500
0
1
There are two important points:
This code doesn't compute 1, 1*2, 1*2*3,... which is O(n^2), but computes 1*2*3*... in one pass (which is O(n)).
It starts from smaller numbers. If we tried to compute
1/1 + 1/2 + 1/6 + ... + 1/20!
and tried to add it 1/21!, we'd be adding
1/21! = 1/51090942171709440000 = 2E-20,
to 2.something, which has no effect on the result (double holds about 16 significant digits). This effect is called underflow.
However, when we start with these numbers, i.e., if we compute 1/32!+1/31!+... they all have some impact.
This solution seems in accordance to what C computes with its expl function, on my 64bit machine, compiled with gcc 4.7.2 20120921.
You may be able to gain some efficiency. Since each term involves the next factorial, some efficiency may be obtained by remembering the last value of the factorial.
e = 1 + 1/1! + 1/2! + 1/3! ...
Expanding the equation:
e = 1 + 1/(1 * 1) + 1/(1 * 1 * 2) + 1/(1 * 2 * 3) ...
Instead of computing each factorial, the denominator is multiplied by the next increment. So keeping the denominator as a variable and multiplying it will produce some optimization.
If you're ok with an approximation up to seven digits, use
3-sqrt(5/63)
2.7182819
If you want the exact value:
e = (-1)^(1/(j*pi))
where j is the imaginary unit and pi the well-known mathematical constant (Euler's Identity)
There are several "spigot" algorithms which compute digits sequentially in an unbounded manner. This is useful because you can simply calculate the "next" digit through a constant number of basic arithmetic operations, without defining beforehand how many digits you wish to produce.
These apply a series of successive transformations such that the next digit comes to the 1's place, so that they are not affected by float rounding errors. The efficiency is high because these transformations can be formulated as matrix multiplications, which reduce to integer addition and multiplication.
In short, the taylor series expansion
e = 1/0! + 1/1! + 1/2! + 1/3! ... + 1/n!
Can be rewritten by factoring out fractional parts of the factorials (note that to make the series regular we've moved 1 to the left side):
(e - 1) = 1 + (1/2)*(1 + (1/3)*(1 + (1/4)...))
We can define a series of functions f1(x) ... fn(x) thus:
f1(x) = 1 + (1/2)x
f2(x) = 1 + (1/3)x
f3(x) = 1 + (1/4)x
...
The value of e is found from the composition of all of these functions:
(e-1) = f1(f2(f3(...fn(x))))
We can observe that the value of x in each function is determined by the next function, and that each of these values is bounded on the range [1,2] - that is, for any of these functions, the value of x will be 1 <= x <= 2
Since this is the case, we can set a lower and upper bound for e by using the values 1 and 2 for x respectively:
lower(e-1) = f1(1) = 1 + (1/2)*1 = 3/2 = 1.5
upper(e-1) = f1(2) = 1 + (1/2)*2 = 2
We can increase precision by composing the functions defined above, and when a digit matches in the lower and upper bound, we know that our computed value of e is precise to that digit:
lower(e-1) = f1(f2(f3(1))) = 1 + (1/2)*(1 + (1/3)*(1 + (1/4)*1)) = 41/24 = 1.708333
upper(e-1) = f1(f2(f3(2))) = 1 + (1/2)*(1 + (1/3)*(1 + (1/4)*2)) = 7/4 = 1.75
Since the 1s and 10ths digits match, we can say that an approximation of (e-1) with precision of 10ths is 1.7. When the first digit matches between the upper and lower bounds, we subtract it off and then multiply by 10 - this way the digit in question is always in the 1's place where floating-point precision is high.
The real optimization comes from the technique in linear algebra of describing a linear function as a transformation matrix. Composing functions maps to matrix multiplication, so all of those nested functions can be reduced to simple integer multiplication and addition. The procedure of subtracting the digit and multiplying by 10 also constitutes a linear transformation, and therefore can also be accomplished by matrix multiplication.
Another explanation of the method:
http://www.hulver.com/scoop/story/2004/7/22/153549/352
The paper that describes the algorithm:
http://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/spigot.pdf
A quick intro to performing linear transformations via matrix arithmetic:
https://people.math.gatech.edu/~cain/notes/cal6.pdf
NB this algorithm makes use of Mobius Transformations which are a type of linear transformation described briefly in the Gibbons paper.
From my point of view, the most efficient way to compute e up to a desired precision is to use the following representation:
e := lim (n -> inf): (1 + (1/n))^n
Especially if you choose n = 2^x, you can compute the potency with just x multiplications, since:
a^n = (a^2)^(n/2), if n % 2 = 0
The binary splitting method lends itself nicely to a template metaprogram which produces a type which represents a rational corresponding to an approximation of e. 13 iterations seems to be the maximum - any higher will produce a "integral constant overflow" error.
#include <iostream>
#include <iomanip>
template<int NUMER = 0, int DENOM = 1>
struct Rational
{
enum {NUMERATOR = NUMER};
enum {DENOMINATOR = DENOM};
static double value;
};
template<int NUMER, int DENOM>
double Rational<NUMER, DENOM>::value = static_cast<double> (NUMER) / DENOM;
template<int ITERS, class APPROX = Rational<2, 1>, int I = 2>
struct CalcE
{
typedef Rational<APPROX::NUMERATOR * I + 1, APPROX::DENOMINATOR * I> NewApprox;
typedef typename CalcE<ITERS, NewApprox, I + 1>::Result Result;
};
template<int ITERS, class APPROX>
struct CalcE<ITERS, APPROX, ITERS>
{
typedef APPROX Result;
};
int test (int argc, char* argv[])
{
std::cout << std::setprecision (9);
// ExpType is the type containing our approximation to e.
typedef CalcE<13>::Result ExpType;
// Call result() to produce the double value.
std::cout << "e ~ " << ExpType::value << std::endl;
return 0;
}
Another (non-metaprogram) templated variation will, at compile-time, calculate a double approximating e. This one doesn't have the limit on the number of iterations.
#include <iostream>
#include <iomanip>
template<int ITERS, long long NUMERATOR = 2, long long DENOMINATOR = 1, int I = 2>
struct CalcE
{
static double result ()
{
return CalcE<ITERS, NUMERATOR * I + 1, DENOMINATOR * I, I + 1>::result ();
}
};
template<int ITERS, long long NUMERATOR, long long DENOMINATOR>
struct CalcE<ITERS, NUMERATOR, DENOMINATOR, ITERS>
{
static double result ()
{
return (double)NUMERATOR / DENOMINATOR;
}
};
int main (int argc, char* argv[])
{
std::cout << std::setprecision (16);
std::cout << "e ~ " << CalcE<16>::result () << std::endl;
return 0;
}
In an optimised build the expression CalcE<16>::result () will be replaced by the actual double value.
Both are arguably quite efficient since they calculate e at compile time :-)
#nico Re:
..."faster" computation than the Taylor expansion of the series, i.e.:
e = 1/0! + 1/1! + 1/2! + ...
or
1/e = 1/0! - 1/1! + 1/2! - 1/3! + ...
Here are ways to algebraically improve the convergence of Newton’s method:
https://www.researchgate.net/publication/52005980_Improving_the_Convergence_of_Newton's_Series_Approximation_for_e
It appears to be an open question as to whether they can be used in conjunction with binary splitting to computationally speed things up. Nonetheless, here is an example from Damian Conway using Perl that illustrates the improvement in direct computational efficiency for this new approach. It’s in the section titled “𝑒 is for estimation”:
http://blogs.perl.org/users/damian_conway/2019/09/to-compute-a-constant-of-calculusa-treatise-on-multiple-ways.html
(This comment is too long to post as a reply for answer on Jun 12 '10 at 10:28)
From wikipedia replace x with 1
Related
Say, I have some integer n and would like to subdivide it into two other integers according to some ratio. I have some approach where I ask myself whether it does work or not.
For example: 20 with ratio 70% should be subdivided into 14,6.
The obvious solution would be:
int n = 20;
double ratio = .7;
int n1 = static_cast<int>(n * ratio);
int n2 = static_cast<int>(n * (1 - ratio));
Since the cast always floors, however, I usually underrate my result. If I use std::round, there are still cases that are not working. For example, if the first decimal place is a 5, then both numbers will be rounded up.
Some colleagues suggested: Ceil the first number and floor the second one. In most of my tests, this works, however:
1) Does it really always work, also taking into accounting possible rounding errors that naturally occur in multiplying numbers? What I think of: 20*.7 could be 14, while 20*.3 could be 5.999999. So, my sum might be 14 + 5 = 19. This is just my guess, however, I do not know whether these kind of results can or cannot occur (otherwise the answer would be simply that this kind of rounding proposition does not work)
2) Even if it does work... Why?
(I have in mind that I could just calculate number 1 by n * ratio and calculate number 2 by n - n * ratio, but I would still be interested in the answer to this question)
How about this?
int n = 20;
double ratio = .7;
int n1 = static_cast<int>(n * ratio);
int n2 = n - n1;
Here is example that confirms your suspicion and shows that the ceil+floor method doesn't always work. It is caused by the finite precision of floating point numbers on computer:
#include <iostream>
#include <cmath>
int main() {
int n = 10;
double ratio = 0.7;
int n1 = static_cast<int>(floor(n * ratio));
int n2 = static_cast<int>(ceil(n * (1.0 - ratio)));
std::cout << n1 << " " << n2 << std::endl;
}
Output:
7 4
7 + 4 is 11, so it's wrong.
Your solution doesn't always work, take a ratio of 77%, you'll get 15 and 4 (See on coliru).
Welcome to the domain of numerical analysis.
First, your computer can't always perfectly store a floating number. As you can see in the example, .77 is stored as 0.77000000000000001776 (it is an approach of the number by a sum of powers of 2).
When doing floating point calculation, you will always have a loss in precision. You can get this precision with std::numeric_limits<double>::epsilon().
Moreover, you'll still get more precision loss when converting from a floating number to an integer, and in your case the difference is big enough to give you an incoherent result.
The solution provided by #ToniBig and your last sentence has the advantage of "hiding" this loss and keep coherent data.
In this StackOverflow question:
Generating random integer from a range
the accepted answer suggests the following formula for generating a random integer in between given min and max, with min and max being included into the range:
output = min + (rand() % (int)(max - min + 1))
But it also says that
This is still slightly biased towards lower numbers ... It's also
possible to extend it so that it removes the bias.
But it doesn't explain why it's biased towards lower numbers or how to remove the bias. So, the question is: is this the most optimal approach to generation of a random integer within a (signed) range while not relying on anything fancy, just rand() function, and in case if it is optimal, how to remove the bias?
EDIT:
I've just tested the while-loop algorithm suggested by #Joey against floating-point extrapolation:
static const double s_invRandMax = 1.0/((double)RAND_MAX + 1.0);
return min + (int)(((double)(max + 1 - min))*rand()*s_invRandMax);
to see how much uniformly "balls" are "falling" into and are being distributed among a number of "buckets", one test for the floating-point extrapolation and another for the while-loop algorithm. But results turned out to be varying depending on the number of "balls" (and "buckets") so I couldn't easily pick a winner. The working code can be found at this Ideone page. For example, with 10 buckets and 100 balls the maximum deviation from the ideal probability among buckets is less for the floating-point extrapolation than for the while-loop algorithm (0.04 and 0.05 respectively) but with 1000 balls, the maximum deviation of the while-loop algorithm is lesser (0.024 and 0.011), and with 10000 balls, the floating-point extrapolation is again doing better (0.0034 and 0.0053), and so on without much of consistency. Thinking of the possibility that none of the algorithms consistently produces uniform distribution better than that of the other algorithm, makes me lean towards the floating-point extrapolation since it appears to perform faster than the while-loop algorithm. So is it fine to choose the floating-point extrapolation algorithm or my testings/conclusions are not completely correct?
The problem is that you're doing a modulo operation. This would be no problem if RAND_MAX would be evenly divisible by your modulus, but usually that is not the case. As a very contrived example, assume RAND_MAX to be 11 and your modulus to be 3. You'll get the following possible random numbers and the following resulting remainders:
0 1 2 3 4 5 6 7 8 9 10
0 1 2 0 1 2 0 1 2 0 1
As you can see, 0 and 1 are slightly more probable than 2.
One option to solve this is rejection sampling: By disallowing the numbers 9 and 10 above you can cause the resulting distribution to be uniform again. The tricky part is figuring out how to do so efficiently. A very nice example (one that took me two days to understand why it works) can be found in Java's java.util.Random.nextInt(int) method.
The reason why Java's algorithm is a little tricky is that they avoid slow operations like multiplication and division for the check. If you don't care too much you can also do it the naïve way:
int n = (int)(max - min + 1);
int remainder = RAND_MAX % n;
int x, output;
do {
x = rand();
output = x % n;
} while (x >= RAND_MAX - remainder);
return min + output;
EDIT: Corrected a fencepost error in above code, now it works as it should. I also created a little sample program (C#; taking a uniform PRNG for numbers between 0 and 15 and constructing a PRNG for numbers between 0 and 6 from it via various ways):
using System;
class Rand {
static Random r = new Random();
static int Rand16() {
return r.Next(16);
}
static int Rand7Naive() {
return Rand16() % 7;
}
static int Rand7Float() {
return (int)(Rand16() / 16.0 * 7);
}
// corrected
static int Rand7RejectionNaive() {
int n = 7, remainder = 16 % n, x, output;
do {
x = Rand16();
output = x % n;
} while (x >= 16 - remainder);
return output;
}
// adapted to fit the constraints of this example
static int Rand7RejectionJava() {
int n = 7, x, output;
do {
x = Rand16();
output = x % n;
} while (x - output + 6 > 15);
return output;
}
static void Test(Func<int> rand, string name) {
var buckets = new int[7];
for (int i = 0; i < 10000000; i++) buckets[rand()]++;
Console.WriteLine(name);
for (int i = 0; i < 7; i++) Console.WriteLine("{0}\t{1}", i, buckets[i]);
}
static void Main() {
Test(Rand7Naive, "Rand7Naive");
Test(Rand7Float, "Rand7Float");
Test(Rand7RejectionNaive, "Rand7RejectionNaive");
}
}
The result is as follows (pasted into Excel and added conditional coloring of cells so that differences are more apparent):
Now that I fixed my mistake in above rejection sampling it works as it should (before it would bias 0). As you can see, the float method isn't perfect at all, it just distributes the biased numbers differently.
The problem occurs when the number of outputs from the random number generator (RAND_MAX+1) is not evenly divisible by the desired range (max-min+1). Since there will be a consistent mapping from a random number to an output, some outputs will be mapped to more random numbers than others. This is regardless of how the mapping is done - you can use modulo, division, conversion to floating point, whatever voodoo you can come up with, the basic problem remains.
The magnitude of the problem is very small, and undemanding applications can generally get away with ignoring it. The smaller the range and the larger RAND_MAX is, the less pronounced the effect will be.
I took your example program and tweaked it a bit. First I created a special version of rand that only has a range of 0-255, to better demonstrate the effect. I made a few tweaks to rangeRandomAlg2. Finally I changed the number of "balls" to 1000000 to improve the consistency. You can see the results here: http://ideone.com/4P4HY
Notice that the floating-point version produces two tightly grouped probabilities, near either 0.101 or 0.097, nothing in between. This is the bias in action.
I think calling this "Java's algorithm" is a bit misleading - I'm sure it's much older than Java.
int rangeRandomAlg2 (int min, int max)
{
int n = max - min + 1;
int remainder = RAND_MAX % n;
int x;
do
{
x = rand();
} while (x >= RAND_MAX - remainder);
return min + x % n;
}
It's easy to see why this algorithm produces a biased sample. Suppose your rand() function returns uniform integers from the set {0, 1, 2, 3, 4}. If I want to use this to generate a random bit 0 or 1, I would say rand() % 2. The set {0, 2, 4} gives me 0, and the set {1, 3} gives me 1 -- so clearly I sample 0 with 60% and 1 with 40% likelihood, not uniform at all!
To fix this you have to either make sure that your desired range divides the range of the random number generator, or otherwise discard the result whenever the random number generator returns a number that's larger than the largest possible multiple of the target range.
In the above example, the target range is 2, the largest multiple that fits into the random generation range is 4, so we discard any sample that is not in the set {0, 1, 2, 3} and roll again.
By far the easiest solution is std::uniform_int_distribution<int>(min, max).
You have touched on two points involving a random integer algorithm: Is it optimal, and is it unbiased?
Optimal
There are many ways to define an "optimal" algorithm. Here we look at "optimal" algorithms in terms of the number of random bits it uses on average. In this sense, rand is a poor method to use for randomly generated numbers because, among other problems with rand(), it need not necessarily produce random bits (because RAND_MAX is not exactly specified). Instead, we will assume we have a "true" random generator that can produce unbiased and independent random bits.
In 1976, D. E. Knuth and A. C. Yao showed that any algorithm that produces random integers with a given probability, using only random bits, can be represented as a binary tree, where random bits indicate which way to traverse the tree and each leaf (endpoint) corresponds to an outcome. (Knuth and Yao, "The complexity of nonuniform random number generation", in Algorithms and Complexity, 1976.) They also gave bounds on the number of bits a given algorithm will need on average for this task. In this case, an optimal algorithm to generate integers in [0, n) uniformly, will need at least log2(n) and at most log2(n) + 2 bits on average.
There are many examples of optimal algorithms in this sense. See the following answer of mine:
How to generate a random integer in the range [0,n] from a stream of random bits without wasting bits?
Unbiased
However, any optimal integer generator that is also unbiased will, in general, run forever in the worst case, as also shown by Knuth and Yao. Going back to the binary tree, each one of the n outcomes labels leaves in the binary tree so that each integer in [0, n) can occur with probability 1/n. But if 1/n has a non-terminating binary expansion (which will be the case if n is not a power of 2), this binary tree will necessarily either—
Have an "infinite" depth, or
include "rejection" leaves at the end of the tree,
And in either case, the algorithm won't run in constant time and will run forever in the worst case. (On the other hand, when n is a power of 2, the optimal binary tree will have a finite depth and no rejection nodes.)
And for general n, there is no way to "fix" this worst case time complexity without introducing bias. For instance, modulo reductions (including the min + (rand() % (int)(max - min + 1)) in your question) are equivalent to a binary tree in which rejection leaves are replaced with labeled outcomes — but since there are more possible outcomes than rejection leaves, only some of the outcomes can take the place of the rejection leaves, introducing bias. The same kind of binary tree — and the same kind of bias — results if you stop rejecting after a set number of iterations. (However, this bias may be negligible depending on the application. There are also security aspects to random integer generation, which are too complicated to discuss in this answer.)
Without loss of generality, the problem of generating random integers on [a, b] can be reduced to the problem of generating random integers on [0, s). The state of the art for generating random integers on a bounded range from a uniform PRNG is represented by the following recent publication:
Daniel Lemire,"Fast Random Integer Generation in an Interval." ACM Trans. Model. Comput. Simul. 29, 1, Article 3 (January 2019) (ArXiv draft)
Lemire shows that his algorithm provides unbiased results, and motivated by the growing popularity of very fast high-quality PRNGs such as Melissa O'Neill's PCG generators, shows how to the results can be computed fast, avoiding slow division operations almost all of the time.
An exemplary ISO-C implementation of his algorithm is shown in randint() below. Here I demonstrate it in conjunction with George Marsaglia's older KISS64 PRNG. For performance reasons, the required 64×64→128 bit unsigned multiplication is typically best implemented via machine-specific intrinsics or inline assembly that map directly to appropriate hardware instructions.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* PRNG state */
typedef struct Prng_T *Prng_T;
/* Returns uniformly distributed integers in [0, 2**64-1] */
uint64_t random64 (Prng_T);
/* Multiplies two 64-bit factors into a 128-bit product */
void umul64wide (uint64_t, uint64_t, uint64_t *, uint64_t *);
/* Generate in bias-free manner a random integer in [0, s) with Lemire's fast
algorithm that uses integer division only rarely. s must be in [0, 2**64-1].
Daniel Lemire, "Fast Random Integer Generation in an Interval," ACM Trans.
Model. Comput. Simul. 29, 1, Article 3 (January 2019)
*/
uint64_t randint (Prng_T prng, uint64_t s)
{
uint64_t x, h, l, t;
x = random64 (prng);
umul64wide (x, s, &h, &l);
if (l < s) {
t = (0 - s) % s;
while (l < t) {
x = random64 (prng);
umul64wide (x, s, &h, &l);
}
}
return h;
}
#define X86_INLINE_ASM (0)
/* Multiply two 64-bit unsigned integers into a 128 bit unsined product. Return
the least significant 64 bist of the product to the location pointed to by
lo, and the most signfiicant 64 bits of the product to the location pointed
to by hi.
*/
void umul64wide (uint64_t a, uint64_t b, uint64_t *hi, uint64_t *lo)
{
#if X86_INLINE_ASM
uint64_t l, h;
__asm__ (
"movq %2, %%rax;\n\t" // rax = a
"mulq %3;\n\t" // rdx:rax = a * b
"movq %%rax, %0;\n\t" // l = (a * b)<31:0>
"movq %%rdx, %1;\n\t" // h = (a * b)<63:32>
: "=r"(l), "=r"(h)
: "r"(a), "r"(b)
: "%rax", "%rdx");
*lo = l;
*hi = h;
#else // X86_INLINE_ASM
uint64_t a_lo = (uint64_t)(uint32_t)a;
uint64_t a_hi = a >> 32;
uint64_t b_lo = (uint64_t)(uint32_t)b;
uint64_t b_hi = b >> 32;
uint64_t p0 = a_lo * b_lo;
uint64_t p1 = a_lo * b_hi;
uint64_t p2 = a_hi * b_lo;
uint64_t p3 = a_hi * b_hi;
uint32_t cy = (uint32_t)(((p0 >> 32) + (uint32_t)p1 + (uint32_t)p2) >> 32);
*lo = p0 + (p1 << 32) + (p2 << 32);
*hi = p3 + (p1 >> 32) + (p2 >> 32) + cy;
#endif // X86_INLINE_ASM
}
/* George Marsaglia's KISS64 generator, posted to comp.lang.c on 28 Feb 2009
https://groups.google.com/forum/#!original/comp.lang.c/qFv18ql_WlU/IK8KGZZFJx4J
*/
struct Prng_T {
uint64_t x, c, y, z, t;
};
struct Prng_T kiss64 = {1234567890987654321ULL, 123456123456123456ULL,
362436362436362436ULL, 1066149217761810ULL, 0ULL};
/* KISS64 state equations */
#define MWC64 (kiss64->t = (kiss64->x << 58) + kiss64->c, \
kiss64->c = (kiss64->x >> 6), kiss64->x += kiss64->t, \
kiss64->c += (kiss64->x < kiss64->t), kiss64->x)
#define XSH64 (kiss64->y ^= (kiss64->y << 13), kiss64->y ^= (kiss64->y >> 17), \
kiss64->y ^= (kiss64->y << 43))
#define CNG64 (kiss64->z = 6906969069ULL * kiss64->z + 1234567ULL)
#define KISS64 (MWC64 + XSH64 + CNG64)
uint64_t random64 (Prng_T kiss64)
{
return KISS64;
}
int main (void)
{
int i;
Prng_T state = &kiss64;
for (i = 0; i < 1000; i++) {
printf ("%llu\n", randint (state, 10));
}
return EXIT_SUCCESS;
}
If you really want to get a perfect generator assuming rand() function that you have is perfect, you need to apply the method explained bellow.
We will create a random number, r, from 0 to max-min=b-1, which is then easy to move to the range that you want, just take r+min
We will create a random number where b < RAND_MAX, but the procedure can be easily adopted to have a random number for any base
PROCEDURE:
Take a random number r in its original RAND_MAX size without any truncation
Display this number in base b
Take first m=floor(log_b(RAND_MAX)) digits of this number for m random numbers from 0 to b-1
Shift each by min (i.e. r+min) to get them into the range (min,max) as you wanted
Since log_b(RAND_MAX) is not necessarily an integer, the last digit in the representation is wasted.
The original approach of just using mod (%) is mistaken exactly by
(log_b(RAND_MAX) - floor(log_b(RAND_MAX)))/ceil(log_b(RAND_MAX))
which you might agree is not that much, but if you insist on being precise, that is the procedure.
I've got a fixed point class (10.22) and I have a need of a pow, a sqrt, an exp and a log function.
Alas I have no idea where to even start on this. Can anyone provide me with some links to useful articles or, better yet, provide me with some code?
I'm assuming that once I have an exp function then it becomes relatively easy to implement pow and sqrt as they just become.
pow( x, y ) => exp( y * log( x ) )
sqrt( x ) => pow( x, 0.5 )
Its just those exp and log functions that I'm finding difficult (as though I remember a few of my log rules, I can't remember much else about them).
Presumably, there would also be a faster method for sqrt and pow so any pointers on that front would be appreciated even if its just to say use the methods i outline above.
Please note: This HAS to be cross platform and in pure C/C++ code so I cannot use any assembler optimisations.
A very simple solution is to use a decent table-driven approximation. You don't actually need a lot of data if you reduce your inputs correctly. exp(a)==exp(a/2)*exp(a/2), which means you really only need to calculate exp(x) for 1 < x < 2. Over that range, a runga-kutta approximation would give reasonable results with ~16 entries IIRC.
Similarly, sqrt(a) == 2 * sqrt(a/4) == sqrt(4*a) / 2 which means you need only table entries for 1 < a < 4. Log(a) is a bit harder: log(a) == 1 + log(a/e). This is a rather slow iteration, but log(1024) is only 6.9 so you won't have many iterations.
You'd use a similar "integer-first" algorithm for pow: pow(x,y)==pow(x, floor(y)) * pow(x, frac(y)). This works because pow(double, int) is trivial (divide and conquer).
[edit] For the integral component of log(a), it may be useful to store a table 1, e, e^2, e^3, e^4, e^5, e^6, e^7 so you can reduce log(a) == n + log(a/e^n) by a simple hardcoded binary search of a in that table. The improvement from 7 to 3 steps isn't so big, but it means you only have to divide once by e^n instead of n times by e.
[edit 2]
And for that last log(a/e^n) term, you can use log(a/e^n) = log((a/e^n)^8)/8 - each iteration produces 3 more bits by table lookup. That keeps your code and table size small. This is typically code for embedded systems, and they don't have large caches.
[edit 3]
That's still not to smart on my side. log(a) = log(2) + log(a/2). You can just store the fixed-point value log2=0.6931471805599, count the number of leading zeroes, shift a into the range used for your lookup table, and multiply that shift (integer) by the fixed-point constant log2. Can be as low as 3 instructions.
Using e for the reduction step just gives you a "nice" log(e)=1.0 constant but that's false optimization. 0.6931471805599 is just as good a constant as 1.0; both are 32 bits constants in 10.22 fixed point. Using 2 as the constant for range reduction allows you to use a bit shift for a division.
[edit 5]
And since you're storing it in Q10.22, you can better store log(65536)=11.09035488. (16 x log(2)). The "x16" means that we've got 4 more bits of precision available.
You still get the trick from edit 2, log(a/2^n) = log((a/2^n)^8)/8. Basically, this gets you a result (a + b/8 + c/64 + d/512) * 0.6931471805599 - with b,c,d in the range [0,7]. a.bcd really is an octal number. Not a surprise since we used 8 as the power. (The trick works equally well with power 2, 4 or 16.)
[edit 4]
Still had an open end. pow(x, frac(y) is just pow(sqrt(x), 2 * frac(y)) and we have a decent 1/sqrt(x). That gives us the far more efficient approach. Say frac(y)=0.101 binary, i.e. 1/2 plus 1/8. Then that means x^0.101 is (x^1/2 * x^1/8). But x^1/2 is just sqrt(x) and x^1/8 is (sqrt(sqrt(sqrt(x))). Saving one more operation, Newton-Raphson NR(x) gives us 1/sqrt(x) so we calculate 1.0/(NR(x)*NR((NR(NR(x))). We only invert the end result, don't use the sqrt function directly.
Below is an example C implementation of Clay S. Turner's fixed-point log base 2 algorithm[1]. The algorithm doesn't require any kind of look-up table. This can be useful on systems where memory constraints are tight and the processor lacks an FPU, such as is the case with many microcontrollers. Log base e and log base 10 are then also supported by using the property of logarithms that, for any base n:
logₘ(x)
logₙ(x) = ───────
logₘ(n)
where, for this algorithm, m equals 2.
A nice feature of this implementation is that it supports variable precision: the precision can be determined at runtime, at the expense of range. The way I've implemented it, the processor (or compiler) must be capable of doing 64-bit math for holding some intermediate results. It can be easily adapted to not require 64-bit support, but the range will be reduced.
When using these functions, x is expected to be a fixed-point value scaled according to the
specified precision. For instance, if precision is 16, then x should be scaled by 2^16 (65536). The result is a fixed-point value with the same scale factor as the input. A return value of INT32_MIN represents negative infinity. A return value of INT32_MAX indicates an error and errno will be set to EINVAL, indicating that the input precision was invalid.
#include <errno.h>
#include <stddef.h>
#include "log2fix.h"
#define INV_LOG2_E_Q1DOT31 UINT64_C(0x58b90bfc) // Inverse log base 2 of e
#define INV_LOG2_10_Q1DOT31 UINT64_C(0x268826a1) // Inverse log base 2 of 10
int32_t log2fix (uint32_t x, size_t precision)
{
int32_t b = 1U << (precision - 1);
int32_t y = 0;
if (precision < 1 || precision > 31) {
errno = EINVAL;
return INT32_MAX; // indicates an error
}
if (x == 0) {
return INT32_MIN; // represents negative infinity
}
while (x < 1U << precision) {
x <<= 1;
y -= 1U << precision;
}
while (x >= 2U << precision) {
x >>= 1;
y += 1U << precision;
}
uint64_t z = x;
for (size_t i = 0; i < precision; i++) {
z = z * z >> precision;
if (z >= 2U << (uint64_t)precision) {
z >>= 1;
y += b;
}
b >>= 1;
}
return y;
}
int32_t logfix (uint32_t x, size_t precision)
{
uint64_t t;
t = log2fix(x, precision) * INV_LOG2_E_Q1DOT31;
return t >> 31;
}
int32_t log10fix (uint32_t x, size_t precision)
{
uint64_t t;
t = log2fix(x, precision) * INV_LOG2_10_Q1DOT31;
return t >> 31;
}
The code for this implementation also lives at Github, along with a sample/test program that illustrates how to use this function to compute and display logarithms from numbers read from standard input.
[1] C. S. Turner, "A Fast Binary Logarithm Algorithm", IEEE Signal Processing Mag., pp. 124,140, Sep. 2010.
A good starting point is Jack Crenshaw's book, "Math Toolkit for Real-Time Programming". It has a good discussion of algorithms and implementations for various transcendental functions.
Check my fixed point sqrt implementation using only integer operations.
It was fun to invent. Quite old now.
https://groups.google.com/forum/?hl=fr%05aacf5997b615c37&fromgroups#!topic/comp.lang.c/IpwKbw0MAxw/discussion
Otherwise check the CORDIC set of algorithms. That's the way to implement all the functions you listed and the trigonometric functions.
EDIT : I published the reviewed source on GitHub here
I have to find 4 digits number of the form XXYY that are perfect squares of any integer. I have written this code, but it gives the square root of all numbers when I have to filter only perfect integer numbers.
I want to show sqrt(z) only when it is an integer.
#include<math.h>
#include<iostream.h>
#include<conio.h>
void main()
{
int x,y=4,z;
clrscr();
for(x=1;x<=9;x++)
{
z=11*(100*x+y);
cout<<"\n"<<sqrt(z);
}
getch();
}
I'd probably check it like this, because my policy is to be paranoid about the accuracy of math functions:
double root = sqrt(z);
int introot = floor(root + 0.5); // round to nearby integer
if ((introot * introot) == z) { // integer arithmetic, so precise
cout << introot << " == sqrt(" << z << ")\n";
}
double can exactly represent all the integers we care about (for that matter, on most implementations it can exactly represent all the values of int). It also has enough precision to distinguish between sqrt(x) and sqrt(x+1) for all the integers we care about. sqrt(10000) - sqrt(9999) is 0.005, so we only need 5 decimal places of accuracy to avoid false positives, because a non-integer square root can't be any closer than that to an integer. A good sqrt implementation therefore can be accurate enough that (int)root == root on its own would do the job.
However, the standard doesn't specify the accuracy of sqrt and other math functions. In C99 this is explicitly stated in 5.2.4.2.2/5: I'm not sure whether C89 or C++ make it explicit. So I'm reluctant to rule out that the result could be out by a ulp or so. int(root) == root would give a false negative if sqrt(7744) came out as 87.9999999999999-ish
Also, there are much larger numbers where sqrt can't be exact (around the limit of what double can represent exactly). So I think it's easier to write the extra two lines of code than to write the comment explaining why I think math functions will be exact in the case I care about :-)
#include <iostream>
int main(int argc, char** argv) {
for (int i = 32; i < 100; ++i) {
// anything less than 32 or greater than 100
// doesn't result in a 4-digit number
int s = i*i;
if (s/100%11==0 && s%100%11==0) {
std::cout << i << '\t' << s << std::endl;
}
}
}
http://ideone.com/1Bn77
We can notice that
1 + 3 = 2^2
1 + 3 + 5 = 3^2,
1 + 3 + 5 + 7 = 4^2,
i.e. sum(1 + 3 + ... (2N + 1)) for any N is a square. (it is pretty easy to prove)
Now we can generate all squares in [0000, 9999] and check each square if it is XXYY.
There is absolutely no need to involve floating point math in this task at all. Here's an efficient piece of code that will do this job for you.
Since your number has to be a perfect square, it's quicker to only check perfect squares up front rather than all four digit numbers, filtering out non-squares (as you would do in the first-cut naive solution).
It's also probably safer to do it with integers rather than floating point values since you don't have to worry about all those inaccuracy issues when doing square root calculations.
#include <stdio.h>
int main (void) {
int d1, d2, d3, d4, sq, i = 32;
while ((sq = i * i) <= 9999) {
d1 = sq / 1000;
d2 = (sq % 1000) / 100;
d3 = (sq % 100) / 10;
d4 = (sq % 10);
if ((d1 == d2) && (d3 == d4))
printf (" %d\n", i * i);
i++;
}
return 0;
}
It relies on the fact that the first four-digit perfect square is 32 * 32 or 1024 (312 is 961). So it checks 322, 332, 342, and so on until you exceed the four-digit limit (that one being 1002 for a total of 69 possibilities whereas the naive solution would check about 9000 possibilities).
Then, for every possibility, it checks the digits for your final XXYY requirement, giving you the single answer:
7744
While I smell a homework question, here is a bit of guidance.
The problem with this solution is you are taking the square root, which introduces floating point arithmetic and the problems that causes in precise mathematics. You can get close by doing something like:
double epsilon = 0.00001;
if ((z % 1.0) < epsilon || (z % 1.0) + epsilon > 1) {
// it's probably an integer
}
It might be worth your while to rewrite this algorithm to just check if the number conforms to that format by testing the squares of ever increasing numbers. The highest number you'd have to test is short of the square root of the highest perfect square you're looking for. i.e. sqrt(9988) = 99.93... so you'd only have to test at most 100 numbers anyway. The lowest number you might test is 1122 I think, so you can actually start counting from 34.
There are even better solutions that involve factoring (and the use of the modulo operator)
but I think those are enough hints for now. ;-)
To check if sqrt(x) is an integer, compare it to its floored value:
sqrt(x) == (int) sqrt(x)
However, this is actually a bad way to compare floating point values due to precision issues. You should always factor in a small error component:
abs(sqrt(x) - ((int) sqrt(x))) < 0.0000001
Even if you make this correction though, your program will still be outputting the sqrt(z) when it sounds like what you want to do is output z. You should also loop through all y values, instead of just considering y=4 (note that y an also be 0, unlike x).
I want to show the sqrt(z) only when it is integer.
double result = sqrt( 25); // Took 25 as an example. Run this in a loop varying sqrt
// parameter
int checkResult = result;
if ( checkResult == result )
std::cout << "perfect Square" ;
else
std::cout << "not perfect square" ;
The way you are generating numbers is incorrect indeed correct (my bad) so all you need is right way to find square. : )
loop x: 1 to 9
if(check_for_perfect_square(x*1100 + 44))
print: x*1100 + 44
see here for how to find appropriate square Perfect square and perfect cube
You don't need to take square roots. Notice that you can easily generate all integer squares, and all numbers XXYY, in increasing order. So you just have to make a single pass through each sequence, looking for matches:
int n = 0 ;
int X = 1, Y = 0 ; // Set X=0 here to alow the solution 0000
while (X < 10) {
int nSquared = n * n ;
int XXYY = 1100 * X + 11 * Y ;
// Output them if they are equal
if (nSquared == XXYY) cout << XXYY << endl ;
// Increment the smaller of the two
if (nSquared <= XXYY) n++ ;
else if (Y < 9) Y++ ;
else { Y = 0 ; X++ ; }
}
I was always wondering how I can make a function which calculates the power (e.g. 23) myself. In most languages these are included in the standard library, mostly as pow(double x, double y), but how can I write it myself?
I was thinking about for loops, but it think my brain got in a loop (when I wanted to do a power with a non-integer exponent, like 54.5 or negatives 2-21) and I went crazy ;)
So, how can I write a function which calculates the power of a real number? Thanks
Oh, maybe important to note: I cannot use functions which use powers (e.g. exp), which would make this ultimately useless.
Negative powers are not a problem, they're just the inverse (1/x) of the positive power.
Floating point powers are just a little bit more complicated; as you know a fractional power is equivalent to a root (e.g. x^(1/2) == sqrt(x)) and you also know that multiplying powers with the same base is equivalent to add their exponents.
With all the above, you can:
Decompose the exponent in a integer part and a rational part.
Calculate the integer power with a loop (you can optimise it decomposing in factors and reusing partial calculations).
Calculate the root with any algorithm you like (any iterative approximation like bisection or Newton method could work).
Multiply the result.
If the exponent was negative, apply the inverse.
Example:
2^(-3.5) = (2^3 * 2^(1/2)))^-1 = 1 / (2*2*2 * sqrt(2))
AB = Log-1(Log(A)*B)
Edit: yes, this definition really does provide something useful. For example, on an x86, it translates almost directly to FYL2X (Y * Log2(X)) and F2XM1 (2x-1):
fyl2x
fld st(0)
frndint
fsubr st(1),st
fxch st(1)
fchs
f2xmi
fld1
faddp st(1),st
fscale
fstp st(1)
The code ends up a little longer than you might expect, primarily because F2XM1 only works with numbers in the range -1.0..1.0. The fld st(0)/frndint/fsubr st(1),st piece subtracts off the integer part, so we're left with only the fraction. We apply F2XM1 to that, add the 1 back on, then use FSCALE to handle the integer part of the exponentiation.
Typically the implementation of the pow(double, double) function in math libraries is based on the identity:
pow(x,y) = pow(a, y * log_a(x))
Using this identity, you only need to know how to raise a single number a to an arbitrary exponent, and how to take a logarithm base a. You have effectively turned a complicated multi-variable function into a two functions of a single variable, and a multiplication, which is pretty easy to implement. The most commonly chosen values of a are e or 2 -- e because the e^x and log_e(1+x) have some very nice mathematical properties, and 2 because it has some nice properties for implementation in floating-point arithmetic.
The catch of doing it this way is that (if you want to get full accuracy) you need to compute the log_a(x) term (and its product with y) to higher accuracy than the floating-point representation of x and y. For example, if x and y are doubles, and you want to get a high accuracy result, you'll need to come up with some way to store intermediate results (and do arithmetic) in a higher-precision format. The Intel x87 format is a common choice, as are 64-bit integers (though if you really want a top-quality implementation, you'll need to do a couple of 96-bit integer computations, which are a little bit painful in some languages). It's much easier to deal with this if you implement powf(float,float), because then you can just use double for intermediate computations. I would recommend starting with that if you want to use this approach.
The algorithm that I outlined is not the only possible way to compute pow. It is merely the most suitable for delivering a high-speed result that satisfies a fixed a priori accuracy bound. It is less suitable in some other contexts, and is certainly much harder to implement than the repeated-square[root]-ing algorithm that some others have suggested.
If you want to try the repeated square[root] algorithm, begin by writing an unsigned integer power function that uses repeated squaring only. Once you have a good grasp on the algorithm for that reduced case, you will find it fairly straightforward to extend it to handle fractional exponents.
There are two distinct cases to deal with: Integer exponents and fractional exponents.
For integer exponents, you can use exponentiation by squaring.
def pow(base, exponent):
if exponent == 0:
return 1
elif exponent < 0:
return 1 / pow(base, -exponent)
elif exponent % 2 == 0:
half_pow = pow(base, exponent // 2)
return half_pow * half_pow
else:
return base * pow(base, exponent - 1)
The second "elif" is what distinguishes this from the naïve pow function. It allows the function to make O(log n) recursive calls instead of O(n).
For fractional exponents, you can use the identity a^b = C^(b*log_C(a)). It's convenient to take C=2, so a^b = 2^(b * log2(a)). This reduces the problem to writing functions for 2^x and log2(x).
The reason it's convenient to take C=2 is that floating-point numbers are stored in base-2 floating point. log2(a * 2^b) = log2(a) + b. This makes it easier to write your log2 function: You don't need to have it be accurate for every positive number, just on the interval [1, 2). Similarly, to calculate 2^x, you can multiply 2^(integer part of x) * 2^(fractional part of x). The first part is trivial to store in a floating point number, for the second part, you just need a 2^x function over the interval [0, 1).
The hard part is finding a good approximation of 2^x and log2(x). A simple approach is to use Taylor series.
Per definition:
a^b = exp(b ln(a))
where exp(x) = 1 + x + x^2/2 + x^3/3! + x^4/4! + x^5/5! + ...
where n! = 1 * 2 * ... * n.
In practice, you could store an array of the first 10 values of 1/n!, and then approximate
exp(x) = 1 + x + x^2/2 + x^3/3! + ... + x^10/10!
because 10! is a huge number, so 1/10! is very small (2.7557319224⋅10^-7).
Wolfram functions gives a wide variety of formulae for calculating powers. Some of them would be very straightforward to implement.
For positive integer powers, look at exponentiation by squaring and addition-chain exponentiation.
Using three self implemented functions iPow(x, n), Ln(x) and Exp(x), I'm able to compute fPow(x, a), x and a being doubles. Neither of the functions below use library functions, but just iteration.
Some explanation about functions implemented:
(1) iPow(x, n): x is double, n is int. This is a simple iteration, as n is an integer.
(2) Ln(x): This function uses the Taylor Series iteration. The series used in iteration is Σ (from int i = 0 to n) {(1 / (2 * i + 1)) * ((x - 1) / (x + 1)) ^ (2 * n + 1)}. The symbol ^ denotes the power function Pow(x, n) implemented in the 1st function, which uses simple iteration.
(3) Exp(x): This function, again, uses the Taylor Series iteration. The series used in iteration is Σ (from int i = 0 to n) {x^i / i!}. Here, the ^ denotes the power function, but it is not computed by calling the 1st Pow(x, n) function; instead it is implemented within the 3rd function, concurrently with the factorial, using d *= x / i. I felt I had to use this trick, because in this function, iteration takes some more steps relative to the other functions and the factorial (i!) overflows most of the time. In order to make sure the iteration does not overflow, the power function in this part is iterated concurrently with the factorial. This way, I overcame the overflow.
(4) fPow(x, a): x and a are both doubles. This function does nothing but just call the other three functions implemented above. The main idea in this function depends on some calculus: fPow(x, a) = Exp(a * Ln(x)). And now, I have all the functions iPow, Ln and Exp with iteration already.
n.b. I used a constant MAX_DELTA_DOUBLE in order to decide in which step to stop the iteration. I've set it to 1.0E-15, which seems reasonable for doubles. So, the iteration stops if (delta < MAX_DELTA_DOUBLE) If you need some more precision, you can use long double and decrease the constant value for MAX_DELTA_DOUBLE, to 1.0E-18 for example (1.0E-18 would be the minimum).
Here is the code, which works for me.
#define MAX_DELTA_DOUBLE 1.0E-15
#define EULERS_NUMBER 2.718281828459045
double MathAbs_Double (double x) {
return ((x >= 0) ? x : -x);
}
int MathAbs_Int (int x) {
return ((x >= 0) ? x : -x);
}
double MathPow_Double_Int(double x, int n) {
double ret;
if ((x == 1.0) || (n == 1)) {
ret = x;
} else if (n < 0) {
ret = 1.0 / MathPow_Double_Int(x, -n);
} else {
ret = 1.0;
while (n--) {
ret *= x;
}
}
return (ret);
}
double MathLn_Double(double x) {
double ret = 0.0, d;
if (x > 0) {
int n = 0;
do {
int a = 2 * n + 1;
d = (1.0 / a) * MathPow_Double_Int((x - 1) / (x + 1), a);
ret += d;
n++;
} while (MathAbs_Double(d) > MAX_DELTA_DOUBLE);
} else {
printf("\nerror: x < 0 in ln(x)\n");
exit(-1);
}
return (ret * 2);
}
double MathExp_Double(double x) {
double ret;
if (x == 1.0) {
ret = EULERS_NUMBER;
} else if (x < 0) {
ret = 1.0 / MathExp_Double(-x);
} else {
int n = 2;
double d;
ret = 1.0 + x;
do {
d = x;
for (int i = 2; i <= n; i++) {
d *= x / i;
}
ret += d;
n++;
} while (d > MAX_DELTA_DOUBLE);
}
return (ret);
}
double MathPow_Double_Double(double x, double a) {
double ret;
if ((x == 1.0) || (a == 1.0)) {
ret = x;
} else if (a < 0) {
ret = 1.0 / MathPow_Double_Double(x, -a);
} else {
ret = MathExp_Double(a * MathLn_Double(x));
}
return (ret);
}
It's an interesting exercise. Here's some suggestions, which you should try in this order:
Use a loop.
Use recursion (not better, but interesting none the less)
Optimize your recursion vastly by using divide-and-conquer
techniques
Use logarithms
You can found the pow function like this:
static double pows (double p_nombre, double p_puissance)
{
double nombre = p_nombre;
double i=0;
for(i=0; i < (p_puissance-1);i++){
nombre = nombre * p_nombre;
}
return (nombre);
}
You can found the floor function like this:
static double floors(double p_nomber)
{
double x = p_nomber;
long partent = (long) x;
if (x<0)
{
return (partent-1);
}
else
{
return (partent);
}
}
Best regards
A better algorithm to efficiently calculate positive integer powers is repeatedly square the base, while keeping track of the extra remainder multiplicands. Here is a sample solution in Python that should be relatively easy to understand and translate into your preferred language:
def power(base, exponent):
remaining_multiplicand = 1
result = base
while exponent > 1:
remainder = exponent % 2
if remainder > 0:
remaining_multiplicand = remaining_multiplicand * result
exponent = (exponent - remainder) / 2
result = result * result
return result * remaining_multiplicand
To make it handle negative exponents, all you have to do is calculate the positive version and divide 1 by the result, so that should be a simple modification to the above code. Fractional exponents are considerably more difficult, since it means essentially calculating an nth-root of the base, where n = 1/abs(exponent % 1) and multiplying the result by the result of the integer portion power calculation:
power(base, exponent - (exponent % 1))
You can calculate roots to a desired level of accuracy using Newton's method. Check out wikipedia article on the algorithm.
I am using fixed point long arithmetics and my pow is log2/exp2 based. Numbers consist of:
int sig = { -1; +1 } signum
DWORD a[A+B] number
A is number of DWORDs for integer part of number
B is number of DWORDs for fractional part
My simplified solution is this:
//---------------------------------------------------------------------------
longnum exp2 (const longnum &x)
{
int i,j;
longnum c,d;
c.one();
if (x.iszero()) return c;
i=x.bits()-1;
for(d=2,j=_longnum_bits_b;j<=i;j++,d*=d)
if (x.bitget(j))
c*=d;
for(i=0,j=_longnum_bits_b-1;i<_longnum_bits_b;j--,i++)
if (x.bitget(j))
c*=_longnum_log2[i];
if (x.sig<0) {d.one(); c=d/c;}
return c;
}
//---------------------------------------------------------------------------
longnum log2 (const longnum &x)
{
int i,j;
longnum c,d,dd,e,xx;
c.zero(); d.one(); e.zero(); xx=x;
if (xx.iszero()) return c; //**** error: log2(0) = infinity
if (xx.sig<0) return c; //**** error: log2(negative x) ... no result possible
if (d.geq(x,d)==0) {xx=d/xx; xx.sig=-1;}
i=xx.bits()-1;
e.bitset(i); i-=_longnum_bits_b;
for (;i>0;i--,e>>=1) // integer part
{
dd=d*e;
j=dd.geq(dd,xx);
if (j==1) continue; // dd> xx
c+=i; d=dd;
if (j==2) break; // dd==xx
}
for (i=0;i<_longnum_bits_b;i++) // fractional part
{
dd=d*_longnum_log2[i];
j=dd.geq(dd,xx);
if (j==1) continue; // dd> xx
c.bitset(_longnum_bits_b-i-1); d=dd;
if (j==2) break; // dd==xx
}
c.sig=xx.sig;
c.iszero();
return c;
}
//---------------------------------------------------------------------------
longnum pow (const longnum &x,const longnum &y)
{
//x^y = exp2(y*log2(x))
int ssig=+1; longnum c; c=x;
if (y.iszero()) {c.one(); return c;} // ?^0=1
if (c.iszero()) return c; // 0^?=0
if (c.sig<0)
{
c.overflow(); c.sig=+1;
if (y.isreal()) {c.zero(); return c;} //**** error: negative x ^ noninteger y
if (y.bitget(_longnum_bits_b)) ssig=-1;
}
c=exp2(log2(c)*y); c.sig=ssig; c.iszero();
return c;
}
//---------------------------------------------------------------------------
where:
_longnum_bits_a = A*32
_longnum_bits_b = B*32
_longnum_log2[i] = 2 ^ (1/(2^i)) ... precomputed sqrt table
_longnum_log2[0]=sqrt(2)
_longnum_log2[1]=sqrt[tab[0])
_longnum_log2[i]=sqrt(tab[i-1])
longnum::zero() sets *this=0
longnum::one() sets *this=+1
bool longnum::iszero() returns (*this==0)
bool longnum::isnonzero() returns (*this!=0)
bool longnum::isreal() returns (true if fractional part !=0)
bool longnum::isinteger() returns (true if fractional part ==0)
int longnum::bits() return num of used bits in number counted from LSB
longnum::bitget()/bitset()/bitres()/bitxor() are bit access
longnum.overflow() rounds number if there was a overflow X.FFFFFFFFFF...FFFFFFFFF??h -> (X+1).0000000000000...000000000h
int longnum::geq(x,y) is comparition |x|,|y| returns 0,1,2 for (<,>,==)
All you need to understand this code is that numbers in binary form consists of sum of powers of 2, when you need to compute 2^num then it can be rewritten as this
2^(b(-n)*2^(-n) + ... + b(+m)*2^(+m))
where n are fractional bits and m are integer bits. multiplication/division by 2 in binary form is simple bit shifting so if you put it all together you get code for exp2 similar to my. log2 is based on binaru search...changing the result bits from MSB to LSB until it matches searched value (very similar algorithm as for fast sqrt computation). hope this helps clarify things...
A lot of approaches are given in other answers. Here is something that I thought may be useful in case of integral powers.
In the case of integer power x of nx, the straightforward approach would take x-1 multiplications. In order to optimize this, we can use dynamic programming and reuse an earlier multiplication result to avoid all x multiplications. For example, in 59, we can, say, make batches of 3, i.e. calculate 53 once, get 125 and then cube 125 using the same logic, taking only 4 multiplcations in the process, instead of 8 multiplications with the straightforward way.
The question is what is the ideal size of the batch b so that the number of multiplications is minimum. So let's write the equation for this. If f(x,b) is the function representing the number of multiplications entailed in calculating nx using the above method, then
Explanation: A product of batch of p numbers will take p-1 multiplications. If we divide x multiplications into b batches, there would be (x/b)-1 multiplications required inside each batch, and b-1 multiplications required for all b batches.
Now we can calculate the first derivative of this function with respect to b and equate it to 0 to get the b for the least number of multiplications.
Now put back this value of b into the function f(x,b) to get the least number of multiplications:
For all positive x, this value is lesser than the multiplications by the straightforward way.
maybe you can use taylor series expansion. the Taylor series of a function is an infinite sum of terms that are expressed in terms of the function's derivatives at a single point. For most common functions, the function and the sum of its Taylor series are equal near this point. Taylor's series are named after Brook Taylor who introduced them in 1715.