I am looking to implement the fermat's little theorem for prime testing. Here's the code I have written:
lld expo(lld n, lld p) //2^p mod n
{
if(p==0)
return 1;
lld exp=expo(n,p/2);
if(p%2==0)
return (exp*exp)%n;
else
return (((exp*exp)%n)*2)%n;
}
bool ifPseudoPrime(lld n)
{
if(expo(n,n)==2)
return true;
else
return false;
}
NOTE: I took the value of a(<=n-1) as 2.
Now, the number n can go as large as 10^18. This means that variable exp can reach values near 10^18. Which further implies that the expression (exp*exp) can reach as high as 10^36 hence causing overflow. How do I avoid this.
I tested this and it ran fine till 10^9. I am using C++
If the modulus is close to the limit of the largest integer type you can use, things get somewhat complicated. If you can't use a library that implements biginteger arithmetic, you can roll a modular multiplication yourself by splitting the factors in low-order and high-order parts.
If the modulus m is so large that 2*(m-1) overflows, things get really fussy, but if 2*(m-1) doesn't overflow, it's bearable.
Let us suppose you have and use a 64-bit unsigned integer type.
You can calculate the modular product by splitting the factors into low and high 32 bits, the product then splits into
a = a1 + (a2 << 32) // 0 <= a1, a2 < (1 << 32)
b = b1 + (b2 << 32) // 0 <= b1, b2 < (1 << 32)
a*b = a1*b1 + (a1*b2 << 32) + (a2*b1 << 32) + (a2*b2 << 64)
To calculate a*b (mod m) with m <= (1 << 63), reduce each of the four products modulo m,
p1 = (a1*b1) % m;
p2 = (a1*b2) % m;
p3 = (a2*b1) % m;
p4 = (a2*b2) % m;
and the simplest way to incorporate the shifts is
for(i = 0; i < 32; ++i) {
p2 *= 2;
if (p2 >= m) p2 -= m;
}
the same for p3 and with 64 iterations for p4. Then
s = p1+p2;
if (s >= m) s -= m;
s += p3;
if (s >= m) s -= m;
s += p4;
if (s >= m) s -= m;
return s;
That way is not very fast, but for the few multiplications needed here, it may be fast enough. A small speedup should be obtained by reducing the number of shifts; first calculate (p4 << 32) % m,
for(i = 0; i < 32; ++i) {
p4 *= 2;
if (p4 >= m) p4 -= m;
}
then all of p2, p3 and the current value of p4 need to be multiplied with 232 modulo m,
p4 += p3;
if (p4 >= m) p4 -= m;
p4 += p2;
if (p4 >= m) p4 -= m;
for(i = 0; i < 32; ++i) {
p4 *= 2;
if (p4 >= m) p4 -= m;
}
s = p4+p1;
if (s >= m) s -= m;
return s;
You can perform your multiplications in several stages. For example, say you want to compute X*Y mod n. Take X and Y and write them as X = 10^9*X_1 + X_0, Y = 10^9*Y_1 + Y_0. Then compute all four products X_i*Y_j mod n, and finally compute X = 10^18*(X_1*Y_1 mod n) + 10^9*( X_0*Y_1 + X_1*Y_0 mod n) + X_0*Y_0. Note that in this case, you are operating with numbers half the size of the maximum allowed.
If splitting in two parts do not suffice (I suspect this is the case), split in three parts using the same schema. Splitting in three should work.
A simpler approach is just to multiply the school way. It corresponds to the previous approach, but writing one number in as many parts as digits it has.
Good luck!
Related
I am calculating combination(15, 7) in C++.
I first used the following code and get the wrong answer due to a type promotion error.
#include <iostream>
int main()
{
int a = 15;
double ans = 1;
for(int i = 1; i <= 7; i++)
ans *= (a + 1 - i) / i;
std::cout << (int) ans;
return 0;
}
Output: 2520
So I changed ans *= (a + 1 - i) / i; to ans *= (double)(a + 1 - i) / i; and still get the wrong answer.
#include <iostream>
int main()
{
int a = 15;
double ans = 1;
for(int i = 1; i <= 7; i++)
ans *= (double) (a + 1 - i) / i;
std::cout << (int) ans;
return 0;
}
Output: 6434
Finally, I tried ans = ans * (a + 1 - i) / i, which gives the right answer.
#include <iostream>
int main()
{
int a = 15;
double ans = 1;
for(int i = 1; i <= 7; i++)
ans = ans * (a + 1 - i) / i;
std::cout << (int) ans;
return 0;
}
Output: 6435
Could someone tell me why the second one did not work?
If you print out ans without casting it to (int) you'll see the second result is 6434.9999999999990905052982270717620849609375. That's pretty darn close to the right answer of 6535, so it's clearly not a type promotion error any more.
No, this is classic floating point inaccuracy. When you write ans *= (double) (a + 1 - i) / i you are doing the equivalent of:
ans = ans * ((double) (a + 1 - i) / i);
Compare this to the third version:
ans = ans * (a + 1 - i) / i;
The former performs division first followed by multiplication. The latter operates left to right and so the multiplication precedes the division. This change in order of operations causes the results of the two to be slightly different. Floating point calculations are extremely sensitive to order of operations.
Quick fix: Don't truncate the result; round it.
Better fix: Don't use floating point for integral arithmetic. Save the divisions until after all the multiplications are done. Use long, long long, or even a big number library.
First one did not work because you have integer division there.
Difference btw second one and third one is this:
ans = ans * (double(a + 1 - i) / i); // second is equal to this
vs:
ans = (ans * (a + 1 - i)) / i; // third is equal to this
so difference is in order of multiplication and division. If you round double to integer instead of simply dropping fractional part you will get the same result.
std::cout << int( ans + 0.5 ) << std::endl;
This problem's answer turns out to be calculating large binomial coefficients modulo prime number using Lucas' theorem. Here's the solution to that problem using this technique: here.
Now my questions are:
Seems like my code expires if the data increases due to overflow of variables. Any ways to handle this?
Are there any ways to do this without using this theorem?
EDIT: note that as this is an OI or ACM problem, external libs other than original ones are not permitted.
Code below:
#include <iostream>
#include <string.h>
#include <stdio.h>
using namespace std;
#define N 100010
long long mod_pow(int a,int n,int p)
{
long long ret=1;
long long A=a;
while(n)
{
if (n & 1)
ret=(ret*A)%p;
A=(A*A)%p;
n>>=1;
}
return ret;
}
long long factorial[N];
void init(long long p)
{
factorial[0] = 1;
for(int i = 1;i <= p;i++)
factorial[i] = factorial[i-1]*i%p;
//for(int i = 0;i < p;i++)
//ni[i] = mod_pow(factorial[i],p-2,p);
}
long long Lucas(long long a,long long k,long long p)
{
long long re = 1;
while(a && k)
{
long long aa = a%p;long long bb = k%p;
if(aa < bb) return 0;
re = re*factorial[aa]*mod_pow(factorial[bb]*factorial[aa-bb]%p,p-2,p)%p;
a /= p;
k /= p;
}
return re;
}
int main()
{
int t;
cin >> t;
while(t--)
{
long long n,m,p;
cin >> n >> m >> p;
init(p);
cout << Lucas(n+m,m,p) << "\n";
}
return 0;
}
This solution assumes that p2 fits into an unsigned long long. Since an unsigned long long has at least 64 bits as per standard, this works at least for p up to 4 billion, much more than the question specifies.
typedef unsigned long long num;
/* x such that a*x = 1 mod p */
num modinv(num a, num p)
{
/* implement this one on your own */
/* you can use the extended Euclidean algorithm */
}
/* n chose m mod p */
/* computed with the theorem of Lucas */
num modbinom(num n, num m, num p)
{
num i, result, divisor, n_, m_;
if (m == 0)
return 1;
/* check for the likely case that the result is zero */
if (n < m)
return 0;
for (n_ = n, m_ = m; m_ > 0; n_ /= p, m_ /= p)
if (n_ % p < m_ % p)
return 0;
for (result = 1; n >= p || m >= p; n /= p, m /= p) {
result *= modbinom(n % p, m % p, p);
result %= p;
}
/* avoid unnecessary computations */
if (m > n - m)
m = n - m;
divisor = 1;
for (i = 0; i < m; i++) {
result *= n - i;
result %= p;
divisor *= i + 1;
divisor %= p;
}
result *= modinv(divisor, p);
result %= p;
return result;
}
An infinite precision integer seems like the way to go.
If you are in C++,
the PicklingTools library has an "infinite precision" integer (similar to
Python's LONG type). Someone else suggested Python, that's a reasonable
answer if you know Python. if you want to do it in C++, you can
use the int_n type:
#include "ocval.h"
int_n n="012345678910227836478627843";
n = n + 1; // Can combine with other plain ints as well
Take a look at the documentation at:
http://www.picklingtools.com/html/usersguide.html#c-int-n-and-the-python-arbitrary-size-ints-long
and
http://www.picklingtools.com/html/faq.html#c-and-otab-tup-int-un-int-n-new-in-picklingtools-1-2-0
The download for the C++ PicklingTools is here.
You want a bignum (a.k.a. arbitrary precision arithmetic) library.
First, don't write your own bignum (or bigint) library, because efficient algorithms (more efficient than the naive ones you learned at school) are difficult to design and implement.
Then, I would recommend GMPlib. It is free software, well documented, often used, quite efficient, and well designed (with perhaps some imperfections, in particular the inability to plugin your own memory allocator in replacement of the system malloc; but you probably don't care, unless you want to catch the rare out-of-memory condition ...). It has an easy C++ interface. It is packaged in most Linux distributions.
If it is a homework assignment, perhaps your teacher is expecting you to think more on the math, and find, with some proof, a way of solving the problem without any bignums.
Lets suppose that we need to compute a value of (a / b) mod p where p is a prime number. Since p is prime then every number b has an inverse mod p. So (a / b) mod p = (a mod p) * (b mod p)^-1. We can use euclidean algorithm to compute the inverse.
To get (n over k) we need to compute n! mod p, (k!)^-1, ((n - k)!)^-1. Total time complexity is O(n).
UPDATE: Here is the code in c++. I didn't test it extensively though.
int64_t fastPow(int64_t a, int64_t exp, int64_t mod)
{
int64_t res = 1;
while (exp)
{
if (exp % 2 == 1)
{
res *= a;
res %= mod;
}
a *= a;
a %= mod;
exp >>= 1;
}
return res;
}
// This inverse works only for primes p, it uses Fermat's little theorem
int64_t inverse(int64_t a, int64_t p)
{
assert(p >= 2);
return fastPow(a, p - 2, p);
}
int64_t binomial(int64_t n, int64_t k, int64_t p)
{
std::vector<int64_t> fact(n + 1);
fact[0] = 1;
for (auto i = 1; i <= n; ++i)
fact[i] = (fact[i - 1] * i) % p;
return ((((fact[n] * inverse(fact[k], p)) % p) * inverse(fact[n - k], p)) % p);
}
I want to find (n choose r) for large integers, and I also have to find out the mod of that number.
long long int choose(int a,int b)
{
if (b > a)
return (-1);
if(b==0 || a==1 || b==a)
return(1);
else
{
long long int r = ((choose(a-1,b))%10000007+(choose(a-1,b- 1))%10000007)%10000007;
return r;
}
}
I am using this piece of code, but I am getting TLE. If there is some other method to do that please tell me.
I don't have the reputation to comment yet, but I wanted to point out that the answer by rock321987 works pretty well:
It is fast and correct up to and including C(62, 31)
but cannot handle all inputs that have an output that fits in a uint64_t. As proof, try:
C(67, 33) = 14,226,520,737,620,288,370 (verify correctness and size)
Unfortunately, the other implementation spits out 8,829,174,638,479,413 which is incorrect. There are other ways to calculate nCr which won't break like this, however the real problem here is that there is no attempt to take advantage of the modulus.
Notice that p = 10000007 is prime, which allows us to leverage the fact that all integers have an inverse mod p, and that inverse is unique. Furthermore, we can find that inverse quite quickly. Another question has an answer on how to do that here, which I've replicated below.
This is handy since:
x/y mod p == x*(y inverse) mod p; and
xy mod p == (x mod p)(y mod p)
Modifying the other code a bit, and generalizing the problem we have the following:
#include <iostream>
#include <assert.h>
// p MUST be prime and less than 2^63
uint64_t inverseModp(uint64_t a, uint64_t p) {
assert(p < (1ull << 63));
assert(a < p);
assert(a != 0);
uint64_t ex = p-2, result = 1;
while (ex > 0) {
if (ex % 2 == 1) {
result = (result*a) % p;
}
a = (a*a) % p;
ex /= 2;
}
return result;
}
// p MUST be prime
uint32_t nCrModp(uint32_t n, uint32_t r, uint32_t p)
{
assert(r <= n);
if (r > n-r) r = n-r;
if (r == 0) return 1;
if(n/p - (n-r)/p > r/p) return 0;
uint64_t result = 1; //intermediary results may overflow 32 bits
for (uint32_t i = n, x = 1; i > r; --i, ++x) {
if( i % p != 0) {
result *= i % p;
result %= p;
}
if( x % p != 0) {
result *= inverseModp(x % p, p);
result %= p;
}
}
return result;
}
int main() {
uint32_t smallPrime = 17;
uint32_t medNum = 3001;
uint32_t halfMedNum = medNum >> 1;
std::cout << nCrModp(medNum, halfMedNum, smallPrime) << std::endl;
uint32_t bigPrime = 4294967291ul; // 2^32-5 is largest prime < 2^32
uint32_t bigNum = 1ul << 24;
uint32_t halfBigNum = bigNum >> 1;
std::cout << nCrModp(bigNum, halfBigNum, bigPrime) << std::endl;
}
Which should produce results for any set of 32-bit inputs if you are willing to wait. To prove a point, I've included the calculation for a 24-bit n, and the maximum 32-bit prime. My modest PC took ~13 seconds to calculate this. Check the answer against wolfram alpha, but beware that it may exceed the 'standard computation time' there.
There is still room for improvement if p is much smaller than (n-r) where r <= n-r. For example, we could precalculate all the inverses mod p instead of doing it on demand several times over.
nCr = n! / (r! * (n-r)!) {! = factorial}
now choose r or n - r in such a way that any of them is minimum
#include <cstdio>
#include <cmath>
#define MOD 10000007
int main()
{
int n, r, i, x = 1;
long long int res = 1;
scanf("%d%d", &n, &r);
int mini = fmin(r, (n - r));//minimum of r,n-r
for (i = n;i > mini;i--) {
res = (res * i) / x;
x++;
}
printf("%lld\n", res % MOD);
return 0;
}
it will work for most cases as required by programming competitions if the value of n and r are not too high
Time complexity :- O(min(r, n - r))
Limitation :- for languages like C/C++ etc. there will be overflow if
n > 60 (approximately)
as no datatype can store the final value..
The expansion of nCr can always be reduced to product of integers. This is done by canceling out terms in denominator. This approach is applied in the function given below.
This function has time complexity of O(n^2 * log(n)). This will calculate nCr % m for n<=10000 under 1 sec.
#include <numeric>
#include <algorithm>
int M=1e7+7;
int ncr(int n, int r)
{
r=min(r,n-r);
int A[r],i,j,B[r];
iota(A,A+r,n-r+1); //initializing A starting from n-r+1 to n
iota(B,B+r,1); //initializing B starting from 1 to r
int g;
for(i=0;i<r;i++)
for(j=0;j<r;j++)
{
if(B[i]==1)
break;
g=__gcd(B[i], A[j] );
A[j]/=g;
B[i]/=g;
}
long long ans=1;
for(i=0;i<r;i++)
ans=(ans*A[i])%M;
return ans;
}
I want an efficient implementation of Faulhaber's Formula
I want answer as
F(N,K) % P
where F(N,K) is implementation of faulhaber's forumula and P is a prime number.
Note: N is very large upto 10^16 and K is upto 3000
I tried the double series implementation in the given site. But its too much time consuming for very large n and k. Can any one help making this implementation more efficient or describe some other way to implement the formula.
How about using Schultz' (1980) idea, outlined below the double series implementation (mathworld.wolfram.com/PowerSum.html) that you mentioned?
From Wolfram MathWorld:
Schultz (1980) showed that the sum S_p(n) can be found by writing
and solving the system of p+1 equations
obtained for j=0, 1, ..., p (Guo and Qi 1999), where delta (j,p) is the Kronecker delta.
Below is an attempt in Haskell that seems to work. It returns a result for n=10^16, p=1000 in about 36 seconds on my old laptop PC.
{-# OPTIONS_GHC -O2 #-}
import Math.Combinatorics.Exact.Binomial
import Data.Ratio
import Data.List (foldl')
kroneckerDelta a b | a == b = 1 % 1
| otherwise = 0 % 1
g a b = ((-1)^(a - b +1) * choose a b) % 1
coefficients :: Integral a => a -> a -> [Ratio a] -> [Ratio a]
coefficients p j cs
| j < 0 = cs
| otherwise = coefficients p (j - 1) (c:cs)
where
c = f / g (j + 1) j
f = foldl h (kroneckerDelta j p) (zip [j + 2..p + 1] cs)
h accum (i,cf) = accum - g i j * cf
trim r = let n = numerator r
d = denominator r
l = div n d
in (mod l (10^9 + 7),(n - d * l) % d)
s n p = numerator (a % 1 + b) where
(a,b) = foldl' (\(i',r') (i,r) -> (mod (i' + i) (10^9 + 7),r' + r)) (0,0)
(zipWith (\c i -> trim (c * n^i)) (coefficients p p []) [1..p + 1])
main = print (s (10^16) 1000)
I've discovered my own algorithm to calculate the coefficients of the polynomial obtained from Faulhaber's formula; it, its proof and several implementations can be found at github.com/fcard/PolySum. This question inspired me to include a c++ implementation (using the GMP library for arbitrary precision numbers), which, as of the time of writing and minus several usability features, is:
#include <gmpxx.h>
#include <vector>
namespace polysum {
typedef std::vector<mpq_class> mpq_row;
typedef std::vector<mpq_class> mpq_column;
typedef std::vector<mpq_row> mpq_matrix;
mpq_matrix make_matrix(size_t n) {
mpq_matrix A(n+1, mpq_row(n+2, 0));
A[0] = mpq_row(n+2, 1);
for (size_t i = 1; i < n+1; i++) {
for (size_t j = i; j < n+1; j++) {
A[i][j] += A[i-1][j];
A[i][j] *= (j - i + 2);
}
A[i][n+1] = A[i][n-1];
}
A[n][n+1] = A[n-1][n+1];
return A;
}
void reduced_row_echelon(mpq_matrix& A) {
size_t n = A.size() - 1;
for (size_t i = n; i+1 > 0; i--) {
A[i][n+1] /= A[i][i];
A[i][i] = 1;
for (size_t j = i-1; j+1 > 0; j--) {
auto p = A[j][i];
A[j][i] = 0;
A[j][n+1] -= A[i][n+1] * p;
}
}
}
mpq_column sum_coefficients(size_t n) {
auto A = make_matrix(n);
reduced_row_echelon(A);
mpq_column result;
for (auto row: A) {
result.push_back(row[n+1]);
}
return result;
}
}
We can use the above like so:
#include <cmath>
#include <gmpxx.h>
#include <polysum.h>
mpq_class power_sum(size_t K, unsigned int N) {
auto coeffs = polysum::sum_coefficients(K)
mpq_class result(0);
for (size_t i = 0; i <= K; i++) {
result += A[i][n+1] * pow(N, i+1);
}
return result;
}
The full implementation provides a Polynomial class that is printable and callable, as well as a polysum function to construct one as a sum of another polynomial.
#include "polysum.h"
void power_sum_print(size_t K, unsigned int N) {
auto F = polysum::polysum(K);
std::cout << "polynomial: " << F;
std::cout << "result: " << F(N);
}
As for efficiency, the above calculates the result for K=1000 and N=1e16 in about 1.75 seconds on my computer, compared to the much more mature and optimized SymPy implementation which takes about 90 seconds on the same machine, and mathematica which takes 30 seconds. For K=3000 the above takes about 4 minutes, mathematica took almost 20 minutes, (but uses much less memory) and I left sympy running all night but it didn't finish, maybe due to it running out of memory.
Among the optimizations that can be done here are making the matrix sparse and taking advantage of the fact that only half of the rows and columns need to be calculated. The Rust version in the linked repository implements the sparse and rows optimizations, and takes about 0.7 seconds to calculate K=1000, and about 45 to calculate K=3000 (using 105mb and 2.9gb of memory respectively). The Haskell version implements all three optimizations and takes about 1 second for K=1000 and about 34 seconds for K=3000. (using 60mb and 880mb of memory respectively) and The completely unoptimized python implementation takes about 12 seconds for K=1000 but runs out of memory for K=3000.
It's looking like this method is the fastest regardless of the language used, but the research is ongoing. Since Schultz's method also boils down to solving a system of n+1 equations and should be able to be optimized the same way, it will depend on whether his matrix is faster to calculate or not. Also, memory usage is not scaling well at all, and Mathematica is still the clear winner here, using only 80mb for K=3000. We'll see.
Cheers,
I know you can get the amount of combinations with the following formula (without repetition and order is not important):
// Choose r from n
n! / r!(n - r)!
However, I don't know how to implement this in C++, since for instance with
n = 52
n! = 8,0658175170943878571660636856404e+67
the number gets way too big even for unsigned __int64 (or unsigned long long). Is there some workaround to implement the formula without any third-party "bigint" -libraries?
Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long
unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.
UPDATE: There's a small possibility that the algorithm will overflow on the line:
r *= n--;
for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.
From Andreas' answer:
Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long
unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.
UPDATE: There's a small possibility that the algorithm will overflow on the line:
r *= n--;
for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.
Consider n == 67 and k == 33. The above algorithm overflows with a 64 bit unsigned long long. And yet the correct answer is representable in 64 bits: 14,226,520,737,620,288,370. And the above algorithm is silent about its overflow, choose(67, 33) returns:
8,829,174,638,479,413
A believable but incorrect answer.
However the above algorithm can be slightly modified to never overflow as long as the final answer is representable.
The trick is in recognizing that at each iteration, the division r/d is exact. Temporarily rewriting:
r = r * n / d;
--n;
For this to be exact, it means if you expanded r, n and d into their prime factorizations, then one could easily cancel out d, and be left with a modified value for n, call it t, and then the computation of r is simply:
// compute t from r, n and d
r = r * t;
--n;
A fast and easy way to do this is to find the greatest common divisor of r and d, call it g:
unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
--n;
Now we can do the same thing with d_temp and n (find the greatest common divisor). However since we know a-priori that r * n / d is exact, then we also know that gcd(d_temp, n) == d_temp, and therefore we don't need to compute it. So we can divide n by d_temp:
unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
// now one can divide n by d/g without truncation
unsigned long long t = n / d_temp;
r = r * t;
--n;
Cleaning up:
unsigned long long
gcd(unsigned long long x, unsigned long long y)
{
while (y != 0)
{
unsigned long long t = x % y;
x = y;
y = t;
}
return x;
}
unsigned long long
choose(unsigned long long n, unsigned long long k)
{
if (k > n)
throw std::invalid_argument("invalid argument in choose");
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d, --n)
{
unsigned long long g = gcd(r, d);
r /= g;
unsigned long long t = n / (d / g);
if (r > std::numeric_limits<unsigned long long>::max() / t)
throw std::overflow_error("overflow in choose");
r *= t;
}
return r;
}
Now you can compute choose(67, 33) without overflow. And if you try choose(68, 33), you'll get an exception instead of a wrong answer.
The following routine will compute the n-choose-k, using the recursive definition and memoization. The routine is extremely fast and accurate:
inline unsigned long long n_choose_k(const unsigned long long& n,
const unsigned long long& k)
{
if (n < k) return 0;
if (0 == n) return 0;
if (0 == k) return 1;
if (n == k) return 1;
if (1 == k) return n;
typedef unsigned long long value_type;
value_type* table = new value_type[static_cast<std::size_t>(n * n)];
std::fill_n(table,n * n,0);
class n_choose_k_impl
{
public:
n_choose_k_impl(value_type* table,const value_type& dimension)
: table_(table),
dimension_(dimension)
{}
inline value_type& lookup(const value_type& n, const value_type& k)
{
return table_[dimension_ * n + k];
}
inline value_type compute(const value_type& n, const value_type& k)
{
if ((0 == k) || (k == n))
return 1;
value_type v1 = lookup(n - 1,k - 1);
if (0 == v1)
v1 = lookup(n - 1,k - 1) = compute(n - 1,k - 1);
value_type v2 = lookup(n - 1,k);
if (0 == v2)
v2 = lookup(n - 1,k) = compute(n - 1,k);
return v1 + v2;
}
value_type* table_;
value_type dimension_;
};
value_type result = n_choose_k_impl(table,n).compute(n,k);
delete [] table;
return result;
}
Remember that
n! / ( n - r )! = n * ( n - 1) * .. * (n - r + 1 )
so it's way smaller than n!. So the solution is to evaluate n* ( n - 1 ) * ... * ( n - r + 1) instead of first calculating n! and then dividing it .
Of course it all depends on the relative magnitude of n and r - if r is relatively big compared to n, then it still won't fit.
Well, I have to answer to my own question. I was reading about Pascal's triangle and by accident noticed that we can calculate the amount of combinations with it:
#include <iostream>
#include <boost/cstdint.hpp>
boost::uint64_t Combinations(unsigned int n, unsigned int r)
{
if (r > n)
return 0;
/** We can use Pascal's triange to determine the amount
* of combinations. To calculate a single line:
*
* v(r) = (n - r) / r
*
* Since the triangle is symmetrical, we only need to calculate
* until r -column.
*/
boost::uint64_t v = n--;
for (unsigned int i = 2; i < r + 1; ++i, --n)
v = v * n / i;
return v;
}
int main()
{
std::cout << Combinations(52, 5) << std::endl;
}
Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive. This is certainly true of the related problem of calculating factorial (see Click here for example).
Here is a simple algorithm based on the Sieve of Eratosthenes that calculates the prime factorization. The idea is basically to go through the primes as you find them using the sieve, but then also to calculate how many of their multiples fall in the ranges [1, k] and [n-k+1,n]. The Sieve is essentially an O(n \log \log n) algorithm, but there is no multiplication done. The actual number of multiplications necessary once the prime factorization is found is at worst O\left(\frac{n \log \log n}{\log n}\right) and there are probably faster ways than that.
prime_factors = []
n = 20
k = 10
composite = [True] * 2 + [False] * n
for p in xrange(n + 1):
if composite[p]:
continue
q = p
m = 1
total_prime_power = 0
prime_power = [0] * (n + 1)
while True:
prime_power[q] = prime_power[m] + 1
r = q
if q <= k:
total_prime_power -= prime_power[q]
if q > n - k:
total_prime_power += prime_power[q]
m += 1
q += p
if q > n:
break
composite[q] = True
prime_factors.append([p, total_prime_power])
print prime_factors
Using a dirty trick with a long double, it is possible to get the same accuracy as Howard Hinnant (and probably more):
unsigned long long n_choose_k(int n, int k)
{
long double f = n;
for (int i = 1; i<k+1; i++)
f /= i;
for (int i=1; i<k; i++)
f *= n - i;
unsigned long long f_2 = std::round(f);
return f_2;
}
The idea is to divide first by k! and then to multiply by n(n-1)...(n-k+1). The approximation through the double can be avoided by inverting the order of the for loop.
Improves Howard Hinnant's answer (in this question) a little bit:
Calling gcd() per loop seems a bit slow.
We could aggregate the gcd() call into the last one, while making the most use of the standard algorithm from Knuth's book "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms":
const uint64_t u64max = std::numeric_limits<uint64_t>::max();
uint64_t choose(uint64_t n, uint64_t k)
{
if (k > n)
throw std::invalid_argument(std::string("invalid argument in ") + __func__);
if (k > n - k)
k = n - k;
uint64_t r = 1;
uint64_t d;
for (d = 1; d <= k; ++d) {
if (r > u64max / n)
break;
r *= n--;
r /= d;
}
if (d > k)
return r;
// Let N be the original n,
// n is the current n (when we reach here)
// We want to calculate C(N,k),
// Currently we already calculated the r value so far:
// r = C(N, n) = C(N, N-n) = C(N, d-1)
// Note that N-n = d-1
// In addition we know the following identity formula:
// C(N,k) = C(N,d-1) * C(N-d+1, k-d+1) / C(k, k-d+1)
// = C(N,d-1) * C(n, k-d+1) / C(k, k-d+1)
// Using this formula, we effectively reduce the calculation,
// while recursively use the same function.
uint64_t b = choose(n, k-d+1);
if (b == u64max) {
return u64max; // overflow
}
uint64_t c = choose(k, k-d+1);
if (c == u64max) {
return u64max; // overflow
}
// Now, the combinatorial should be r * b / c
// We can use gcd() to calculate this:
// We Pick b for gcd: b < r almost (if not always) in all cases
uint64_t g = gcd(b, c);
b /= g;
c /= g;
r /= c;
if (r > u64max / b)
return u64max; // overflow
return r * b;
}
Note that the recursive depth is normally 2 (I don't really see a case goes to 3, the combinatorial reducing is quite decent.), i.e. calling choose() for 3 times, for non-overflow cases.
Replace uint64_t with unsigned long long if you prefer it.
One of SHORTEST way :
int nChoosek(int n, int k){
if (k > n) return 0;
if (k == 0) return 1;
return nChoosek(n - 1, k) + nChoosek(n - 1, k - 1);
}
If you want to be 100% sure that no overflows occur so long as the final result is within the numeric limit, you can sum up Pascal's Triangle row-by-row:
for (int i=0; i<n; i++) {
for (int j=0; j<=i; j++) {
if (j == 0) current_row[j] = 1;
else current_row[j] = prev_row[j] + prev_row[j-1];
}
prev_row = current_row; // assume they are vectors
}
// result is now in current_row[r-1]
However, this algorithm is much slower than the multiplication one. So perhaps you could use multiplication to generate all the cases you know that are 'safe' and then use addition from there. (.. or you could just use a BigInt library).