I would like to write this formula in C++ language:
(2<=n<=1e5), (1<=k<=n), (2<=M<=1e9).
I would like to do this without using special structures.
Unfortunately in this formula there are a lot of cases which effectively make modulation difficult. Example: ((n-k)!) mod M can be equal to 0, or ((n-1)(n-2))/4 may not be an integer. I will be very grateful for any help.
(n−1)!/(n−k)! can be handled by computing the product (n−k+1)…(n−1).
(n−1)! (n−1)(n−2)/4 can be handled by handling n ≤ 2 (0) and n ≥ 3
(3…(n−1) (n−1)(n−2)/2) separately.
Untested C++:
#include <cassert>
#include <cstdint>
class Residue {
public:
// Accept int64_t for convenience.
explicit Residue(int64_t rep, int32_t modulus) : modulus_(modulus) {
assert(modulus > 0);
rep_ = rep % modulus;
if (rep_ < 0)
rep_ += modulus;
}
// Return int64_t for convenience.
int64_t rep() const { return rep_; }
int32_t modulus() const { return modulus_; }
private:
int32_t rep_;
int32_t modulus_;
};
Residue operator+(Residue a, Residue b) {
assert(a.modulus() == b.modulus());
return Residue(a.rep() + b.rep(), a.modulus());
}
Residue operator-(Residue a, Residue b) {
assert(a.modulus() == b.modulus());
return Residue(a.rep() - b.rep(), a.modulus());
}
Residue operator*(Residue a, Residue b) {
assert(a.modulus() == b.modulus());
return Residue(a.rep() * b.rep(), a.modulus());
}
Residue QuotientOfFactorialsMod(int32_t a, int32_t b, int32_t modulus) {
assert(modulus > 0);
assert(b >= 0);
assert(a >= b);
Residue result(1, modulus);
// Don't initialize with b + 1 because it could overflow.
for (int32_t i = b; i < a; i++) {
result = result * Residue(i + 1, modulus);
}
return result;
}
Residue FactorialMod(int32_t a, int32_t modulus) {
assert(modulus > 0);
assert(a >= 0);
return QuotientOfFactorialsMod(a, 0, modulus);
}
Residue Triangular(int32_t a, int32_t modulus) {
assert(modulus > 0);
return Residue((static_cast<int64_t>(a) + 1) * a / 2, modulus);
}
Residue F(int32_t n, int32_t k, int32_t m) {
assert(n >= 2);
assert(n <= 100000);
assert(k >= 1);
assert(k <= n);
assert(m >= 2);
assert(m <= 1000000000);
Residue n_res(n, m);
Residue n_minus_1(n - 1, m);
Residue n_minus_2(n - 2, m);
Residue k_res(k, m);
Residue q = QuotientOfFactorialsMod(n - 1, n - k, m);
return q * (k_res - n_res) * n_minus_1 +
(FactorialMod(n - 1, m) - q) * k_res * n_minus_1 +
(n > 2 ? QuotientOfFactorialsMod(n - 1, 2, m) *
(n_res * n_minus_1 + Triangular(n - 2, m))
: Residue(1, m));
}
As mentioned in the other answer dividing factorials can be evaluated directly without division. Also you need 64bit arithmetics in order to store your subresults. And use modulo after each multiplication otherwise you would need very huge numbers which would take forever to compute.
Also you mention ((n-1)(n-2))/4 can be non just integer how to deal with that is questionable as we do not have any context to what you are doing. However you can move /2 before brackets (apply it on (n-1)! so modpi without 2 beware not to divide the already modded factorial!!!) and then you have no remainder as the (n-1)*(n-2)/4 become (n-1)*(n-2)/2 and the (n-1)*(n-2) is always odd (divisible by 2). The only "problem" is when n=2 as the n*(n-1)/2 is 1 but the /2 moved before bracket will round down the (n-1)! so you should handle it as special case by not moving the /2 before brackets (not included in code below).
I see it like this:
typedef unsigned __int64 u64;
u64 modpi(u64 x0,u64 x1,u64 p) // ( x0*(x0+1)*(x0+2)*...*x1 ) mod p
{
u64 x,y;
if (x0>x1){ x=x0; x0=x1; x1=x; }
for (y=1,x=x0;x<=x1;x++){ y*=x; y%=p; }
return y;
}
void main()
{
u64 n=100,k=20,m=123456789,a,b,b2,c,y;
a =modpi(n-k+1,n-1,m); // (n-1)!/(n-k)!
b =modpi(1,n-1,m); // (n-1)! mod m
b2=modpi(3,n-1,m); // (n-1)!/2 mod m
c =((n*(n-1)))%m; // 2*( n*(n-1)/2 + (n-1)*(n-2)/4 ) mod m
c+=(((n-1)*(n-2))/2)%m;
y =(((a*(k-n))%m)*(n-1))%m; // ((n-1)!/(n-k)!)*(k-1)*(n-1) mod m
y+=b; // (n-1)! mod m
y-=(((a*k)%m)*(n-1))%m; // ((n-1)!/(n-k)!)*k*(n-1) mod m
y+=(b2*c)%m; // (n-1)!*( n*(n-1)/2 + (n-1)*(n-2)/4 ) mod m
// here y should hold your answer
}
however be careful older compilers do not have full support of 64 bit integers and can produce wrong results or even does not compile. In such case use big integer lib or compute using 2*32bit variables or look for 32 bit modmul implementation.
The expression implies the use of a floating point type. Therefore, use the function fmod to get the remainder of the division.
Related
How to calculate a!/(b1! b2! ... bm!) modulo p, where p is a prime number? The factorial of a and b can be very big (long long int is not sufficient) so I need to pass to modulo.
If a, bs and p are fairly small, prefer #KellyBundy's approach of cancelling factors, or counting prime factors.
Multiplication and modular arithmetic
Given integers m and n and some other integer k:
(m * n) modulo k = ((m modulo k) * (n mod k)) modulo k
This allows a large product to be calculated modulo p without worrying about overflow, since we can always keep the arguments in the range [0, k).
For example to compute the factorial a! modulo k, in python:
def fact(a, k):
if a == 0:
return 1
else:
return ((a % k) * fact(a - 1, k)) % k
Division and modular arithmetic
If p is a prime then for any integer n that is not divisible by p, we can find an integer which I'll call inv(n) such that:
(n * inv(n)) modulo p = 1
This number is called the modular inverse of n. There are various algorithms to find modular inverses, which I won't describe here (but see e.g. here).
Now, given integers n and m, and assuming that m / n is an integer, we can apply the rule:
(m / n) modulo p = (m * inv(n)) modulo p
So provided we can calculate modular inverses, we can convert division to multiplication, and then apply the previous rule.
Another way, listing the factors 1 to a, then canceling with all divisors, then multiplying modulo p:
#include <iostream>
#include <vector>
int gcd(int a, int b) {
return b ? gcd(b, a % b) : a;
}
int main() {
int a = 60;
std::vector<int> bs = {13, 7, 19};
int p = 10007;
std::vector<int> factors(a);
for (int i=0; i<a; i++)
factors[i] = i + 1;
for (int b : bs) {
while (b > 1) {
int d = b--;
for (int& f : factors) {
int g = gcd(f, d);
f /= g;
d /= g;
}
}
}
int result = 1;
for (int f : factors)
result = result * f % p;
std::cout << result;
}
Prints 5744, same as this Python code:
from math import factorial, prod
a = 60
bs = [13, 7, 19]
p = 10007
num = factorial(a)
den = prod(map(factorial, bs))
print(num // den % p)
I want a function
int rounded_division(const int a, const int b) {
return round(1.0 * a/b);
}
So we have, for example,
rounded_division(3, 2) // = 2
rounded_division(2, 2) // = 1
rounded_division(1, 2) // = 1
rounded_division(0, 2) // = 0
rounded_division(-1, 2) // = -1
rounded_division(-2, 2) // = -1
rounded_division(-3, -2) // = 2
Or in code, where a and b are 32 bit signed integers:
int rounded_division(const int a, const int b) {
return ((a < 0) ^ (b < 0)) ? ((a - b / 2) / b) : ((a + b / 2) / b);
}
And here comes the tricky part: How to implement this guy efficiently (not using larger 64 bit values) and without a logical operators such as ?:, &&, ...? Is it possible at all?
The reason why I am wondering of avoiding logical operators, because the processor I have to implement this function for, has no conditional instructions (more about missing conditional instructions on ARM.).
a/b + a%b/(b/2 + b%2) works quite well - not failed in billion+ test cases. It meets all OP's goals: No overflow, no long long, no branching, works over entire range of int when a/b is defined.
No 32-bit dependency. If using C99 or later, no implementation behavior restrictions.
int rounded_division(int a, int b) {
int q = a / b;
int r = a % b;
return q + r/(b/2 + b%2);
}
This works with 2's complement, 1s' complement and sign-magnitude as all operations are math ones.
How about this:
int rounded_division(const int a, const int b) {
return (a + b/2 + b * ((a^b) >> 31))/b;
}
(a ^ b) >> 31 should evaluate to -1 if a and b have different signs and 0 otherwise, assuming int has 32 bits and the leftmost is the sign bit.
EDIT
As pointed out by #chux in his comments this method is wrong due to integer division. This new version evaluates the same as OP's example, but contains a bit more operations.
int rounded_division(const int a, const int b) {
return (a + b * (1 + 2 * ((a^b) >> 31)) / 2)/b;
}
This version still however does not take into account the overflow problem.
What about
...
return ((a + (a*b)/abs(a*b) * b / 2) / b);
}
Without overflow:
...
return ((a + ((a/abs(a))*(b/abs(b))) * b / 2) / b);
}
This is a rough approach that you may use. Using a mask to apply something if the operation a*b < 0.
Please note that I did not test this appropriately.
int function(int a, int b){
int tmp = float(a)/b + 0.5;
int mask = (a*b) >> 31; // shift sign bit to set rest of the bits
return tmp - (1 & mask);//minus one if a*b was < 0
}
The following rounded_division_test1() meets OP's requirement of no branching - if one counts sign(int a), nabs(int a), and cmp_le(int a, int b) as non-branching. See here for ideas of how to do sign() without compare operators. These helper functions could be rolled into rounded_division_test1() without explicit calls.
The code demonstrates the correct functionality and is useful for testing various answers. When a/b is defined, this answer does not overflow.
#include <limits.h>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
int nabs(int a) {
return (a < 0) * a - (a >= 0) * a;
}
int sign(int a) {
return (a > 0) - (a < 0);
}
int cmp_le(int a, int b) {
return (a <= b);
}
int rounded_division_test1(int a, int b) {
int q = a / b;
int r = a % b;
int flag = cmp_le(nabs(r), (nabs(b) / 2 + nabs(b % 2)));
return q + flag * sign(b) * sign(r);
}
// Alternative that uses long long
int rounded_division_test1LL(int a, int b) {
int c = (a^b)>>31;
return (a + (c*2 + 1)*1LL*b/2)/b;
}
// Reference code
int rounded_division(int a, int b) {
return round(1.0*a/b);
}
int test(int a, int b) {
int q0 = rounded_division(a, b);
//int q1 = function(a,b);
int q1 = rounded_division_test1(a, b);
if (q0 != q1) {
printf("%d %d --> %d %d\n", a, b, q0, q1);
fflush(stdout);
}
return q0 != q1;
}
void tests(void) {
int err = 0;
int const a[] = { INT_MIN, INT_MIN + 1, INT_MIN + 1, -3, -2, -1, 0, 1, 2, 3,
INT_MAX - 1, INT_MAX };
for (unsigned i = 0; i < sizeof a / sizeof a[0]; i++) {
for (unsigned j = 0; j < sizeof a / sizeof a[0]; j++) {
if (a[j] == 0) continue;
if (a[i] == INT_MIN && a[j] == -1) continue;
err += test(a[i], a[j]);
}
}
printf("Err %d\n", err);
}
int main(void) {
tests();
return 0;
}
Let me give my contribution:
What about:
int rounded_division(const int a, const int b) {
return a/b + (2*(a%b))/b;
}
No branch, no logical operators, only mathematical operators. But it could fail if b is great than INT_MAX/2 or less than INT_MIN/2.
But if 64 bits are allowed to compute 32 bits rounds. It will not fail
int rounded_division(const int a, const int b) {
return a/b + (2LL*(a%b))/b;
}
Code that I came up with for use on ARM M0 (no floating point, slow divide).
It only uses one divide instruction and no conditionals, but will overflow if numerator + (denominator/2) > INT_MAX.
Cycle count on ARM M0 = 7 cycles + the divide (M0 has no divide instruction, so it is toolchain dependant).
int32_t Int32_SignOf(int32_t val)
{
return (+1 | (val >> 31)); // if v < 0 then -1, else +1
}
uint32_t Int32_Abs(int32_t val)
{
int32_t tmp = val ^ (val >> 31);
return (tmp - (val >> 31));
// the following code looks like it should be faster, using subexpression elimination
// except on arm a bitshift is free when performed with another operation,
// so it would actually end up being slower
// tmp = val >> 31;
// dst = val ^ (tmp);
// dst -= tmp;
// return dst;
}
int32_t Int32_DivRound(int32_t numerator, int32_t denominator)
{
// use the absolute (unsigned) demominator in the fudge value
// as the divide by 2 then becomes a bitshift
int32_t sign_num = Int32_SignOf(numerator);
uint32_t abs_denom = Int32_Abs(denominator);
return (numerator + sign_num * ((int32_t)(abs_denom / 2u))) / denominator;
}
since the function seems to be symmetric how about sign(a/b)*floor(abs(a/b)+0.5)
This problem's answer turns out to be calculating large binomial coefficients modulo prime number using Lucas' theorem. Here's the solution to that problem using this technique: here.
Now my questions are:
Seems like my code expires if the data increases due to overflow of variables. Any ways to handle this?
Are there any ways to do this without using this theorem?
EDIT: note that as this is an OI or ACM problem, external libs other than original ones are not permitted.
Code below:
#include <iostream>
#include <string.h>
#include <stdio.h>
using namespace std;
#define N 100010
long long mod_pow(int a,int n,int p)
{
long long ret=1;
long long A=a;
while(n)
{
if (n & 1)
ret=(ret*A)%p;
A=(A*A)%p;
n>>=1;
}
return ret;
}
long long factorial[N];
void init(long long p)
{
factorial[0] = 1;
for(int i = 1;i <= p;i++)
factorial[i] = factorial[i-1]*i%p;
//for(int i = 0;i < p;i++)
//ni[i] = mod_pow(factorial[i],p-2,p);
}
long long Lucas(long long a,long long k,long long p)
{
long long re = 1;
while(a && k)
{
long long aa = a%p;long long bb = k%p;
if(aa < bb) return 0;
re = re*factorial[aa]*mod_pow(factorial[bb]*factorial[aa-bb]%p,p-2,p)%p;
a /= p;
k /= p;
}
return re;
}
int main()
{
int t;
cin >> t;
while(t--)
{
long long n,m,p;
cin >> n >> m >> p;
init(p);
cout << Lucas(n+m,m,p) << "\n";
}
return 0;
}
This solution assumes that p2 fits into an unsigned long long. Since an unsigned long long has at least 64 bits as per standard, this works at least for p up to 4 billion, much more than the question specifies.
typedef unsigned long long num;
/* x such that a*x = 1 mod p */
num modinv(num a, num p)
{
/* implement this one on your own */
/* you can use the extended Euclidean algorithm */
}
/* n chose m mod p */
/* computed with the theorem of Lucas */
num modbinom(num n, num m, num p)
{
num i, result, divisor, n_, m_;
if (m == 0)
return 1;
/* check for the likely case that the result is zero */
if (n < m)
return 0;
for (n_ = n, m_ = m; m_ > 0; n_ /= p, m_ /= p)
if (n_ % p < m_ % p)
return 0;
for (result = 1; n >= p || m >= p; n /= p, m /= p) {
result *= modbinom(n % p, m % p, p);
result %= p;
}
/* avoid unnecessary computations */
if (m > n - m)
m = n - m;
divisor = 1;
for (i = 0; i < m; i++) {
result *= n - i;
result %= p;
divisor *= i + 1;
divisor %= p;
}
result *= modinv(divisor, p);
result %= p;
return result;
}
An infinite precision integer seems like the way to go.
If you are in C++,
the PicklingTools library has an "infinite precision" integer (similar to
Python's LONG type). Someone else suggested Python, that's a reasonable
answer if you know Python. if you want to do it in C++, you can
use the int_n type:
#include "ocval.h"
int_n n="012345678910227836478627843";
n = n + 1; // Can combine with other plain ints as well
Take a look at the documentation at:
http://www.picklingtools.com/html/usersguide.html#c-int-n-and-the-python-arbitrary-size-ints-long
and
http://www.picklingtools.com/html/faq.html#c-and-otab-tup-int-un-int-n-new-in-picklingtools-1-2-0
The download for the C++ PicklingTools is here.
You want a bignum (a.k.a. arbitrary precision arithmetic) library.
First, don't write your own bignum (or bigint) library, because efficient algorithms (more efficient than the naive ones you learned at school) are difficult to design and implement.
Then, I would recommend GMPlib. It is free software, well documented, often used, quite efficient, and well designed (with perhaps some imperfections, in particular the inability to plugin your own memory allocator in replacement of the system malloc; but you probably don't care, unless you want to catch the rare out-of-memory condition ...). It has an easy C++ interface. It is packaged in most Linux distributions.
If it is a homework assignment, perhaps your teacher is expecting you to think more on the math, and find, with some proof, a way of solving the problem without any bignums.
Lets suppose that we need to compute a value of (a / b) mod p where p is a prime number. Since p is prime then every number b has an inverse mod p. So (a / b) mod p = (a mod p) * (b mod p)^-1. We can use euclidean algorithm to compute the inverse.
To get (n over k) we need to compute n! mod p, (k!)^-1, ((n - k)!)^-1. Total time complexity is O(n).
UPDATE: Here is the code in c++. I didn't test it extensively though.
int64_t fastPow(int64_t a, int64_t exp, int64_t mod)
{
int64_t res = 1;
while (exp)
{
if (exp % 2 == 1)
{
res *= a;
res %= mod;
}
a *= a;
a %= mod;
exp >>= 1;
}
return res;
}
// This inverse works only for primes p, it uses Fermat's little theorem
int64_t inverse(int64_t a, int64_t p)
{
assert(p >= 2);
return fastPow(a, p - 2, p);
}
int64_t binomial(int64_t n, int64_t k, int64_t p)
{
std::vector<int64_t> fact(n + 1);
fact[0] = 1;
for (auto i = 1; i <= n; ++i)
fact[i] = (fact[i - 1] * i) % p;
return ((((fact[n] * inverse(fact[k], p)) % p) * inverse(fact[n - k], p)) % p);
}
I want to find (n choose r) for large integers, and I also have to find out the mod of that number.
long long int choose(int a,int b)
{
if (b > a)
return (-1);
if(b==0 || a==1 || b==a)
return(1);
else
{
long long int r = ((choose(a-1,b))%10000007+(choose(a-1,b- 1))%10000007)%10000007;
return r;
}
}
I am using this piece of code, but I am getting TLE. If there is some other method to do that please tell me.
I don't have the reputation to comment yet, but I wanted to point out that the answer by rock321987 works pretty well:
It is fast and correct up to and including C(62, 31)
but cannot handle all inputs that have an output that fits in a uint64_t. As proof, try:
C(67, 33) = 14,226,520,737,620,288,370 (verify correctness and size)
Unfortunately, the other implementation spits out 8,829,174,638,479,413 which is incorrect. There are other ways to calculate nCr which won't break like this, however the real problem here is that there is no attempt to take advantage of the modulus.
Notice that p = 10000007 is prime, which allows us to leverage the fact that all integers have an inverse mod p, and that inverse is unique. Furthermore, we can find that inverse quite quickly. Another question has an answer on how to do that here, which I've replicated below.
This is handy since:
x/y mod p == x*(y inverse) mod p; and
xy mod p == (x mod p)(y mod p)
Modifying the other code a bit, and generalizing the problem we have the following:
#include <iostream>
#include <assert.h>
// p MUST be prime and less than 2^63
uint64_t inverseModp(uint64_t a, uint64_t p) {
assert(p < (1ull << 63));
assert(a < p);
assert(a != 0);
uint64_t ex = p-2, result = 1;
while (ex > 0) {
if (ex % 2 == 1) {
result = (result*a) % p;
}
a = (a*a) % p;
ex /= 2;
}
return result;
}
// p MUST be prime
uint32_t nCrModp(uint32_t n, uint32_t r, uint32_t p)
{
assert(r <= n);
if (r > n-r) r = n-r;
if (r == 0) return 1;
if(n/p - (n-r)/p > r/p) return 0;
uint64_t result = 1; //intermediary results may overflow 32 bits
for (uint32_t i = n, x = 1; i > r; --i, ++x) {
if( i % p != 0) {
result *= i % p;
result %= p;
}
if( x % p != 0) {
result *= inverseModp(x % p, p);
result %= p;
}
}
return result;
}
int main() {
uint32_t smallPrime = 17;
uint32_t medNum = 3001;
uint32_t halfMedNum = medNum >> 1;
std::cout << nCrModp(medNum, halfMedNum, smallPrime) << std::endl;
uint32_t bigPrime = 4294967291ul; // 2^32-5 is largest prime < 2^32
uint32_t bigNum = 1ul << 24;
uint32_t halfBigNum = bigNum >> 1;
std::cout << nCrModp(bigNum, halfBigNum, bigPrime) << std::endl;
}
Which should produce results for any set of 32-bit inputs if you are willing to wait. To prove a point, I've included the calculation for a 24-bit n, and the maximum 32-bit prime. My modest PC took ~13 seconds to calculate this. Check the answer against wolfram alpha, but beware that it may exceed the 'standard computation time' there.
There is still room for improvement if p is much smaller than (n-r) where r <= n-r. For example, we could precalculate all the inverses mod p instead of doing it on demand several times over.
nCr = n! / (r! * (n-r)!) {! = factorial}
now choose r or n - r in such a way that any of them is minimum
#include <cstdio>
#include <cmath>
#define MOD 10000007
int main()
{
int n, r, i, x = 1;
long long int res = 1;
scanf("%d%d", &n, &r);
int mini = fmin(r, (n - r));//minimum of r,n-r
for (i = n;i > mini;i--) {
res = (res * i) / x;
x++;
}
printf("%lld\n", res % MOD);
return 0;
}
it will work for most cases as required by programming competitions if the value of n and r are not too high
Time complexity :- O(min(r, n - r))
Limitation :- for languages like C/C++ etc. there will be overflow if
n > 60 (approximately)
as no datatype can store the final value..
The expansion of nCr can always be reduced to product of integers. This is done by canceling out terms in denominator. This approach is applied in the function given below.
This function has time complexity of O(n^2 * log(n)). This will calculate nCr % m for n<=10000 under 1 sec.
#include <numeric>
#include <algorithm>
int M=1e7+7;
int ncr(int n, int r)
{
r=min(r,n-r);
int A[r],i,j,B[r];
iota(A,A+r,n-r+1); //initializing A starting from n-r+1 to n
iota(B,B+r,1); //initializing B starting from 1 to r
int g;
for(i=0;i<r;i++)
for(j=0;j<r;j++)
{
if(B[i]==1)
break;
g=__gcd(B[i], A[j] );
A[j]/=g;
B[i]/=g;
}
long long ans=1;
for(i=0;i<r;i++)
ans=(ans*A[i])%M;
return ans;
}
Cheers,
I know you can get the amount of combinations with the following formula (without repetition and order is not important):
// Choose r from n
n! / r!(n - r)!
However, I don't know how to implement this in C++, since for instance with
n = 52
n! = 8,0658175170943878571660636856404e+67
the number gets way too big even for unsigned __int64 (or unsigned long long). Is there some workaround to implement the formula without any third-party "bigint" -libraries?
Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long
unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.
UPDATE: There's a small possibility that the algorithm will overflow on the line:
r *= n--;
for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.
From Andreas' answer:
Here's an ancient algorithm which is exact and doesn't overflow unless the result is to big for a long long
unsigned long long
choose(unsigned long long n, unsigned long long k) {
if (k > n) {
return 0;
}
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d) {
r *= n--;
r /= d;
}
return r;
}
This algorithm is also in Knuth's "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms" I think.
UPDATE: There's a small possibility that the algorithm will overflow on the line:
r *= n--;
for very large n. A naive upper bound is sqrt(std::numeric_limits<long long>::max()) which means an n less than rougly 4,000,000,000.
Consider n == 67 and k == 33. The above algorithm overflows with a 64 bit unsigned long long. And yet the correct answer is representable in 64 bits: 14,226,520,737,620,288,370. And the above algorithm is silent about its overflow, choose(67, 33) returns:
8,829,174,638,479,413
A believable but incorrect answer.
However the above algorithm can be slightly modified to never overflow as long as the final answer is representable.
The trick is in recognizing that at each iteration, the division r/d is exact. Temporarily rewriting:
r = r * n / d;
--n;
For this to be exact, it means if you expanded r, n and d into their prime factorizations, then one could easily cancel out d, and be left with a modified value for n, call it t, and then the computation of r is simply:
// compute t from r, n and d
r = r * t;
--n;
A fast and easy way to do this is to find the greatest common divisor of r and d, call it g:
unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
--n;
Now we can do the same thing with d_temp and n (find the greatest common divisor). However since we know a-priori that r * n / d is exact, then we also know that gcd(d_temp, n) == d_temp, and therefore we don't need to compute it. So we can divide n by d_temp:
unsigned long long g = gcd(r, d);
// now one can divide both r and d by g without truncation
r /= g;
unsigned long long d_temp = d / g;
// now one can divide n by d/g without truncation
unsigned long long t = n / d_temp;
r = r * t;
--n;
Cleaning up:
unsigned long long
gcd(unsigned long long x, unsigned long long y)
{
while (y != 0)
{
unsigned long long t = x % y;
x = y;
y = t;
}
return x;
}
unsigned long long
choose(unsigned long long n, unsigned long long k)
{
if (k > n)
throw std::invalid_argument("invalid argument in choose");
unsigned long long r = 1;
for (unsigned long long d = 1; d <= k; ++d, --n)
{
unsigned long long g = gcd(r, d);
r /= g;
unsigned long long t = n / (d / g);
if (r > std::numeric_limits<unsigned long long>::max() / t)
throw std::overflow_error("overflow in choose");
r *= t;
}
return r;
}
Now you can compute choose(67, 33) without overflow. And if you try choose(68, 33), you'll get an exception instead of a wrong answer.
The following routine will compute the n-choose-k, using the recursive definition and memoization. The routine is extremely fast and accurate:
inline unsigned long long n_choose_k(const unsigned long long& n,
const unsigned long long& k)
{
if (n < k) return 0;
if (0 == n) return 0;
if (0 == k) return 1;
if (n == k) return 1;
if (1 == k) return n;
typedef unsigned long long value_type;
value_type* table = new value_type[static_cast<std::size_t>(n * n)];
std::fill_n(table,n * n,0);
class n_choose_k_impl
{
public:
n_choose_k_impl(value_type* table,const value_type& dimension)
: table_(table),
dimension_(dimension)
{}
inline value_type& lookup(const value_type& n, const value_type& k)
{
return table_[dimension_ * n + k];
}
inline value_type compute(const value_type& n, const value_type& k)
{
if ((0 == k) || (k == n))
return 1;
value_type v1 = lookup(n - 1,k - 1);
if (0 == v1)
v1 = lookup(n - 1,k - 1) = compute(n - 1,k - 1);
value_type v2 = lookup(n - 1,k);
if (0 == v2)
v2 = lookup(n - 1,k) = compute(n - 1,k);
return v1 + v2;
}
value_type* table_;
value_type dimension_;
};
value_type result = n_choose_k_impl(table,n).compute(n,k);
delete [] table;
return result;
}
Remember that
n! / ( n - r )! = n * ( n - 1) * .. * (n - r + 1 )
so it's way smaller than n!. So the solution is to evaluate n* ( n - 1 ) * ... * ( n - r + 1) instead of first calculating n! and then dividing it .
Of course it all depends on the relative magnitude of n and r - if r is relatively big compared to n, then it still won't fit.
Well, I have to answer to my own question. I was reading about Pascal's triangle and by accident noticed that we can calculate the amount of combinations with it:
#include <iostream>
#include <boost/cstdint.hpp>
boost::uint64_t Combinations(unsigned int n, unsigned int r)
{
if (r > n)
return 0;
/** We can use Pascal's triange to determine the amount
* of combinations. To calculate a single line:
*
* v(r) = (n - r) / r
*
* Since the triangle is symmetrical, we only need to calculate
* until r -column.
*/
boost::uint64_t v = n--;
for (unsigned int i = 2; i < r + 1; ++i, --n)
v = v * n / i;
return v;
}
int main()
{
std::cout << Combinations(52, 5) << std::endl;
}
Getting the prime factorization of the binomial coefficient is probably the most efficient way to calculate it, especially if multiplication is expensive. This is certainly true of the related problem of calculating factorial (see Click here for example).
Here is a simple algorithm based on the Sieve of Eratosthenes that calculates the prime factorization. The idea is basically to go through the primes as you find them using the sieve, but then also to calculate how many of their multiples fall in the ranges [1, k] and [n-k+1,n]. The Sieve is essentially an O(n \log \log n) algorithm, but there is no multiplication done. The actual number of multiplications necessary once the prime factorization is found is at worst O\left(\frac{n \log \log n}{\log n}\right) and there are probably faster ways than that.
prime_factors = []
n = 20
k = 10
composite = [True] * 2 + [False] * n
for p in xrange(n + 1):
if composite[p]:
continue
q = p
m = 1
total_prime_power = 0
prime_power = [0] * (n + 1)
while True:
prime_power[q] = prime_power[m] + 1
r = q
if q <= k:
total_prime_power -= prime_power[q]
if q > n - k:
total_prime_power += prime_power[q]
m += 1
q += p
if q > n:
break
composite[q] = True
prime_factors.append([p, total_prime_power])
print prime_factors
Using a dirty trick with a long double, it is possible to get the same accuracy as Howard Hinnant (and probably more):
unsigned long long n_choose_k(int n, int k)
{
long double f = n;
for (int i = 1; i<k+1; i++)
f /= i;
for (int i=1; i<k; i++)
f *= n - i;
unsigned long long f_2 = std::round(f);
return f_2;
}
The idea is to divide first by k! and then to multiply by n(n-1)...(n-k+1). The approximation through the double can be avoided by inverting the order of the for loop.
Improves Howard Hinnant's answer (in this question) a little bit:
Calling gcd() per loop seems a bit slow.
We could aggregate the gcd() call into the last one, while making the most use of the standard algorithm from Knuth's book "The Art of Computer Programming, 3rd Edition, Volume 2: Seminumerical Algorithms":
const uint64_t u64max = std::numeric_limits<uint64_t>::max();
uint64_t choose(uint64_t n, uint64_t k)
{
if (k > n)
throw std::invalid_argument(std::string("invalid argument in ") + __func__);
if (k > n - k)
k = n - k;
uint64_t r = 1;
uint64_t d;
for (d = 1; d <= k; ++d) {
if (r > u64max / n)
break;
r *= n--;
r /= d;
}
if (d > k)
return r;
// Let N be the original n,
// n is the current n (when we reach here)
// We want to calculate C(N,k),
// Currently we already calculated the r value so far:
// r = C(N, n) = C(N, N-n) = C(N, d-1)
// Note that N-n = d-1
// In addition we know the following identity formula:
// C(N,k) = C(N,d-1) * C(N-d+1, k-d+1) / C(k, k-d+1)
// = C(N,d-1) * C(n, k-d+1) / C(k, k-d+1)
// Using this formula, we effectively reduce the calculation,
// while recursively use the same function.
uint64_t b = choose(n, k-d+1);
if (b == u64max) {
return u64max; // overflow
}
uint64_t c = choose(k, k-d+1);
if (c == u64max) {
return u64max; // overflow
}
// Now, the combinatorial should be r * b / c
// We can use gcd() to calculate this:
// We Pick b for gcd: b < r almost (if not always) in all cases
uint64_t g = gcd(b, c);
b /= g;
c /= g;
r /= c;
if (r > u64max / b)
return u64max; // overflow
return r * b;
}
Note that the recursive depth is normally 2 (I don't really see a case goes to 3, the combinatorial reducing is quite decent.), i.e. calling choose() for 3 times, for non-overflow cases.
Replace uint64_t with unsigned long long if you prefer it.
One of SHORTEST way :
int nChoosek(int n, int k){
if (k > n) return 0;
if (k == 0) return 1;
return nChoosek(n - 1, k) + nChoosek(n - 1, k - 1);
}
If you want to be 100% sure that no overflows occur so long as the final result is within the numeric limit, you can sum up Pascal's Triangle row-by-row:
for (int i=0; i<n; i++) {
for (int j=0; j<=i; j++) {
if (j == 0) current_row[j] = 1;
else current_row[j] = prev_row[j] + prev_row[j-1];
}
prev_row = current_row; // assume they are vectors
}
// result is now in current_row[r-1]
However, this algorithm is much slower than the multiplication one. So perhaps you could use multiplication to generate all the cases you know that are 'safe' and then use addition from there. (.. or you could just use a BigInt library).