Question Description : Given an array arr[] of length N, the task is to find the XOR of pairwise sum of every possible unordered pairs of the array.
I solved this question using the method described in this post.
My Code :
int xorAllSum(int a[], int n)
{
int curr, prev = 0;
int ans = 0;
for (int k = 0; k < 32; k++) {
int o = 0, z = 0;
for (int i = 0; i < n; i++) {
if (a[i] & (1 << k)) {
o++;
}
else {
z++;
}
}
curr = o * z + prev;
if (curr & 1) {
ans = ans | (1 << k);
}
prev = o * (o - 1) / 2;
}
return ans;
}
Code Descrption : I am finding out at each bit, whether our answer will have that bit set ort not. So to do this for each bit-position, I find the count of all the numbers which have a set bit at the position(represeneted by 'o' in the code) and the count of numbers having un-set bit at that position(represented by 'z').
Now if we pair up these numbers(the numbers having set bit and unset bit together, then we will get a set bit in their sum(Because we need to get XOR of all pair sums).
The factor of 'prev' is included to account for the carry over bits. Now we know that the answer will have a set bit at current position only if the number of set bits are 'odd' as we are doing an XOR operation.
But I am not getting correct output. Can anyone please help me
Test Cases :
n = 3, a[] = {1, 2, 3} => (1 + 2) ^ (1 + 3) ^ (2 + 3)
=> 3 ^ 4 ^ 5 = 2
=> Output : 2
n = 6
a[] = {1 2 10 11 18 20}
Output : 50
n = 8
a[] = {10 26 38 44 51 70 59 20}
Output : 182
Constraints : 2 <= n <= 10^8
Also, here we need to consider UNORDERED PAIRS and not Ordered Pairs for the answer
PS : I know that the same question has been asked before but I couldn't explain my problem with this much detail in the comments so I created a new post. I am new here, so please pardon me and give me your feedback :)
I suspect that the idea in the post you referred to is missing important details, if it could work at all with the stated complexity. (I would be happy to better understand and be corrected should that author wish to clarify their method further.)
Here's my understanding of at least one author's intention for an O(n * log n * w) solution, where w is the number of bits in the largest sum, as well as JavaScript code with a random comparison to brute force to show that it works (easily translatable to C or Python).
The idea is to examine the contribution of each bit one a time. Since in any one iteration, we are only interested in whether the kth bit in the sums is set, we can remove all parts of the numbers that include higher bits, taking them each modulo 2^(k + 1).
Now the sums that would necessarily have the kth bit set are in the intervals, [2^k, 2^(k + 1)) (that's when the kth bit is the highest) and [2^(k+1) + 2^k, 2^(k+2) − 2] (when we have both the kth and (k+1)th bits set). So in the iteration for each bit, we sort the input list (modulo 2^(k + 1)), and for each left summand, we decrement a pointer to the end of each of the two intervals, and binary search the relevant start index.
// https://stackoverflow.com/q/64082509
// Returns the lowest index of a value
// greater than or equal to the target
function lowerIdx(a, val, left, right){
if (left >= right)
return left;
mid = left + ((right - left) >> 1);
if (a[mid] < val)
return lowerIdx(a, val, mid+1, right);
else
return lowerIdx(a, val, left, mid);
}
function bruteForce(A){
let answer = 0;
for (let i=1; i<A.length; i++)
for (let j=0; j<i; j++)
answer ^= A[i] + A[j];
return answer;
}
function f(A, W){
const n = A.length;
const _A = new Array(n);
let result = 0;
for (let k=0; k<W; k++){
for (let i=0; i<n; i++)
_A[i] = A[i] % (1 << (k + 1));
_A.sort((a, b) => a - b);
let pairs_with_kth_bit = 0;
let l1 = 1 << k;
let r1 = 1 << (k + 1);
let l2 = (1 << (k + 1)) + (1 << k);
let r2 = (1 << (k + 2)) - 2;
let ptr1 = n - 1;
let ptr2 = n - 1;
for (let i=0; i<n-1; i++){
// Interval [2^k, 2^(k+1))
while (ptr1 > i+1 && _A[i] + _A[ptr1] >= r1)
ptr1 -= 1;
const idx1 = lowerIdx(_A, l1-_A[i], i+1, ptr1);
let sum = _A[i] + _A[idx1];
if (sum >= l1 && sum < r1)
pairs_with_kth_bit += ptr1 - idx1 + 1;
// Interval [2^(k+1)+2^k, 2^(k+2)−2]
while (ptr2 > i+1 && _A[i] + _A[ptr2] > r2)
ptr2 -= 1;
const idx2 = lowerIdx(_A, l2-_A[i], i+1, ptr2);
sum = _A[i] + _A[idx2]
if (sum >= l2 && sum <= r2)
pairs_with_kth_bit += ptr2 - idx2 + 1;
}
if (pairs_with_kth_bit & 1)
result |= 1 << k;
}
return result;
}
var As = [
[1, 2, 3], // 2
[1, 2, 10, 11, 18, 20], // 50
[10, 26, 38, 44, 51, 70, 59, 20] // 182
];
for (let A of As){
console.log(JSON.stringify(A));
console.log(`DP, brute force: ${ f(A, 10) }, ${ bruteForce(A) }`);
console.log('');
}
var numTests = 500;
for (let i=0; i<numTests; i++){
const W = 8;
const A = [];
const n = 12;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << (W - 1)));
A.push(num);
}
const fA = f(A, W);
const brute = bruteForce(A);
if (fA != brute){
console.log('Mismatch:');
console.log(A);
console.log(fA, brute);
console.log('');
}
}
console.log("Done testing.");
I'm trying to understand the FFT algorithm.
Here's a code
void fft(double *a, double *b, double *w, int m, int l)
{
int i, i0, i1, i2, i3, j;
double u, v, wi, wr;
for (j = 0; j < l; j++) {
wr = w[j << 1];
wi = w[j << 1 + 1];
for (i = 0; i < m; i++) {
i0 = (i << 1) + (j * m << 1);
i1 = i0 + (m * l << 1);
i2 = (i << 1) + (j * m << 2);
i3 = i2 + (m << 1);
u = a[i0] - a[i1];
v = a[i0 + 1] - a[i1 + 1];
b[i2] = a[i0] + a[i1];
b[i2 + 1] = a[i0 + 1] + a[i1 + 1];
b[i3] = wr * u - wi * v;
b[i3 + 1] = wr * v + wi * u;
}
}
}
If I get it right, array W is input, where every odd number is real and even is imag. A and B are imag and real parts of complex result
Also I found that l = 2**m
But when i'm trying to do this:
double a[4] = { 0, 0, 0, 0 };
double b[4] = { 0, 0, 0, 0 };
double w[8] = { 1, 0, 0, 0, 0, 0, 0, 0 };
int m = 3;
int l = 8;
fft(a, b, w, m, l);
There's error.
This code is only part of an FFT. a is input. b is output. w contains precomputed weights. l is a number of subdivisions at the current point in the FFT. m is the number of elements per division. The data in a, b, and w is interleaved complex data—each pair of double elements from the array consists of the real part and the imaginary part of one complex number.
The code performs one radix-two butterfly pass over the data. To use it to compute an FFT, it must be called multiple times with specific values for l, m, and the weights in w. Since, for each call, the input is in a and the output is in b, the caller must use at least two buffers and alternate between them for successive calls to the routine.
From the indexing performed in i0 and i2, it appears the data is being rearranged slightly. This may be intended to produce the final results of the FFT in “natural” order instead of the bit-reversed order that occurs in a simple implementation.
But when i'm trying to do this:
double a[4] = { 0, 0, 0, 0 };
double b[4] = { 0, 0, 0, 0 };
double w[8] = { 1, 0, 0, 0, 0, 0, 0, 0 };
int m = 3;
int l = 8;
fft(a, b, w, m, l);
There's error.
From for (j = 0; j < l; j++), we see the maximum value of j in the loop is l-1. From for (i = 0; i < m; i++), we see the maximum value of i is m-1. Then in i0 = (i << 1) + (j * m << 1), we have i0 = ((m-1) << 1) + ((l-1) * m << 1) = (m-1)*2 + (l-1) * m * 2 = 2*m - 2 + l*m*2 - m*2 = 2*m*l - 2. And in i1 = i0 + (m * l << 1), we have i1 = 2*m*l - 2 + (m * l * 2) = 4*m*l - 2. When the code uses a[i1 + 1], the index is i1 + 1 = 4*m*l - 2 + 1 = 4*m*l - 1.
Therefore a must have an element with index 4*m*l - 1, so it must have at least 4*m*l elements. The required size for b can be computed similarly and is the same.
When you call fft with m set to 3 and l set to 8, a must have 4•3•8 = 96 elements. Your sample code shows four elements. Thus, the array is overrun, and the code fails.
I do not believe it is correct that l should equal 2m. More likely, 4*m*l should not vary between calls to fft in the same complete FFT computation, and, since a and b contain two double elements for every complex number, 4*m*l should be twice the number of complex elements in the signal being transformed.
Problem
Given an array A = a0,a1,...an, with size up to N ≤ 10^5, and 0 ≤ ai ≤ 10^9.
And a number 0 < M ≤ 10^9.
The task is to find the maximum ∑(k=i, j) ak % M = (ai + ai+1 + a(i+2) + ⋯ + a(j−1) + a(j)) % M, and how many different range(i,j) get that sum.
The complexity has to be less than O(N^2), the latter is too slow.
Example
N = 3, M = 5
A = {2, 4, 3}
The Maximum Sum mod M is 4 and there are 2 ranges, which are a0 to a2 and a1
My attempt
Let's define s[j] = (a0 + a1 + ... + aj) % M so if you want the best sum that ends in j you have to choose an s[i] i < j that s[i] is the smallest sum higher than you.
Because if s[i] > s[j]; s[i] = M - K; K < M - s[j] then the result sum range will be (s[j]-s[i]+M) % M = (s[j] + K) % M and because K < M - s[j] it will increase the result mod M, and as s[j] gets closer to s[j] it will increase the result mod M.
The idea is my attemp, first you have to have to calculate all the sums that starts from 0 and end in a index i, then you can search the smaller value grater than you fast by searching the value with a binary search that the map already have (lower_bound), and count how many time you could do sum with the value that you found. You have to keep the sum somewhere to count how many time you could do it.
#include <iostream>
#include <map>
#define optimizar_io ios_base::sync_with_stdio(false);cin.tie(NULL);
using namespace std;
const int LN = 1e5;
long long N, M, num[LN];
map < long long, int > sum;
int main() {
optimizar_io
cin >> N >> M;
sum[0]++;
long long cont = 0, tmax = 0, res = 1, val;
map < long long, int > :: iterator best;
for (int i = 0; i < N; i++)
{
cin >> num[i];
cont = (cont + num[i]) % M;
if (tmax == cont)
res += sum[0];
if (tmax < cont)
tmax = cont, res = sum[0];
best = sum.lower_bound(cont + 1);
if (best != sum.end())
{
val = cont - (*best).first + M;
if (tmax == val)
res += (*best).second;
if (tmax < val)
tmax = val, res = (*best).second;
}
sum[cont]++;
}
cout << tmax << " " << res;
return 0;
}
I have an array with the elements {7,2,1} and the idea is to do 7 * 2 + 7 * 1 + 2 * 1 which is basically this algorithm:
for(int i=0;i<n-1;++i)
for(int k=i+1;k<n;++k)
sum += a[i] * a[k];
Where a is the array in which I have the numbers and n is the number of elements, I need a more efficient algorithm for doing this, and I have no clue how to do it, can someone give me a hand?
Thank you!
You can do better in the general case. Time to do some math. Let's look at the 3-element version, we have:
ab + ac + bc
= 1/2 * (2ab + 2ac + 2bc)
= 1/2 * (2ab + 2ac + 2bc + a^2 + b^2 + c^2 - (a^2 + b^2 + c^2))
= 1/2 * ((a+b+c)^2 - (a^2 + b^2 + c^2))
That is:
int sum = 0;
int sum_sq = 0;
for (int i : arr) {
sum += i;
sum_sq += i*i;
}
int result = (sum*sum - sum_sq) / 2;
This is O(n) multiplications, instead of O(n^2). This'll certainly be better than the naive implementation at some point. Whether or not it's better for just 3 elements is something I haven't timed.
#chux's suggestion is essentially to redistribute operations:
ai * ai + 1 + ai * ai + 2 + ... + ai * an
-->
ai * (ai + 1 + ... + an)
combined with the avoiding unnecessary recomputation of partial sums of the (ai + 1 + ... + an) terms by leveraging the fact that each differs from the next by the value of one element of the input array.
Here's a one-pass implementation with O(1) overhead:
int psum(size_t n, int array[n]) {
int result = 0;
int rsum = array[n - 1];
for (int i = n - 2; i >= 0; i--) {
result += array[i] * rsum;
rsum += array[i];
}
return result;
}
The sum of all elements to the right of index i is maintained from iteration to iteration in variable rsum. It's unnecessary to track its various values in an array, because we need each value only for one iteration of the loop.
This scales linearly with the number of elements in the input array. You'll see that the number and type of operations is quite similar to #Barry's answer, but nothing analogous to his final step is required, which saves a few operations.
As #Barry observes in comments, the iteration can also be run in the other direction, in conjunction with tracking the left-hand partial sums intead of the right-hand ones. That would diverge a bit more from #chux's description, but it relies on exactly the same principles.
We have (a + b + c + ...)2 = (a2 + b2 + c2 + ...) + 2(ab + bc + ca + ...)
You want the sum S = ab + bc + ca + ..., which has O(n2) pairs (using 2 nested loops)
You can do 2 separated loops, one calculates P = a2 + b2 + c2 + ... in O(n) time, and another calculates Q = (a + b + c + ...)2 also in O(n) time. Then take S = (Q - P) / 2.
Make 1 pass, walk from the end of [a] to the front and form a sum of all the elements "to the right".
2nd pass, Multiple a[i] * sum[i].
O(n).
long sum0(int a[], int n) {
long sum = 0;
for (int i = 0; i < n - 1; ++i)
for (int k = i + 1; k < n; ++k)
sum += a[i] * a[k];
return sum;
}
long sum1(int a[], int n) {
int long sums[n];
sums[n - 1] = 0;
for (int i = n - 2; i >= 0; i--) {
sums[i] = a[i+1] + sums[i + 1];
}
long sum = 0;
for (int i = 0; i < n - 1; ++i)
sum += a[i] * sums[i];
return sum;
}
void test(int a[], int n) {
long s0 = sum0(a, n);
long s1 = sum1(a, n);
if (s0 != s1) printf("%9ld %9ld\n", s0, s1);
}
void tests(int k) {
while (k--) {
int n = rand() % 10 + 2;
int a[n + 1];
for (int m = 0; m < n; m++)
a[m] = rand() % 256;
test(a, n);
}
}
int main() {
int a[3] = { 7, 2, 1 };
printf("%d\n", sum1(a, 3));
tests(1000000);
puts("Done");
}
As it turns out the sums[] array is not needed either as the the running sums needs only 1 location. This effectively makes this answers similar to others
long sum1(int a[], int n) {
int long sums = 0;
long sum = 0;
for (int i = n - 2; i >= 0; i--) {
sums = a[i+1] + sums;
sum += a[i] * sums;
}
return sum;
}
What I mean by "large n" is something in the millions. p is prime.
I've tried
http://apps.topcoder.com/wiki/display/tc/SRM+467
But the function seems to be incorrect (I tested it with 144 choose 6 mod 5 and it gives me 0 when it should give me 2)
I've tried
http://online-judge.uva.es/board/viewtopic.php?f=22&t=42690
But I don't understand it fully
I've also made a memoized recursive function that uses the logic (combinations(n-1, k-1, p)%p + combinations(n-1, k, p)%p) but it gives me stack overflow problems because n is large
I've tried Lucas Theorem but it appears to be either slow or inaccurate.
All I'm trying to do is create a fast/accurate n choose k mod p for large n. If anyone could help show me a good implementation for this I'd be very grateful. Thanks.
As requested, the memoized version that hits stack overflows for large n:
std::map<std::pair<long long, long long>, long long> memo;
long long combinations(long long n, long long k, long long p){
if (n < k) return 0;
if (0 == n) return 0;
if (0 == k) return 1;
if (n == k) return 1;
if (1 == k) return n;
map<std::pair<long long, long long>, long long>::iterator it;
if((it = memo.find(std::make_pair(n, k))) != memo.end()) {
return it->second;
}
else
{
long long value = (combinations(n-1, k-1,p)%p + combinations(n-1, k,p)%p)%p;
memo.insert(std::make_pair(std::make_pair(n, k), value));
return value;
}
}
So, here is how you can solve your problem.
Of course you know the formula:
comb(n,k) = n!/(k!*(n-k)!) = (n*(n-1)*...(n-k+1))/k!
(See http://en.wikipedia.org/wiki/Binomial_coefficient#Computing_the_value_of_binomial_coefficients)
You know how to compute the numerator:
long long res = 1;
for (long long i = n; i > n- k; --i) {
res = (res * i) % p;
}
Now, as p is prime the reciprocal of each integer that is coprime with p is well defined i.e. a-1 can be found. And this can be done using Fermat's theorem ap-1=1(mod p) => a*ap-2=1(mod p) and so a-1=ap-2.
Now all you need to do is to implement fast exponentiation(for example using the binary method):
long long degree(long long a, long long k, long long p) {
long long res = 1;
long long cur = a;
while (k) {
if (k % 2) {
res = (res * cur) % p;
}
k /= 2;
cur = (cur * cur) % p;
}
return res;
}
And now you can add the denominator to our result:
long long res = 1;
for (long long i = 1; i <= k; ++i) {
res = (res * degree(i, p- 2)) % p;
}
Please note I am using long long everywhere to avoid type overflow. Of course you don't need to do k exponentiations - you can compute k!(mod p) and then divide only once:
long long denom = 1;
for (long long i = 1; i <= k; ++i) {
denom = (denom * i) % p;
}
res = (res * degree(denom, p- 2)) % p;
EDIT: as per #dbaupp's comment if k >= p the k! will be equal to 0 modulo p and (k!)^-1 will not be defined. To avoid that first compute the degree with which p is in n*(n-1)...(n-k+1) and in k! and compare them:
int get_degree(long long n, long long p) { // returns the degree with which p is in n!
int degree_num = 0;
long long u = p;
long long temp = n;
while (u <= temp) {
degree_num += temp / u;
u *= p;
}
return degree_num;
}
long long combinations(int n, int k, long long p) {
int num_degree = get_degree(n, p) - get_degree(n - k, p);
int den_degree = get_degree(k, p);
if (num_degree > den_degree) {
return 0;
}
long long res = 1;
for (long long i = n; i > n - k; --i) {
long long ti = i;
while(ti % p == 0) {
ti /= p;
}
res = (res * ti) % p;
}
for (long long i = 1; i <= k; ++i) {
long long ti = i;
while(ti % p == 0) {
ti /= p;
}
res = (res * degree(ti, p-2, p)) % p;
}
return res;
}
EDIT: There is one more optimization that can be added to the solution above - instead of computing the inverse number of each multiple in k!, we can compute k!(mod p) and then compute the inverse of that number. Thus we have to pay the logarithm for the exponentiation only once. Of course again we have to discard the p divisors of each multiple. We only have to change the last loop with this:
long long denom = 1;
for (long long i = 1; i <= k; ++i) {
long long ti = i;
while(ti % p == 0) {
ti /= p;
}
denom = (denom * ti) % p;
}
res = (res * degree(denom, p-2, p)) % p;
For large k, we can reduce the work significantly by exploiting two fundamental facts:
If p is a prime, the exponent of p in the prime factorisation of n! is given by (n - s_p(n)) / (p-1), where s_p(n) is the sum of the digits of n in the base p representation (so for p = 2, it's popcount). Thus the exponent of p in the prime factorisation of choose(n,k) is (s_p(k) + s_p(n-k) - s_p(n)) / (p-1), in particular, it is zero if and only if the addition k + (n-k) has no carry when performed in base p (the exponent is the number of carries).
Wilson's theorem: p is a prime, if and only if (p-1)! ≡ (-1) (mod p).
The exponent of p in the factorisation of n! is usually calculated by
long long factorial_exponent(long long n, long long p)
{
long long ex = 0;
do
{
n /= p;
ex += n;
}while(n > 0);
return ex;
}
The check for divisibility of choose(n,k) by p is not strictly necessary, but it's reasonable to have that first, since it will often be the case, and then it's less work:
long long choose_mod(long long n, long long k, long long p)
{
// We deal with the trivial cases first
if (k < 0 || n < k) return 0;
if (k == 0 || k == n) return 1;
// Now check whether choose(n,k) is divisible by p
if (factorial_exponent(n) > factorial_exponent(k) + factorial_exponent(n-k)) return 0;
// If it's not divisible, do the generic work
return choose_mod_one(n,k,p);
}
Now let us take a closer look at n!. We separate the numbers ≤ n into the multiples of p and the numbers coprime to p. With
n = q*p + r, 0 ≤ r < p
The multiples of p contribute p^q * q!. The numbers coprime to p contribute the product of (j*p + k), 1 ≤ k < p for 0 ≤ j < q, and the product of (q*p + k), 1 ≤ k ≤ r.
For the numbers coprime to p we will only be interested in the contribution modulo p. Each of the full runs j*p + k, 1 ≤ k < p is congruent to (p-1)! modulo p, so altogether they produce a contribution of (-1)^q modulo p. The last (possibly) incomplete run produces r! modulo p.
So if we write
n = a*p + A
k = b*p + B
n-k = c*p + C
we get
choose(n,k) = p^a * a!/ (p^b * b! * p^c * c!) * cop(a,A) / (cop(b,B) * cop(c,C))
where cop(m,r) is the product of all numbers coprime to p which are ≤ m*p + r.
There are two possibilities, a = b + c and A = B + C, or a = b + c + 1 and A = B + C - p.
In our calculation, we have eliminated the second possibility beforehand, but that is not essential.
In the first case, the explicit powers of p cancel, and we are left with
choose(n,k) = a! / (b! * c!) * cop(a,A) / (cop(b,B) * cop(c,C))
= choose(a,b) * cop(a,A) / (cop(b,B) * cop(c,C))
Any powers of p dividing choose(n,k) come from choose(a,b) - in our case, there will be none, since we've eliminated these cases before - and, although cop(a,A) / (cop(b,B) * cop(c,C)) need not be an integer (consider e.g. choose(19,9) (mod 5)), when considering the expression modulo p, cop(m,r) reduces to (-1)^m * r!, so, since a = b + c, the (-1) cancel and we are left with
choose(n,k) ≡ choose(a,b) * choose(A,B) (mod p)
In the second case, we find
choose(n,k) = choose(a,b) * p * cop(a,A)/ (cop(b,B) * cop(c,C))
since a = b + c + 1. The carry in the last digit means that A < B, so modulo p
p * cop(a,A) / (cop(b,B) * cop(c,C)) ≡ 0 = choose(A,B)
(where we can either replace the division with a multiplication by the modular inverse, or view it as a congruence of rational numbers, meaning the numerator is divisible by p). Anyway, we again find
choose(n,k) ≡ choose(a,b) * choose(A,B) (mod p)
Now we can recur for the choose(a,b) part.
Example:
choose(144,6) (mod 5)
144 = 28 * 5 + 4
6 = 1 * 5 + 1
choose(144,6) ≡ choose(28,1) * choose(4,1) (mod 5)
≡ choose(3,1) * choose(4,1) (mod 5)
≡ 3 * 4 = 12 ≡ 2 (mod 5)
choose(12349,789) ≡ choose(2469,157) * choose(4,4)
≡ choose(493,31) * choose(4,2) * choose(4,4
≡ choose(98,6) * choose(3,1) * choose(4,2) * choose(4,4)
≡ choose(19,1) * choose(3,1) * choose(3,1) * choose(4,2) * choose(4,4)
≡ 4 * 3 * 3 * 1 * 1 = 36 ≡ 1 (mod 5)
Now the implementation:
// Preconditions: 0 <= k <= n; p > 1 prime
long long choose_mod_one(long long n, long long k, long long p)
{
// For small k, no recursion is necessary
if (k < p) return choose_mod_two(n,k,p);
long long q_n, r_n, q_k, r_k, choose;
q_n = n / p;
r_n = n % p;
q_k = k / p;
r_k = k % p;
choose = choose_mod_two(r_n, r_k, p);
// If the exponent of p in choose(n,k) isn't determined to be 0
// before the calculation gets serious, short-cut here:
/* if (choose == 0) return 0; */
choose *= choose_mod_one(q_n, q_k, p);
return choose % p;
}
// Preconditions: 0 <= k <= min(n,p-1); p > 1 prime
long long choose_mod_two(long long n, long long k, long long p)
{
// reduce n modulo p
n %= p;
// Trivial checks
if (n < k) return 0;
if (k == 0 || k == n) return 1;
// Now 0 < k < n, save a bit of work if k > n/2
if (k > n/2) k = n-k;
// calculate numerator and denominator modulo p
long long num = n, den = 1;
for(n = n-1; k > 1; --n, --k)
{
num = (num * n) % p;
den = (den * k) % p;
}
// Invert denominator modulo p
den = invert_mod(den,p);
return (num * den) % p;
}
To calculate the modular inverse, you can use Fermat's (so-called little) theorem
If p is prime and a not divisible by p, then a^(p-1) ≡ 1 (mod p).
and calculate the inverse as a^(p-2) (mod p), or use a method applicable to a wider range of arguments, the extended Euclidean algorithm or continued fraction expansion, which give you the modular inverse for any pair of coprime (positive) integers:
long long invert_mod(long long k, long long m)
{
if (m == 0) return (k == 1 || k == -1) ? k : 0;
if (m < 0) m = -m;
k %= m;
if (k < 0) k += m;
int neg = 1;
long long p1 = 1, p2 = 0, k1 = k, m1 = m, q, r, temp;
while(k1 > 0) {
q = m1 / k1;
r = m1 % k1;
temp = q*p1 + p2;
p2 = p1;
p1 = temp;
m1 = k1;
k1 = r;
neg = !neg;
}
return neg ? m - p2 : p2;
}
Like calculating a^(p-2) (mod p), this is an O(log p) algorithm, for some inputs it's significantly faster (it's actually O(min(log k, log p)), so for small k and large p, it's considerably faster), for others it's slower.
Overall, this way we need to calculate at most O(log_p k) binomial coefficients modulo p, where each binomial coefficient needs at most O(p) operations, yielding a total complexity of O(p*log_p k) operations.
When k is significantly larger than p, that is much better than the O(k) solution. For k <= p, it reduces to the O(k) solution with some overhead.
If you're calculating it more than once, there's another way that's faster. I'm going to post code in python because it'll probably be the easiest to convert into another language, although I'll put the C++ code at the end.
Calculating Once
Brute force:
def choose(n, k, m):
ans = 1
for i in range(k): ans *= (n-i)
for i in range(k): ans //= i
return ans % m
But the calculation can get into very big numbers, so we can use modular airthmetic tricks instead:
(a * b) mod m = (a mod m) * (b mod m) mod m
(a / (b*c)) mod m = (a mod m) / ((b mod m) * (c mod m) mod m)
(a / b) mod m = (a mod m) * (b mod m)^-1
Note the ^-1 at the end of the last equation. This is the multiplicative inverse of b mod m. It basically means that ((b mod m) * (b mod m)^-1) mod m = 1, just like how a * a^-1 = a * 1/a = 1 with (non-zero) integers.
This can be calculated in a few ways, one of which is the extended euclidean algorithm:
def multinv(n, m):
''' Multiplicative inverse of n mod m '''
if m == 1: return 0
m0, y, x = m, 0, 1
while n > 1:
y, x = x - n//m*y, y
m, n = n%m, m
return x+m0 if x < 0 else x
Note that another method, exponentiation, works only if m is prime. If it is, you can do this:
def powmod(b, e, m):
''' b^e mod m '''
# Note: If you use python, there's a built-in pow(b, e, m) that's probably faster
# But that's not in C++, so you can convert this instead:
P = 1
while e:
if e&1: P = P * b % m
e >>= 1; b = b * b % m
return P
def multinv(n, m):
''' Multiplicative inverse of n mod m, only if m is prime '''
return powmod(n, m-2, m)
But note that the Extended Euclidean Algorithm tends to still run faster, even though they technically have the same time complexity, O(log m), because it has a lower constant factor.
So now the full code:
def multinv(n, m):
''' Multiplicative inverse of n mod m in log(m) '''
if m == 1: return 0
m0, y, x = m, 0, 1
while n > 1:
y, x = x - n//m*y, y
m, n = n%m, m
return x+m0 if x < 0 else x
def choose(n, k, m):
num = den = 1
for i in range(k): num = num * (n-i) % m
for i in range(k): den = den * i % m
return num * multinv(den, m)
Querying Multiple Times
We can calculate the numerator and denominator separately, and then combine them. But notice that the product we're calculating for the numerator is n * (n-1) * (n-2) * (n-3) ... * (n-k+1). If you've ever learned about something called prefix sums, this is awfully similar. So let's apply it.
Precalculate fact[i] = i! mod m for i up to whatever the max value of n is, maybe 1e7 (ten million). Then, the numerator is (fact[n] * fact[n-k]^-1) mod m, and the denominator is fact[k]. So we can calculate choose(n, k, m) = fact[n] * multinv(fact[n-k], m) % m * multinv(fact[k], m) % m.
Python code:
MAXN = 1000 # Increase if necessary
MOD = 10**9+7 # A common mod that's used, change if necessary
fact = [1]
for i in range(1, MAXN+1):
fact.append(fact[-1] * i % MOD)
def multinv(n, m):
''' Multiplicative inverse of n mod m in log(m) '''
if m == 1: return 0
m0, y, x = m, 0, 1
while n > 1:
y, x = x - n//m*y, y
m, n = n%m, m
return x+m0 if x < 0 else x
def choose(n, k, m):
return fact[n] * multinv(fact[n-k] * fact[k] % m, m) % m
C++ code:
#include <iostream>
using namespace std;
const int MAXN = 1000; // Increase if necessary
const int MOD = 1e9+7; // A common mod that's used, change if necessary
int fact[MAXN+1];
int multinv(int n, int m) {
/* Multiplicative inverse of n mod m in log(m) */
if (m == 1) return 0;
int m0 = m, y = 0, x = 1, t;
while (n > 1) {
t = y;
y = x - n/m*y;
x = t;
t = m;
m = n%m;
n = t;
}
return x<0 ? x+m0 : x;
}
int choose(int n, int k, int m) {
return (long long) fact[n]
* multinv((long long) fact[n-k] * fact[k] % m, m) % m;
}
int main() {
fact[0] = 1;
for (int i = 1; i <= MAXN; i++) {
fact[i] = (long long) fact[i-1] * i % MOD;
}
cout << choose(4, 2, MOD) << '\n';
cout << choose(1e6, 1e3, MOD) << '\n';
}
Note that I'm casting to long long to avoid overflow.