sum multiples in a given range - c++

well I want to sum up the multiples of 3 and 5. Not too hard if I want just the sum upon to a given number, e.g. -> up to 60 the sum is 870.
But what if I want just the first 15 multiples?
well one way is
void summation (const unsigned long number_n, unsigned long &summe,unsigned int &counter );
void summation (const unsigned long number_n, unsigned long &summe,unsigned int &counter )
{
unsigned int howoften = 0;
summe = 0;
for( unsigned long i = 1; i <=number_n; i++ )
if (howoften <= counter-1)
{
if( !( i % 3 ) || !( i % 5 ) )
{
summe += i;
howoften++;
}
}
counter = howoften;
return;
}
But as expected the runtime is not accceptable for a counter like 1.500.000 :-/
Hm I tried a lot of things but I cannot find a solution by my own.
I also tried a faster summation algorithm like (dont care bout overflow at this point):
int sum(int N);
int sum(int N)
{
int S1, S2, S3;
S1 = ((N / 3)) * (2 * 3 + (N / 3 - 1) * 3) / 2;
S2 = ((N / 5)) * (2 * 5 + (N / 5 - 1) * 5) / 2;
S3 = ((N / 15)) *(2 * 15 + (N / 15 - 1) * 15) / 2;
return S1 + S2 - S3;
}
or even
unsigned long sum1000 (unsigned long target);
unsigned long sum1000 (unsigned long target)
{
unsigned int summe = 0;
for (unsigned long i = 0; i<=target; i+=3) summe+=i;
for (unsigned long i = 0; i<=target; i+=5) summe+=i;
for (unsigned long i = 0; i<=target; i+=15) summe-=i;
return summe;
}
But I'm not smart enough to set up an algorithm which is fast enough (I say 5-10 sec. are ok)
The whole sum of the multiples is not my problem, the first N multiples are :)
Thanks for reading, and if u have any ideas, it would be great

Some prerequisites:
(dont care bout overflow at this point)
Ok, so lets ignore that completely.
Next, the sum of all numbers from 1 till n can be calculated from (see eg here):
int sum(int n) {
return (n * (n+1)) / 2;
}
Note that n*(n+1) is an even number for any n, so using integer artihmetics for /2 is not an issue.
How does this help to get sum of numbers divisible by 3? Lets start with even numbers (divisble by 2). We write out the long form of the sum above:
1 + 2 + 3 + 4 + ... + n
multiply each term by 2:
2 + 4 + 6 + 8 + ... + 2*n
now I hope you see that this sum contains all numbers that are divisible by 2 up to 2*n. Those numbers are the first n numbers that are divisble by 2.
Hence, the sum of the fist n numbers that are divisble by 2 is 2 * sum(n). We can generalize that to write a function that returns the sum of the first n numbers that are divisble by m:
int sum_div_m( int n, int m) {
return sum(n) * m;
}
First I want to reproduce your inital example "up to 60 the sum is 870". For that we consider that
60/3 == 20 -> there are 20 numbers divisble by 3 and we get their sum from sum_div_m(20,3)
60/5 == 12 -> there are 12 numbers divisible by 5 and we get their sum from sum_div_m(12,5)
we cannot simply add the above two results because then we would count some numbers double. Those numbers are those divisible by 3 and 5, ie divisible by 15
60/15 == 4 -> there are 4 numbers divisble by 3 and 5 and we get their sum from sum_div_m(4,15).
Putting it together, the sum of all numbers divisible by 3 or 5 up to 60 is
int x = sum_div_m( 20,3) + sum_div_m( 12,5) - sum_div_m( 4,15);
Finally, back to your actual question:
But what if I want just the first 15 multiples?
Above we saw that there are
n == x/3 + x/5 - x/15
numbers that are divisble by 3 or 5 in the range 0...x. All division are using integer arithmetics. We already had the example of 60 with 20+12-4 == 28 divisble numbers. Another example is x=10 where there are n = 3 + 2 - 0 = 5 numbers divisible by 3 or 5 (3,5,6,9,10). We have to be a bit careful with integer arithmetics, but no big deal:
15*n == 5*x + 3*x - x
-> 15*n == 7*x
-> x == 15*n/7
Quick test: 15*28/7 == 60, looks correct.
Putting it all together the sum of the first n numbers divisible by 3 or 5 is
int sum_div_3_5(int n) {
int x = (15*n)/7;
return sum_div_m(x/3, 3) + sum_div_m(x/5, 5) - sum_div_m(x/15, 15);
}
To check that this is correct we can again try sum_div_3_5(28) to see that it returns 870 (because there are 28 numbers divisble by 3 or 5 up to 60 and that was the initial example).
PS Turned out that the question is really only about doing the maths. Though that isnt a big surprise. When you want to write efficient code you should primarily take care to use the right algorithm. Optimizations based on a given algorithm often are less effective than choosing a better algorithm. Once you chose an algorithm, often it does not pay off to try to be "clever" because compilers are much better at optimizing. For example this code:
int main(){
int x = 0;
int n = 60;
for (int i=0; i <= n; ++i) x += i;
return x;
}
will be be optimized by most compilers to a simple return 1830; when optimizations are turned on because compilers do know how to add all numbers from 1 to n. See here.

You can do it in compile time recursively by using class templates/meta functions if your value is known in compile time. So there will be no runtime cost.
Ex:
template<int n>
struct Sum{
static const int value = n + Sum<n-1>::value;
};
template<>
struct Sum<0>{
static constexpr int value = 0;
};
int main()
{
constexpr auto x = Sum<100>::value;
// x is known (5050) in compile time
return 0;
}

Related

Find minimum number of digits required to make a given number

We have to find the minimum number of digits required to make a given number, for example: 14 => 95 (9 + 5 = 14) is two digits which is the minimum to form 14.
int moves(int n) {
int m = 0; // Minimum count
while (n-9 >= 0) { // To place maximum number of 9's
n -= 9;
m++;
}
if (n == 0) { // If only nines made up the number
return m;
}
else {
m++;
return m;
}
}
I am getting a TLE (runtime time limit exceeded) by an online judge. How can I improve it or is there a better approach?
Your code starts by looking at how many times 9 fits into that number. This can be done way more easily:
int m = n/9;
This suffices since we do an integer division, in which the remainder is thrown away. Note that if n would be float or another floating type, this would not work.
The question left is if it is divisible by 9 or not. If not, we have one additional digit. This can be done by the modulo operator (made it verbose for ease of understanding):
bool divisible_by_nine = (n % 9 == 0);
Assuming that you might not know the modulo operator, it returns the remainder of an integer division, 47 % 9 = 2 since 47 / 9 = 5 remainder 2.
Without it, you would go with
int remainder = n - 9*m;
bool divisible = (remainder == 0);
Combined:
int required_digits(int number)
{
bool divisible = (number % 9 == 0);
return number/9 + (divisible ? 0 : 1);
}
Or in a single line, depending on how verbose you want it to be:
int required_digits(int number)
{
return number/9 + (number % 9 == 0 ? 0 : 1);
}
Since there isn't any loop, this is in Θ(1) and thus should work in your required time limit.
(Technically, the processor might as well handle the division somewhat like you did internally, but it is very efficient at that. To be absolutely correct, I'd have to add "assuming that division is a constant time operation".)
Your solution works fine. You can try the shorter:
return (n%9==0)? n/9 : n/9 +1 ;
Shorter, but less easy to read...
Or a compromise:
if (n%9==0) // n can be divided by 9
return n/9;
else
return n/9+1;
Explanation
We know that every number a can be represented as
(a_n * 10 ^ n) + ... + (a_2 * 10 ^ 2) + (a_1 * 10) + (a_0)
where a_k are digits
and 10^n = 11...11 * 9 + 1 (n digits 1).
Meaning that number 10^n can be represented as the sum of 11...11 + 1 digits.
Now we can write a as (a_n * 11..11 * 9 + a_n) + ...
After grouping by 9 (help, I don't know English term for this. Factoring?)
(a_n * 11..11 + a_n-1 * 11..11 + ... a_1) * 9 + (a_n + a_n-1 + ... + a_1 + a_0)
Which I'll write as b_9 * 9 + b_1.
This means that number a can be represented as the sum of b_9 digits 9 + how much is needed for b_1 (this is recursive by the way)
To recapitulate:
Let's call function f
If -10 < digit < 10, the result is 1.
Two counters are needed, c1 and c2.
Iterate over digits
For every ith digit, multiply by i digit number 11..11 and add the result to c1
Add the ith digit to c2
The result is c_1 + f(c_2)
And for practice, implement this in a non-recursive way.
As you guess, you need to iterate on a lower number to a bigger one, like 111119 is fine, but we want the lowest one... Your answer is wrong. The lowest would be 59!
You can brute force and it will work, but for a bigger number you will struggle, so you need to guess first: How many minimum digits do I need to find my solution?
For instance, if you want to find 42, just add as much 9 you need to overflow the result!
9 + 9 + 9 + 9 + 9 = 45. When you find the overflow, you know that the answer is lower than 99999.
Now how much do I need to decrease the value to get the correct answer, 3 as expected?
So 99996, 99969, etc... will be valid! But you want to lower, so you have to decrease the greatest unit (the left one of course!).
The answer would be 69999 = 42!
int n = 14;
int r = 0;
for (int i = i; i < 10 /*if you play with long or long long*/; i++)
if (i * 9 >= n)
{
for (int j = 0; j < i; j++)
r = r * 10 + 9;
while (is_correct(r, n) == false)
{
// Code it yourself!!
}
return (r);
}
Now it correctly returns true or false. You can make it return the number that r is actually a decrease what you need to decrease! It's not the fastest way possible, and there is always a faster way, with a binary shift, but this algorithm would work just fine!

Implement Euclidean division to write greatest common divisor of two positive integers in terms of a linear combination of those integers

I'm trying to write a program that can write the greatest common divisor of two positive integers in terms of those integers. For example, let's say I have the numbers 353 and 15, I would find the gcd using the following steps:
353 = 23*15 + 8
15 = 1*8 + 7
8 = 1*7 + 1
7 = 7*1
so the gcd is 1. I have implemented this as:
//div_algo always takes int1 >= int2
int div_algo(int int1, int int2)
{
if (int2 == 0) //we are done
return int1;
int factor = 0;
int remainder = 0;
factor = int1/int2; //this is useful for linear combination
remainder = int1 % int2;
return div_algo(int2, remainder);
}
The problem is, if I want to find the linear combination, I basically work backwards. So, continuing with my example:
1 = 1*8 - 1*7 (substitute 7 = 15 - 1*8)
1 = 1*8 - 1*15 + 1*8 = 2*8 - 1*15 (substitute 8 = 353 - 23*15)
1 = 2*353 - 46*15 - 1*15 = 2*353 - 47*15
and there we go. The problem I am experiencing is that I don't know how to "store" the previous equations so that I can back substitute.
Add another parameter that will store the factors you are looking for. You can implement it like this:
int div_algo(int int1, int int2, vector<int>& factors)
{
if (int2 == 0) //we are done
return int1;
int factor = 0;
int remainder = 0;
factors.push_back(int1/int2); //this is useful for linear combination
remainder = int1 % int2;
return div_algo(int2, remainder, factors);
}
Note the use of & for factors. You don't want to copy the array, but just send along a reference to the same original array. You can replace int in the vector with a struct that can keep whatever data you consider necessary.
To call it you can do:
vector<int> factors;
div_algo(353, 15, factors);
for (int x : factors) cout << x << " ";

c++ - hackerrank project euler #1 terminated due to timeout

There are many discussions on this topic. I went through them, but none helped.
The question seems fairly simple:
If we list all the natural numbers below 10 that are multiples of 3 or
5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below N.
Input Format First line contains T that denotes the number of test
cases. This is followed by T lines, each containing an integer, N.
Output Format For each test case, print an integer that denotes the
sum of all the multiples of 3 or 5 below N.
Constraints 1≤T≤10^5 1≤N≤10^9
However, for two test cases, most probably the ones with a large input, my code results in terminated due to timeout.
Here is my code:
int main() {
unsigned long long int n,t;
unsigned long long int sum;
cin>>t;
while(t--)
{
sum=0;
cin>>n;
for(unsigned long long int i=3;i<n;i++){
if(i%3==0 || i%5==0){
sum+=i;
}
}
cout<<sum<<"\n";
}
return 0;
}
Why is it not working for large inputs even with unsigned long long int?
I suggest using two loops of addition and eliminating the expensive % operator.
Given that all the numbers that are divisible by 3 are also all the numbers that have the 3 added. So rather testing a number for divisibility by 3 and summing them, only sum the numbers that are multiples of 3.
For example:
for (int i = 0; i < n; i = i + 3)
{
sum += i;
}
If you also include the loop for 5, you would have all the values summed.
Also, subtract the values that are multiples of 15.
On the other hand, applying a little algebra and calculus, you could simplify the formula, then implement it.
Some Analysis
The quantity of values divisible by 3 are less then N/3. So for N = 13, there are 4 multiples of 3: 3, 6, 9, 12. So the limit is N/3.
Breaking down algebraically, we see that the numbers for N = 13, are:
[1] (3 * 1) + (3 * 2) + (3 * 3) + (3 * 4)
Factoring out the common multiplying of 3 yields:
[2] 3 * ( 1 + 2 + 3 + 4)
Looking at equation [2], this yields 3 * sum(1..N).
Using the formula for summation:
(x * (x + 1)) / 2
the equation can be simplified to:
[3] 3 * ( 4 * (4 + 1) ) / 2
Or replacing the total values by N/3 this formula comes out to:
[4] 3 * ((N/3) * ((N/3) + 1) ) / 2
The simplification of equation [4] is left as an exercise for the reader.
I tried in python , you can check
def multiple(n):
return n*(n+1)/2
def sum(n):
return multiple(n/3)*3 + multiple(n/5)*5 - multiple(n/15)*15
for i in xrange(int(raw_input())):
n = int(raw_input()) - 1
print sum(n)
The problem timeout is probably set at a value that disallows brute force algorithms like yours. You can calculate the sum for any given value of N in constant time using the closed formula for summation of successive integers and De Morgan's laws.
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
using namespace std;
int main() {
/* Enter you`enter code here`r code here. Read input from STDIN. Print output to STDOUT */
long long int t, N, i=0, sum3=0, sum5=0, sum35=0;
cin >> t;
while(i<t){
cin >> N;
N=N-1;
sum3 = 3 * ((N/3) * ((N/3) + 1) ) / 2;
sum5 = 5 * ((N/5) * ((N/5) + 1) ) / 2;
sum35 = 15 * ((N/15) * ((N/15) + 1) ) / 2;
cout << sum3 + sum5 - sum35 << endl;
sum3=sum5=sum35=0;
i++;
}
return 0;
}

How to reduce execution time in C++ for the following code?

I have written this code which has an execution time of 3.664 sec but the time limit is 3 seconds.
The question is this-
N teams participate in a league cricket tournament on Mars, where each
pair of distinct teams plays each other exactly once. Thus, there are a total
of (N × (N­1))/2 matches. An expert has assigned a strength to each team,
a positive integer. Strangely, the Martian crowds love one­sided matches
and the advertising revenue earned from a match is the absolute value of
the difference between the strengths of the two matches. Given the
strengths of the N teams, find the total advertising revenue earned from all
the matches.
Input format
Line 1 : A single integer, N.
Line 2 : N space ­separated integers, the strengths of the N teams.
#include<iostream>
using namespace std;
int main()
{
int n;
cin>>n;
int stren[200000];
for(int a=0;a<n;a++)
cin>>stren[a];
long long rev=0;
for(int b=0;b<n;b++)
{
int pos=b;
for(int c=pos;c<n;c++)
{
if(stren[pos]>stren[c])
rev+=(long long)(stren[pos]-stren[c]);
else
rev+=(long long)(stren[c]-stren[pos]);
}
}
cout<<rev;
}
Can you please give me a solution??
Rewrite your loop as:
sort(stren);
for(int b=0;b<n;b++)
{
rev += (2 * b - n + 1) * static_cast<long long>(stren[b]);
}
Live code here
Why does it workYour loops make all pairs of 2 numbers and add the difference to rev. So in a sorted array, bth item is subtracted (n-1-b) times and added b times. Hence the number 2 * b - n + 1
There can be 1 micro optimization that possibly is not needed:
sort(stren);
for(int b = 0, m = 1 - n; b < n; b++, m += 2)
{
rev += m * static_cast<long long>(stren[b]);
}
In place of the if statement, use
rev += std::abs(stren[pos]-stren[c]);
abs returns the positive difference between two integers. This will be much quicker than an if test and ensuing branching. The (long long) cast is also unnecessary although the compiler will probably optimise that out.
There are other optimisations you could make, but this one should do it. If your abs function is poorly implemented on your system, you could always make use of this fast version for computing the absolute value of i:
(i + (i >> 31)) ^ (i >> 31) for a 32 bit int.
This has no branching at all and would beat even an inline ternary! (But you should use int32_t as your data type; if you have 64 bit int then you'll need to adjust my formula.) But we are in the realms of micro-optimisation here.
for(int b = 0; b < n; b++)
{
for(int c = b; c < n; c++)
{
rev += abs(stren[b]-stren[c]);
}
}
This should give you a speed increase, might be enough.
An interesting approach might be to collapse down the strengths from an array - if that distribution is pretty small.
So:
std::unordered_map<int, int> strengths;
for (int i = 0; i < n; ++i) {
int next;
cin >> next;
++strengths[next];
}
This way, we can reduce the number of things we have to sum:
long long rev = 0;
for (auto a = strengths.begin(); a != strengths.end(); ++a) {
for (auto b = std::next(a), b != strengths.end(); ++b) {
rev += abs(a->first - b->first) * (a->second * b->second);
// ^^^^ stren diff ^^^^^^^^ ^^ number of occurences ^^
}
}
cout << rev;
If the strengths tend to be repeated a lot, this could save a lot of cycles.
What exactly we are doing in this problem is: For all combinations of pairs of elements, we are adding up the absolute values of the differences between the elements of the pair. i.e. Consider the sample input
3 10 3 5
Ans (Take only absolute values) = (3-10) + (3-3) + (3-5) + (10-3) + (10-5) + (3-5) = 7 + 0 + 2 + 7 + 5 + 2 = 23
Notice that I have fixed 3, iterated through the remaining elements, found the differences and added them to Ans, then fixed 10, iterated through the remaining elements and so on till the last element
Unfortunately, N(N-1)/2 iterations are required for the above procedure, which wouldn't be ok for the time limit.
Could we better it?
Let's sort the array and repeat this procedure. After sorting, the sample input is now 3 3 5 10
Let's start by fixing the greatest element, 10 and iterating through the array like how we did before (of course, the time complexity is the same)
Ans = (10-3) + (10-3) + (10-5) + (5-3) + (5-3) + (3-3) = 7 + 7 + 5 + 2 + 2 = 23
We could rearrange the above as
Ans = (10)(3)-(3+3+5) + 5(2) - (3+3) + 3(1) - (3)
Notice a pattern? Let's generalize it.
Suppose we have an array of strengths arr[N] of size N indexed from 0
Ans = (arr[N-1])(N-1) - (arr[0] + arr[1] + ... + arr[N-2]) + (arr[N-2])(N-2) - (arr[0] + arr[1] + arr[N-3]) + (arr[N-3])(N-3) - (arr[0] + arr[1] + arr[N-4]) + ... and so on
Right. So let's put this new idea to work. We'll introduce a 'sum' variable. Some basic DP to the rescue.
For i=0 to N-1
sum = sum + arr[i]
Ans = Ans + (arr[i+1]*(i+1)-sum)
That's it, you just have to sort the array and iterate only once through it. Excluding the sorting part, it's down to N iterations from N(N-1)/2, I suppose that's called O(N) time EDIT: That is O(N log N) time overall
Hope it helped!

Calculating Binomial Coefficient (nCk) for large n & k

I just saw this question and have no idea how to solve it. can you please provide me with algorithms , C++ codes or ideas?
This is a very simple problem. Given the value of N and K, you need to tell us the value of the binomial coefficient C(N,K). You may rest assured that K <= N and the maximum value of N is 1,000,000,000,000,000. Since the value may be very large, you need to compute the result modulo 1009.
Input
The first line of the input contains the number of test cases T, at most 1000. Each of the next T lines consists of two space separated integers N and K, where 0 <= K <= N and 1 <= N <= 1,000,000,000,000,000.
Output
For each test case, print on a new line, the value of the binomial coefficient C(N,K) modulo 1009.
Example
Input:
3
3 1
5 2
10 3
Output:
3
10
120
Notice that 1009 is a prime.
Now you can use Lucas' Theorem.
Which states:
Let p be a prime.
If n = a1a2...ar when written in base p and
if k = b1b2...br when written in base p
(pad with zeroes if required)
Then
(n choose k) modulo p = (a1 choose b1) * (a2 choose b2) * ... * (ar choose br) modulo p.
i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.
Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.
This should make it easier even when N is close to 10^15.
Note:
For N=10^15, N choose N/2 is more than
2^(100000000000000) which is way
beyond an unsigned long long.
Also, the algorithm suggested by
Lucas' theorem is O(log N) which is
exponentially faster than trying to
compute the binomial coefficient
directly (even if you did a mod 1009
to take care of the overflow issue).
Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):
class Binomial
{
public:
Binomial(int Max)
{
max = Max+1;
table = new unsigned int * [max]();
for (int i=0; i < max; i++)
{
table[i] = new unsigned int[max]();
for (int j = 0; j < max; j++)
{
table[i][j] = 0;
}
}
}
~Binomial()
{
for (int i =0; i < max; i++)
{
delete table[i];
}
delete table;
}
unsigned int Choose(unsigned int n, unsigned int k);
private:
bool Contains(unsigned int n, unsigned int k);
int max;
unsigned int **table;
};
unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
if (n < k) return 0;
if (k == 0 || n==1 ) return 1;
if (n==2 && k==1) return 2;
if (n==2 && k==2) return 1;
if (n==k) return 1;
if (Contains(n,k))
{
return table[n][k];
}
table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
return table[n][k];
}
bool Binomial::Contains(unsigned int n, unsigned int k)
{
if (table[n][k] == 0)
{
return false;
}
return true;
}
Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.
Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.
After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.
Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.
So for example consider the case of C((1009^3), 200).
Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.
On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.
That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.
We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).
Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!
I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes
Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while.
What you might use is, if
n = n0 + n1*p + n2*p^2 ... + nd*p^d
m = m0 + m1*p + m2*p^2 ... + md*p^d
(where 0<=mi,ni < p)
then
C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p
see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html
One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.
psudo code for calculating nCk:
result = 1
for i=1 to min{K,N-K}:
result *= N-i+1
result /= i
return result
Time Complexity: O(min{K,N-K})
The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because
C(k,n) = C(k, n-k)
And you can calculate the thing even more efficiently if you use the GammaLn function.
nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))
The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.
The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.
binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5
do while col < n
binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
output(binomial_coefficient)
col = col + 1
loop
The output of binomial coefficients is therefore:
1
1 * (5 + 1 - (0 + 1)) / (0 + 1) = 5
5 * (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5
5 * (5 + 1 - (4 + 1)) / (4 + 1) = 1
I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(