Sieve of Eratosthenes on a segment - c++

Sieve of Eratosthenes on the segment:
Sometimes you need to find all the primes that are in the range
[L...R] and not in [1...N], where R is a large number.
Conditions:
You are allowed to create an array of integers with size
(R−L+1).
Implementation:
bool isPrime[r - l + 1]; //filled by true
for (long long i = 2; i * i <= r; ++i) {
for (long long j = max(i * i, (l + (i - 1)) / i * i); j <= r; j += i) {
isPrime[j - l] = false;
}
}
for (long long i = max(l, 2); i <= r; ++i) {
if (isPrime[i - l]) {
//then i is prime
}
}
What is the logic behind setting the lower limit of 'j' in second for loop??
Thanks in advance!!

Think about what we want to find. Ignore the i*i part. We have only
(L + (i - 1)) / i * i) to consider. (I wrote the L capital since l and 1 look quite similar)
What should it be? Obviously it should be the smallest number within L..R that is divisible by i. That's when we want to start to sieve out.
The last part of the formula, / i * i finds the next lower number that is divisible by i by using the properties of integer division.
Example: 35 div 4 * 4 = 8 * 4 = 32, 32 is the highest number that is (equal or) lower than 35 which is divisible by 4.
The L is where we want to start, obviously, and the + (i-1) makes sure that we don't find the highest number equal or lower than but the smallest number equal or bigger than L that is divisible by i.
Example: (459 + (4-1)) div 4 * 4 = 462 div 4 * 4 = 115 * 4 = 460.
460 >= 459, 460 | 4, smallest number with that property
(the max( i*i, ...) is only so that i is not sieved out itself if it is within L..R, I think, although I wonder why it's not 2 * i)
For reasons of readability, I'd made this an inline function next_divisible(number, divisor) or the like. And I'd make it clear that integer division is used. If not, somebody clever might change it to regular division, with which it wouldn't work.
Also, I strongly recommend to wrap the array. It is not obvious to the outside that the property for a number X is stored at position X - L. Something like a class RangedArray that does that shift for you, allowing you a direct input of X instead of X - L, could easily take the responsibility. If you don't do that, at least make it a vector, outside of a innermost class, you shouldn't use raw arrays in C++.

Related

Find minimum number of digits required to make a given number

We have to find the minimum number of digits required to make a given number, for example: 14 => 95 (9 + 5 = 14) is two digits which is the minimum to form 14.
int moves(int n) {
int m = 0; // Minimum count
while (n-9 >= 0) { // To place maximum number of 9's
n -= 9;
m++;
}
if (n == 0) { // If only nines made up the number
return m;
}
else {
m++;
return m;
}
}
I am getting a TLE (runtime time limit exceeded) by an online judge. How can I improve it or is there a better approach?
Your code starts by looking at how many times 9 fits into that number. This can be done way more easily:
int m = n/9;
This suffices since we do an integer division, in which the remainder is thrown away. Note that if n would be float or another floating type, this would not work.
The question left is if it is divisible by 9 or not. If not, we have one additional digit. This can be done by the modulo operator (made it verbose for ease of understanding):
bool divisible_by_nine = (n % 9 == 0);
Assuming that you might not know the modulo operator, it returns the remainder of an integer division, 47 % 9 = 2 since 47 / 9 = 5 remainder 2.
Without it, you would go with
int remainder = n - 9*m;
bool divisible = (remainder == 0);
Combined:
int required_digits(int number)
{
bool divisible = (number % 9 == 0);
return number/9 + (divisible ? 0 : 1);
}
Or in a single line, depending on how verbose you want it to be:
int required_digits(int number)
{
return number/9 + (number % 9 == 0 ? 0 : 1);
}
Since there isn't any loop, this is in Θ(1) and thus should work in your required time limit.
(Technically, the processor might as well handle the division somewhat like you did internally, but it is very efficient at that. To be absolutely correct, I'd have to add "assuming that division is a constant time operation".)
Your solution works fine. You can try the shorter:
return (n%9==0)? n/9 : n/9 +1 ;
Shorter, but less easy to read...
Or a compromise:
if (n%9==0) // n can be divided by 9
return n/9;
else
return n/9+1;
Explanation
We know that every number a can be represented as
(a_n * 10 ^ n) + ... + (a_2 * 10 ^ 2) + (a_1 * 10) + (a_0)
where a_k are digits
and 10^n = 11...11 * 9 + 1 (n digits 1).
Meaning that number 10^n can be represented as the sum of 11...11 + 1 digits.
Now we can write a as (a_n * 11..11 * 9 + a_n) + ...
After grouping by 9 (help, I don't know English term for this. Factoring?)
(a_n * 11..11 + a_n-1 * 11..11 + ... a_1) * 9 + (a_n + a_n-1 + ... + a_1 + a_0)
Which I'll write as b_9 * 9 + b_1.
This means that number a can be represented as the sum of b_9 digits 9 + how much is needed for b_1 (this is recursive by the way)
To recapitulate:
Let's call function f
If -10 < digit < 10, the result is 1.
Two counters are needed, c1 and c2.
Iterate over digits
For every ith digit, multiply by i digit number 11..11 and add the result to c1
Add the ith digit to c2
The result is c_1 + f(c_2)
And for practice, implement this in a non-recursive way.
As you guess, you need to iterate on a lower number to a bigger one, like 111119 is fine, but we want the lowest one... Your answer is wrong. The lowest would be 59!
You can brute force and it will work, but for a bigger number you will struggle, so you need to guess first: How many minimum digits do I need to find my solution?
For instance, if you want to find 42, just add as much 9 you need to overflow the result!
9 + 9 + 9 + 9 + 9 = 45. When you find the overflow, you know that the answer is lower than 99999.
Now how much do I need to decrease the value to get the correct answer, 3 as expected?
So 99996, 99969, etc... will be valid! But you want to lower, so you have to decrease the greatest unit (the left one of course!).
The answer would be 69999 = 42!
int n = 14;
int r = 0;
for (int i = i; i < 10 /*if you play with long or long long*/; i++)
if (i * 9 >= n)
{
for (int j = 0; j < i; j++)
r = r * 10 + 9;
while (is_correct(r, n) == false)
{
// Code it yourself!!
}
return (r);
}
Now it correctly returns true or false. You can make it return the number that r is actually a decrease what you need to decrease! It's not the fastest way possible, and there is always a faster way, with a binary shift, but this algorithm would work just fine!

Why M = L + ((R - L) / 2) instead of M=(L+R)/2 avoid overflow in C++?

Hello I was looking at the C++ solution to the question "Suppose a sorted array is rotated at some pivot unknown to you beforehand. (i.e., 0 1 2 4 5 6 7 might become 4 5 6 7 0 1 2). How do you find an element in the rotated array efficiently? You may assume no duplicate exists in the array."
int rotated_binary_search(int A[], int N, int key) {
int L = 0;
int R = N - 1;
while (L <= R) {
// Avoid overflow, same as M=(L+R)/2
int M = L + ((R - L) / 2);
if (A[M] == key) return M;
// the bottom half is sorted
if (A[L] <= A[M]) {
if (A[L] <= key && key < A[M])
R = M - 1;
else
L = M + 1;
}
// the upper half is sorted
else {
if (A[M] < key && key <= A[R])
L = M + 1;
else
R = M - 1;
}
}
return -1;
}
and saw the comment says that using M = L + ((R - L) / 2) instead of M=(L+R)/2 avoid overflow. Why is that? Thx ahead
Because it does...
Let's assume for a minute you're using unsigned chars (same applies to larger integers of course).
If L is 100 and R is 200, the first version is:
M = (100 + 200) / 2 = 300 / 2 = 22
100+200 overflows (because the largest unsigned char is 255), and you get 100+200=44 (unsigned no. addition).
The second, on the other hand:
M = 100 + (200-100) / 2 = 100 + 100 / 2 = 150
No overflow.
As #user2357112 pointed out in a comment, there are no free lunches. If L is negative, the second version might not work while the first will.
Not sure, but if the max limit of int is suppose 100.
R=80 & L = 40
then,
M=(L+R)/2
M=(120)/2, here 120 is out limits if our integer type, so this causes overflow
However,
M = L + ((R - L) / 2)
M = 80 +((40)/2)
M = 80 +20
M =100.
So in this case we never encounter a value that exceeds the limits of our integer type.So this approach will never encounter a overFlow, THEORATICALLY.
I hope this analogy will help
It avoids overflow in this specific implementation, which operates under the guarantees that L and R are non-negative and L <= R. Under these guarantees it should be obvious that R - L does not overflow and L + ((R - L) / 2) does not overflow either.
In general case (i.e. for arbitrary values of L and R) R - L is as prone to overflow as L + R, meaning that this trick does not achieve anything.
The comment is wrong, for a number of reasons.
For the particular problem the risk of overflow is probably nil.
Reordering calculations does not guarantee that the compiler will perform them in that order.
If there is a range of values for which an ordering can cause overflow, then there is another range of values for which the reordered calculation will cause overflow.
If overflow could be a problem then it should be controlled explicitly, not implicitly.
This is an excellent place for an assert. In this case the algorithm is only valid if N is less than half the maximum positive range of int, so say it in an assert.
If the algorithm is required to work for the whole positive range of signed int then the range should be explicitly tested in an assert, and the calculation should be ordered by introducing a sequence point (eg broken into two statements).
Doing this right is hard. Numerical computation is full of this stuff. Best to avoid, if possible. And don't accept random advice (even this!) without doing your own research.

Is there an expression using modulo to do backwards wrap-around ("reverse overflow")?

For any whole number input W restricted by the range R = [x,y], the "overflow," for lack of a better term, of W over R is W % (y-x+1) + x. This causes it wrap back around if W exceeds y.
As an example of this principle, suppose we iterate over a calendar's months:
int this_month = 5;
int next_month = (this_month + 1) % 12;
where both integers will be between 0 and 11, inclusive. Thus, the expression above "clamps" the integer to the range R = [0,11]. This approach of using an expression is simple, elegant, and advantageous as it omits branching.
Now, what if we want to do the same thing, but backwards? The following expression works:
int last_month = ((this_month - 1) % 12 + 12) % 12;
but it's abstruse. How can it be beautified?
tl;dr - Can the expression ((x-1) % k + k) % k be simplified further?
Note: C++ tag specified because other languages handle negative operands for the modulo operator differently.
Your expression should be ((x-1) + k) % k. This will properly wrap x=0 around to 11. In general, if you want to step back more than 1, you need to make sure that you add enough so that the first operand of the modulo operation is >= 0.
Here is an implementation in C++:
int wrapAround(int v, int delta, int minval, int maxval)
{
const int mod = maxval + 1 - minval;
if (delta >= 0) {return (v + delta - minval) % mod + minval;}
else {return ((v + delta) - delta * mod - minval) % mod + minval;}
}
This also allows to use months labeled from 0 to 11 or from 1 to 12, setting min_val and max_val accordingly.
Since this answer is so highly appreciated, here is an improved version without branching, which also handles the case where the initial value v is smaller than minval. I keep the other example because it is easier to understand:
int wrapAround(int v, int delta, int minval, int maxval)
{
const int mod = maxval + 1 - minval;
v += delta - minval;
v += (1 - v / mod) * mod;
return v % mod + minval;
}
The only issue remaining is if minval is larger than maxval. Feel free to add an assertion if you need it.
k % k will always be 0. I'm not 100% sure what you're trying to do but it seems you want the last month to be clamped between 0 and 11 inclusive.
(this_month + 11) % 12
Should suffice.
The general solution is to write a function that computes the value that you want:
//Returns floor(a/n) (with the division done exactly).
//Let ÷ be mathematical division, and / be C++ division.
//We know
// a÷b = a/b + f (f is the remainder, not all
// divisions have exact Integral results)
//and
// (a/b)*b + a%b == a (from the standard).
//Together, these imply (through algebraic manipulation):
// sign(f) == sign(a%b)*sign(b)
//We want the remainder (f) to always be >=0 (by definition of flooredDivision),
//so when sign(f) < 0, we subtract 1 from a/n to make f > 0.
template<typename Integral>
Integral flooredDivision(Integral a, Integral n) {
Integral q(a/n);
if ((a%n < 0 && n > 0) || (a%n > 0 && n < 0)) --q;
return q;
}
//flooredModulo: Modulo function for use in the construction
//looping topologies. The result will always be between 0 and the
//denominator, and will loop in a natural fashion (rather than swapping
//the looping direction over the zero point (as in C++11),
//or being unspecified (as in earlier C++)).
//Returns x such that:
//
//Real a = Real(numerator)
//Real n = Real(denominator)
//Real r = a - n*floor(n/d)
//x = Integral(r)
template<typename Integral>
Integral flooredModulo(Integral a, Integral n) {
return a - n * flooredDivision(a, n);
}
Easy Peasy, do not use the first module operator, it is superfluous:
int last_month = (this_month - 1 + 12) % 12;
which is the general case
In this instance you can write 11, but I would still do the -1 + 11 as it more clearly states what you want to achieve.
Note that normal mod causes the pattern 0...11 to repeat at 12...23, 24...35, etc. but doesn't wrap on -11...-1. In other words, it has two sets of behaviors. One from -infinity...-1, and a different set of behavior from 0...infinity.
The expression ((x-1) % k + k) % k fixes -11...-1 but has the same problem as normal mod with -23...-12. I.e. while it fixes 12 additional numbers, it doesn't wrap around infinitely. It still has one set of behavior from -infinity...-12, and a different behavior from -11...+infinity.
This means that if you're using the function for offsets, it could lead to buggy code.
If you want a truly wrap around mod, it should handle the entire range, -infinity...infinity in exactly the same way.
There is probably a better way to implement this, but here is an easy to understand implementation:
// n must be greater than 0
func wrapAroundMod(a: Int, n: Int) -> Int {
var offsetTimes: Int = 0
if a < 0 {
offsetTimes = (-a / n) + 1
}
return (a + n * offsetTimes) % n
}
Not sure if you were having the same problem as me, but my problem was essentially that I wanted to constrain all numbers to a certain range. Say that range was 0-6, so using %7 means that any number higher than 6 will wrap back around to 0 or above. The actual problem is that numbers less than zero didn't wrap back around to 6. I have a solution to that (where X is the upper limit of your number range and 0 is the minimum):
if(inputNumber <0)//If this is a negative number
{
(X-(inputNumber*-1))%X;
}
else
{
inputNumber%X;
}

Porting optimized Sieve of Eratosthenes from Python to C++

Some time ago I used the (blazing fast) primesieve in python that I found here: Fastest way to list all primes below N
To be precise, this implementation:
def primes2(n):
""" Input n>=6, Returns a list of primes, 2 <= p < n """
n, correction = n-n%6+6, 2-(n%6>1)
sieve = [True] * (n/3)
for i in xrange(1,int(n**0.5)/3+1):
if sieve[i]:
k=3*i+1|1
sieve[ k*k/3 ::2*k] = [False] * ((n/6-k*k/6-1)/k+1)
sieve[k*(k-2*(i&1)+4)/3::2*k] = [False] * ((n/6-k*(k-2*(i&1)+4)/6-1)/k+1)
return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]
Now I can slightly grasp the idea of the optimizing by automaticly skipping multiples of 2, 3 and so on, but when it comes to porting this algorithm to C++ I get stuck (I have a good understanding of python and a reasonable/bad understanding of C++, but good enough for rock 'n roll).
What I currently have rolled myself is this (isqrt() is just a simple integer square root function):
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
T sievemax = (N-3 + (1-(N % 2))) / 2;
T i;
T sievemaxroot = isqrt(sievemax) + 1;
boost::dynamic_bitset<> sieve(sievemax);
sieve.set();
primes.push_back(2);
for (i = 0; i <= sievemaxroot; i++) {
if (sieve[i]) {
primes.push_back(2*i+3);
for (T j = 3*i+3; j <= sievemax; j += 2*i+3) sieve[j] = 0; // filter multiples
}
}
for (; i <= sievemax; i++) {
if (sieve[i]) primes.push_back(2*i+3);
}
}
This implementation is decent and automatically skips multiples of 2, but if I could port the Python implementation I think it could be much faster (50%-30% or so).
To compare the results (in the hope this question will be successfully answered), the current execution time with N=100000000, g++ -O3 on a Q6600 Ubuntu 10.10 is 1230ms.
Now I would love some help with either understanding what the above Python implementation does or that you would port it for me (not as helpful though).
EDIT
Some extra information about what I find difficult.
I have trouble with the techniques used like the correction variable and in general how it comes together. A link to a site explaining different Eratosthenes optimizations (apart from the simple sites that say "well you just skip multiples of 2, 3 and 5" and then get slam you with a 1000 line C file) would be awesome.
I don't think I would have issues with a 100% direct and literal port, but since after all this is for learning that would be utterly useless.
EDIT
After looking at the code in the original numpy version, it actually is pretty easy to implement and with some thinking not too hard to understand. This is the C++ version I came up with. I'm posting it here in full version to help further readers in case they need a pretty efficient primesieve that is not two million lines of code. This primesieve does all primes under 100000000 in about 415 ms on the same machine as above. That's a 3x speedup, better then I expected!
#include <vector>
#include <boost/dynamic_bitset.hpp>
// http://vault.embedded.com/98/9802fe2.htm - integer square root
unsigned short isqrt(unsigned long a) {
unsigned long rem = 0;
unsigned long root = 0;
for (short i = 0; i < 16; i++) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
root++;
if (root <= rem) {
rem -= root;
root++;
} else root--;
}
return static_cast<unsigned short> (root >> 1);
}
// https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
// https://stackoverflow.com/questions/5293238/porting-optimized-sieve-of-eratosthenes-from-python-to-c/5293492
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
T i, j, k, l, sievemax, sievemaxroot;
sievemax = N/3;
if ((N % 6) == 2) sievemax++;
sievemaxroot = isqrt(N)/3;
boost::dynamic_bitset<> sieve(sievemax);
sieve.set();
primes.push_back(2);
primes.push_back(3);
for (i = 1; i <= sievemaxroot; i++) {
if (sieve[i]) {
k = (3*i + 1) | 1;
l = (4*k-2*k*(i&1)) / 3;
for (j = k*k/3; j < sievemax; j += 2*k) {
sieve[j] = 0;
sieve[j+l] = 0;
}
primes.push_back(k);
}
}
for (i = sievemaxroot + 1; i < sievemax; i++) {
if (sieve[i]) primes.push_back((3*i+1)|1);
}
}
I'll try to explain as much as I can. The sieve array has an unusual indexing scheme; it stores a bit for each number that is congruent to 1 or 5 mod 6. Thus, a number 6*k + 1 will be stored in position 2*k and k*6 + 5 will be stored in position 2*k + 1. The 3*i+1|1 operation is the inverse of that: it takes numbers of the form 2*n and converts them into 6*n + 1, and takes 2*n + 1 and converts it into 6*n + 5 (the +1|1 thing converts 0 to 1 and 3 to 5). The main loop iterates k through all numbers with that property, starting with 5 (when i is 1); i is the corresponding index into sieve for the number k. The first slice update to sieve then clears all bits in the sieve with indexes of the form k*k/3 + 2*m*k (for m a natural number); the corresponding numbers for those indexes start at k^2 and increase by 6*k at each step. The second slice update starts at index k*(k-2*(i&1)+4)/3 (number k * (k+4) for k congruent to 1 mod 6 and k * (k+2) otherwise) and similarly increases the number by 6*k at each step.
Here's another attempt at an explanation: let candidates be the set of all numbers that are at least 5 and are congruent to either 1 or 5 mod 6. If you multiply two elements in that set, you get another element in the set. Let succ(k) for some k in candidates be the next element (in numerical order) in candidates that is larger than k. In that case, the inner loop of the sieve is basically (using normal indexing for sieve):
for k in candidates:
for (l = k; ; l += 6) sieve[k * l] = False
for (l = succ(k); ; l += 6) sieve[k * l] = False
Because of the limitations on which elements are stored in sieve, that is the same as:
for k in candidates:
for l in candidates where l >= k:
sieve[k * l] = False
which will remove all multiples of k in candidates (other than k itself) from the sieve at some point (either when the current k was used as l earlier or when it is used as k now).
Piggy-Backing onto Howard Hinnant's response, Howard, you don't have to test numbers in the set of all natural numbers not divisible by 2, 3 or 5 for primality, per se. You need simply multiply each number in the array (except 1, which self-eliminates) times itself and every subsequent number in the array. These overlapping products will give you all the non-primes in the array up to whatever point you extend the deterministic-multiplicative process. Thus the first non-prime in the array will be 7 squared, or 49. The 2nd, 7 times 11, or 77, etc. A full explanation here: http://www.primesdemystified.com
As an aside, you can "approximate" prime numbers. Call the approximate prime P. Here are a few formulas:
P = 2*k+1 // not divisible by 2
P = 6*k + {1, 5} // not divisible 2, 3
P = 30*k + {1, 7, 11, 13, 17, 19, 23, 29} // not divisble by 2, 3, 5
The properties of the set of numbers found by these formulas is that P may not be prime, however all primes are in the set P. I.e. if you only test numbers in the set P for prime, you won't miss any.
You can reformulate these formulas to:
P = X*k + {-i, -j, -k, k, j, i}
if that is more convenient for you.
Here is some code that uses this technique with a formula for P not divisible by 2, 3, 5, 7.
This link may represent the extent to which this technique can be practically leveraged.

Calculating Binomial Coefficient (nCk) for large n & k

I just saw this question and have no idea how to solve it. can you please provide me with algorithms , C++ codes or ideas?
This is a very simple problem. Given the value of N and K, you need to tell us the value of the binomial coefficient C(N,K). You may rest assured that K <= N and the maximum value of N is 1,000,000,000,000,000. Since the value may be very large, you need to compute the result modulo 1009.
Input
The first line of the input contains the number of test cases T, at most 1000. Each of the next T lines consists of two space separated integers N and K, where 0 <= K <= N and 1 <= N <= 1,000,000,000,000,000.
Output
For each test case, print on a new line, the value of the binomial coefficient C(N,K) modulo 1009.
Example
Input:
3
3 1
5 2
10 3
Output:
3
10
120
Notice that 1009 is a prime.
Now you can use Lucas' Theorem.
Which states:
Let p be a prime.
If n = a1a2...ar when written in base p and
if k = b1b2...br when written in base p
(pad with zeroes if required)
Then
(n choose k) modulo p = (a1 choose b1) * (a2 choose b2) * ... * (ar choose br) modulo p.
i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.
Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.
This should make it easier even when N is close to 10^15.
Note:
For N=10^15, N choose N/2 is more than
2^(100000000000000) which is way
beyond an unsigned long long.
Also, the algorithm suggested by
Lucas' theorem is O(log N) which is
exponentially faster than trying to
compute the binomial coefficient
directly (even if you did a mod 1009
to take care of the overflow issue).
Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):
class Binomial
{
public:
Binomial(int Max)
{
max = Max+1;
table = new unsigned int * [max]();
for (int i=0; i < max; i++)
{
table[i] = new unsigned int[max]();
for (int j = 0; j < max; j++)
{
table[i][j] = 0;
}
}
}
~Binomial()
{
for (int i =0; i < max; i++)
{
delete table[i];
}
delete table;
}
unsigned int Choose(unsigned int n, unsigned int k);
private:
bool Contains(unsigned int n, unsigned int k);
int max;
unsigned int **table;
};
unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
if (n < k) return 0;
if (k == 0 || n==1 ) return 1;
if (n==2 && k==1) return 2;
if (n==2 && k==2) return 1;
if (n==k) return 1;
if (Contains(n,k))
{
return table[n][k];
}
table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
return table[n][k];
}
bool Binomial::Contains(unsigned int n, unsigned int k)
{
if (table[n][k] == 0)
{
return false;
}
return true;
}
Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.
Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.
After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.
Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.
So for example consider the case of C((1009^3), 200).
Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.
On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.
That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.
We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).
Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!
I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes
Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while.
What you might use is, if
n = n0 + n1*p + n2*p^2 ... + nd*p^d
m = m0 + m1*p + m2*p^2 ... + md*p^d
(where 0<=mi,ni < p)
then
C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p
see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html
One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.
psudo code for calculating nCk:
result = 1
for i=1 to min{K,N-K}:
result *= N-i+1
result /= i
return result
Time Complexity: O(min{K,N-K})
The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because
C(k,n) = C(k, n-k)
And you can calculate the thing even more efficiently if you use the GammaLn function.
nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))
The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.
The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.
binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5
do while col < n
binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
output(binomial_coefficient)
col = col + 1
loop
The output of binomial coefficients is therefore:
1
1 * (5 + 1 - (0 + 1)) / (0 + 1) = 5
5 * (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5
5 * (5 + 1 - (4 + 1)) / (4 + 1) = 1
I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(