Range queries on a binary string?

Range queries on a binary string? - c++

Binary-Decimal
We are given a binary string S of length n, in which each character is either '1' or '0'.
And we are asked to perform several queries on the string.
In each query we are given integers L and R.
And we have to tell the value of sub-string S[l..r], in decimal representation.
Sample testcase:
Input:
1011 (string S)
5 (number of queries)
1 1 (l, r)
2 2
1 2
2 4
1 4
Output:
1 (1 * 2^0 == 1)
0
2
3 (0 * 2^2 + 1 * 2^1 + 1 * 2^0)
11 (1 * 2^3 + 0 * 2^2 + 1 * 2^1 + 1 * 2^0 = 11)
Constraints
1 < N < 10^5
1 < Q < 10^5
As the number can be very large, we are required to print it modulo 10^9 + 7.
Approach
So basically we need to convert binary representation sub-string S[l..r] in it's decimal.
I pre-computed the results of S[i...n-1] for all i:[0, n-1] in array B.
So now B[i] represents the decimal number representation of sub-string S[i..n-1].
vector<int> pow(1e5, 1);
for(int i = 1; i < 1e5; i++) {
pow[i] = (pow[i - 1] * 2) % mod;
}
string s;
getline(cin, s);
vector<int> B(n, 0);
int prev = 0;
for(int i = 0; i < n; i++) {
B[(n - 1) - i] = (prev + (s[(n - 1) - i] == '1' ? pow[i] : 0)) % mod;
prev = B[(n - 1) - i];
}
while(q--) {
int l, r;
cin >> l >> r;
cout << ((B[l] - (r + 1 < n ? B[r + 1] : 0) + mod) % mod) / pow[n - (r + 1)]<< "\n";
}
return 0;
With the above approach only sample testcase got passed and all other cases are giving the wrong answers(WA).
I even tried using Segment tree for this problem but that is also not working.
What is the correct approach to solve this problem ?

Define V[k] to be the value of the digits of S starting with the kth one.
Then the value of the substring S[l..r] = (V[l] - V[r+1]) / 2^(n - r - 1). (Something like that, I may have an off by one error. Play with small examples.)
Now the useful fact about 10^9 + 7 is that it is a prime. (The first 10 digit prime.) Which means that dividing by 2 is the same as multiplying by 2^(10^9 + 5). Which is a constant that you can figure out with repeated squaring. And raising that constant to a high power can be done very efficiently using repeated squaring.
With this you can create a lookup table for V, and then do your queries in time O(log(n)).

This seems the same as regular sum range-queries, except (1) we need to store the partial sums mod 10^9 + 7, (2) during retrieval, we need to "shift" the relevant sections of the full sum by the length of the sections on their right. To "shift" in this case would mean multiplying by 2^(length_of_suffix) mod 10^9 + 7. And of course sum the sections mod 10^9 + 7.
But btilly's answer seems much simpler :)

Related

Find minimum number of digits required to make a given number

We have to find the minimum number of digits required to make a given number, for example: 14 => 95 (9 + 5 = 14) is two digits which is the minimum to form 14.
int moves(int n) {
int m = 0; // Minimum count
while (n-9 >= 0) { // To place maximum number of 9's
n -= 9;
m++;
}
if (n == 0) { // If only nines made up the number
return m;
}
else {
m++;
return m;
}
}
I am getting a TLE (runtime time limit exceeded) by an online judge. How can I improve it or is there a better approach?

Your code starts by looking at how many times 9 fits into that number. This can be done way more easily:
int m = n/9;
This suffices since we do an integer division, in which the remainder is thrown away. Note that if n would be float or another floating type, this would not work.
The question left is if it is divisible by 9 or not. If not, we have one additional digit. This can be done by the modulo operator (made it verbose for ease of understanding):
bool divisible_by_nine = (n % 9 == 0);
Assuming that you might not know the modulo operator, it returns the remainder of an integer division, 47 % 9 = 2 since 47 / 9 = 5 remainder 2.
Without it, you would go with
int remainder = n - 9*m;
bool divisible = (remainder == 0);
Combined:
int required_digits(int number)
{
bool divisible = (number % 9 == 0);
return number/9 + (divisible ? 0 : 1);
}
Or in a single line, depending on how verbose you want it to be:
int required_digits(int number)
{
return number/9 + (number % 9 == 0 ? 0 : 1);
}
Since there isn't any loop, this is in Θ(1) and thus should work in your required time limit.
(Technically, the processor might as well handle the division somewhat like you did internally, but it is very efficient at that. To be absolutely correct, I'd have to add "assuming that division is a constant time operation".)

Your solution works fine. You can try the shorter:
return (n%9==0)? n/9 : n/9 +1 ;
Shorter, but less easy to read...
Or a compromise:
if (n%9==0) // n can be divided by 9
return n/9;
else
return n/9+1;

Explanation
We know that every number a can be represented as
(a_n * 10 ^ n) + ... + (a_2 * 10 ^ 2) + (a_1 * 10) + (a_0)
where a_k are digits
and 10^n = 11...11 * 9 + 1 (n digits 1).
Meaning that number 10^n can be represented as the sum of 11...11 + 1 digits.
Now we can write a as (a_n * 11..11 * 9 + a_n) + ...
After grouping by 9 (help, I don't know English term for this. Factoring?)
(a_n * 11..11 + a_n-1 * 11..11 + ... a_1) * 9 + (a_n + a_n-1 + ... + a_1 + a_0)
Which I'll write as b_9 * 9 + b_1.
This means that number a can be represented as the sum of b_9 digits 9 + how much is needed for b_1 (this is recursive by the way)
To recapitulate:
Let's call function f
If -10 < digit < 10, the result is 1.
Two counters are needed, c1 and c2.
Iterate over digits
For every ith digit, multiply by i digit number 11..11 and add the result to c1
Add the ith digit to c2
The result is c_1 + f(c_2)
And for practice, implement this in a non-recursive way.

As you guess, you need to iterate on a lower number to a bigger one, like 111119 is fine, but we want the lowest one... Your answer is wrong. The lowest would be 59!
You can brute force and it will work, but for a bigger number you will struggle, so you need to guess first: How many minimum digits do I need to find my solution?
For instance, if you want to find 42, just add as much 9 you need to overflow the result!
9 + 9 + 9 + 9 + 9 = 45. When you find the overflow, you know that the answer is lower than 99999.
Now how much do I need to decrease the value to get the correct answer, 3 as expected?
So 99996, 99969, etc... will be valid! But you want to lower, so you have to decrease the greatest unit (the left one of course!).
The answer would be 69999 = 42!
int n = 14;
int r = 0;
for (int i = i; i < 10 /*if you play with long or long long*/; i++)
if (i * 9 >= n)
{
for (int j = 0; j < i; j++)
r = r * 10 + 9;
while (is_correct(r, n) == false)
{
// Code it yourself!!
}
return (r);
}
Now it correctly returns true or false. You can make it return the number that r is actually a decrease what you need to decrease! It's not the fastest way possible, and there is always a faster way, with a binary shift, but this algorithm would work just fine!

Divide array into smaller consecutive parts such that NEO value is maximal

On this years Bubble Cup (finished) there was the problem NEO (which I couldn't solve), which asks
Given array with n integer elements. We divide it into several part (may be 1), each part is a consecutive of elements. The NEO value in that case is computed by: Sum of value of each part. Value of a part is sum all elements in this part multiple by its length.
Example: We have array: [ 2 3 -2 1 ]. If we divide it like: [2 3] [-2 1]. Then NEO = (2 + 3) * 2 + (-2 + 1) * 2 = 10 - 2 = 8.
The number of elements in array is smaller then 10^5 and the numbers are integers between -10^6 and 10^6
I've tried something like divide and conquer to constantly split array into two parts if it increases the maximal NEO number otherwise return the NEO of the whole array. But unfortunately the algorithm has worst case O(N^2) complexity (my implementation is below) so I'm wondering whether there is a better solution
EDIT: My algorithm (greedy) doesn't work, taking for example [1,2,-6,2,1] my algorithm returns the whole array while to get the maximal NEO value is to take parts [1,2],[-6],[2,1] which gives NEO value of (1+2)*2+(-6)+(1+2)*2=6
#include <iostream>
int maxInterval(long long int suma[],int first,int N)
{
long long int max = -1000000000000000000LL;
long long int curr;
if(first==N) return 0;
int k;
for(int i=first;i<N;i++)
{
if(first>0) curr = (suma[i]-suma[first-1])*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Split the array into elements from [first..i] and [i+1..N-1] store the corresponding NEO value
else curr = suma[i]*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Same excpet that here first = 0 so suma[first-1] doesn't exist
if(curr > max) max = curr,k=i; // find the maximal NEO value for splitting into two parts
}
if(k==N-1) return max; // If the max when we take the whole array then return the NEO value of the whole array
else
{
return maxInterval(suma,first,k+1)+maxInterval(suma,k+1,N); // Split the 2 parts further if needed and return it's sum
}
}
int main() {
int T;
std::cin >> T;
for(int j=0;j<T;j++) // Iterate over all the test cases
{
int N;
long long int NEO[100010]; // Values, could be long int but just to be safe
long long int suma[100010]; // sum[i] = sum of NEO values from NEO[0] to NEO[i]
long long int sum=0;
int k;
std::cin >> N;
for(int i=0;i<N;i++)
{
std::cin >> NEO[i];
sum+=NEO[i];
suma[i] = sum;
}
std::cout << maxInterval(suma,0,N) << std::endl;
}
return 0;
}

This is not a complete solution but should provide some helpful direction.
Combining two groups that each have a positive sum (or one of the sums is non-negative) would always yield a bigger NEO than leaving them separate:
m * a + n * b < (m + n) * (a + b) where a, b > 0 (or a > 0, b >= 0); m and n are subarray lengths
Combining a group with a negative sum with an entire group of non-negative numbers always yields a greater NEO than combining it with only part of the non-negative group. But excluding the group with the negative sum could yield an even greater NEO:
[1, 1, 1, 1] [-2] => m * a + 1 * (-b)
Now, imagine we gradually move the dividing line to the left, increasing the sum b is combined with. While the expression on the right is negative, the NEO for the left group keeps decreasing. But if the expression on the right gets positive, relying on our first assertion (see 1.), combining the two groups would always be greater than not.
Combining negative numbers alone in sequence will always yield a smaller NEO than leaving them separate:
-a - b - c ... = -1 * (a + b + c ...)
l * (-a - b - c ...) = -l * (a + b + c ...)
-l * (a + b + c ...) < -1 * (a + b + c ...) where l > 1; a, b, c ... > 0
O(n^2) time, O(n) space JavaScript code:
function f(A){
A.unshift(0);
let negatives = [];
let prefixes = new Array(A.length).fill(0);
let m = new Array(A.length).fill(0);
for (let i=1; i<A.length; i++){
if (A[i] < 0)
negatives.push(i);
prefixes[i] = A[i] + prefixes[i - 1];
m[i] = i * (A[i] + prefixes[i - 1]);
for (let j=negatives.length-1; j>=0; j--){
let negative = prefixes[negatives[j]] - prefixes[negatives[j] - 1];
let prefix = (i - negatives[j]) * (prefixes[i] - prefixes[negatives[j]]);
m[i] = Math.max(m[i], prefix + negative + m[negatives[j] - 1]);
}
}
return m[m.length - 1];
}
console.log(f([1, 2, -5, 2, 1, 3, -4, 1, 2]));
console.log(f([1, 2, -4, 1]));
console.log(f([2, 3, -2, 1]));
console.log(f([-2, -3, -2, -1]));
Update
This blog provides that we can transform the dp queries from
dp_i = sum_i*i + max(for j < i) of ((dp_j + sum_j*j) + (-j*sum_i) + (-i*sumj))
to
dp_i = sum_i*i + max(for j < i) of (dp_j + sum_j*j, -j, -sum_j) ⋅ (1, sum_i, i)
which means we could then look at each iteration for an already seen vector that would generate the largest dot product with our current information. The math alluded to involves convex hull and farthest point query, which are beyond my reach to implement at this point but will make a study of.

Find the smallest integer whose sum of squares of digits add to the given number

Example:
Input: | Output:
5 –> 12 (1^2 + 2^2 = 5)
500 -> 18888999 (1^2 + 8^2 + 8^2 + 8^2 + 9^2 + 9^2 + 9^2 = 500)
I have written a pretty simple brute-force solution, but it has big performance problems:
#include <iostream>
using namespace std;
int main() {
int n;
bool found = true;
unsigned long int sum = 0;
cin >> n;
int i = 0;
while (found) {
++i;
if (n == 0) { //The code below doesn't work if n = 0, so we assign value to sum right away (in case n = 0)
sum = 0;
break;
}
int j = i;
while (j != 0) { //After each iteration, j's last digit gets stripped away (j /= 10), so we want to stop right when j becomes 0
sum += (j % 10) * (j % 10); //After each iteration, sum gets increased by *(last digit of j)^2*. (j % 10) gets the last digit of j
j /= 10;
}
if (sum == n) { //If we meet our problem's requirements, so that sum of j's each digit squared is equal to the given number n, loop breaks and we get our result
break;
}
sum = 0; //Otherwise, sum gets nullified and the loops starts over
}
cout << i;
return 0;
}
I am looking for a fast solution to the problem.

Use dynamic programming. If we knew the first digit of the optimal solution, then the rest would be an optimal solution for the remainder of the sum. As a result, we can guess the first digit and use a cached computation for smaller targets to get the optimum.
def digitsum(n):
best = [0]
for i in range(1, n+1):
best.append(min(int(str(d) + str(best[i - d**2]).strip('0'))
for d in range(1, 10)
if i >= d**2))
return best[n]

Let's try and explain David's solution. I believe his assumption is that given an optimal solution, abcd..., the optimal solution for n - a^2 would be bcd..., therefore if we compute all the solutions from 1 to n, we can rely on previous solutions for numbers smaller than n as we try different subtractions.
So how can we interpret David's code?
(1) Place the solutions for the numbers 1 through n, in order, in the table best:
for i in range(1, n+1):
best.append(...
(2) the solution for the current query, i, is the minimum in an array of choices for different digits, d, between 1 and 9 if subtracting d^2 from i is feasible.
The minimum of the conversion to integers...
min(int(
...of the the string, d, concatenated with the string of the solution for n - d^2 previously recorded in the table (removing the concatenation of the solution for zero):
str(d) + str(best[i - d**2]).strip('0')
Let's modify the last line of David's code, to see an example of how the table works:
def digitsum(n):
best = [0]
for i in range(1, n+1):
best.append(min(int(str(d) + str(best[i - d**2]).strip('0'))
for d in range(1, 10)
if i >= d**2))
return best # original line was 'return best[n]'
We call, digitsum(10):
=> [0, 1, 11, 111, 2, 12, 112, 1112, 22, 3, 13]
When we get to i = 5, our choices for d are 1 and 2 so the array of choices is:
min([ int(str(1) + str(best[5 - 1])), int(str(2) + str(best[5 - 4])) ])
=> min([ int( '1' + '2' ), int( '2' + '1' ) ])
And so on and so forth.

So this is in fact a well known problem in disguise. The minimum coin change problem in which you are given a sum and requested to pay with minimum number of coins. Here instead of ones, nickels, dimes and quarters we have 81, 64, 49, 36, ... , 1 cents.
Apparently this is a typical example to encourage dynamic programming. In dynamic programming, unlike in recursive approach in which you are expected to go from top to bottom, you are now expected to go from bottom to up and "memoize" the results those will be required later. Thus... much faster..!
So ok here is my approach in JS. It's probably doing a very similar job to David's method.
function getMinNumber(n){
var sls = Array(n).fill(),
sct = [], max;
sls.map((_,i,a) => { max = Math.min(9,~~Math.sqrt(i+1)),
sct = [];
while (max) sct.push(a[i-max*max] ? a[i-max*max].concat(max--)
: [max--]);
a[i] = sct.reduce((p,c) => p.length < c.length ? p : c);
});
return sls[sls.length-1].reverse().join("");
}
console.log(getMinNumber(500));
What we are doing is from bottom to up generating a look up array called sls. This is where memoizing happens. Then starting from from 1 to n we are mapping the best result among several choices. For example if we are to look for 10's partitions we will start with the integer part of 10's square root which is 3 and keep it in the max variable. So 3 being one of the numbers the other should be 10-3*3 = 1. Then we look up for the previously solved 1 which is in fact [1] at sls[0] and concat 3 to sls[0]. And the result is [3,1]. Once we finish with 3 then one by one we start over the same job with one smaller, up until it's 1. So after 3 we check for 2 (result is [2,2,1,1]) and then for 1 (result is [1,1,1,1,1,1,1,1,1,1]) and compare the length of the results of 3, 2 and 1 for the shortest, which is [3,1] and store it at sls[9] (a.k.a a[i]) which is the place for 10 in our look up array.

(Edit) This answer is not correct. The greedy approach does not work for this problem -- sorry.
I'll give my solution in a language agnostic fashion, i.e. the algorithm.
I haven't tested but I believe this should do the trick, and the complexity is proportional to the number of digits in the output:
digitSquared(n) {
% compute the occurrences of each digit
numberOfDigits = [0 0 0 0 0 0 0 0 0]
for m from 9 to 1 {
numberOfDigits[m] = n / m*m;
n = n % m*m;
if (n==0)
exit loop;
}
% assemble the final output
output = 0
powerOfTen = 0
for m from 9 to 1 {
for i from 0 to numberOfDigits[m] {
output = output + m*10^powerOfTen
powerOfTen = powerOfTen + 1
}
}
}

Finding the smallest possible number which cannot be represented as sum of 1,2 or other numbers in the sequence

I am a newbie in C++ and need logical help in the following task.
Given a sequence of n positive integers (n < 10^6; each given integer is less than 10^6), write a program to find the smallest positive integer, which cannot be expressed as a sum of 1, 2, or more items of the given sequence (i.e. each item could be taken 0 or 1 times). Examples: input: 2 3 4, output: 1; input: 1 2 6, output: 4
I cannot seem to construct the logic out of it, why the last output is 4 and how to implement it in C++, any help is greatly appreciated.
Here is my code so far:
#include<iostream>
using namespace std;
const int SIZE = 3;
int main()
{
//Lowest integer by default
int IntLowest = 1;
int x = 0;
//Our sequence numbers
int seq;
int sum = 0;
int buffer[SIZE];
//Loop through array inputting sequence numbers
for (int i = 0; i < SIZE; i++)
{
cout << "Input sequence number: ";
cin >> seq;
buffer[i] = seq;
sum += buffer[i];
}
int UpperBound = sum + 1;
int a = buffer[x] + buffer[x + 1];
int b = buffer[x] + buffer[x + 2];
int c = buffer[x + 1] + buffer[x + 2];
int d = buffer[x] + buffer[x + 1] + buffer[x + 2];
for (int y = IntLowest - 1; y < UpperBound; y++)
{
//How should I proceed from here?
}
return 0;
}

What the answer of Voreno suggests is in fact solving 0-1 knapsack problem (http://en.wikipedia.org/wiki/Knapsack_problem#0.2F1_Knapsack_Problem). If you follow the link you can read how it can be done without constructing all subsets of initial set (there are too much of them, 2^n). And it would work if the constraints were a bit smaller, like 10^3.
But with n = 10^6 it still requires too much time and space. But there is no need to solve knapsack problem - we just need to find first number we can't get.
The better solution would be to sort the numbers and then iterate through them once, finding for each prefix of your array a number x, such that with that prefix you can get all numbers in interval [1..x]. The minimal number that we cannot get at this point is x + 1. When you consider the next number a[i] you have two options:
a[i] <= x + 1, then you can get all numbers up to x + a[i],
a[i] > x + 1, then you cannot get x + 1 and you have your answer.
Example:
you are given numbers 1, 4, 12, 2, 3.
You sort them (and get 1, 2, 3, 4, 12), start with x = 0, consider each element and update x the following way:
1 <= x + 1, so x = 0 + 1 = 1.
2 <= x + 1, so x = 1 + 2 = 3.
3 <= x + 1, so x = 3 + 3 = 6.
4 <= x + 1, so x = 6 + 4 = 10.
12 > x + 1, so we have found the answer and it is x + 1 = 11.
(Edit: fixed off-by-one error, added example.)

I think this can be done in O(n) time and O(log2(n)) memory complexities.
Assuming that a BSR (highest set bit index) (floor(log2(x))) implementation in O(1) is used.
Algorithm:
1 create an array of (log2(MAXINT)) buckets, 20 in case of 10^6, Each bucket contains the sum and min values (init: min = 2^(i+1)-1, sum = 0). (lazy init may be used for small n)
2 one pass over the input, storing each value in the buckets[bsr(x)].
for (x : buffer) // iterate input
buckets[bsr(x)].min = min(buckets[bsr(x)].min, x)
buckets[bsr(x)].sum += x
3 Iterate over buckets, maintaining unreachable:
int unreachable = 1 // 0 is always reachable
for(b : buckets)
if (unreachable >= b.min)
unreachable += b.sum
else
break
return unreachable
This works because, assuming we are at bucket i, lets consider the two cases:
unreachable >= b.min is true: because this bucket contains values in the range [2^i...2^(i+1)-1], this implies that 2^i <= b.min. in turn, b.min <= unreachable. therefor unreachable+b.min >= 2^(i+1). this means that all values in the bucket may be added (after adding b.min all the other values are smaller) i.e. unreachable += b.sum.
unreachable >= b.min is false: this means that b.min (the smallest number the the remaining sequence) is greater than unreachable. thus we need to return unreachable.

The output of the second input is 4 because that is the smallest positive number that cannot be expressed as a sum of 1,2 or 6 if you can take each item only 0 or 1 times. I hope this can help you understand more:
You have 3 items in that list: 1,2,6
Starting from the smallest positive integer, you start checking if that integer can be the result of the sum of 1 or more numbers of the given sequence.
1 = 1+0+0
2 = 0+2+0
3 = 1+2+0
4 cannot be expressed as a result of the sum of one of the items in the list (1,2,6). Thus 4 is the smallest positive integer which cannot be expressed as a sum of the items of that given sequence.

The last output is 4 because:
1 = 1
2 = 2
1 + 2 = 3
1 + 6 = 7
2 + 6 = 8
1 + 2 + 6 = 9
Therefore, the lowest integer that cannot be represented by any combination of your inputs (1, 2, 6) is 4.
What the question is asking:
Part 1. Find the largest possible integer that can be represented by your input numbers (ie. the sum of all the numbers you are given), that gives the upper bound
UpperBound = sum(all_your_inputs) + 1
Part 2. Find all the integers you can get, by combining the different integers you are given. Ie if you are given a, b and c as integers, find:
a + b, a + c, b + c, and a + b + c
Part 2) + the list of integers, gives you all the integers you can get using your numbers.
cycle for each integer from 1 to UpperBound
for i = 1 to UpperBound
if i not = a number in the list from point 2)
i = your smallest integer
break
This is a clumsy way of doing it, but I'm sure that with some maths it's possible to find a better way?
EDIT: Improved solution
//sort your input numbers from smallest to largest
input_numbers = sort(input_numbers)
//create a list of integers that have been tried numbers
tried_ints = //empty list
for each input in input_numbers
//build combinations of sums of this input and any of the previous inputs
//add the combinations to tried_ints, if not tried before
for 1 to input
//check whether there is a gap in tried_ints
if there_is_gap
//stop the program, return the smallest integer
//the first gap number is the smallest integer

Calculating Binomial Coefficient (nCk) for large n & k

I just saw this question and have no idea how to solve it. can you please provide me with algorithms , C++ codes or ideas?
This is a very simple problem. Given the value of N and K, you need to tell us the value of the binomial coefficient C(N,K). You may rest assured that K <= N and the maximum value of N is 1,000,000,000,000,000. Since the value may be very large, you need to compute the result modulo 1009.
Input
The first line of the input contains the number of test cases T, at most 1000. Each of the next T lines consists of two space separated integers N and K, where 0 <= K <= N and 1 <= N <= 1,000,000,000,000,000.
Output
For each test case, print on a new line, the value of the binomial coefficient C(N,K) modulo 1009.
Example
Input:
3
3 1
5 2
10 3
Output:
3
10
120

Notice that 1009 is a prime.
Now you can use Lucas' Theorem.
Which states:
Let p be a prime.
If n = a1a2...ar when written in base p and
if k = b1b2...br when written in base p
(pad with zeroes if required)
Then
(n choose k) modulo p = (a1 choose b1) * (a2 choose b2) * ... * (ar choose br) modulo p.
i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.
Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.
This should make it easier even when N is close to 10^15.
Note:
For N=10^15, N choose N/2 is more than
2^(100000000000000) which is way
beyond an unsigned long long.
Also, the algorithm suggested by
Lucas' theorem is O(log N) which is
exponentially faster than trying to
compute the binomial coefficient
directly (even if you did a mod 1009
to take care of the overflow issue).
Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):
class Binomial
{
public:
Binomial(int Max)
{
max = Max+1;
table = new unsigned int * [max]();
for (int i=0; i < max; i++)
{
table[i] = new unsigned int[max]();
for (int j = 0; j < max; j++)
{
table[i][j] = 0;
}
}
}
~Binomial()
{
for (int i =0; i < max; i++)
{
delete table[i];
}
delete table;
}
unsigned int Choose(unsigned int n, unsigned int k);
private:
bool Contains(unsigned int n, unsigned int k);
int max;
unsigned int **table;
};
unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
if (n < k) return 0;
if (k == 0 || n==1 ) return 1;
if (n==2 && k==1) return 2;
if (n==2 && k==2) return 1;
if (n==k) return 1;
if (Contains(n,k))
{
return table[n][k];
}
table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
return table[n][k];
}
bool Binomial::Contains(unsigned int n, unsigned int k)
{
if (table[n][k] == 0)
{
return false;
}
return true;
}

Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.
Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.
After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.
Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.
So for example consider the case of C((1009^3), 200).
Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.
On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.
That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.
We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).
Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!

I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes
Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while.
What you might use is, if
n = n0 + n1*p + n2*p^2 ... + nd*p^d
m = m0 + m1*p + m2*p^2 ... + md*p^d
(where 0<=mi,ni < p)
then
C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p
see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html
One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.

psudo code for calculating nCk:
result = 1
for i=1 to min{K,N-K}:
result *= N-i+1
result /= i
return result
Time Complexity: O(min{K,N-K})
The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because
C(k,n) = C(k, n-k)
And you can calculate the thing even more efficiently if you use the GammaLn function.
nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))
The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.

The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.
binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5
do while col < n
binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
output(binomial_coefficient)
col = col + 1
loop
The output of binomial coefficients is therefore:
1
1 * (5 + 1 - (0 + 1)) / (0 + 1) = 5
5 * (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5
5 * (5 + 1 - (4 + 1)) / (4 + 1) = 1
I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js