Trying to compare a recursive and an iterative algorithm

Trying to compare a recursive and an iterative algorithm - c++

I have two algorithms that solve this problem: Generate all sequences of bits within Hamming distance t. Now I want to compare them theoretically (I do have time measurements, if needed).
The iterative algorithm has a complexity of:
O((n choose t) * n)
where n is the length of the bit-string and t is the desired Hamming distance.
The recursive algorithm, they best we have so far is:
O(2^n)
but how to compare these two Time Complexities, without introducing t inside the second Time Complexity? For that reason, I am trying to do that, can you help?
The recursive algorithm:
// str is the bitstring, i the current length, and changesLeft the
// desired Hamming distance (see linked question for more)
void magic(char* str, int i, int changesLeft) {
if (changesLeft == 0) {
// assume that this is constant
printf("%s\n", str);
return;
}
if (i < 0) return;
// flip current bit
str[i] = str[i] == '0' ? '1' : '0';
magic(str, i-1, changesLeft-1);
// or don't flip it (flip it again to undo)
str[i] = str[i] == '0' ? '1' : '0';
magic(str, i-1, changesLeft);
}

The recursive algorithm is O((n choose t) * n) too, by an analysis that charges to each printed combination the cost of the entire call stack at the time that it is printed. We can do this because every invocation of magic (except the two O(1) leaf calls where i < 0, which we could easily do away with) prints something.
This bound is best possible if you assign printing its true cost. Otherwise, I'm pretty sure that both analyses can be tightened to O(n choose t) excluding printing for t > 0, with details in Knuth 4A.

At the most general level of time complexity, we have a "worst case" of t = n/2. Now, fix t and gradually increment n. Let's take a starting point of n=8, t=4
C(8 4) = 8*7*6*5*4*3*2*1 / (4*3*2*1 * 4*3*2*1)
= 8*7*6*5 / 24
n <= n+1 ... n choose t is now
C(9 4) = ...
= 9*8*7*6 / 24
= 9/5 of the previous value.
Now, the progression is a little easier to watch.
C( 8 4) = 8*7*6*5 / 24
C( 9 4) = 9/5 * C( 8 4)
C(10 4) = 10/6 * C( 9 4)
C(11 4) = 11/7 * C(10 4)
...
C( n 4) = n/(n-4) * C(n-1 4)
Now, as lemmas for the student:
Find the base complexity, n! / ( (n-1)! ^ 2)
Find the combinatorial complexity of product (n / (n-c)) for constant c

Related

Why should mid-value be used instead of mid-value - 1 for a recursive implementation of a binary search?

Background
An interactive book presents this function as an example of a binary search.
void GuessNumber(int lowVal, int highVal) {
int midVal; // Midpoint of low and high value
char userAnswer; // User response
midVal = (highVal + lowVal) / 2;
// Prompt user for input
cout << "Is it " << midVal << "? (l/h/y): ";
cin >> userAnswer;
if( (userAnswer != 'l') && (userAnswer != 'h') ) { // Base case: found number
cout << "Thank you!" << endl;
}
else { // Recursive case: split into lower OR upper half
if (userAnswer == 'l') { // Guess in lower half
GuessNumber(lowVal, midVal); // Recursive call
}
else { // Guess in upper half
GuessNumber(midVal + 1, highVal); // Recursive call
}
}
}
It presents the algorithm, and then they give an explanation about how to calculate the mid-value for the recursive call.
Because midVal has already been checked, it need not be part of the new window, so midVal + 1 rather than midVal is used for the window's new low side, or midVal - 1 for the window's new high side. But the midVal - 1 can have the drawback of a non-intuitive base case (i.e., midVal < lowVal, because if the current window is say 4..5, midVal is 4, so the new window would be 4..4-1, or 4..3). rangeSize == 1 is likely more intuitive, and thus the algorithm uses midVal rather than midVal - 1. However, the algorithm uses midVal + 1 when searching higher, due to integer rounding. In particular, for window 99..100, midVal is 99 ((99 + 100) / 2 = 99.5, rounded to 99 due to truncation of the fraction in integer division). So the next window would again be 99..100, and the algorithm would repeat with this window forever. midVal + 1 prevents the problem, and doesn't miss any numbers because midVal was checked and thus need not be part of the window.
Reasoning
I can see why, when the mid-value is used as a lower limit, the function is called as GuessNumber(midVal + 1, highVal). The explanation given with the limits 99 and 100 is pretty clear. However, I don't understand why, when the mid-value is used as the highest limit, the function is called as GuessNumber(lowVal, midVal) instead of GuessNumber(lowVal, midVal - 1).
The algorithm is missing the case when the value being searched is not within the range. However, it seems that they do make the assumption that it is (as a precondition). Therefore, the example they give with 4 and 5 does not make much sense.
Test case: the number being searched is 4
Let's assume that the value being searched is 4.
mid_value := (4+5) / 2 = 9 / 2 = 4.5 = 4 (due truncation)
When the number is checked, it should return the position of 4, so there would be no error. The call GuessNumber(4, mid_value - 1) would have never been called. This means that the case for midVal < lowVal would have never occurred.
Test case: the number being searched is 4
Now, suppose the value is 5. The same calculation is done. When compared, the algorithm will execute the call GuessNumber(mid_value + 1, 5). This should return the position of 5. Again, GuessNumber(5, mid_value - 1) is not called.
Test case: changing the range
If I try to increase the range, let's say using 4 and 7 as limits, the function would never cause midVal < lowVal if called like GuessNumber(low_value, mid_value - 1). Consider the mid-value of the range between 4 and 7, namely 5 (due truncation). If the number being searched is 5, then the position is immediately returned. However, if the number being searched is 4 and the recursive call is done as GuessNumber(low_value, mid_value - 1) (GuessNumber(4, 5 - 1)), then the new mid-value will be 4 and no midVal < lowVal would occur. The postion of 4 is returned.
Some conclusions
I think it may be a logical error. The only way this could happen is if the number being searched is outside of the range (and specifically below the lower limit), but the algorithm is not testing a case when the number being searched is outside of the range. Again, it seems to be a precondition. Nevertheless, the explanation given caught my attention. They took the time to say that the error midVal < lowVal could happen, and they gave the example of the range 4 and 5.
Other Findings
I looked up the pseudocode in a discrete math book, but they use the case of recursive_binary_search(lowVal, midVal - 1) without worrying about the issue described above. I noticed they do some checking if the value is out of range, though.
procedure binary_search(i, j, x: integer, 1 ≤ i ≤ j ≤ n)
m := ⎣(i + j)/2⎦
if x = am then
return m
else if (x < am and i < m) then
return binary_search(i, m-1, x)
else if (x > am and j > m) then
return binary_search(m + 1, j, x)
else return 0
{output is location of x in a1, a2, ..., an if it appears; otherwise it is 0}
I also saw this implementation in another data structures book. This does not make the precondition that the item being searched is inside the range, but they do check for that, and they still call the recursive function with the limits lower (first in this example) and mid - 1 (loc - 1 in this example).
void recBinarySearch(ArrayType a, int first, int last, ElementType item, bool &found, int &loc) {
/*---------------------------
Recursively search sub(list) a[first], ..., a[last] for item using binary search.
Precondition: Elements of a are in ascending order; item has the same type as the array elements.
Postcondition: found = true and loc = position of item if the search is successful; otherwise, found is false.
-----------------------------*/
if (first > last)
found = false;
else
{
loc = (first + last) / 2;
if (item < a[loc]) // First half
recBinarySearch(a, first, loc - 1, found, loc);
else if (item > a[loc]) // Second half
recBinarySearch(a, loc + 1, last, found, loc);
else
found = true;
}
}
Question
I have searched on Google and other StackOverflow questions, but I cannot find something that points me in the right direction (most of the findings explain the overflow issue in the mid-value calculation, which is not the issue here). Is the explanation given in the book about using mid-value instead of mid-value - 1 for the upper limit correct? Is there an example that could demonstrate so, or am I missing something?
Thank you in advance for you time and help!

You're right to be confused by this example. With a range of 4..5, the guess (midVal) would be 4. The only way the line of code GuessNumber(lowVal, midVal-1); would be executed is if the user answered "low" which is:
a lie, or
their number is out of range.
The example code doesn't account for search values outside the initial input range, which a binary search should do.

Efficiently randomly shuffling the bits of a sequence of words

Consider the following algorithm from the C++ standard library: std::shuffle that has the following signature:
template <class RandomIt, class URBG>
void shuffle(RandomIt first, RandomIt last, URBG&& g);
It reorders the elements in the given range [first, last) such that each possible permutation of those elements has equal probability of appearance.
I am trying to implement the same algorithms, but which works at the bit level, randomly shuffling the bits of the words of the input sequence. Considering a sequence of 64-bits words, I am trying to implement:
template <class URBG>
void bit_shuffle(std::uint64_t* first, std::uint64_t* last, URBG&& g)
Question: How to do that as efficiently as possible (using compiler intrinsics if necessary)? I am not necessarily looking for an entire implementation, but more for suggestions/directions of research, because it's really not clear to me if it's even feasible to implement that efficiently.

It's obvious that asymptotically, the speed is O(N), where N is number of bits. Our goal is to improve the constants involved in it.
Disclaimer: the description proposed algorithm is a rough sketch. There are a lot of stuffs needs to be added and, especially, a lot of details that needs to be cared of in order to make it work correctly. The approximated execution time will not be different from what is claimed here though.
Baseline Algorithm
The most obvious one is the textbook approach, which takes N operations, each of which involves calling the random_generator which takes R milliseconds, and accessing the bit's value of two different bits, and set new value to them in total of 4 * A milliseconds (A is time to read/write one bit). Suppose that the array lookup operations takes C milliseconds. So the total time of this algorithm is N * (R + 4 * A + 2 * C) milliseconds (approximately). It is also reasonable to assume that the random number generation takes more time, i.e. R >> A == C.
Proposed Algorithm
Suppose the bits are stored in a byte storage, i.e. we will work with blocks of bytes.
unsigned char bit_field[field_size = N / 8];
First, let's count the number of 1 bits in our bitset. For that, we can use a lookup-table and iterate through the bitset as byte array:
# Generate lookup-table, you may modify it with `constexpr`
# to make it run in compile time.
int bitcount_lookup[256];
for (int = 0; i < 256; ++i) {
bitcount_lookup[i] = 0;
for (int b = 0; b < 8; ++b)
bitcount_lookup[i] += (i >> b) & 1;
}
We can treat this as preprocessing overhead (as it may as well be calculated at compile-time) and say that it takes 0 milliseconds. Now, counting number of 1 bits is easy (the following will take (N / 8) * C milliseconds):
int bitcount = 0;
for (auto *it = bit_field; it != bit_field + field_size; ++it)
bitcount += bitcount_lookup[*it];
Now, we randomly generate N / 8 numbers (let's call the resulting array gencnt[N / 8]), each in the range [0..8], such that they sums up to bitcount. This is a bit tricky and kind of hard to do it uniformly (the "correct" algorithm to generate uniform distribution is quite slow comparing to the baseline algo). A quite uniform-ish but quick solution is roughly:
Fill the gencnt[N / 8] array with values v = bitcount / (N / 8).
Randomly choose N / 16 "black" cells. The rests are "white". The algorithm is similar to random permutation, but only of half of the array.
Generate N / 16 random numbers in the range [0..v]. Let's call them tmp[N / 16].
Increase "black" cells by tmp[i] values, and decrease "white" cells by tmp[i]. This will ensure that the overall sum is bitcount.
After that, we will have a uniform-ish random-ish array gencnt[N / 8], the value of which are the number of 1 bytes in a particular "cell". It was all generated in:
(N / 8) * C + (N / 16) * (4 * C) + (N / 16) * (R + 2 * C)
^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
filling step random coloring filling
milliseconds (this estimation is done with a concrete implementation in my mind). Lastly, we can have a lookup table of the bytes with specified number of bits set to 1 (can be compiled overhead, or even in compile-time as constexpr, so let's assume that this takes 0 milliseconds):
std::vector<std::vector<unsigned char>> random_lookup(8);
for (int c = 0; c < 8; c++)
random_lookup[c] = { /* numbers with `c` bits set to `1` */ };
Then, we can fill our bit_field as follows (which takes roughly (N / 8) * (R + 3 * C) milliseconds):
for (int i = 0; i < field_size; i++) {
bit_field[i] = random_lookup[gencnt[i]][rand() % gencnt[i].size()];
Summing everything up, we have the total execution time:
T = (N / 8) * C +
(N / 8) * C + (N / 16) * (4 * C) + (N / 16) * (R + 2 * C) +
(N / 8) * (R + 3 * C)
= N * (C + (3/16) * R) < N * (R + 4 * A + 2 * C)
^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^
proposed algorithm naive baseline algo
Although it's not truly uniformly random, but it does spread the bits out quite evenly and randomly, and it's quite fast and hopefully gets the job done in your use-case.

Observing that actual shuffling bits, which involves swapping via Fisher-Yates, is not required for producing the exact equivalent, a random distribution of the bits.
#include <iostream>
#include <vector>
#include <random>
// shuffle a vector of bools. This requires only counting the number of trues in the vector
// followed by clearing the vector and inserting bool trues to produce an equivalent to
// a bit shuffle. This is cache line friendly and doesn't require swapping.
std::vector<bool> DistributeBitsRandomly(std::vector<bool> bvector)
{
std::random_device rd;
static std::mt19937 gen(rd()); //mersenne_twister_engine seeded with rd()
// count the number of set bits and clear bvector
int set_bits_count = 0;
for (int i=0; i < bvector.size(); i++)
if (bvector[i])
{
set_bits_count++;
bvector[i] = 0;
}
// set a bit if a random value in range bvector.size()-bit_loc-1 is
// less than the number of bits remaining to be placed. This produces exactly the same
// distribution as a random shuffle but only does an insertion of a 1 bit rather than
// a swap. It requires counting the number of 1 bits. There are efficient ways
// of doing this. See https://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer
for (int bit_loc = 0; set_bits_count; bit_loc++)
{
std::uniform_int_distribution<int> dis(0, bvector.size()-bit_loc-1);
auto x = dis(gen);
if (x < set_bits_count)
{
bvector[bit_loc] = true;
set_bits_count--;
}
}
return bvector;
}
This performs the equivalent of shuffling the bools in a vector<bool> It is cache line friendly and involves no swapping. It's presented in executable, but simple algorithmic form as requested by the OP. Much can be done to optimize this such as improving the speed of bit counting and clearing the array.
This sets 4 bits out of 10, calls the "shuffle" routine 100,000 times, and prints the number of time a 1 bit occurs in each of the 10 locations. It should be around 40,000 in each position.
int main()
{
std::vector<bool> initial{ 1,1,1,1,0,0,0,0,0,0 };
std::vector<int> totals(initial.size());
for (int i = 0; i < 100000; i++)
{
auto a_distribution = DistributeBitsRandomly(initial);
for (int ii = 0; ii < totals.size(); ii++)
if (a_distribution[ii])
totals[ii]++;
}
for (auto cnt : totals)
std::cout << cnt << "\n";
}
Possible Output:
40116
39854
40045
39917
40105
40074
40214
39963
39946
39766

3-sum alternative approach

I tried an alternative approach to the 3sum problem: given an array find all triplets that sum up to a given number.
Basically the approach is this: Sort the array. Once a pair of elements (say A[i] and A[j]) is selected, a binary search is done for the third element [using the equal_range function]. The index one past the last of the matching elements is saved in a variable 'c'. Since A[j+1] > A[j], we to search only upto and excluding index c (since numbers at index c and beyond would definitely sum greater than the target sum). For the case j=i+1, we save the end index as 'd' instead and make c=d. For the next value of i, when j=i+1, we need to search only upto and excluding index d.
C++ implementation:
int sum3(vector<int>& A,int sum)
{
int count=0, n=A.size();
sort(A.begin(),A.end());
int c=n, d=n; //initialize c and d to array length
pair < vector<int>::iterator, vector<int>::iterator > p;
for (int i=0; i<n-2; i++)
{
for (int j=i+1; j<n-1; j++)
{
if(j == i+1)
{
p=equal_range (A.begin()+j+1, A.begin()+d, sum-A[i]-A[j]);
d = p.second - A.begin();
if(d==n+1) d--;
c=d;
}
else
{
p=equal_range (A.begin()+j+1, A.begin()+c, sum-A[i]-A[j]);
c = p.second - A.begin();
if(c==n+1) c--;
}
count += p.second-p.first;
for (auto it=p.first; it != p.second; ++it)
cout<<A[i]<<' '<<A[j]<<' '<<*it<<'\n';
}
}
return count;
}
int main() //driver function for testing
{
vector <int> A = {4,3,2,6,4,3,2,6,4,5,7,3,4,6,2,3,4,5};
int sum = 17;
cout << sum3(A,sum) << endl;
return 0;
}
I am unable to work out the upper bound time needed for this algorithm. I understand that the worst case scenario will be when the target sum is unachievably large.
My calculations yield something like:
For i=0, no. of binary searches is lg(n-2) + lg(n-3) + ... +lg(1)
For i=1, lg(n-3) + lg(n-4) + ... + lg(1)
...
...
...
For i=n-3, lg(1)
So totally, lg((n-2)!) + lg((n-3)!) + ... + lg(1!)
= lg(1^n*2^(n-1)3^(n-2)...*(n-1)^2*n^1)
But how to deduce the O(n) bound from this expression?

In addition to James' good answer I would like to point out that this can actually go upto O (n^3) in the worst case because you are running 3 nested for loops. Consider the case
{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}
and the demanded sum is 3.

When computing complexity, I'll start by referring to the Big-O Cheat sheet. I use this sheet to classify smaller sections of the code to get their runtime performance.
E.g. if I had a simple loop it would be O(n). BinSearch (according to the cheat sheet) is O(log(n)), etc..
Next, I use the Properties of Big-O notation to composite the smaller pieces together.
So for instance if I had two loops independent of each other it would be O(n) + O(n) or O(2n) => O(n). If one of my loops were inside the other, I would multiply them. So g( f(x) ) turns into O(n^2).
Now, I know you're saying: "hey, wait, I'm changing the upper and lower bounds of the inner loop" but I don't think that really matters...here's a university level example.
So my back-of-the-napkin calculation of your runtime is O(n^2) * O(Log(n)) or O(n^2 Log(n)).
But this need not be the case. I could've done something horribly wrong. So my next step would be to start graphing the runtimes of your worst possible case. Set sum to the impossibly large value and generate larger and larger arrays. You can avoid integer overflow by using lots and lots of repeated smaller numbers.
Also, compare it to the Quadratic 3Sum Solution. That's a known O(n^2) solution. Be sure to compare worst cases, or at least the same array on both. Do both timed tests at the same time so you can start getting a feel for which is faster while you are empirically testing the runtime.
Release builds, optimized for speed.

1. For your analysis, note that
log(1) + log(2) + ... + log(k) = Theta(k log(k)).
Indeed, the upper half of this sum is log(k/2) + log(k/2+1) + ... + log(k),
so it is at least log(k/2)*k/2, which is asymptotically the same as log(k)*k already.
Similarly, we can conclude that
log(n-1) + log(n-2) + log(n-3) + ... + log(1) + // Theta((n-1) log(n-1))
log(n-2) + log(n-3) + ... + log(1) + // Theta((n-2) log(n-2))
log(n-3) + ... + log(1) + // Theta((n-3) log(n-3))
... +
log(1) = Theta(n^2 log(n))
Indeed, if we consider the logarithms which are at least log(n/2), it's the half-triangle (thus ~1/2) of the upper left quadrant (thus ~n^2/4) of the above sum, so there are Theta(n^2/8) such terms.
2. As noted by satvik in another answer, your output loop can take up to Theta(n^3) steps when the number of outputs itself is Theta(n^3), which is when they are all equal.
3. There are O(n^2) solutions to the 3-sum problem, which are therefore asymptotically faster than this one.

Finding all paths down stairs?

I was given the following problem in an interview:
Given a staircase with N steps, you can go up with 1 or 2 steps each time. Output all possible way you go from bottom to top.
For example:
N = 3
Output :
1 1 1
1 2
2 1
When interviewing, I just said to use dynamic programming.
S(n) = S(n-1) +1 or S(n) = S(n-1) +2
However, during the interview, I didn't write very good code for this. How would you code up a solution to this problem?
Thanks indeed!

I won't write the code for you (since it's a great exercise), but this is a classic dynamic programming problem. You're on the right track with the recurrence; it's true that
S(0) = 1
Since if you're at the bottom of the stairs there's exactly one way to do this. We also have that
S(1) = 1
Because if you're one step high, your only option is to take a single step down, at which point you're at the bottom.
From there, the recurrence for the number of solutions is easy to find. If you think about it, any sequence of steps you take either ends with taking one small step as your last step or one large step as your last step. In the first case, each of the S(n - 1) solutions for n - 1 stairs can be extended into a solution by taking one more step, while in the second case each of the S(n - 2) solutions to the n - 2 stairs case can be extended into a solution by taking two steps. This gives the recurrence
S(n) = S(n - 2) + S(n - 1)
Notice that to evaluate S(n), you only need access to S(n - 2) and S(n - 1). This means that you could solve this with dynamic programming using the following logic:
Create an array S with n + 1 elements in it, indexed by 0, 1, 2, ..., n.
Set S[0] = S[1] = 1
For i from 2 to n, inclusive, set S[i] = S[i - 1] + S[i - 2].
Return S[n].
The runtime for this algorithm is a beautiful O(n) with O(n) memory usage.
However, it's possible to do much better than this. In particular, let's take a look at the first few terms of the sequence, which are
S(0) = 1
S(1) = 1
S(2) = 2
S(3) = 3
S(4) = 5
This looks a lot like the Fibonacci sequence, and in fact you might be able to see that
S(0) = F(1)
S(1) = F(2)
S(2) = F(3)
S(3) = F(4)
S(4) = F(5)
This suggests that, in general, S(n) = F(n + 1). We can actually prove this by induction on n as follows.
As our base cases, we have that
S(0) = 1 = F(1) = F(0 + 1)
and
S(1) = 1 = F(2) = F(1 + 1)
For the inductive step, we get that
S(n) = S(n - 2) + S(n - 1) = F(n - 1) + F(n) = F(n + 1)
And voila! We've gotten this series written in terms of Fibonacci numbers. This is great, because it's possible to compute the Fibonacci numbers in O(1) space and O(lg n) time. There are many ways to do this. One uses the fact that
F(n) = (1 / √(5)) (Φn + φn)
Here, Φ is the golden ratio, (1 + √5) / 2 (about 1.6), and φ is 1 - Φ, about -0.6. Because this second term drops to zero very quickly, you can get a the nth Fibonacci number by computing
(1 / √(5)) Φn
And rounding down. Moreover, you can compute Φn in O(lg n) time by repeated squaring. The idea is that we can use this cool recurrence:
x0 = 1
x2n = xn * xn
x2n + 1 = x * xn * xn
You can show using a quick inductive argument that this terminates in O(lg n) time, which means that you can solve this problem using O(1) space and O(lg n) time, which is substantially better than the DP solution.
Hope this helps!

You can generalize your recursive function to also take already made moves.
void steps(n, alreadyTakenSteps) {
if (n == 0) {
print already taken steps
}
if (n >= 1) {
steps(n - 1, alreadyTakenSteps.append(1));
}
if (n >= 2) {
steps(n - 2, alreadyTakenSteps.append(2));
}
}
It's not really the code, more of a pseudocode, but it should give you an idea.

Your solution sounds right.
S(n):
If n = 1 return {1}
If n = 2 return {2, (1,1)}
Return S(n-1)x{1} U S(n-2)x{2}
(U is Union, x is Cartesian Product)
Memoizing this is trivial, and would make it O(Fib(n)).

Great answer by #templatetypedef - I did this problem as an exercise and arrived at the Fibonacci numbers on a different route:
The problem can basically be reduced to an application of Binomial coefficients which are handy for Combination problems: The number of combinations of n things taken k at a time (called n choose k) can be found by the equation
Given that and the problem at hand you can calculate a solution brute force (just doing the combination count). The number of "take 2 steps" must be zero at least and may be 50 at most, so the number of combinations is the sum of C(n,k) for 0 <= k <= 50 ( n= number of decisions to be made, k = number of 2's taken out of those n)
BigInteger combinationCount = 0;
for (int k = 0; k <= 50; k++)
{
int n = 100 - k;
BigInteger result = Fact(n) / (Fact(k) * Fact(n - k));
combinationCount += result;
}
The sum of these binomial coefficients just happens to also have a different formula:

Actually, you can prove that the number of ways to climb is just the fibonacci sequence. Good explanation here: http://theory.cs.uvic.ca/amof/e_fiboI.htm

Solving the problem, and solving it using a dynamic programming solution are potentially two different things.
http://en.wikipedia.org/wiki/Dynamic_programming
In general, to solve a given problem, we need to solve different parts of the problem (subproblems), then combine the solutions of the subproblems to reach an overall solution. Often, many of these subproblems are really the same. The dynamic programming approach seeks to solve each subproblem only once, thus reducing the number of computations
This leads me to believe you want to look for a solution that is both Recursive, and uses the Memo Design Pattern. Recursion solves a problem by breaking it into sub-problems, and the Memo design pattern allows you to cache answers, thus avoiding re-calculation. (Note that there are probably cache implementations that aren't the Memo design pattern, and you could use one of those as well).
Solving:
The first step I would take would be to solve some set of problems by hand, with varying or increasing sizes of N. This will give you a pattern to help you figure out a solution. Start with N = 1, through N = 5. (as others have stated, it may be a form of the fibbonacci sequence, but I would determine this for myself before calling the problem solved and understood).
From there, I would try to make a generalized solution that used recursion. Recursion solves a problem by breaking it into sub-problems.
From there, I would try to make a cache of previous problem inputs to the corresponding output, hence memoizing it, and making a solution that involved "Dynamic Programming".
I.e., maybe the inputs to one of your functions are 2, 5, and the correct result was 7. Make some function that looks this up from an existing list or dictionary (based on the input). It will look for a call that was made with the inputs 2, 5. If it doesn't find it, call the function to calculate it, then store it and return the answer (7). If it does find it, don't bother calculating it, and return the previously calculated answer.

Here is a simple solution to this question in very simple CSharp (I believe you can port this with almost no change to Java/C++).
I have added a little bit more of complexity to it (adding the possibility that you can also walk 3 steps). You can even generalize this code to "from 1 to k-steps" if desired with a while loop in the addition of steps (last if statement).
I have used a combination of both dynamic programming and recursion. The use of dynamic programming avoid the recalculation of each previous step; reducing the space and time complexity related to the call stack. It however adds some space complexity (O(maxSteps)) which I think is negligible compare to the gain.
/// <summary>
/// Given a staircase with N steps, you can go up with 1 or 2 or 3 steps each time.
/// Output all possible way you go from bottom to top
/// </summary>
public class NStepsHop
{
const int maxSteps = 500; // this is arbitrary
static long[] HistorySumSteps = new long[maxSteps];
public static long CountWays(int n)
{
if (n >= 0 && HistorySumSteps[n] != 0)
{
return HistorySumSteps[n];
}
long currentSteps = 0;
if (n < 0)
{
return 0;
}
else if (n == 0)
{
currentSteps = 1;
}
else
{
currentSteps = CountWays(n - 1) +
CountWays(n - 2) +
CountWays(n - 3);
}
HistorySumSteps[n] = currentSteps;
return currentSteps;
}
}
You can call it in the following manner
long result;
result = NStepsHop.CountWays(0); // result = 1
result = NStepsHop.CountWays(1); // result = 1
result = NStepsHop.CountWays(5); // result = 13
result = NStepsHop.CountWays(10); // result = 274
result = NStepsHop.CountWays(25); // result = 2555757
You can argue that the initial case when n = 0, it could 0, instead of 1. I decided to go for 1, however modifying this assumption is trivial.

the problem can be solved quite nicely using recursion:
void printSteps(int n)
{
char* output = new char[n+1];
generatePath(n, output, 0);
printf("\n");
}
void generatePath(int n, char* out, int recLvl)
{
if (n==0)
{
out[recLvl] = '\0';
printf("%s\n",out);
}
if(n>=1)
{
out[recLvl] = '1';
generatePath(n-1,out,recLvl+1);
}
if(n>=2)
{
out[recLvl] = '2';
generatePath(n-2,out,recLvl+1);
}
}
and in main:
void main()
{
printSteps(0);
printSteps(3);
printSteps(4);
return 0;
}

It's a weighted graph problem.
From 0 you can get to 1 only 1 way (0-1).
You can get to 2 two ways, from 0 and from 1 (0-2, 1-1).
You can get to 3 three ways, from 1 and from 2 (2 has two ways).
You can get to 4 five ways, from 2 and from 3 (2 has two ways and 3 has three ways).
You can get to 5 eight ways, ...
A recursive function should be able to handle this, working backwards from N.

Complete C-Sharp code for this
void PrintAllWays(int n, string str)
{
string str1 = str;
StringBuilder sb = new StringBuilder(str1);
if (n == 0)
{
Console.WriteLine(str1);
return;
}
if (n >= 1)
{
sb = new StringBuilder(str1);
PrintAllWays(n - 1, sb.Append("1").ToString());
}
if (n >= 2)
{
sb = new StringBuilder(str1);
PrintAllWays(n - 2, sb.Append("2").ToString());
}
}

Late C-based answer
#include <stdio.h>
#include <stdlib.h>
#define steps 60
static long long unsigned int MAP[steps + 1] = {1 , 1 , 2 , 0,};
static long long unsigned int countPossibilities(unsigned int n) {
if (!MAP[n]) {
MAP[n] = countPossibilities(n-1) + countPossibilities(n-2);
}
return MAP[n];
}
int main() {
printf("%llu",countPossibilities(steps));
}

Here is a C++ solution. This prints all possible paths for a given number of stairs.
// Utility function to print a Vector of Vectors
void printVecOfVec(vector< vector<unsigned int> > vecOfVec)
{
for (unsigned int i = 0; i < vecOfVec.size(); i++)
{
for (unsigned int j = 0; j < vecOfVec[i].size(); j++)
{
cout << vecOfVec[i][j] << " ";
}
cout << endl;
}
cout << endl;
}
// Given a source vector and a number, it appends the number to each source vectors
// and puts the final values in the destination vector
void appendElementToVector(vector< vector <unsigned int> > src,
unsigned int num,
vector< vector <unsigned int> > &dest)
{
for (int i = 0; i < src.size(); i++)
{
src[i].push_back(num);
dest.push_back(src[i]);
}
}
// Ladder Problem
void ladderDynamic(int number)
{
vector< vector<unsigned int> > vecNminusTwo = {{}};
vector< vector<unsigned int> > vecNminusOne = {{1}};
vector< vector<unsigned int> > vecResult;
for (int i = 2; i <= number; i++)
{
// Empty the result vector to hold fresh set
vecResult.clear();
// Append '2' to all N-2 ladder positions
appendElementToVector(vecNminusTwo, 2, vecResult);
// Append '1' to all N-1 ladder positions
appendElementToVector(vecNminusOne, 1, vecResult);
vecNminusTwo = vecNminusOne;
vecNminusOne = vecResult;
}
printVecOfVec(vecResult);
}
int main()
{
ladderDynamic(6);
return 0;
}

may be I am wrong.. but it should be :
S(1) =0
S(2) =1
Here We are considering permutations so in that way
S(3) =3
S(4) =7

Calculating Binomial Coefficient (nCk) for large n & k

I just saw this question and have no idea how to solve it. can you please provide me with algorithms , C++ codes or ideas?
This is a very simple problem. Given the value of N and K, you need to tell us the value of the binomial coefficient C(N,K). You may rest assured that K <= N and the maximum value of N is 1,000,000,000,000,000. Since the value may be very large, you need to compute the result modulo 1009.
Input
The first line of the input contains the number of test cases T, at most 1000. Each of the next T lines consists of two space separated integers N and K, where 0 <= K <= N and 1 <= N <= 1,000,000,000,000,000.
Output
For each test case, print on a new line, the value of the binomial coefficient C(N,K) modulo 1009.
Example
Input:
3
3 1
5 2
10 3
Output:
3
10
120

Notice that 1009 is a prime.
Now you can use Lucas' Theorem.
Which states:
Let p be a prime.
If n = a1a2...ar when written in base p and
if k = b1b2...br when written in base p
(pad with zeroes if required)
Then
(n choose k) modulo p = (a1 choose b1) * (a2 choose b2) * ... * (ar choose br) modulo p.
i.e. remainder of n choose k when divided by p is same as the remainder of
the product (a1 choose b1) * .... * (ar choose br) when divided by p.
Note: if bi > ai then ai choose bi is 0.
Thus your problem is reduced to finding the product modulo 1009 of at most log N/log 1009 numbers (number of digits of N in base 1009) of the form a choose b where a <= 1009 and b <= 1009.
This should make it easier even when N is close to 10^15.
Note:
For N=10^15, N choose N/2 is more than
2^(100000000000000) which is way
beyond an unsigned long long.
Also, the algorithm suggested by
Lucas' theorem is O(log N) which is
exponentially faster than trying to
compute the binomial coefficient
directly (even if you did a mod 1009
to take care of the overflow issue).
Here is some code for Binomial I had written long back, all you need to do is to modify it to do the operations modulo 1009 (there might be bugs and not necessarily recommended coding style):
class Binomial
{
public:
Binomial(int Max)
{
max = Max+1;
table = new unsigned int * [max]();
for (int i=0; i < max; i++)
{
table[i] = new unsigned int[max]();
for (int j = 0; j < max; j++)
{
table[i][j] = 0;
}
}
}
~Binomial()
{
for (int i =0; i < max; i++)
{
delete table[i];
}
delete table;
}
unsigned int Choose(unsigned int n, unsigned int k);
private:
bool Contains(unsigned int n, unsigned int k);
int max;
unsigned int **table;
};
unsigned int Binomial::Choose(unsigned int n, unsigned int k)
{
if (n < k) return 0;
if (k == 0 || n==1 ) return 1;
if (n==2 && k==1) return 2;
if (n==2 && k==2) return 1;
if (n==k) return 1;
if (Contains(n,k))
{
return table[n][k];
}
table[n][k] = Choose(n-1,k) + Choose(n-1,k-1);
return table[n][k];
}
bool Binomial::Contains(unsigned int n, unsigned int k)
{
if (table[n][k] == 0)
{
return false;
}
return true;
}

Binomial coefficient is one factorial divided by two others, although the k! term on the bottom cancels in an obvious way.
Observe that if 1009, (including multiples of it), appears more times in the numerator than the denominator, then the answer mod 1009 is 0. It can't appear more times in the denominator than the numerator (since binomial coefficients are integers), hence the only cases where you have to do anything are when it appears the same number of times in both. Don't forget to count multiples of (1009)^2 as two, and so on.
After that, I think you're just mopping up small cases (meaning small numbers of values to multiply/divide), although I'm not sure without a few tests. On the plus side 1009 is prime, so arithmetic modulo 1009 takes place in a field, which means that after casting out multiples of 1009 from both top and bottom, you can do the rest of the multiplication and division mod 1009 in any order.
Where there are non-small cases left, they will still involve multiplying together long runs of consecutive integers. This can be simplified by knowing 1008! (mod 1009). It's -1 (1008 if you prefer), since 1 ... 1008 are the p-1 non-zero elements of the prime field over p. Therefore they consist of 1, -1, and then (p-3)/2 pairs of multiplicative inverses.
So for example consider the case of C((1009^3), 200).
Imagine that the number of 1009s are equal (don't know if they are, because I haven't coded a formula to find out), so that this is a case requiring work.
On the top we have 201 ... 1008, which we'll have to calculate or look up in a precomputed table, then 1009, then 1010 ... 2017, 2018, 2019 ... 3026, 3027, etc. The ... ranges are all -1, so we just need to know how many such ranges there are.
That leaves 1009, 2018, 3027, which once we've cancelled them with 1009's from the bottom will just be 1, 2, 3, ... 1008, 1010, ..., plus some multiples of 1009^2, which again we'll cancel and leave ourselves with consecutive integers to multiply.
We can do something very similar with the bottom to compute the product mod 1009 of "1 ... 1009^3 - 200 with all the powers of 1009 divided out". That leaves us with a division in a prime field. IIRC that's tricky in principle, but 1009 is a small enough number that we can manage 1000 of them (the upper limit on the number of test cases).
Of course with k=200, there's an enormous overlap which could be cancelled more directly. That's what I meant by small cases and non-small cases: I've treated it like a non-small case, when in fact we could get away with just "brute-forcing" this one, by calculating ((1009^3-199) * ... * 1009^3) / 200!

I don't think you want to calculate C(n,k) and then reduce mod 1009. The biggest one, C(1e15,5e14) will require something like 1e16 bits ~ 1000 terabytes
Moreover executing the loop in snakiles answer 1e15 times seems like it might take a while.
What you might use is, if
n = n0 + n1*p + n2*p^2 ... + nd*p^d
m = m0 + m1*p + m2*p^2 ... + md*p^d
(where 0<=mi,ni < p)
then
C(n,m) = C(n0,m0) * C(n1,m1) *... * C(nd, nd) mod p
see, eg http://www.cecm.sfu.ca/organics/papers/granville/paper/binomial/html/binomial.html
One way would be to use pascal's triangle to build a table of all C(m,n) for 0<=m<=n<=1009.

psudo code for calculating nCk:
result = 1
for i=1 to min{K,N-K}:
result *= N-i+1
result /= i
return result
Time Complexity: O(min{K,N-K})
The loop goes from i=1 to min{K,N-K} instead of from i=1 to K, and that's ok because
C(k,n) = C(k, n-k)
And you can calculate the thing even more efficiently if you use the GammaLn function.
nCk = exp(GammaLn(n+1)-GammaLn(k+1)-GammaLn(n-k+1))
The GammaLn function is the natural logarithm of the Gamma function. I know there's an efficient algorithm to calculate the GammaLn function but that algorithm isn't trivial at all.

The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.
binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5
do while col < n
binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
output(binomial_coefficient)
col = col + 1
loop
The output of binomial coefficients is therefore:
1
1 * (5 + 1 - (0 + 1)) / (0 + 1) = 5
5 * (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5
5 * (5 + 1 - (4 + 1)) / (4 + 1) = 1
I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js