Find rank of a number on basis of number of 1's - c++

Let f(k) = y where k is the y-th number in the increasing sequence of non-negative integers with
the same number of ones in its binary representation as k, e.g. f(0) = 1, f(1) = 1, f(2) = 2, f(3) = 1, f(4)
= 3, f(5) = 2, f(6) = 3 and so on. Given k >= 0, compute f(k)
many of us have seen this question
1 solution to this problem to categorise numbers on basis of number of 1's and then find the rank.i did find some patterns going by this way but it would be a lengthy process. can anyone suggest me a better solution?

This is a counting problem. I think that if you approach it with this in mind, you can do much better than literally enumerating values and checking how many bits they have.
Consider the number 17. The binary representation is 10001. The number of 1s is 2. We can get smaller numbers with two 1s by (in this case) re-distributing the 1s to any of the four low-order bits. 4 choose 2 is 6, so 17 should be the 7th number with 2 ones in the binary representation. We can check this...
0 00000 -
1 00001 -
2 00010 -
3 00011 1
4 00100 -
5 00101 2
6 00110 3
7 00111 -
8 01000 -
9 01001 4
10 01010 5
11 01011 -
12 01100 6
13 01101 -
14 01110 -
15 01111 -
16 10000 -
17 10001 7
And we were right. Generalize that idea and you should get an efficient function for which you simply compute the rank of k.
EDIT: Hint for generalization
17 is special in that if you don't consider the high-order bit, the number has rank 1; that is, f(z) = 1 where z is everything except the higher order bit. For numbers where this is not the case, how can you account for the fact that you can get smaller numbers without moving the high-order bit?

f(k) are integers less than or equal to k that have the same number of ones in their binary representation as k.
For example, k needs m bits, that is k = 2^(m-1) + a, where a < 2^(m-1). The number of integers less than 2^(m-1) that have the same number of bits as k is choose(m-1, bitcount(k)), since you can freely redistribute the ones among the m-1 least significant bits.
Integers that are greater than or equal to 2^(m-1) have the same most significant bit as k (which is 1), so there are f(k - 2^(m-1)) of them. This implies f(k) = choose(m-1, bitcount(k)) + f(k-2^(m-1)).

See "Efficiently Enumerating the Subsets of a Set". Look at Table 3, the "Bankers sequence". This is a method to generate exactly the sequence you need (if you reverse the bit order). Just run K iterations for the word with K bits. There is code to generate it included in the paper.

Related

Optimal way to compress 60 bit string

Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals).
What is the optimal way to compress the bytes?
Here are some examples:
01230 45647 789AA
D8D9F 8AAAF 21052
20D22 8CC56 AA53A
AECAB 3BB95 E1E6D
9993F C9F29 B3130
Initially I've been trying to use Huffman encoding on just 20 bits because huffman coding can go from 20 bits down to ~10 bits but storing the table takes more than 9 bits.
Here is the breakdown showing 20 bits -> 10 bits for 01230
Character Frequency Assignment Space Savings
0 2 0 2×4 - 2×1 = 6 bits
2 1 10 1×4 - 1×2 = 2 bits
1 1 110 1×4 - 1×3 = 1 bits
3 1 111 1×4 - 1×3 = 1 bits
I then tried to do huffman encoding on all 300 bits (five 60bit runs) and here is the mapping given the above example:
Character Frequency Assignment Space Savings
---------------------------------------------------------
a 10 101 10×4 - 10×3 = 10 bits
9 8 000 8×4 - 8×3 = 8 bits
2 7 1111 7×4 - 7×4 = 0 bits
3 6 1101 6×4 - 6×4 = 0 bits
0 5 1100 5×4 - 5×4 = 0 bits
5 5 1001 5×4 - 5×4 = 0 bits
1 4 0010 4×4 - 4×4 = 0 bits
8 4 0111 4×4 - 4×4 = 0 bits
d 4 0101 4×4 - 4×4 = 0 bits
f 4 0110 4×4 - 4×4 = 0 bits
c 4 1000 4×4 - 4×4 = 0 bits
b 4 0011 4×4 - 4×4 = 0 bits
6 3 11100 3×4 - 3×5 = -3 bits
e 3 11101 3×4 - 3×5 = -3 bits
4 2 01000 2×4 - 2×5 = -2 bits
7 2 01001 2×4 - 2×5 = -2 bits
This yields a savings of 8 bits overall, but 8 bits isn't enough to store the huffman table. It seems because of the randomness of the data that the more bits you try to encode with huffman the less effective it works. Huffman encoding seemed to work best with 20 bits (50% reduction) but storing the table in 9 or less bits isnt possible AFAIK.
In the worst-case for a 60 bit string there are still at least 3 duplicates, the average case there are more than 3 duplicates (my assumption). As a result of at least 3 duplicates the most symbols you can have in a run of 60 bits is just 12.
Because of the duplicates plus the less than 16 symbols, I can't help but feel like there is some type of compression that can be used
If I simply count the number of 20-bit values with at least two hexadecimal digits equal, there are 524,416 of them. A smidge more than 219. So the most you could possibly save is a little less than one bit out of the 20.
Hardly seems worth it.
If I split your question in two parts:
How do I compress (perfect) random data: You can't. Every bit is some new entropy which can't be "guessed" by a compression algorithm.
How to compress "one duplicate in five characters": There are exactly 10 options where the duplicate can be (see table below). This is basically the entropy. Just store which option it is (maybe grouped for the whole line).
These are the options:
AAbcd = 1 AbAcd = 2 AbcAd = 3 AbcdA = 4 (<-- cases where first character is duplicated somewhere)
aBBcd = 5 aBcBd = 6 aBcdB = 7 (<-- cases where second character is duplicated somewhere)
abCCd = 8 abCdC = 9 (<-- cases where third character is duplicated somewhere)
abcDD = 0 (<-- cases where last characters are duplicated)
So for your first example:
01230 45647 789AA
The first one (01230) is option 4, the second 3 and the third option 0.
You can compress this by multiplying each consecutive by 10: (4*10 + 3)*10 + 0 = 430
And uncompress it by using divide and modulo: 430%10=0, (430/10)%10=3, (430/10/10)%10=4. So you could store your number like that:
1AE 0123 4567 789A
^^^ this is 430 in hex and requires only 10 bit
The maximum number for the three options combined is 1000, so 10 bit are enough.
Compared to storing these 3 characters normally you save 2 bit. As someone else already commented - this is probably not worth it. For the whole line it's even less: 2 bit / 60 bit = 3.3% saved.
If you want to get rid of the duplicates first, do this, then look at the links at the bottom of the page. If you don't want to get rid of the duplicates, then still look at the links at the bottom of the page:
Array.prototype.contains = function(v) {
for (var i = 0; i < this.length; i++) {
if (this[i] === v) return true;
}
return false;
};
Array.prototype.unique = function() {
var arr = [];
for (var i = 0; i < this.length; i++) {
if (!arr.contains(this[i])) {
arr.push(this[i]);
}
}
return arr;
}
var duplicates = [1, 3, 4, 2, 1, 2, 3, 8];
var uniques = duplicates.unique(); // result = [1,3,4,2,8]
console.log(uniques);
Then you would have shortened your code that you have to deal with. Then you might want to check out Smaz
Smaz is a simple compression library suitable for compressing strings.
If that doesn't work, then you could take a look at this:
http://ed-von-schleck.github.io/shoco/
Shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.
Let me know if it works!

Generating variations in C++ - all r-digit numbers among n given digits?

I have a program in which I have to generate all r-digit numbers (if r is 2 - all 2-digit numbers) among n digits (these ought to be all numbers from 1 to n inclusive). My question is how can I do this recursively or iteratively, for example if n = 3 and r = 2, the result should be 12 13 21 23 31 32.

Counting ways of breaking up a string of digits into numbers under 26

Given a string of digits, I wish to find the number of ways of breaking up the string into individual numbers so that each number is under 26.
For example, "8888888" can only be broken up as "8 8 8 8 8 8 8". Whereas "1234567" can be broken up as "1 2 3 4 5 6 7", "12 3 4 5 6 7" and "1 23 4 5 6 7".
I'd like both a recurrence relation for the solution, and some code that uses dynamic programming.
This is what I've got so far. It only covers the base cases which are a empty string should return 1 a string of one digit should return 1 and a string of all numbers larger than 2 should return 1.
int countPerms(vector<int> number, int currentPermCount)
{
vector< vector<int> > permsOfNumber;
vector<int> working;
int totalPerms=0, size=number.size();
bool areAllOverTwo=true, forLoop = true;
if (number.size() <=1)
{
//TODO: print out permetations
return 1;
}
for (int i = 0; i < number.size()-1; i++) //minus one here because we dont care what the last digit is if all of them before it are over 2 then there is only one way to decode them
{
if (number.at(i) <= 2)
{
areAllOverTwo = false;
}
}
if (areAllOverTwo) //if all the nubmers are over 2 then there is only one possable combination 3456676546 has only one combination.
{
permsOfNumber.push_back(number);
//TODO: write function to print out the permetions
return 1;
}
do
{
//TODO find all the peremtions here
} while (forLoop);
return totalPerms;
}
Assuming you either don't have zeros, or you disallow numbers with leading zeros), the recurrence relations are:
N(1aS) = N(S) + N(aS)
N(2aS) = N(S) + N(aS) if a < 6.
N(a) = 1
N(aS) = N(S) otherwise
Here, a refers to a single digit, and S to a number. The first line of the recurrence relation says that if your string starts with a 1, then you can either have it on its own, or join it with the next digit. The second line says that if you start with a 2 you can either have it on its own, or join it with the next digit assuming that gives a number less than 26. The third line is the termination condition: when you're down to 1 digit, the result is 1. The final line says if you haven't been able to match one of the previous rules, then the first digit can't be joined to the second, so it must stand on its own.
The recurrence relations can be implemented fairly directly as an iterative dynamic programming solution. Here's code in Python, but it's easy to translate into other languages.
def N(S):
a1, a2 = 1, 1
for i in xrange(len(S) - 2, -1, -1):
if S[i] == '1' or S[i] == '2' and S[i+1] < '6':
a1, a2 = a1 + a2, a1
else:
a1, a2 = a1, a1
return a1
print N('88888888')
print N('12345678')
Output:
1
3
An interesting observation is that N('1' * n) is the n+1'st fibonacci number:
for i in xrange(1, 20):
print i, N('1' * i)
Output:
1 1
2 2
3 3
4 5
5 8
6 13
7 21
8 34
9 55
If I understand correctly, there are only 25 possibilities. My first crack at this would be to initialize an array of 25 ints all to zero and when I find a number less than 25, set that index to 1. Then I would count up all the 1's in the array when I was finished looking at the string.
What do you mean by recurrence? If you're looking for a recursive function, you would need to find a good way to break the string of numbers down recursively. I'm not sure that's the best approach here. I would just go through digit by digit and as you said if the digit is 2 or less, then store it and test appending the next digit... i.e. 10*digit + next. I hope that helped! Good luck.
Another way to think about it is that, after the initial single digit possibility, for every sequence of contiguous possible pairs of digits (e.g., 111 or 12223) of length n we multiply the result by:
1 + sum, i=1 to floor (n/2), of (n-i) choose i
For example, with a sequence of 11111, we can have
i=1, 1 1 1 11 => 5 - 1 = 4 choose 1 (possibilities with one pair)
i=2, 1 11 11 => 5 - 2 = 3 choose 2 (possibilities with two pairs)
This seems directly related to Wikipedia's description of Fibonacci numbers' "Use in Mathematics," for example, in counting "the number of compositions of 1s and 2s that sum to a given total n" (http://en.wikipedia.org/wiki/Fibonacci_number).
Using the combinatorial method (or other fast Fibonacci's) could be suitable for strings with very long sequences.

Max number ways to jump to the last element

I had a question from a contest and would like to know the solution.
Question is about finding max number of unique ways to jump to last element. I am thinking about a solution with dynamic programming but couldnt figure it out.
You can jump max 3 steps in any position. Number of steps will be given as n, and our program should calculate Max number of jumps to reach n+1 position.
For example:
n=4, max number of jumps to n+1 position should be 7
Jump1: 1 2 1
Jump2: 1 1 2
Jump3: 2 1 1
Jump4: 1 3
Jump5: 3 1
Jump6: 2 2
Jump7: 1 1 1 1
Thank you
The longest journey, says the proverb, starts with a single step.
In this case, there are three possible first steps in the journey to the end: a hop of 1, 2 or 3 spots. In each case, the journey will continue from a closer point, either 1, 2 or 3 steps closer to the end. So if we know the number of possible paths from the closer points, we can simply add them up:
paths(n) = paths(n-1) // First hop was one, n-1 elements left
+ paths(n-2) // First hop was two, n-2 elements left
+ paths(n-3) // First hop was three, n-3 elements left.
The similarity to the Fibonacci recursion is not coincidental. This sequence is often called the "Tribonacci sequence", and you can easily look that up in the usual places (mathworld, wikipedia, oeis, etc.) to find a variety of computation techniques, including the one below.
Clearly, you can compute the Tribonacci function in O(n) by starting at the end and working backwards (defining f(0) = 1, f(-1) = 0, f(-2) = 0 to provide a starting position.) But it's easy to do better than that, using the same technique that can be used to compute Fibonacci numbers in O(log n) operations.
Here's the Fibonacci algorithm. We start with the observation that the matrix product:
| 1 1 |
[ a b ] x | | = [ a+b a ]
| 1 0 |
Let's use F(n) for the nth Fibonacci number, and call matrix of 1s and 0s above MF. We can see that
[ F(n) F(n-1) ] = [ 1 0 ] × MF × MF × … × MF
n products
But since matrix multiplication is associative, we can rewrite that as:
[ F(n) F(n-1) ] = [ 1 0 ] × MFn
Again, since matrix multiplication is associative, we can compute MFn in O(log N) steps. For example, we could use the recursion:
= Mn/2 × Mn/2 if n is even
Mn
= M × M(n-1)/2 × M(n-1)/2 if n is odd
Similarly, for the Tribonacci numbers T(n), we can define the matrix MT:
| 1 1 0 |
MT = | 1 0 1 |
| 1 0 0 |
and by the same logic as above:
[ T(n) T(n-1) T(n-2) ] = [ 1 0 0 ] × MTn
Do you know number of ways for n = 0, n = 1 and n = 2?
For any larger value N, number of ways = number of ways for N - 1 + number of ways for N - 2 + number of ways for N - 3
You should not calculate the number of ways for given n more than 1 time. (Remember it in a dp array)
The important function is going to be (number_of_elements)!/product((number_repeated_characters)!)
For instance, if you know 2211 is one of your paths, then 4!/2!*2! = 6 so there are 6 path combinations for 2 "2"s and 2 "1"s.
Since you're only going up to a maximum of 3 steps, it's really not too bad once you know that formula. Really you're just looking for the combinations of 2s and 3s that can replace the 1s in your input. I suggest starting with 1 3 and then going through each 2 that fills in the remainder. Then repeat for 2 3s and so on. If you precompute and save all the factorials, it should run very fast, although I'm sure there are additional optimizations.

0 or 1 combinations such that we do not have two 1's immediately in sequence

My requirement is for a code to find the number of combinations of two digits only 0 and 1 for X digit size which may vary from 1 .. 1000 such that no time two 1 can be immediately in sequence but 0's are possible
Say for input of 4 digit we have
1010 1000 0000 0101 0001 0010 0100 1001
I am not sure which of algos to generate such a combinations of 0's and 1's?
The answer is given by the Fibonacci sequence.
f(n) = f(n-1) + f(n-2)
Here are the first few results:
length number of combinations
1 2 (0, 1)
2 3 (00, 01, 10)
3 5 (000, 001, 010, 100, 101)
4 8 (0000, 0001, 0010, 0100, 0101, 1000, 1001, 1010)
You can see the why there is a relationship to the Fibonacci sequence if you consider strings starting with "0" or "10" separately:
number of sequences of n digits
= number of sequences starting with 0, followed by n-1 more digits
+ number of sequences starting with 10, followed by n-2 more digits
Sequences starting with "11" are disallowed.
The Fibonacci numbers can be calculated very quickly if an appropriate technique is used, but you should be aware that the answer will grow very quickly as maxlen increases. If you want to have an exact answer you will need to use a library that can work with arbitrary large integers.
One idea is to build the complete string by using the words 10 and 0 (and 1, but only at the very end).
build(sofar, maxlen):
if len(sofar) > maxlen: return
if len(sofar) == maxlen: found(sofar); return
if len(sofar) == maxlen - 1: build(sofar + "1", maxlen)
build(sofar + "10", maxlen)
build(sofar + "0", maxlen)
The proof that this algorithm only generates valid sequences is left to you. Same with the proof that this algorithm generates all valid sequences.
How about having a function that generates these values into arrays, and another function that just checks if the current index to a value in the array is a '1' and checks if the next value is a '1' or not? If true, then discard; else, valid.