Bit count in the following case - c++

I got the following questions in one of the interviews plz help me some ideas to solve it as am completely unaware how to proceed
A non-empty array A of N elements contains octal representation of a non-negative integer K, i.e. each element of A belongs to the interval [0; 7]
Write a function:
int bitcount_in_big_octal(const vector<int> &A);
that returns the number of bits set to 1 in the binary representation of K. The function should return -1 if the number of bits set to 1 exceeds 10,000,000.
Assume that the array can be very large.
Assume that N is an integer within the range [1..100,000].

is there any time restriction?
I have one idea: at first, make the following dictionary, {0->0, 1->1, 2->1, 3-> 2, 4->1, 5->1, 6->2, 7->3}. then, loop the array A to sum the 1s in every elements using the dictionary.

Iterate over your representation
for-each element in that iterate, convert the representation to its number of bits. #meteorgan's answer is a great way to do just that. If you need the representation for something other than bit counts, you'll probably want to convert it to some intermediate form useful for whatever else you'll be using - e.g. to byte[]: each octet in the representation should correspond to a single byte and since all you're doing is counting bits it doesn't matter that Java's byte is signed. then for-each byte in the array, use an existing bit counting lib, cast the byte to an int and use Integer.bitCount(...), or roll your own, etc - to count the bits
add the result to a running total, escape the iteration if you hit your threshold.
That's a Java answer in the details (like the lib I linked), but the algorithm steps are fine for C++, find a replacement library (or use the dictionary answer).

Here's the solution using the indexed (dictionary) based approach.
INDEX = [0, 1, 1, 2, 1, 2, 2, 3]
def bitcount_in_big_octal(A):
counter = 0
for octal in A: counter += INDEX[octal]
return counter

Related

Minimum storage space for an ordered list

I am seeking advice on achieving the most efficient storage of an ordered list, that is the minimum storage for a list.
An ordered list of 256 unique items where each item is a unique number from 0 to 255 will standardly require 2^16 bits of data for storage, 2^8 places, each place holding a 2^8 value.
However this information ought to be storable in near 2^15 bits.
The second item, rather than being in the 2nd place of 256, can be viewed as being the next of the remaining 255, the next item the next of the remaining 254 etc.
This is a continuation of not needing to store the detail of the last item in a sorted list because that item must be in the last place by default.
In this case you can simply see you can have 2^8-1 places each place holding a 2^8 value, which is less than 2^16.
So how does this get down to 2^15+1 bits of storage. Or is there a proof that says otherwise? If there is a proof I would hope it doesn't say 2^16 bits of storage are needed as I have just shown that that is wrong!
I am hopefully just unaware of the terminology to identify work on this subject.
Can anyone advise of work on the matter?
Thank you for your time.
Glenn
Upon clarification of the question as storage of some particular permutation of 256 items (being the 8-bit numbers from 0 to 255 in particular), I have updated my answer. The prior discussion is below for posterity.
Answer
1684 bits.
Explanation
In this case, the clearest analysis comes through encoding and informational entropy. Again, we use the pigeonhole principle: in order to uniquely determine a particular permutation, we must have at least as many encodings as we have possible messages to be encoded.
An example may be helpful: consider a list of 256 numbers, each of which is an 8-bit number. The first item has 256 possible values, as does the second, as does the third, and so on. Overall, we have 256^256 possible messages, so we need at least 256^256 possible encodings. To determine the number of bits needed, we can simply take the base 2 logarithm of this, and log2(256^256) = 256*log2(256) = 256 * log2(2^8) = 256 * 8 = 2^11, so we can see that to encode this list, we only need 2^11, or 2048 bits. You may note this is the same as taking 8 bits per item, and multiplying it by the number of items. Your original question was incorrect on the storage needed, as you supposed that it requires 2^8 bits, so a 256 bit integer, which could store values from 0 to ~10^77.
With this understanding, we turn our attention to the problem at hand. There are 256 possibilities for the first item, then 255 possibilities for the second item, 254 possibilities for the third item, etc, until there is only 1 possibility for the last item. Overall, we have 256! possibilities, so we need at least 256! encodings. Again, we use the base 2 logarithm to determine how many bits we need, so we need log2(256!) bits. A nice property of logarithms is that they turn products into sums, so log2(256!) = log2(256) + log2(255) + log2(254) + ... + log2(2) + log2(1). This is analogous to using 8 bits for each of the 256 items, but here as each item has progressively less information, it requires fewer bits. Also note that log2(1) is 0, which corresponds to your observation that you don't need any information to encode the last item. Overall, when we perform this sum, we end up with 1683.996..., so we need 1684 bits to encode these ordered lists. Some variable-length encodings could get lower, but they can never go lower than log2(256!) bits on average.
Coming up with an encoding that uses 1684 bits is not simple, but here's one method that's more efficient than the original full storage. We can note that the first 128 items each have between 129 and 256 possibilities, and encode each of these items with 8 bits. The next 64 items each have between 65 and 128 possibilities, so we can encode each of these items with 7 bits. Continuing on, we end up using
(128 * 8) + (64 * 7) + (32 * 6) + (16 * 5) + (8 * 4) + (4 * 3) + (2 * 2) + (1 * 1) + (1 * 0) = 1793 bits to store the list.
Pre-clarification discussion
If all you're ever interested in encoding is an ordered list of 256 unique items, where each item is an 8-bit integer, then you can do it in 1 bit: 0 means you have that list, 1 means you don't, because there's only one possible list satisfying those criteria.
If you're trying to store anything in memory, you need at least as many configurations of that memory as there are different options (otherwise by the pigeonhole principle there would be at least two options you couldn't differentiate). Assuming by "ordered" you mean that they are strictly increasing or decreasing, an n-element ordered list of 8-bit integers, without repetition, has 256 choose n possible options (as there's only one possible configuration, the ordered one). Summing 256 choose n over all possible values of n (i.e. 0 to 256), gives 2^256, or 2^(2^8). Therefore, a perfect encoding scheme could use as few as 2^8 bits to store this particular kind of list, but couldn't encode any other kind of list.
EDIT: If you want to read more about this sort of stuff, read up on information theory.
EDIT: An easier way to think about this encoding is like this: We know the list is ordered, so if we know what items are in it then we know what order they're in, so we only need to know which items are in the list. There's 256 possible items (0 through 255), and if we assume the items in the list are unique then each item is either in the list, or it isn't. For each item, we use 1 bit to store whether or not it's in the list (so bit 0 records if the list contains 0, bit 1 records if the list contains 1, etc, bit 255 records if the list contains 255). Tada, we've stored all of the information about this 256 element array of bytes in only 256 = 2^8 bits.
EDIT: Let's examine an analogous situation. We have an ordered, unique list of up to 4 items, each of which is 2 bits. We'll write out all of the possible circumstances:[], [0], [1], [2], [3], [0,1], [0,2], [0,3], [1,2], [1,3], [2,3], [0,1,2], [0,1,3], [0,2,3], [1,2,3], [0,1,2,3]. These are the only possible lists where each element is two bits, the elements are unique, and the elements are in ascending order. Notice that I can't actually make it more efficient just by leaving the 3 off of [0,1,2,3], because I also need to distinguish it from [0,1,2]. The thing is, asking how much space you need to "store" something isolated from context is almost unanswerable. If all you want is to store enough information to recover it (i.e. you want lossless compression), and if you presume you know the properties, you can get your compression ratio pretty much as low as you want. For example, if you gave me an ordered list containing every element from 0 to 1,000,000 exactly once, even though storing that list directly in memory requires ~2^40 bits, you can recover the list from the known properties and the two numbers 0 and 1,000,000, for a total of 40 bits.

Compressing a string of 1's and 0s containing the same number of 1's as 0's

I have a string of 1's and 0's in which the number of 1's and 0's is the same. I would like to compress this into a number that is smaller in terms of the number of bits needed to store it. Also, converting between the compressed form and non compressed form needs to not require a lot of work.
For example, ordering all possible strings and numbering them off and letting this number be the compressed data would be too much work.
An easy solution would be to allow the compressed data to be just the first n-1 characters of the string where the string is of length n. Converting between the compressed and decompressed data would be easy but this offers little compression, only one bit per string.
I would like an algorithm that would compress a string with this property (same number of ones and zeros) that can be generalized to a string with any even length. I would also like it to compress more than the method described above.
Thanks for help.
This is a combination problem, N items taken k at a time.
In your comment as an example of Length 10, taken 5 at a time, means that there are only 252 unique patterns. Which can fit into an 8 bit value, instead of a 10 bit value. SEE: WIKI: Combinations
Expanding the indexed value from the 0-251 , there are examples here:
SEE: Algorithm to return all combinations of k elements from n
While extracting, you can use the extracted value to set the Bit position in the reconstructed value, which is O(1) time per expansion. If the list is not millions+ you could pre-compute a lookup table, which is much faster to translate the index value to the decoded value. IE: build a list of all possible, and lookup the translation.

Method to find number of digits after converting from a different base number

The text in quotes gives a bit of background on my program in case it's needed to understand my issue, you might be able to fully understand with the stuff at the end unquoted if you don't feel like reading it.
I'm working on the common project of sorting in C++, and I am
currently doing radix sort. I have it as a function, taking in a
vector of strings, an integer holding the max number of digits, and an
integer with the radix/base of the numbers: (numbers, maxDigits, radix)
Since the program takes in numbers of different base and as a string,
I'm using stoi to convert them to a base 10 integer to make the
process easier to generalize. Here's a quick summary of the algorithm:
create 10 queues to hold values 0 to 9
iterate through each digit (maxDigit times)
iterate through each number in the vector (here it converts to a base 10)
put them into the queue based on the current digit it's looking at
pull the numbers out of the queues from beginning to end back into the vector
As for the problem I'm trying to wrap my head around, I want to change the maxDigit value (with whatever radix the user inputs) to a maxDigit value after it is converted to base 10. In other words, say the user used the code
radixSort(myVector, 8, 2)
to sort a vector of numbers with the max number of digits 8 and a radix of 2. Since I convert the radix of the number to 10, I'm trying to find an algorithm to also change the maxDigits, if that makes sense.
I've tried thinking about this so much, trying to figure out a simple way through trial and error. If I could get some tips or help in the right direction that would be a great help.
If something is in radix 2 and max digits 8, then its largest value is all ones. And 11111111 = 255, which is (2^8 - 1).
The maximum digits in base 10 will be whatever is needed to represent that largest value. Here we see that to be 3. Which is the base 10 logarithm of 255 (2.40654018043), rounded up to 3.
So basically just round up log10 (radix^maxdigits - 1) to the nearest whole number.

All subsets in Subset_sum_problem

I'm stuck at solving Subset_sum_problem.
Given a set of integers(S), need to compute non-empty subsets whose sum is equal to a given target(T).
Example:
Given set, S{4, 8, 10, 16, 20, 22}
Target, T = 52.
Constraints:
The number of elements N of set S is limited to 8. Hence a NP time solution is acceptable as N has a small upperbound.
Time and space complexities are not really a concern.
Output:
Possible subsets with sum exactly equal to T=52 are:
{10, 20, 22}
{4, 10, 16, 22}
The solution given in Wiki and in some other pages tries to check whether there exists such a subset or not (YES/NO).
It doesn't really help to compute all possible subsets as outlined in the above example.
The dynamic programming approach at this link gives single such subset but I need all such subsets.
One obvious approach is to compute all 2^N combinations using brute force but that would be my last resort.
I'm looking for some programmatic example(preferably C++) or algorithm which computes such subsets with illutrations/examples?
When you construct the dynamic-programming table for the subset sum problem you intialize most of it like so (taken from the Wikipedia article referenced in the question):
Q(i,s) := Q(i − 1,s) or (xi == s) or Q(i − 1,s − xi)
This sets the table element to 0 or 1.
This simple formula doesn't let you distinguish between those several cases that can give you 1.
But you can instead set the table element to a value that'd let you distinguish those cases, something like this:
Q(i,s) := {Q(i − 1,s) != 0} * 1 + {xi == s} * 2 + {Q(i − 1,s − xi) != 0} *4
Then you can traverse the table from the last element. At every element the element value will tell you whether you have zero, one or two possible paths from it and their directions. All paths will give you all combinations of numbers summing up to T. And that's at most 2N.
if N <= 8 why don't just go with 2^n solution?? it's only 256 possibilities that will be very fast
Just brute force it. If N is limited to 8, your total number of subsets is 2^8, which is only 256. They give constraints for a reason.
You can express the set inclusion as a binary string where each element is either in the set or out of the set. Then you can just increment your binary string (which can simply be represented as an integer) and then determine which elements are in the set or not using the bitwise & operator. Once you've counted up to 2^N, you know you've gone through all possible subsets.
The best way to do it is using a dynamic programming approach.However, dynamic programming just answers whether a subset sum exits or not as you mentioned in your question.
By dynamic programming, you can output all the solutions by backtracking.However, the overall time complexity to generate all the valid combinations is still 2^n.
So, any better algorithm than 2^n is close to impossible.
UPD:
From #Knoothe Comment:
You can modify horowitz-sahni's algorithm to enumerate all possible subsets.If there are M such sets whose sum equals S, then overall time complexity is in O(N * 2^(N/2) + MN)

bit shifting - replacing a section of a bitset with a new number

I have a list of numbers encoded as a boost dynamic bitset. I dynamically choose the size of this bitset depending on the maximum value any number in this list can take. So let's say I have numbers from just 0 to 7, I only need three bits and my string 0,2,7 will be encoded as
000010111.
I now need to change say the 2nd number in this list (2) to another number, say 4.
I thought the most efficient way to do this would be to represent 4 as a dynamic bitset of the same length as the list but with all other values set to 1, so 111111011. I would then bitshift this the required amount using with 1s used to fill in values to get 111011111, and then just bitwise AND this with the original bitset to get my desired result.
However, I cannot find a way to do these two things, as it seems with both initialisation of a bitset from an integer, and when bit shifting, the default and fill in values are always set to 0, not 1. How can I get around this problem, or achieve my goal in a different and efficient way.
Thanks
If that is really the implementation, the most general and efficient method I can think of would be to first mask off all the bits for the part you are replacing:
value &= 111000111;
Then "or" in the actual bits for that position:
value |= 000011000;
Hopefully someone here has a better trick for me to learn, but that's what I do.
XOR the old value and the new value:
int valuetoset = oldvalue ^ newvalue; // 4 XOR 2 in your example
Just shift the value you need to set:
int bitstoset = valuetoset << position; // (4 XOR 2) << 3 in your example
Then XOR again bitstoset with your bitset and that's it !
int result = bitstoset ^ bitset;
Would you be able to use a vector of dynamic bitsets? Depending on your needs that might be sufficient and allow for easy updates.
Alternately fill your new bitset similiarly to how you proposed, but exactly inverted. Then right before you do the and at the end, flip all the bits.
I guess your understanding of bitset is elementary wrong:
set means it is NOT ordered, and the idea of a bitset is, that only one bit is necessary to show that the element is in-/outside the set.
So your original set 0,2,7 would have 8 bits because 0..7 are 8 elements and NOT 3 * 3 (3 bits required to represent 0..7), and the bitmap would look like 10000101.
What you describe is just a "packed" coding of the values. In your coding scheme 0,2,7 and 2,0,7 would coded completly different, but in a bitset they are the same.
In a (real) bitset (if that is what you want) you can then really easy "replace" elements by removing the old and adding the new. This happens as T.E.D. describes it.
To get the right mask you can easily use shift operations. So imagine you start counting by 0, you get the mask for value x by doing: 1<<x;
So you remove element x from the set by
value &= ~(1<<x);
and add another elemtn x (which might be the same) with
value | = 1<<x;
From you comment you misuse the bitset, so the masks must be build different (and you already had an almost right idea how to build them).
The command with bitmask for removal of element at position p:
value &= ~(111 p);
This 111 is for the above example where you need 3 bit for a position. If you dont want to hardcode it, you could for just take the next power of 2 and subtract 1 and then you got your only-1-string.
And to add you would just take your suggestest bitlist that contains only the new element and OR it to your bitlist:
value |= new_element_bitlist;