All subsets in Subset_sum_problem - c++

I'm stuck at solving Subset_sum_problem.
Given a set of integers(S), need to compute non-empty subsets whose sum is equal to a given target(T).
Example:
Given set, S{4, 8, 10, 16, 20, 22}
Target, T = 52.
Constraints:
The number of elements N of set S is limited to 8. Hence a NP time solution is acceptable as N has a small upperbound.
Time and space complexities are not really a concern.
Output:
Possible subsets with sum exactly equal to T=52 are:
{10, 20, 22}
{4, 10, 16, 22}
The solution given in Wiki and in some other pages tries to check whether there exists such a subset or not (YES/NO).
It doesn't really help to compute all possible subsets as outlined in the above example.
The dynamic programming approach at this link gives single such subset but I need all such subsets.
One obvious approach is to compute all 2^N combinations using brute force but that would be my last resort.
I'm looking for some programmatic example(preferably C++) or algorithm which computes such subsets with illutrations/examples?

When you construct the dynamic-programming table for the subset sum problem you intialize most of it like so (taken from the Wikipedia article referenced in the question):
Q(i,s) := Q(i − 1,s) or (xi == s) or Q(i − 1,s − xi)
This sets the table element to 0 or 1.
This simple formula doesn't let you distinguish between those several cases that can give you 1.
But you can instead set the table element to a value that'd let you distinguish those cases, something like this:
Q(i,s) := {Q(i − 1,s) != 0} * 1 + {xi == s} * 2 + {Q(i − 1,s − xi) != 0} *4
Then you can traverse the table from the last element. At every element the element value will tell you whether you have zero, one or two possible paths from it and their directions. All paths will give you all combinations of numbers summing up to T. And that's at most 2N.

if N <= 8 why don't just go with 2^n solution?? it's only 256 possibilities that will be very fast

Just brute force it. If N is limited to 8, your total number of subsets is 2^8, which is only 256. They give constraints for a reason.
You can express the set inclusion as a binary string where each element is either in the set or out of the set. Then you can just increment your binary string (which can simply be represented as an integer) and then determine which elements are in the set or not using the bitwise & operator. Once you've counted up to 2^N, you know you've gone through all possible subsets.

The best way to do it is using a dynamic programming approach.However, dynamic programming just answers whether a subset sum exits or not as you mentioned in your question.
By dynamic programming, you can output all the solutions by backtracking.However, the overall time complexity to generate all the valid combinations is still 2^n.
So, any better algorithm than 2^n is close to impossible.
UPD:
From #Knoothe Comment:
You can modify horowitz-sahni's algorithm to enumerate all possible subsets.If there are M such sets whose sum equals S, then overall time complexity is in O(N * 2^(N/2) + MN)

Related

Bit Manipulation: Harder Flipping Coins

Recently, I saw this problem from CodeChef titled 'Flipping Coins' (Link: FLIPCOINS).
Summarily, there are N coins and we must write a program that supports two operations.
To flip coin in range [A,B]
To find the number of heads in range [A,B] respectively.
Of course, we can quickly use a segment tree (range query, range updates using lazy propagation) to solve this.
However, I faced another similar problem where after a series of flips (operation 1), we are required to output the resulting permutation of coins after the flips (e.g 100101, where 0 represents head while 1 represents tail).
More specifically, operation 2 changes from counting number of heads to producing the resulting permutation of all N coins. Also, the new operation 2 is only called after all the flips have been done (i.e operation 2 is the last to be called and is only called one time).
May I know how does one solve this? It requires some form of bit manipulation, according to the problem tags.
Edit
I attempted brute-forcing through all queries, and alas, it yield Time Limit Exceeded.
Printing out the state of the coins can be done using a Binary-indexed tree:
Initially all values are 0.
When we need to flip coins [A, B], we increment A by 1 and
decrement B + 1 by 1.
The state of coin i is then the prefix sum at i modulo 2.
This works because the prefix sum at i is always the number of flip operations done at i.

finding intersections in a given range?

assume array of N (N<=100000) elements a1, a2, .... ,an, and you are given range in it L, R where 1<=L<=R<=N, you are required to get number of values in the given range which are divisible by at least one number from a set S which is given also, this set can be any subset of {1,2,....,10}. a fast way must be used because it may ask you for more than one range and more than one S (many queries Q, Q<=100000), so looping on the values each time will be very slow.
i thought of storing numbers of values divisible by each number in the big set {1,2,....,10} in 10 arrays of N elements each, and do cumulative sum to get the number of values divisible by any specific number in any range in O(1) time, for example if it requires to get number of values divisible by at least one of the following: 2,3,5, then i add the numbers of values divisible by each of them and then remove the intersections, but i didn't properly figure out how to calculate the intersections without 2^10 or 2^9 calculations each time which will be also very slow (and possibly hugely memory consuming) because it may be done 100000 times, any ideas ?
Your idea is correct. You can use inclusion-exclusion principle and prefix sums to find the answer. There is just one more observation you need to make.
If there's a pair of numbers a and b in the set such that a divides b, we can remove b without changing the answer to the query (indeed, if b | x, then a | x). Thus, we always get a set such that no element divides any other one.
The number of such mask is smaller than 2^10. In facts, it's 102. Here's the code that computes it:
def good(mask):
for i in filter(lambda b: mask & (1 << (b - 1)), range(1, 11)):
if (any(i % j == 0 for j in filter(lambda b: mask & (1 << (b - 1)), range(1, i)))):
return False
return True
print(list(filter(good, range(1, 2 ** 10)))))
Thus, we the preprocessing requires approximately 100N operations and numbers to store (it looks reasonably small).
Moreover, there are most 5 elements in any "good" mask (it can be checked using the code above). Thus, we can answer each query using around 2^5 operations.

Good way to detect identical expressions in C++

I am writing a program that solves this puzzle game: some numbers and a goal number is given, and you make the goal number using the n numbers and operators +, -, *, / and (). For example, given 2,3,5,7 and the goal number 10, the solutions are (2+3)*(7-5)=10, 3*5-(7-2)=10, and so on.
The catch is, if I implement it naively, I will get a bunch of identical solutions, like (2+3)*(7-5)=10 and (3+2)*(7-5)=10, and 3*5-(7-2)=10 and 5*3-(7-2)=10 and 3*5-7+2=10 and 3*5+2-7=10 and so on. So I'd like to detect those identical solutions and prune them.
I'm currently using randomly generated double numbers to detect identical solutions. What I'm doing is basically substituting those random numbers to the solution and check if there are any pairs of them that calculate to the same number. I have to perform the detection at every node of my search, so it has to be fast, and I use hashset for it now.
Now the problem is the error that comes with the calculation. Because even identical solutions do not calculate to the exactly same value, I currently round the calculated value to a precision when storing in the hashset. However this does not seem to work well enough, and gives different number of solutions every time to the same problem. Sometimes the random numbers are bad and prune some completely different solutions. Sometimes the calculated value lies on the edge of rounding function and it outputs two(or more) identical solutions. Is there a better way to do this?
EDIT:
By "identical" I mean two or more solutions(f(w,x,y,z,...) and g(w,x,y,z,...)) that calculate to the same number whatever the original number(w,x,y,z...) is. For more examples, 4/3*1/2 and 1*4/3/2 and (1/2)/(3/4) are identical, but 4/3/1/2 and 4/(3*1)/2 are not because if you change 1 to some other number they will not produce the same result.
It will be easier if you "canonicalize" the expressions before comparing them. One way would be to sort when an operation is commutative, so 3+2 becomes 2+3 whereas 2+3 remains as it was. Of course you will need to establish an ordering for parenthesized groups as well, like 3+(2*1)...does that become (1*2)+3 or 3+(1*2)? What the ordering is doesn't necessarily matter, so long as it is a total ordering.
Generate all possibilities of your expressions. Then..
When you create expressions, put them in a collection of parsed trees (this would also eliminate your parenthesis). Then "push down" any division and subtraction into the leaf nodes so that all the non-leaf nodes have * and +. Apply a sorting of the branches (e.g. regular string sort) and then compare the trees to see if they are identical.
I like the idea of using doubles. The problem is in the rounding. Why not use a container SORTED by the value obtained with one random set of double inputs. When you find the place you would insert in that container, you can look at the immediately preceding and following items. Use a different set of random doubles to recompute each for the more robust comparison. Then you can have a reasonable cutoff for "close enough to be equal" without arbitrary rounding.
If a pair of expressions are close enough for equal in both the main set of random numbers and the second set, the expressions are safely "same" and the newer one discarded. If close enough for equal in the main set but not the new set, you have a rare problem, that probably requires rekeying the entire container with a different random number set. If not close enough in either, then they are different.
For the larger n suggested by one of your recent comments, I think you would need the better performance that should be possible from a canonical by construction method (or maybe "almost" canonical by construction) rather than a primarily comparison based approach.
You don't want to construct an incredibly large number of expressions, then canonicalize and compare.
Define a doubly recursive function can(...) that takes as input:
A reference to a canonical expression tree.
A reference to one subexpression of that tree.
A count N of inputs to be injected.
A set of flags for prohibiting some injections.
A leaf function to call.
If N is zero, can just calls the leaf function. If N is nonzero, can patches the subtree in every possible way that produces a canonical tree with N injected variables, and calls the leaf function for each and restores the tree, undoing each part of the patch as it is done with it, so we never need massive copying.
X is the subtree and K is a leaf representing variable N-1. First can would replace the subtree temporarily one at a time with subtrees representing some of (X)+K, (X)-K, (X)*K, (X)/K and K/(X) but both flags and some other rules would cause some of those to be skipped. For each not skipped, recursively call itself with the whole tree as both top and sub, with N-1, and with 0 flags.
Next drill into the two children of X and call recursively itself with that as the subtree, with N, and with appropriate flags.
The outer just calls can with a single node tree representing variable N-1 of the original N, and passing N-1.
In discussion, it is easier to name the inputs forward, so A is input N-1 and B is input N-2 etc.
When we drill into X and see it is Y+Z or Y-Z we don't want to add or subtract K from Y or Z because those are redundant with X+K or X-K. So we pass a flag that suppresses direct add or subtract.
Similarly, when we drill into X and see it is Y*Z or Y/Z we don't want to multiply or divide either Y or Z by K because that is redundant with multiplying or dividing X by K.
Some cases for further clarification:
(A/C)/B and A/(B*C) are easily non canonical because we prefer (A/B)/C and so when distributing C into (A/B) we forbid direct multiplying or dividing.
I think it takes just a bit more effort to allow C/(A*B) while rejecting C/(A/B) which was covered by (B/A)*C.
It is easier if negation is inherently non canonical, so level 1 is just A and does not include -A then if the whole expression yields negative the target value, we negate the whole expression. Otherwise we never visit the negative of a canonical expression:
Given X, we might visit (X)+K, (X)-K, (X)*K, (X)/K and K/(X) and we might drill down into the parts of X passing flags which suppress some of the above cases for the parts:
If X is a + or - suppress '+' or '-' in its direct parts. If X is a * or / suppress * or divide in its direct parts.
But if X is a / we also suppress K/(X) before drilling into X.
Since you are dealing with integers, I'd focus on getting an exact result.
Claim: Suppose there is some f(a_1, ..., a_n) = x where a_i and x are your integer input numbers and f(a_1, ..., a_n) represents any functions of your desired form. Then clearly f(a_i) - x = 0. I claim, we can construct a different function g with g(x, a_1, ..., a_n) = 0 for the exact same x and g only uses ()s, +, - and * (no division).
I'll prove that below. Consequently you could construct g evaluate g(x, a_1, ..., a_n) = 0 on integers only.
Example:
Suppose we have a_i = i for i = 1, ..., 4 and f(a_i) = a_4 / (a_2 - (a_3 / 1)) (which contains divisions so far). This is how I would like to simplify:
0 = a_4 / (a_2 - (a_3 / a_1) ) - x | * (a_2 - (a_3 / a_1) )
0 = a_4 - x * (a_2 - (a_3 / a_1) ) | * a_1
0 = a_4 * a_1 - x * (a_2 * a_1 - (a_3) )
In this form, you can verify your equality for some given integer x using integer operations only.
Proof:
There is some g(x, a_i) := f(a_i) - x which is equivalent to f. Consider any equivalent g with as few as possible division. Assume there is at least one (otherwise we are done). Assume within g we divide by h(x, a_i) (any of your functions, may contain divisions itself). Then (g*h)(x, a_i) := g(x, a_i) * h(x, a_i) has the same roots, as g has (multiplying by a root, ie. (x, a_i) where g(a_i) - x = 0, preserves all roots). But on the other hand, g*h is composed of one division fewer. A contradiction (g with minimum number of divisions), which is why g doesn't contain any division.
I've updated the example to visualize the strategy.
Update: This works well on rational input numbers (those represent a single division p/q). This should help you. Other input can't be provided by humans.
What are you doing to find / test f's? I'd guess some form of dynamic programming will be fast in practice.

Good Hash function with 2 integer for a special key

I'm trying to determine a key for map<double, double> type. But the problem is that the key I want will be generated by a pair of 2 numbers. Are there any good functions which could generate such key for pairs like (0, 1), (2, 3), (4, 2) (0, 2), etc.
Go for N'ary numerical system, where N is the maximum possible value of the number in pair.
Like this:
hash(a, b) = a + b * N
then
a = hash(a, b) % N
b = hash(a, b) / N
This will guarantee that for every pair (a, b) there is its own unique hash(a, b). Same things happens to numbers in decimal: imagine all numbers from 0 (we write them as 00, 01, 02, ...) to 99 inclusive are your pairs ab. Then, hash(a, b) = a * 10 + b, and visa-versa, to obtain first digit you have to divide the number by 10, second - get it modulo 10.
Why can't we pick any N, maybe smaller than the maximum of a/b? The answer is: to avoid collision.
If you pick any number and it happens to be smaller than your maximum number, it is highly possible that same hash function will be provided by different pairs of numbers. For example, if you pick N = 10 for pairs: (10, 10) and (0, 11), both their hashes will be equal to 110, which is not good for you in this situation.
You should ideally have a KeyValuePair<int, int> as your key. I don't think writing more code than that can be helpful. If you cant have that for some reason, then hashing the pair to give a single key depends on what you're trying to achieve. If hashes are meant for hash structures like Dictionary, then you have to balance collision rate and speed of hashing. To have a perfect hash without collision at all it will be more time consuming. Similarly the fastest hashing algorithm will have more collisions relatively. Finding the perfect balance is the key here. Also you should take into consideration how large your effective hash can be and if hashed output should be reversible to give you back the original inputs. Typically priority should be given to speed up pairing/hashing/mapping than minimizing collision probability (a good hash algorithm will have less collision chances). To have perfect hashes you can see this thread for a plethora of options..

Bit count in the following case

I got the following questions in one of the interviews plz help me some ideas to solve it as am completely unaware how to proceed
A non-empty array A of N elements contains octal representation of a non-negative integer K, i.e. each element of A belongs to the interval [0; 7]
Write a function:
int bitcount_in_big_octal(const vector<int> &A);
that returns the number of bits set to 1 in the binary representation of K. The function should return -1 if the number of bits set to 1 exceeds 10,000,000.
Assume that the array can be very large.
Assume that N is an integer within the range [1..100,000].
is there any time restriction?
I have one idea: at first, make the following dictionary, {0->0, 1->1, 2->1, 3-> 2, 4->1, 5->1, 6->2, 7->3}. then, loop the array A to sum the 1s in every elements using the dictionary.
Iterate over your representation
for-each element in that iterate, convert the representation to its number of bits. #meteorgan's answer is a great way to do just that. If you need the representation for something other than bit counts, you'll probably want to convert it to some intermediate form useful for whatever else you'll be using - e.g. to byte[]: each octet in the representation should correspond to a single byte and since all you're doing is counting bits it doesn't matter that Java's byte is signed. then for-each byte in the array, use an existing bit counting lib, cast the byte to an int and use Integer.bitCount(...), or roll your own, etc - to count the bits
add the result to a running total, escape the iteration if you hit your threshold.
That's a Java answer in the details (like the lib I linked), but the algorithm steps are fine for C++, find a replacement library (or use the dictionary answer).
Here's the solution using the indexed (dictionary) based approach.
INDEX = [0, 1, 1, 2, 1, 2, 2, 3]
def bitcount_in_big_octal(A):
counter = 0
for octal in A: counter += INDEX[octal]
return counter