Related
I am looking for a least time-complex algorithm that would solve a variant of the perfect sum problem (initially: finding all variable size subset combinations from an array [*] of integers of size n that sum to a specific number x) where the subset combination size is of a fixed size k and return the possible combinations without direct and also indirect (when there's a combination containing the exact same elements from another in another order) duplicates.
I'm aware this problem is NP-hard, so I am not expecting a perfect general solution but something that could at least run in a reasonable time in my case, with n close to 1000 and k around 10
Things I have tried so far:
Finding a combination, then doing successive modifications on it and its modifications
Let's assume I have an array such as:
s = [1,2,3,3,4,5,6,9]
So I have n = 8, and I'd like x = 10 for k = 3
I found thanks to some obscure method (bruteforce?) a subset [3,3,4]
From this subset I'm finding other possible combinations by taking two elements out of it and replacing them with other elements that sum the same, i.e. (3, 3) can be replaced by (1, 5) since both got the same sum and the replacing numbers are not already in use. So I obtain another subset [1,5,4], then I repeat the process for all the obtained subsets... indefinitely?
The main issue as suggested here is that it's hard to determine when it's done and this method is rather chaotic. I imagined some variants of this method but they really are work in progress
Iterating through the set to list all k long combinations that sum to x
Pretty self explanatory. This is a naive method that do not work well in my case since I have a pretty large n and a k that is not small enough to avoid a catastrophically big number of combinations (the magnitude of the number of combinations is 10^27!)
I experimented several mechanism related to setting an area of research instead of stupidly iterating through all possibilities, but it's rather complicated and still work in progress
What would you suggest? (Snippets can be in any language, but I prefer C++)
[*] To clear the doubt about whether or not the base collection can contain duplicates, I used the term "array" instead of "set" to be more precise. The collection can contain duplicate integers in my case and quite much, with 70 different integers for 1000 elements (counts rounded), for example
With reasonable sum limit this problem might be solved using extension of dynamic programming approach for subset sum problem or coin change problem with predetermined number of coins. Note that we can count all variants in pseudopolynomial time O(x*n), but output size might grow exponentially, so generation of all variants might be a problem.
Make 3d array, list or vector with outer dimension x-1 for example: A[][][]. Every element A[p] of this list contains list of possible subsets with sum p.
We can walk through all elements (call current element item) of initial "set" (I noticed repeating elements in your example, so it is not true set).
Now scan A[] list from the last entry to the beginning. (This trick helps to avoid repeating usage of the same item).
If A[i - item] contains subsets with size < k, we can add all these subsets to A[i] appending item.
After full scan A[x] will contain subsets of size k and less, having sum x, and we can filter only those of size k
Example of output of my quick-made Delphi program for the next data:
Lst := [1,2,3,3,4,5,6,7];
k := 3;
sum := 10;
3 3 4
2 3 5 //distinct 3's
2 3 5
1 4 5
1 3 6
1 3 6 //distinct 3's
1 2 7
To exclude variants with distinct repeated elements (if needed), we can use non-first occurence only for subsets already containing the first occurence of item (so 3 3 4 will be valid while the second 2 3 5 won't be generated)
I literally translate my Delphi code into C++ (weird, I think :)
int main()
{
vector<vector<vector<int>>> A;
vector<int> Lst = { 1, 2, 3, 3, 4, 5, 6, 7 };
int k = 3;
int sum = 10;
A.push_back({ {0} }); //fictive array to make non-empty variant
for (int i = 0; i < sum; i++)
A.push_back({{}});
for (int item : Lst) {
for (int i = sum; i >= item; i--) {
for (int j = 0; j < A[i - item].size(); j++)
if (A[i - item][j].size() < k + 1 &&
A[i - item][j].size() > 0) {
vector<int> t = A[i - item][j];
t.push_back(item);
A[i].push_back(t); //add new variant including current item
}
}
}
//output needed variants
for (int i = 0; i < A[sum].size(); i++)
if (A[sum][i].size() == k + 1) {
for (int j = 1; j < A[sum][i].size(); j++) //excluding fictive 0
cout << A[sum][i][j] << " ";
cout << endl;
}
}
Here is a complete solution in Python. Translation to C++ is left to the reader.
Like the usual subset sum, generation of the doubly linked summary of the solutions is pseudo-polynomial. It is O(count_values * distinct_sums * depths_of_sums). However actually iterating through them can be exponential. But using generators the way I did avoids using a lot of memory to generate that list, even if it can take a long time to run.
from collections import namedtuple
# This is a doubly linked list.
# (value, tail) will be one group of solutions. (next_answer) is another.
SumPath = namedtuple('SumPath', 'value tail next_answer')
def fixed_sum_paths (array, target, count):
# First find counts of values to handle duplications.
value_repeats = {}
for value in array:
if value in value_repeats:
value_repeats[value] += 1
else:
value_repeats[value] = 1
# paths[depth][x] will be all subsets of size depth that sum to x.
paths = [{} for i in range(count+1)]
# First we add the empty set.
paths[0][0] = SumPath(value=None, tail=None, next_answer=None)
# Now we start adding values to it.
for value, repeats in value_repeats.items():
# Reversed depth avoids seeing paths we will find using this value.
for depth in reversed(range(len(paths))):
for result, path in paths[depth].items():
for i in range(1, repeats+1):
if count < i + depth:
# Do not fill in too deep.
break
result += value
if result in paths[depth+i]:
path = SumPath(
value=value,
tail=path,
next_answer=paths[depth+i][result]
)
else:
path = SumPath(
value=value,
tail=path,
next_answer=None
)
paths[depth+i][result] = path
# Subtle bug fix, a path for value, value
# should not lead to value, other_value because
# we already inserted that first.
path = SumPath(
value=value,
tail=path.tail,
next_answer=None
)
return paths[count][target]
def path_iter(paths):
if paths.value is None:
# We are the tail
yield []
else:
while paths is not None:
value = paths.value
for answer in path_iter(paths.tail):
answer.append(value)
yield answer
paths = paths.next_answer
def fixed_sums (array, target, count):
paths = fixed_sum_paths(array, target, count)
return path_iter(paths)
for path in fixed_sums([1,2,3,3,4,5,6,9], 10, 3):
print(path)
Incidentally for your example, here are the solutions:
[1, 3, 6]
[1, 4, 5]
[2, 3, 5]
[3, 3, 4]
You should first sort the so called array. Secondly, you should determine if the problem is actually solvable, to save time... So what you do is you take the last k elements and see if the sum of those is larger or equal to the x value, if it is smaller, you are done it is not possible to do something like that.... If it is actually equal yes you are also done there is no other permutations.... O(n) feels nice doesn't it?? If it is larger, than you got a lot of work to do..... You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array.... If this is still larger than x then you do it for the second and third and so on until you get something smaller than x. Once you reach a point where you have the sum smaller than x, you can go ahead and start to increase the value of the last position you stopped at until you hit x.... Once you hit x that is your combination.... Then you can go ahead and get the previous element so if you had 1,1,5, 6 in your thingy, you can go ahead and grab the 1 as well, add it to your smallest element, 5 to get 6, next you check, can you write this number 6 as a combination of two values, you stop once you hit the value.... Then you can repeat for the others as well.... You problem can be solved in O(n!) time in the worst case.... I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements, mhmmm bad idea do you even have that much space??? That's like 3bits for the header and 8 bits for each integer you would need 9.8765*10^25 terabytes just to store that clossal array, more memory than a supercomputer, you should worry about whether your computer can even store this monster rather than if you can solve the problem, that many combinations even if you find a quadratic solution it would crash your computer, and you know what quadratic is a long way off from O(n!)...
A brute force method using recursion might look like this...
For example, given variables set, x, k, the following pseudo code might work:
setSumStructure find(int[] set, int x, int k, int setIdx)
{
int sz = set.length - setIdx;
if (sz < x) return null;
if (sz == x) check sum of set[setIdx] -> set[set.size] == k. if it does, return the set together with the sum, else return null;
for (int i = setIdx; i < set.size - (k - 1); i++)
filter(find (set, x - set[i], k - 1, i + 1));
return filteredSets;
}
Doing a major programming assignment for a Data Abstraction and ADT class. It's a sorted linked list* type of "nested classes" referred to by pointers environment. The binary search algorithm is provided for us in the way of the pseudocode below.
**ALGORITHM** *BinarySearch*(A[0...n-1], n, k)* //appropriate for get & remove
//pre: A is sorted in ascending order, n is the number of items in the array
// k is the item being searched for
//post: if the search is successful the index of the match is returned( 1-based)
//if the search is unsuccessful, -1 is returned
f <-- 0 // 0-based indices used in this method
I <-- n-1 /* personal note. I think its an I, the pdf could be an l though */
/* personal note - I think the above is an I, pdf font could be an l though.
--- secondly, the brackets in this pseudocode are bottom-half brackets.
---- i.e., they could be "floor" or "minimum"-based brackets of a kind */
m <-- f + [(i-f)/2] // floor-brackets
while f <= I do
{
if (K = a[m]) // not-floor brackets, just subscript
return (m + 1)
else if ( K < A[m] ) // not-floor brackets, just subscript
I <-- m-1
else
f <-- m+1
m <-- f + [(I-f)/2] /* floor-based brackets */
return -1
second half of the algorithm
**ALGORITHM** *BinarySearch*(A[0...n-1], n, k) // appropriate for Add function
//pre: A is sorted in ascending order, n is hte number of items in the array
// K is the term being searched for
//post: if successful, index is returned (1-based)
// if the search is unsuccessful, the index of the current location is returned
//what if n = 0? (that is, the first item is being added to the list)
f <-- 0 //0-based indices used in this method
I <-- n-1 **
m <-- f + [(I-f)/2] /* floor-based square brackets */
while (f <= I) do
// int comp = (*compare)(K, A[m])[function ptr]
if(K = A[m] ) // not-floor brackets, just subscript
// comp == 0, duplicate found, add here, adjust for 1
return (m + 1)
else if( K < A[m] ) // comp < 0
I <-- m-1
else
f <-- m+1
m <-- f + [(I-f)/2] // these are floor-brackets, but what is M in all this?
return m+1 //Not found, but current mid position is the location to add
I understand most of it, but am having trouble translating some of his notation style. I'm not entirely sure what F, I, and M are all the time, or how they are totally applying.
Maybe 'F' is a function call being applied for the recursive nature of this algorithm?
This is a general template for variables being replaced by actual code objects and names, of course.
However, I tried looking up generalized templates of binary search algorithms, but I seem to be bad at finding proper example templates of certain method types. Especially on cplusplus.com, when looking up code gives me this advanced example that is in no way even TRYING to be an attempt at introducing a genenral concept of what a specific method or function does, generally, or what it always does or does not have.
So, could you help me parse what this pseudocode is saying? Maybe direct me to a good template of what a binary search algorithm template might be, to compare to this, to help me build the function body and definition?
I have items with ID 1, 3, 4, 5, 6, 7. Now I have data like following.
There is an offerId for each row. Array of Ids consist of combination of the ID in an array. Discount is the value for that offerId
offerId : Array of Ids : Discount
o1 : [1] : 45
o2 : [1 3 4] : 100
o3 : [3 5] : 55
o4 : [5] : 40
o5 : [6] : 30
o6 : [6 7] : 20
Now I have to select all the offerIds which give me best combination of Ids i.e. maximum total discount.
For example in above case : possible results can be:
[o2, o4, o5] maximum discount is 170(100 + 40 + 30).
Note. the result offerId should be such that Ids don't repeat. Example for o2,o4,o6 ids are [1,3,4], [5], [6] all are distinct.
Other combination can be :
o1, o3, 06 for which ids are [1], [3,5], [6,7] However the total is 120(45+55+20) which is less then 170 as in previous case.
I need an algorithm/code which will help me to identify combination of offerIds which will give maximum discount , considering that each offer should contain distinct Ids.
NOTE I am writing my code in go language. But solutions/Logic in any language will be helpful.
NOTE : I hope I am able to explain my requirement properly. please comment if any extra information is required. Thanks.
Here is a dynamic programming solution which, for every possible subset of IDs, finds the combination of offers for which the discount is maximum possible.
This will be pseudocode.
Let our offers be structures with fields offerNumber, setOfItems and discount.
For the purposes of implementation, we first renumerate the possible items by integers from zero to number of different possible items (say k) minus one.
After that, we can represent setOfItems by a binary number of length k.
For example, if k = 6 and setOfItems = 1011102, this set includes items 5, 3, 2 and 1 and excludes items 4 and 0, since bits 5, 3, 2 and 1 are ones and bits 4 and 0 are zeroes.
Now let f[s] be the best discount we can get using exactly set s of items.
Here, s can be any integer between 0 and 2k - 1, representing one of the 2k possible subsets.
Furthermore, let p[s] be the list of offers which together allow us to get discount f[s] for the set of items s.
The algorithm goes as follows.
initialize f[0] to zero, p[0] to empty list
initialize f[>0] to minus infinity
initialize bestF to 0, bestP to empty list
for each s from 0 to 2^k - 1:
for each o in offers:
if s & o.setOfItems == o.setOfItems: // o.setOfItems is a subset of s
if f[s] < f[s - o.setOfItems] + o.discount: // minus is set subtraction
f[s] = f[s - o.setOfItems] + o.discount
p[s] = p[s - o.setOfItems] append o.offerNumber
if bestF < f[s]:
bestF = f[s]
bestP = p[s]
After that, bestF is the best possible discount, and bestP is the list of offers which get us that discount.
The complexity is O (|offers| * 2k) where k is the total number of items.
Here is another implementation which is asymptotically the same, but might be faster in practice when most subsets are unreachable.
It is "forward" instead of "backward" dynamic programming.
initialize f[0] to zero, p[0] to empty list
initialize f[>0] to -1
initialize bestF to 0, bestP to empty list
for each s from 0 to 2^k - 1:
if f[s] >= 0: // only for reachable s
if bestF < f[s]:
bestF = f[s]
bestP = p[s]
for each o in offers:
if s & o.setOfItems == 0: // s and o.setOfItems don't intersect
if f[s + o.setOfItems] < f[s] + o.discount: // plus is set addition
f[s + o.setOfItems] = f[s] + o.discount
p[s + o.setOfItems] = p[s] append o.offerNumber
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
A sequence of integers is a one- sequence if the difference between any two consecutive numbers in this sequence is -1 or 1 and its first element is 0.
More precisely: a1, a2, ..., an is a one-sequence if:
For any k (1 ≤ k < n): |a[k] - a[k+1]|=1,
a[1]=0
Given n and s ─ sum of all elements in a. W need to construct a one-sequence with the given parameters.
Like If n=8 and s=4 then one of such sequence is [0 1 2 1 0 -1 0 1].
Note if for given n and s we cant form such sequence than also we need to tell that its not possible.Otherwise we need to tell any of such one sequence.How to do this problem Please help.
Here's another take on aioobe's algorithm, with a formal proof of correctness.
Given a sequence a(k), define the difference sequence d(k) = a(k+1) - a(k) and observe that a(1) + a(2) + ... + a(n) = (n-1)d(1) + (n-2)d(2) + ... + 1d(n-1).
Theorem: for parameters n and s, there exists a length-n one-sequence summing to s if and only if (1) n(n-1)/2 mod 2 = s mod 2 and (2) |s| ≤ n(n-1)/2.
Proof: by induction on n. The base case, n = 1, is trivial. Inductively, since d(k) ∈ {±1}, we observe that both (1) and (2) are necessary conditions, as n-1 + n-2 + ... + 1 = n(n-1)/2 and -1 mod 2 = 1 mod 2. Conversely, assume both (1) and (2). If s ≥ 0, then construct a length-(n-1) sequence summing to s - (n-1). If s < 0, then construct a length-(n-1) sequence summing to s + (n-1). Both (1) and (2) are satisfied for these constructions (some tedious case analysis omitted), so it follows from the inductive hypothesis that they succeed. Increase/decrease the elements of this sequence by one depending on whether s ≥ 0/s < 0 and put 0 at the beginning.
Since the proof of the theorem is constructive, we can implement it in Python.
def oneseq(n, s):
assert isinstance(n, int)
assert isinstance(s, int)
nchoose2 = n*(n-1)//2
abss = abs(s)
if n < 1 or abss%2 != nchoose2%2 or abss > nchoose2: return None
a = [0]
for k in range(n-1, 0, -1): # n-1, n-2, ..., 1
d = 1 if s >= 0 else -1
a.append(a[-1] + d) # a[-1] equivalent to a[len(a) - 1]
s -= k*d
return a
First, to decide if it's possible to solve or not can be done up front. Since you go either +1 or -1 in each step, you'll go from even, to odd, to even, to odd... So with an odd value for n you'll only be able to reach an even number, and for an even value of n you'll only be able to reach an odd number. The reachable range is simple as well: ±(1+2+3+...+n).
Second, if you draw the "decision tree" on whether to go up (+1) or down (-1) in each step, and draw the accumulated sum in each node, you'll see that you can do a kind of binary search to find the sum at one of the leaves in the tree.
You go +1 if you're about to undershoot, and go -1 if you're about to overshoot. The tricky part is to figure out if you're going to undershoot/overshoot. Your current "state" should be computed by
"what I have so far" + "what I'll get for free by staying at this level for the rest of the array".
What you have "for free by staying at this level" is stepsLeft * previousValue.
Here's some pseudo code.
solve(stepsLeft, prev, acc) {
if stepsLeft == 0, return empty list // base case
ifIStayHere = acc + prev*stepsLeft
step = ifIstayHere > s ? prev-1 : prev+1
return [step] concatenated by solve(stepsLeft-1, step, acc+step)
}
Note that this solution does not include the initial 0, so call it with stepsLeft = n-1.
As you can see, it's θ(n) and it works for all cases I've tested. (Implemented it in Java.)
Suppose that I have a sorted array, N, consisting of n elements. Now, given k, I need a highly efficient method to generate the k-combination that would be the middle combination (if all the k-combinations were lexicographically sorted).
Example:
N = {a,b,c,d,e} , k = 3
1: a,b,c
2: a,b,d
3: a,b,e
4: a,c,d
5: a,c,e
6: a,d,e
7: b,c,d
8: b,c,e
9: b,d,e
10: c,d,e
I need the algorithm to generate combination number 5.
The Wikipedia page on the combinatorial number system explains how this can be obtained (in a greedy way). However, since n is very large and I need to find the middle combination for all k's less than n, I need something much more efficient than that.
I'm hoping that since the combination of interest always lies in the middle, there is some sort of a straightforward method for finding it. For example, the first k-combination in the above list is always given by the first k elements in N, and similarly the last combination is always given by the last k elements. Is there such a way to find the middle combination as well?
http://en.wikipedia.org/wiki/Combinatorial_number_system
If you are looking for a way to obtain the K-indexes from the lexicographic index or rank of a unique combination, then your problem falls under the binomial coefficient. The binomial coefficient handles problems of choosing unique combinations in groups of K with a total of N items.
I have written a class in C# to handle common functions for working with the binomial coefficient. It performs the following tasks:
Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters.
Converts the K-indexes to the proper lexicographic index or rank of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle and is very efficient compared to iterating over the set.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. The technique used is also much faster than older iterative solutions.
Uses Mark Dominus method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to use the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with several cases and there are no known bugs.
To read about this class and download the code, see Tablizing The Binomial Coeffieicent.
The following tested code will calculate the median lexicographic element for any N Choose K combination:
void TestMedianMethod()
{
// This test driver tests out the GetMedianNChooseK method.
GetMedianNChooseK(5, 3); // 5 choose 3 case.
GetMedianNChooseK(10, 3); // 10 choose 3 case.
GetMedianNChooseK(10, 5); // 10 choose 5 case.
}
private void GetMedianNChooseK(int N, int K)
{
// This method calculates the median lexicographic index and the k-indexes for that index.
String S;
// Create the bin coeff object required to get all
// the combos for this N choose K combination.
BinCoeff<int> BC = new BinCoeff<int>(N, K, false);
int NumCombos = BinCoeff<int>.GetBinCoeff(N, K);
// Calculate the median value, which in this case is the number of combos for this N
// choose K case divided by 2.
int MedianValue = NumCombos / 2;
// The Kindexes array holds the indexes for the specified lexicographic element.
int[] KIndexes = new int[K];
// Get the k-indexes for this combination.
BC.GetKIndexes(MedianValue, KIndexes);
StringBuilder SB = new StringBuilder();
for (int Loop = 0; Loop < K; Loop++)
{
SB.Append(KIndexes[Loop].ToString());
if (Loop < K - 1)
SB.Append(" ");
}
// Print out the information.
S = N.ToString() + " choose " + K.ToString() + " case:\n";
S += " Number of combos = " + NumCombos.ToString() + "\n";
S += " Median Value = " + MedianValue.ToString() + "\n";
S += " KIndexes = " + SB.ToString() + "\n\n";
Console.WriteLine(S);
}
Output:
5 choose 3 case:
Number of combos = 10
Median Value = 5
KIndexes = 4 2 0
10 choose 3 case:
Number of combos = 120
Median Value = 60
KIndexes = 8 3 1
10 choose 5 case:
Number of combos = 252
Median Value = 126
KIndexes = 9 3 2 1 0
You should be able to port this class over fairly easily to the language of your choice. You probably will not have to port over the generic part of the class to accomplish your goals. Depending on the number of combinations you are working with, you might need to use a bigger word size than 4 byte ints.