Generate unique combinations from multiple arrays/vectors - c++

I have 800 data files and each file contain 8 lines of integer eg
17,1,2,3,4,5,6,7,10,11,12,13,15,16,20,22,24,26,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
16,1,2,3,4,5,6,7,8,9,10,11,12,16,17,21,26,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
23,4,5,6,7,8,9,10,12,13,14,15,16,17,18,19,20,23,25,26,28,29,35,36,,,,,,,,,,,,,,,,,,,,,,,,,,
27,8,9,11,12,13,14,15,17,19,20,21,22,23,24,26,27,28,29,30,31,32,34,37,39,40,41,42,,,,,,,,,,,,,,,,,,,,,,
27,14,16,17,18,19,20,22,23,24,25,26,27,28,29,30,31,32,33,35,36,37,38,39,40,42,43,44,,,,,,,,,,,,,,,,,,,,,,
24,20,24,26,27,28,29,30,31,32,33,34,35,36,37,39,40,41,42,43,44,45,46,47,48,,,,,,,,,,,,,,,,,,,,,,,,,
16,33,34,35,36,37,38,39,41,42,43,44,45,46,47,48,49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
14,35,37,38,39,40,41,42,43,44,45,46,47,48,49,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Each line has 50 elements, 1st element of each line is number count i.e. 17 of line 1 indicate there is 17 numbers in this line 1,2,3,4,5,6,7,10,11,12,13,15,16,20,22,24,26. Numbers in each line is unique , in ascending order and within range 1~49.
My task is to generate list of unique 8 numbers combinations from this 8 lines
i.e. A,B,C,D,E,F,G,H
A from line 1, B from line 2 ... H from line 8
24,517,914,624 (17*16*23*27*27*24*16*14) entries will be generated:
1,1,4,8,14,20,33,35
...
1,1,4,8,14,20,33,49
....
1,2,4,8,14,20,33,35
...
2,1,4,8,14,20,33,35
...
And then process the 24,517,914,624 entries list as follow:
i) remove entries with duplicate numbers e.g. 1,1,4,8,14,20,33,35 and 1,1,4,8,14,20,33,49 will be removed
ii) sort number in each entry in ascending order e.g. 2,1,4,8,14,20,33,35 will become 1,2,4,8,14,20,33,35
iii) remove duplicated entries e.g. 2,1,4,8,14,20,33,35 is same as 1,2,4,8,14,20,33,35 after sorted, therefore only 1 entry of 1,2,4,8,14,20,33,35 will be kept
After the above process, may be around 10 millions entries left (which is the result I want)
However. processing a 24,517,914,624 entries array is a nearly impossible task,
therefore I tried the following 2 approachs to tackle the problem (try remove entries with duplicate numbers and sort number for each entry.
1) Brute force approach, use 8 nested for loop to generate combinations:
for (int i = 0; i < LineArr[0][0]; i++) {
for (int j = 0; j < LineArr[1][0]; j++) {
for (int k = 0; k < LineArr[2][0]; k++) {
for (int l = 0; l < LineArr[3][0]; l++) {
for (int m = 0; m < LineArr[4][0]; m++) {
for (int n = 0; n < LineArr[5][0]; n++) {
for (int o = 0; o < LineArr[6][0]; o++) {
for (int p = 0; p < LineArr[7][0]; p++) {
MyRes[0]=LineArr[0][i]
MyRes[1]=LineArr[1][j]
MyRes[2]=LineArr[2][k]
MyRes[3]=LineArr[3][l]
MyRes[4]=LineArr[4][m]
MyRes[5]=LineArr[5][n]
MyRes[6]=LineArr[6][o]
MyRes[7]=LineArr[7][p]
// Sort number of MyRes and discard if it contains duplicate numbers
// store valid combination in a temp array/vector
}}}}}}}}
// remove duplicate entries in the temp array/vector ('unique' the temp array)
2) Stepwise approach
Instead of generate 8 numbers combination at once, generate 2 numbers combination from first 2 lines, sort number in each entry, remove entry with duplicate number and unify the list
the output will be something like this:
1,2
1,3
1,4
1,1 2,2 will be removed and 4,1 will become 1,4 and duplicated entries removed.
Then the above list will combine with line 3 to form 3 numbers combinations, also sort and remove entries with duplicated number and unify the list.
Apply the above to 4,5,6...8 lines to form 4,5,6...8 numbers combinations
Since this is part of an automation project, AutoIt is used throughout the project (those 800 files
are from another 3rd party software). I tried implement the combinations generation with AutoIt,
Technically approach 1) generate 24,517,914,624 entries, sort number in each entry right after generation and discard entry with duplicate number in it.
This approach takes forever to run since it involve billions entries to test/sort and its array size is much higher than AutoIt's array size limit (16 millions). Therefore approach 1) can be discarded,
it only suitable for (at most) 5 numbers combination (eg 1,3,7,14,23).
For approach 2), I tried 2 variations:
i) store the result in each step in temp array and use AutoIt's _ArrayUnique function to remove duplicate entries. This also takes forever to run!!
ii) Instead of store the result in temp array, I make use of SQLite, i.e. put the combination generated in each step into a single row table in SQLite, the table/row is created with PRIMARY KEY UNIQUE Then I select the row back into AutoIt for further processing.
Variation ii) eventually work, it takes 1 hr 20 min to handle 1 file (and I have 800 of such files)
Now I plan to implement the combination generation in VC++ (VS 2017) and I have the follow questions:
1) Apart from "Brute force" and "Stepwise", any other approach/algorithm to generate unique combinations from multiple arrays/vectors from performance point of view ?
2) To sort number in each entry and check repeat number in each entry, I think std::sort, std::search/std::find will do the job, however, since there will be millions entries to check, is there any other options from performance point of view ?
3) To remove duplicate entries (unify the combination list i.e. get unique combinations), I should use std::unique or still rely on SQLite ? since the size of array may as large as 30~40 millions and shrink to 10 millions after std::sort and std::unique or SELECT from SQLite (don't know which implementation is better in performance point of view)
4) Any ready made LIB can easy the task ?
Thanks a lot.
Regds
LAM Chi-fung

Just find out the std::set, and its sort/unique feature suit my need. I implement the stepwise approach with it and the program run like fly. Only that it easily go out of memory after row 6, so I combine it with SQLite i.e. after work on 6 rows, I discard the std::set and store the combined result in SQLite table (single row table with PRIMARY KEY UNIQUE). This may not be a perfect solution but workable.

Related

Perfect sum problem with fixed subset size

I am looking for a least time-complex algorithm that would solve a variant of the perfect sum problem (initially: finding all variable size subset combinations from an array [*] of integers of size n that sum to a specific number x) where the subset combination size is of a fixed size k and return the possible combinations without direct and also indirect (when there's a combination containing the exact same elements from another in another order) duplicates.
I'm aware this problem is NP-hard, so I am not expecting a perfect general solution but something that could at least run in a reasonable time in my case, with n close to 1000 and k around 10
Things I have tried so far:
Finding a combination, then doing successive modifications on it and its modifications
Let's assume I have an array such as:
s = [1,2,3,3,4,5,6,9]
So I have n = 8, and I'd like x = 10 for k = 3
I found thanks to some obscure method (bruteforce?) a subset [3,3,4]
From this subset I'm finding other possible combinations by taking two elements out of it and replacing them with other elements that sum the same, i.e. (3, 3) can be replaced by (1, 5) since both got the same sum and the replacing numbers are not already in use. So I obtain another subset [1,5,4], then I repeat the process for all the obtained subsets... indefinitely?
The main issue as suggested here is that it's hard to determine when it's done and this method is rather chaotic. I imagined some variants of this method but they really are work in progress
Iterating through the set to list all k long combinations that sum to x
Pretty self explanatory. This is a naive method that do not work well in my case since I have a pretty large n and a k that is not small enough to avoid a catastrophically big number of combinations (the magnitude of the number of combinations is 10^27!)
I experimented several mechanism related to setting an area of research instead of stupidly iterating through all possibilities, but it's rather complicated and still work in progress
What would you suggest? (Snippets can be in any language, but I prefer C++)
[*] To clear the doubt about whether or not the base collection can contain duplicates, I used the term "array" instead of "set" to be more precise. The collection can contain duplicate integers in my case and quite much, with 70 different integers for 1000 elements (counts rounded), for example
With reasonable sum limit this problem might be solved using extension of dynamic programming approach for subset sum problem or coin change problem with predetermined number of coins. Note that we can count all variants in pseudopolynomial time O(x*n), but output size might grow exponentially, so generation of all variants might be a problem.
Make 3d array, list or vector with outer dimension x-1 for example: A[][][]. Every element A[p] of this list contains list of possible subsets with sum p.
We can walk through all elements (call current element item) of initial "set" (I noticed repeating elements in your example, so it is not true set).
Now scan A[] list from the last entry to the beginning. (This trick helps to avoid repeating usage of the same item).
If A[i - item] contains subsets with size < k, we can add all these subsets to A[i] appending item.
After full scan A[x] will contain subsets of size k and less, having sum x, and we can filter only those of size k
Example of output of my quick-made Delphi program for the next data:
Lst := [1,2,3,3,4,5,6,7];
k := 3;
sum := 10;
3 3 4
2 3 5 //distinct 3's
2 3 5
1 4 5
1 3 6
1 3 6 //distinct 3's
1 2 7
To exclude variants with distinct repeated elements (if needed), we can use non-first occurence only for subsets already containing the first occurence of item (so 3 3 4 will be valid while the second 2 3 5 won't be generated)
I literally translate my Delphi code into C++ (weird, I think :)
int main()
{
vector<vector<vector<int>>> A;
vector<int> Lst = { 1, 2, 3, 3, 4, 5, 6, 7 };
int k = 3;
int sum = 10;
A.push_back({ {0} }); //fictive array to make non-empty variant
for (int i = 0; i < sum; i++)
A.push_back({{}});
for (int item : Lst) {
for (int i = sum; i >= item; i--) {
for (int j = 0; j < A[i - item].size(); j++)
if (A[i - item][j].size() < k + 1 &&
A[i - item][j].size() > 0) {
vector<int> t = A[i - item][j];
t.push_back(item);
A[i].push_back(t); //add new variant including current item
}
}
}
//output needed variants
for (int i = 0; i < A[sum].size(); i++)
if (A[sum][i].size() == k + 1) {
for (int j = 1; j < A[sum][i].size(); j++) //excluding fictive 0
cout << A[sum][i][j] << " ";
cout << endl;
}
}
Here is a complete solution in Python. Translation to C++ is left to the reader.
Like the usual subset sum, generation of the doubly linked summary of the solutions is pseudo-polynomial. It is O(count_values * distinct_sums * depths_of_sums). However actually iterating through them can be exponential. But using generators the way I did avoids using a lot of memory to generate that list, even if it can take a long time to run.
from collections import namedtuple
# This is a doubly linked list.
# (value, tail) will be one group of solutions. (next_answer) is another.
SumPath = namedtuple('SumPath', 'value tail next_answer')
def fixed_sum_paths (array, target, count):
# First find counts of values to handle duplications.
value_repeats = {}
for value in array:
if value in value_repeats:
value_repeats[value] += 1
else:
value_repeats[value] = 1
# paths[depth][x] will be all subsets of size depth that sum to x.
paths = [{} for i in range(count+1)]
# First we add the empty set.
paths[0][0] = SumPath(value=None, tail=None, next_answer=None)
# Now we start adding values to it.
for value, repeats in value_repeats.items():
# Reversed depth avoids seeing paths we will find using this value.
for depth in reversed(range(len(paths))):
for result, path in paths[depth].items():
for i in range(1, repeats+1):
if count < i + depth:
# Do not fill in too deep.
break
result += value
if result in paths[depth+i]:
path = SumPath(
value=value,
tail=path,
next_answer=paths[depth+i][result]
)
else:
path = SumPath(
value=value,
tail=path,
next_answer=None
)
paths[depth+i][result] = path
# Subtle bug fix, a path for value, value
# should not lead to value, other_value because
# we already inserted that first.
path = SumPath(
value=value,
tail=path.tail,
next_answer=None
)
return paths[count][target]
def path_iter(paths):
if paths.value is None:
# We are the tail
yield []
else:
while paths is not None:
value = paths.value
for answer in path_iter(paths.tail):
answer.append(value)
yield answer
paths = paths.next_answer
def fixed_sums (array, target, count):
paths = fixed_sum_paths(array, target, count)
return path_iter(paths)
for path in fixed_sums([1,2,3,3,4,5,6,9], 10, 3):
print(path)
Incidentally for your example, here are the solutions:
[1, 3, 6]
[1, 4, 5]
[2, 3, 5]
[3, 3, 4]
You should first sort the so called array. Secondly, you should determine if the problem is actually solvable, to save time... So what you do is you take the last k elements and see if the sum of those is larger or equal to the x value, if it is smaller, you are done it is not possible to do something like that.... If it is actually equal yes you are also done there is no other permutations.... O(n) feels nice doesn't it?? If it is larger, than you got a lot of work to do..... You need to store all the permutations in an seperate array.... Then you go ahead and replace the smallest of the k numbers with the smallest element in the array.... If this is still larger than x then you do it for the second and third and so on until you get something smaller than x. Once you reach a point where you have the sum smaller than x, you can go ahead and start to increase the value of the last position you stopped at until you hit x.... Once you hit x that is your combination.... Then you can go ahead and get the previous element so if you had 1,1,5, 6 in your thingy, you can go ahead and grab the 1 as well, add it to your smallest element, 5 to get 6, next you check, can you write this number 6 as a combination of two values, you stop once you hit the value.... Then you can repeat for the others as well.... You problem can be solved in O(n!) time in the worst case.... I would not suggest that you 10^27 combinations, meaning you have more than 10^27 elements, mhmmm bad idea do you even have that much space??? That's like 3bits for the header and 8 bits for each integer you would need 9.8765*10^25 terabytes just to store that clossal array, more memory than a supercomputer, you should worry about whether your computer can even store this monster rather than if you can solve the problem, that many combinations even if you find a quadratic solution it would crash your computer, and you know what quadratic is a long way off from O(n!)...
A brute force method using recursion might look like this...
For example, given variables set, x, k, the following pseudo code might work:
setSumStructure find(int[] set, int x, int k, int setIdx)
{
int sz = set.length - setIdx;
if (sz < x) return null;
if (sz == x) check sum of set[setIdx] -> set[set.size] == k. if it does, return the set together with the sum, else return null;
for (int i = setIdx; i < set.size - (k - 1); i++)
filter(find (set, x - set[i], k - 1, i + 1));
return filteredSets;
}

Faster way of searching array of sets

I have an array containing 100,000 sets. Each set contains natural numbers below 1,000,000. I have to find the number of ordered pairs {m, n}, where 0 < m < 1,000,000, 0 < n < 1,000,000 and m != n, which do not exist together in any of 100,000 sets. A naive method of searching through all the sets leads to 10^5 * (10^6 choose 2) number of searches.
For example I have 2 sets set1 = {1,2,4} set2 = {1,3}. All possible ordered pairs of numbers below 5 are {1,2}, {1,3}, {1,4}, {2,3}, {2,4} and {3,4}. The ordered pairs of numbers below 5 which do not exist together in set 1 are {1,3},{2,3} and {3,4}. The ordered pairs below 5 missing in set 2 are {1,2},{1,4},{2,3},{2,4} and {3,4}. The ordered pairs which do not exist together in both the sets are {2,3} and {3,4}. So the count of number of ordered pairs missing is 2.
Can anybody point me to a clever way of organizing my data structure so that finding the number of missing pairs is faster? I apologize in advance if this question has been asked before.
Update:
Here is some information about the structure of my data set.
The number of elements in each set varies from 2 to 500,000. The median number of elements is around 10,000. The distribution peaks around 10,000 and tapers down in both direction. The union of the elements in the 100,000 sets is close to 1,000,000.
If you are looking for combinations across sets, there is a way to meaningfully condense your dataset, as shown in frenzykryger's answer. However, from your examples, what you're looking for is the number of combinations available within each set, meaning each set contains irreducible information. Additionally, you can't use combinatorics to simply obtain the number of combinations from each set either; you ultimately want to deduplicate combinations across all sets, so the actual combinations matter.
Knowing all this, it is difficult to think of any major breakthroughs you could make. Lets say you have i sets and a maximum of k items in each set. The naive approach would be:
If your sets are typically dense (i.e. contain most of the numbers between 1 and 1,000,000), replace them with the complement of the set instead
Create a set of 2 tuples (use a set structure that ensures insertion is idempotent)
For each set O(i):
Evaluate all combinations and insert into set of combinations: O(k choose 2)
The worst case complexity for this isn't great, but assuming you have scenarios where a set either contains most of the numbers between 0 and 1,000,000, or almost none of them, you should see a big improvement in performance.
Another approach would be to go ahead and use combinatorics to count the number of combinations from each set, then use some efficient approach to find the number of duplicate combinations among sets. I'm not aware of such an approach, but it is possible it exists.
First lets solve more simple task of counting number of elements not present in your sets. This task can be reworded in more simple form - instead of 100,000 sets you can think about 1 set which contains all your numbers. Then number of elements not present in this set is x = 1000000 - len(set). Now you can use this number x to count number of combinations. With repetitions: x * x, without repetitions: x * (x - 1). So bottom line of my answer is to put all your numbers in one big set and use it's length to find number of combinations using combinatorics.
Update
So above we have a way to find number of combinations where each element in combination is not in any of the sets. But question was to find number of combinations where each combination is not present in any of the sets.
Lets try to solve simpler problem first:
your sets have all numbers in them, none missing
each number is present exactly in one set, no duplicates across sets
How you would construct such combinations over such sets? You would simply pick two elements from different sets and resulting combination would not be in any of the sets. Number of such combinations could be counted using following code (it accepts sizes of the sets):
int count_combinations(vector<int>& buckets) {
int result = 0;
for (int i=0; i < buckets.size(); ++i) {
for (int j=i+1; j < buckets.size(); ++j) {
result += buckets[i] * buckets[j];
}
}
return result;
}
Now let's imagine that some numbers are missing. Then we can just add additional set with those missing numbers to our sets (as a separate set). But we also need to account that given there were n missing numbers there would be n * (n-1) combinations constructed using only these missing numbers. So following code will produce total number of combinations with account to missing numbers:
int missing_numbers = upper_bound - all_numbers.size() - 1;
int missing_combinations = missing_numbers * (missing_numbers - 1);
return missing_combinations + count_combinations(sets, missing_numbers);
Now lets imagine we have a duplicate across two sets: {a, b, c}, {a, d}.
What types of errors they will introduce? Following pairs: {a, a} - repetition, {a, d} - combination which is present in second set.
So how to treat such duplicates? We need to eliminate them completely from all sets. Even single instance of a duplicate will produce combination present in some set. Because we can just pick any element from the set where duplicate was removed and produce such combination (in my example - if we will keep a in first set, then pick d from the second to produce {a, d}, if we will keep a in second set, then pick b or c from the first to produce {a, b} and {a, c}). So duplicates shall be removed.
Update
However we can't simply remove all duplicates, consider this counterexample:
{a, b} {a, c} {d}. If we simply remove a we will acquire {b} {c} {d} and lost information about not-existing combination {a, d}. Consider another counterexample:
{a, b} {a, b, c} {b, d}. If we simply remove duplicates we will acquire {c} {d} and lost information about {a, d}.
Also we can't simply apply such logic to pairs of sets, a simple counter example for numbers < 3: {1, 2} {1} {2}. Here number of missing combinations is 0, but we will incorrectly count in {1, 2} if we will apply duplicates removal to pair of sets. Bottom line is that I can't come up with good technique which will help to correctly handle duplicate elements across sets.
What you can do, depending on memory requirements, is take advantage of the ordering of Set, and iterate over the values smartly. Something like the code below (untested). You'll iterate over all of your sets, and then for each of your sets you'll iterate over their values. For each of these values, you'll check all of the values in the set after them. Our complexity is reduced to the number of sets times the square of their sizes. You can use a variety of methods to keep track of your found/unfound count, but using a set should be fine, since insertion is simply O(log(n)) where n is no more than 499999500000. In theory using a map of sets (mapping based on the first value) could be slightly faster, but in either case the cost is minimal.
long long numMissing(const std::array<std::set<int>, 100000>& sets){
std::set<pair<int, int> > found;
for (const auto& s : sets){
for (const auto& m : s){
const auto &n = m;
for (n++; n != s.cend(); n++){
found.emplace(m, n);
}
}
}
return 499999500000 - found.size();
}
As an option you can build Bloom Filter(s) over your sets.
Before checking against all sets you can quickly lookup at your bloom filter and since it will never produce false negatives you can safely use your pair as its not present in your sets.
Physically storing each possible pair would take too much memory. We have 100k sets and an average set has 10k numbers = 50M pairs = 400MB with int32 (and set<pair<int, int>> needs much more than 8 bytes per element).
My suggestion is based on two ideas:
don't store, only count the missing pairs
use interval set for compact storage and fast set operations (like boost interval set)
The algorithm is still quadratic on the number of elements in the sets but needs much less space.
Algorithm:
Create the union_set of the individual sets.
We also need a data structure, let's call it sets_for_number to answer this question: which sets contain a particular number? For the simplest case this could be unordered_map<int, vector<int>> (vector stores set indices 0..99999)
Also create the inverse sets for each set. Using interval sets this takes only 10k * 2 * sizeof(int) space per set on average.
dynamic_bitset<> union_set = ...; //union of individual sets (can be vector<bool>)
vector<interval_set<int>> inverse_sets = ...; // numbers 1..999999 not contained in each set
int64_t missing_count = 0;
for(int n = 1; n < 1000000; ++n)
// count the missing pairs whose first element is n
if (union_set.count(n) == 0) {
// all pairs are missing
missing_count += (999999 - n);
} else {
// check which second elements are not present
interval_set<int> missing_second_elements = interval_set<int>(n+1, 1000000);
// iterate over all sets containing n
for(int set_idx: sets_for_number.find(n)) {
// operator&= is in-place intersection
missing_second_elements &= inverse_sets[set_idx];
}
// counting the number of pairs (n, m) where m is a number
// that is not present in any of the sets containing n
for(auto interval: missing_second_elements)
missing_count += interval.size()
}
}
If it is possible, have a set of all numbers and remove each of the number when you insert to your array of set. This will have a O(n) space complexity.
Of course if you don't want to have high spec complexity, maybe you can have a range vector. For each element in the vector, you have a pair of numbers which are the start/end of a range.

How do I print out vectors in different order every time

I'm trying to make two vectors. Where vector1 (total1) is containing some strings and vector2(total2) is containing some random unique numbers(that are between 0 and total1.size() - 1)
I want to make a program that print out total1s strings, but in different order every turn. I don't want to use iterators or something because I want to improve my problem solving capacity.
Here is the specific function that crash the program.
for (unsigned i = 0; i < total1.size();)
{
v1 = rand() % total1.size();
for (unsigned s = 0; s < total1.size(); ++s)
{
if (v1 == total2[s])
;
else
{
total2.push_back(v1);
++i;
}
}
}
I'm very grateful for any help that I can get!
Can I suggest you change of algorithm?. Because, even if your current one is correctly implemented ("s", in your code, must go from 0 to total2.size not total1.size and if element is found, break and generate a new random), it has the following drawback: assume vectors of 1.000.000 elements and you are trying the last random number. You have one probability in 1.000.000 of find a random number not previously used. That is a very small amount.Last but one number has a probability of 2 in 1.000.000 also small. In conclusion, your program will loop and expend lots of CPU resources.
Your best alternative is follow #NathanOliver suggestion and look for function std::shuffle. The manual page shows the implementation algorithm, that is what you are looking for.
Another simple algorithm, with some pros and cons, is:
init total2 with sequence 0,1,2,...,n where n is the size total1 - 1
choice two random numbers, i1 and i2, in range [0,n-1].
Swap elements i1 and i2 in total2.
repeat from (2) a fixed number of times "R".
This method allows to known a priori the necessary steps and to control the level of "randomness" of the final vector (bigger R is more random). However, it is far to be good in its randomness quality.
Another method, better in the probabilistic distribution:
fill a list L with number 0,1,2,...size total1-1.
choice a random number i between 0 and the size of list L - 1 .
Store in total2 the i-th element in list L.
Remove this element from L.
repeat from (2) until L is empty.
If you just want to shuffle vector<string> total1, you can do this without using helping vector<int> total2. Here is an implementation based on Fisher–Yates shuffle.
for(int i=n-1; i>=1; i--) {
int j=rand()%(i+1);
swap(total1[j], total1[i]); // your prof might not allow use of swap:)
}
If you must use vector<int> total2 then shuffle it using above algorithm. Next you can use it to create a new vector<string> result from total1 where result[i]=total1[total2[i]].

Iterating through all possible combinations

My objective is to iterate through all combinations of a given amount of 1's and 0's. Say, if I am given the number 5, what would be a sufficiently fast way to list
1110100100,
1011000101, etc.
(Each different combination of 5 1's and 5 0's)
I am attempting to avoid iterating through all possible permutations and checking if 5 1's exist as 2^n is much greater than (n choose n/2). Thanks.
UPDATE
The answer can be calculated efficiently (recurses 10 deep) with:
// call combo() to have calculate(b) called with every valid bitset combo exactly once
combo(int index = 0, int numones = 0) {
static bitset<10> b;
if( index == 10 ) {
calculate(b); // can't have too many zeroes or ones, it so must be 5 zero and 5 one
} else {
if( 10 - numones < 5 ) { // ignore paths with too many zeroes
b[index] = 0;
combo(b, index+1, numones);
}
if( numones < 5 ) { // ignore paths with too many ones
b[index] = 1;
combo(b, index+1, numones++);
}
}
}
(Above code is not tested)
You can transform the problem. If you fix the 1s (or vice versa) then it's simply a matter of where you put the 0s. For 5 1s, there are 5+1 bins, and you want to put 5 elements (0s) in the bins.
This can be solved with a recursion per bin and a loop for each bin (put 0...reaming elements in the bin - except for the last bin, where you have to put all the remaning elements).
Another way to think about it is as a variant of the the string permutation question - just build a vector of length 2n (e.g. 111000) and then use the same algorithm for string permutation to build the result.
Note that the naive algorithm will print duplicate results. However, the algorithm can be easily adapted to ignore such duplicates by keeping a bool array in the recursive function that keeps the values set for the specific bit.

Fast way to pick randomly from a set, with each entry picked only once?

I'm working on a program to solve the n queens problem (the problem of putting n chess queens on an n x n chessboard such that none of them is able to capture any other using the standard chess queen's moves). I am using a heuristic algorithm, and it starts by placing one queen in each row and picking a column randomly out of the columns that are not already occupied. I feel that this step is an opportunity for optimization. Here is the code (in C++):
vector<int> colsleft;
//fills the vector sequentially with integer values
for (int c=0; c < size; c++)
colsleft.push_back(c);
for (int i=0; i < size; i++)
{
vector<int>::iterator randplace = colsleft.begin() + rand()%colsleft.size();
/* chboard is an integer array, with each entry representing a row
and holding the column position of the queen in that row */
chboard[i] = *randplace;
colsleft.erase(randplace);
}
If it is not clear from the code: I start by building a vector containing an integer for each column. Then, for each row, I pick a random entry in the vector, assign its value to that row's entry in chboard[]. I then remove that entry from the vector so it is not available for any other queens.
I'm curious about methods that could use arrays and pointers instead of a vector. Or <list>s? Is there a better way of filling the vector sequentially, other than the for loop? I would love to hear some suggestions!
The following should fulfill your needs:
#include <algorithm>
...
int randplace[size];
for (int i = 0; i < size; i ++)
randplace[i] = i;
random_shuffle(randplace, randplace + size);
You can do the same stuff with vectors, too, if you wish.
Source: http://gethelp.devx.com/techtips/cpp_pro/10min/10min1299.asp
Couple of random answers to some of your questions :):
As far as I know, there's no way to fill an array with consecutive values without iterating over it first. HOWEVER, if you really just need consecutive values, you do not need to fill the array - just use the cell indices as the values: a[0] is 0 and a[100] is 100 - when you get a random number, treat the number as the value.
You can implement the same with a list<> and remove cells you already hit, or...
For better performance, rather than removing cells, why not put an "already used" value in them (like -1) and check for that. Say you get a random number like 73, and a[73] contains -1, you just get a new random number.
Finally, describing item 3 reminded me of a re-hashing function. Perhaps you can implement your algorithm as a hash-table?
Your colsleft.erase(randplace); line is really inefficient, because erasing an element in the middle of the vector requires shifting all the ones after it. A more efficient approach that will satisfy your needs in this case is to simply swap the element with the one at index (size - i - 1) (the element whose index will be outside the range in the next iteration, so we "bring" that element into the middle, and swap the used one out).
And then we don't even need to bother deleting that element -- the end of the array will accumulate the "chosen" elements. And now we've basically implemented an in-place Knuth shuffle.