Listing specific subsets using STL

Listing specific subsets using STL - c++

Say I have a range of number, say {2,3,4,5}, stored in this order in a std::vector v, and that I want to list all possibles subsets which end with 5 using STL... that is :
2 3 4 5
2 3 5
2 4 5
3 4 5
2 5
3 5
4 5
5
( I hope i don't forget any:) )
I tried using while(next_permutation(v.begin(),v.end())) but didn't come up with the wanted result :)
Does anyone have an idea?
PS : those who have done the archives of google code jam 2010 may recognize this :)

Let's focus on the problem of printing all subsets. As you know, if you have vector of n elements, you'll have 2^n possible subsets. It's not coincidence, that if you have n-bit integer, the maximal stored value is 2^n. If you consider each integer as a vector of bits, then iterating over all possible values will give all possible subsets of bits. Well, we have subsets for free by iterating integer!
Assuming vector has not more than 32 elements (over 4 billion possible subsets!), this piece of code will print all subset of vector v (excluding empty one):
for (uint32_t mask =1; mask < (1<<v.size()); ++mask)
{
std::vector<int>::const_iterator it = v.begin();
for (uint32_t m =mask; m; (m>>=1), ++it)
{
if (m&1) std::cout << *it << " ";
}
std::cout << std::endl;
}
I just create all possible bit masks for size of vector, and iterate through every bit; if it's set, I print appropriate element.
Now applying the rule of ending with some specific number is piece of cake (by checking additional condition while looping through masks). Preferably, if there is only one 5 in your vector, you could swap it to the end and print all subsets of vector without last element.
I'm effectively using std::vector, const_iterator and std::cout, so you might think about it as being solved using STL. If I come up with something more STLish, I'll let you know (well, but how, it's just iterating). You can use this function as a benchmark for your STL solutions though ;-)
EDIT: As pointed out by Jørgen Fogh, it doesn't solve your subset blues if you want to operate on large vectors. Actually, if you would like to print all subsets for 32 elements it would generate terabytes of data. You could use 64-bit integer if you feel limited by constant 32, but you wouldn't even end iterating through all the numbers. If your problem is just answering how many are desired subsets, you definitely need another approach. And STL won't be much helpful also ;-)

As you can use any container I would use std::set because it is next to what we want to represent.
Now your task is to find all subsets ending with 5 so we take our initial set and remove 5 from it.
Now we want to have all subsets of this new set and append 5 to them at the end.
void subsets(std::set<std::set<int>> &sets, std::set<int> initial)
{
if(initial.empty())
return;
sets.insert(initial);//save the current set in the set of sets
std::set<int>::iterator i = initial.begin();
for(; i != initial.end(); i++)//for each item in the set
{
std::set<int> new_set(initial);//copy the set
new_set.erase(new_set.find(*i));//remove the current item
subsets(sets, new_set);//recursion ...
}
}
sets is a set that contains all subsets you want.
initial is the set that you want to have the subsets of.
Finally call this with subsets(all_subsets, initial_list_without_5);
This should create the subsets and finally you can append 5 to all of them. Btw don't forget the empty set :)
Also note that creating and erasing all these sets is not very efficient. If you want it faster the final set should get pointers to sets and new_set should be allocated dynamically...

tomasz describes a solution which will be workable as long as n<=32 although it will be take a very long time to print 2^32 different subsets. Since the bounds for the large dataset are 2 <= n <= 500 generating all the subsets is definitely not the way to go. You need to come up with some clever way to avoid having to generate them. In fact, this is the whole point of the problem.
You can probably find solutions by googling the problem if you want. My hint is that you need to look at the structure of the sets and avoid generating them at all. You should only calculate how many there are.

use permutation to create a vector of vectors. Then use std::partition with a function to sort it into the vectors that end with 5 and those that don't.

Related

What is the cheapest way to sort a permutation in C++?

The problem is:
You have to sort an array in ascending order(permutation: numbers from 1 to N in a random order) using series of swaps. Every swap has a price and there are 5 types of prices. Write a program that sorts the given array for the smallest price.
There are two kinds of prices: priceByValue and priceByIndex. All of the prices of a kind are given in 2 two-dimensional arrays N*N. Example how to access prices:
You want to swap the 2nd and the 5th elements from the permutation with values of 4 and 7. The price for this swap will be priceByValue[4][7] + priceByIndex[2][5].
Indexes of all arrays are counted from 1 (, not from 0) in order to have access to all of the prices (the permutation elements’ values start from 1): priceByIndex[2][5] would actually be priceByIndex[1][4] in code. Moreover, the order of the indexes by which you access prices from the two-dimensional arrays doesn’t matter: priceByIndex[i][j] = priceByIndex[j][i] and priceByIndex[i][i] is always equal to 0. (priceByValue is the same)
Types of prices:
Price[i][j] = 0;
Price[i][j] = random number between 1 and 4*N;
Price[i][j] = |i-j|*6;
Price[i][j] = sqrt(|i-j|) *sqrt(N)*15/4;
Price[i][j] = max(i,j)*3;
When you access prices by index i and j are the indexes of the elements you want to swap from the original array; when you access prices by value i and j are the values of the elements you want to swap from the original array. (And they are always counted from 1)
Things given:
N - an integer from 1 to 400, Mixed array, Type of priceByIndex, priceByIndex matrix, Type of priceByValue, priceByValue matrix. (all elements of a matrix are from the given type)
Things that should 'appear on the screen': number of swaps, all swaps (only by index - 2 5 means that you have swapped 2nd and 3rd elements) and the price.
As I am still learning C++, I was wondering what is the most effective way to sort the array in order to try to find the sort with the smallest cost.
There might be a way how to access series of swaps that result a sorted array and see which one is with the smallest price and I need to sort the array by swapping the elements which are close by both value and index, but I don’t know how to do this. I would be very grateful if someone can give me a solution how to find the cheapest sort in code. Thank you in advance!
More: this problem might have no solution, I am just trying to get a result close to the ideal.

Dynamic Programming!
Think of the problem as a graph. Each of the N-factorial permutations represents a graph vertex, and the allowed swaps are just arcs between vertices. The price-tag of a swap is just the weight on the arc.
When you look at the problem this way, it can be easily solved with Dijkstra's algortihm for finding the lowest cost path through a graph from one vertex to another.
This is also called Single Pair Shortest Path

you can use an algorithm for sorting an array in lexicographical order and modify it so that it fits your needs ( you did not mention the sorting criteria like the desired result i.e. least value first, ... ) there are multiple algorithms available for this, i.e. quick sort,...
a code example is in https://www.geeksforgeeks.org/lexicographic-permutations-of-string/

Optimal data structure (in C++) for random access and looping through elements

I have the following problem: I have a set of N elements (N being somewhere between several hundred and several thousand elements, let's say between 500 and 3000 elements). Out of these elements, small percentage will have some property "X", but the elements "gain" and "lose" this property in a semi-random fashion; so if I store them all in an array, and assign 1 to elements with property X, and zero otherwise, this array of N elements will have n 1's and the N-n zeros (n being small in the 20-50 range).
The problem is the following: these elements change very frequently in a semi-random way (meaning that any element can flip from 0 to 1 and vice versa, but the process that controls that is somewhat stable, so the total number "n" fluctuates a bit, but is reasonably stable in the 20-50 range); and I frequently need all the "X" elements of the set (in other words, indices of the array where value of the array is 1), to perform some task on them.
One simple and slow way to achieve this is to simply loop through the array and if index k has value 1, perform the task, but this is kinda slow because well over 95% of all the elements have value 1. The solution would be to put all the 1s into a different structure (with n elements) and then loop through that structure, instead of looping through all N elements. The question is what's the best structure to use?
Elements will flip from 0 to 1 and vice versa randomly (from several different threads), so there's no order there of any sort (time when element flipped from 0 to 1 is has nothing to do with time it will flip back), and when I loop through them (from another thread), I do not need to loop in any particular order (in other words, I just need to get them all, but it's nor relevant in which order).
Any suggestions what would be the optimal structure for this? "std::map" comes to mind, but since the keys of std::map are sorted (and I don't need that feature), the questions is if there is anything faster?
EDIT: To clarify, the array example is just one (slow) way to solve the problem. The essence of the problem is that out of one big set "S" with "N" elements, there is a continuously changing subset "s" of "n" elements (with n much smaller then N), and I need to loop though that set "s". Speed is of essence, both for adding/removing elements to "s", and for looping through them. So while suggestions like having 2 arrays and moving elements between them would be fast from iteration perspective, adding and removing elements to an array would be prohibitively slow. It sounds like some hash-based approach like std::set would work reasonably fast on both iteration and addition/removal fronts, the question is is there something better than that? Reading the documentation on "unordered_map" and "unordered_set" doesn't really clarify how much faster addition/removal of elements is relative to std::map and std::set, nor how much slower the iteration through them would be. Another thing to keep in mind is that I don't need a generic solution that works best in all cases, I need one that works best when N is in the 500-3000 range, and n is in the 20-50 range. Finally, the speed is really of essence; there are plenty slow ways of doing it, so I'm looking for the fastest way.

Since order doesn't appear to be important, you can use a single array and keep the elements with property X at the front. You will also need an index or iterator to the point in the array that is the transition from X set to unset.
To set X, increment the index/iterator and swap that element with the one you want to change.
To unset X, do the opposite: decrement the index/iterator and swap that element with the one you want to change.
Naturally with multiple threads you will need some sort of mutex to protect the array and index.
Edit: to keep a half-open range as iterators are normally used, you should reverse the order of the operations above: swap, then increment/decrement. If you keep an index instead of an iterator then the index does double duty as the count of the number of X.

N=3000 isn't really much. If you use a single bit for each of them, you have a structure smaller than 400 bytes. You can use std::bitset for that. If you use an unordered_set or a set however be mindful that you'll spend many more bytes for each of the n elements in your list: if you just allocate a pointer for each element in a 64bit architecture you'll use at least 8*50 = 400 bytes, much more than the bitset

#geza : perhaps I misunderstood what you meant by two arrays; I assume you meant something like have one std::vector (or something similar) in which I store all elements with property X, and another where I store the rest? In reality, I don't care about others, so I really need one array. Adding an element is obviously simple if I can just add it to the end of the array; now, correct me if I'm wrong here, but finding an element in that array is O(n) operation (since the array is unsorted), and then removing it from the array again requires shifting all the elements by one place, so this in average requires n/2 operations. If I use linked list instead of vector, then deleting an element is faster, but finding it still takes O(n). That's what I meant when I said it would be prohibitively slow; if I misunderstood you, please do clarify.
It sounds like std::unordered_set or std::unordered_map would be fastest in adding/deleting elements, since it's O(1) to find an element, but it's unclear to me how fast can one loop through all the keys; the documentation clearly states that iteration through keys of std::unordered_map is slower then iteration through keys of std::map, but it's not quantified in any way just how slow is "slower", and how fast is "faster".
And finally, to repeat one more time, I'm not interested in general solution, I'm interested in one for small "n". So if for example I have two solutions, one that's k_1*log(n), and second that's k_2*n^2, first one might be faster in principle (and for large n), but if k_1 >> k_2 (let's say for example k_1 = 1000 and k_2=2 and n=20), second one can still be faster for relatively small "n" (1000*log(20) is still larger than 2*20^2). So even if addition/deletion in std::unordered_map might be done in constant time O(1), for small "n" it still matters if that constant time is 1 nanosecond or 1 microsecond or 1 millisecond. So I'm really looking for suggestions that work best for small "n", not for in the asymptotic limit of large "n".

An alternative approach (in my opinion worth only if the number of element is increased at least tenfold) might be keeping a double index:
#include<algorithm>
#include<vector>
class didx {
// v == indexes[i] && v > 0 <==> flagged[v-1] == i
std::vector<ptrdiff_t> indexes;
std::vector<ptrdiff_t> flagged;
public:
didx(size_t size) : indexes(size) {}
// loop through flagged items using iterators
auto begin() { return flagged.begin(); }
auto end() { return flagged.end(); }
void flag(ptrdiff_t index) {
if(!isflagged(index)) {
flagged.push_back(index);
indexes[index] = flagged.size();
}
}
void unflag(ptrdiff_t index) {
if(isflagged(index)) {
// swap last item with item to be removed in "flagged", update indexes accordingly
// in "flagged" we swap last element with element at index to be removed
auto idx = indexes[index]-1;
auto last_element = flagged.back();
std::swap(flagged.back(),flagged[idx]);
std::swap(indexes[index],indexes[last_element]);
// remove the element, which is now last in "flagged"
flagged.pop_back();
indexes[index] = 0;
}
}
bool isflagged(ptrdiff_t index) {
return indexes[index] > 0;
}
};

How can one populate a vector with all possible string combinations of n vector length in the C++ programming language

I have a vector of 100 strings and another empty vector and I'm trying to fill the empty vector with every possible combination of n strings from the group of 100. (n = 1, 2, 3,...)
If n =1 then you get every unique vector composed of 1 string (or all 100 strings as vectors)
If n =2 then you get every unique vector composed of 2 strings(or 100^2 variations)
C++ is not my native language.
I have some attempts so far, and what I'd do in Zou_script (proprietary in-house) would be to assign each string a number and then permute through all possible combinations of those numbers, and then reference individual strings through Vector[] to create the vectors.
This seems slow and has a lack of elegance, holding the string bank in memory could be bad if the string bank was much bigger.
I have used std::next_permutation but I'm having trouble extending it elegantly to sorting vectors composed of strings.
How can one populate a vector with all possible string combinations of n vector length in the C++ programming language? <- Question.
Might anyone be of any assistance? If you're unsure, or intimidated by the question it's OK to go to the next one.
Update
I have managed to replicate the technique in C++ but next_permutation is significantly slower because it does not understand it does not need to calculate the entire permutation vector, only up to n amount.
Any way to manipulate next_permutation to only calculate x elements of a vector permutation?

I have asked this question but I will do the best that I can to give an answer. This is in fact a well researched and often asked question in C/C++.
"How to consider permutations of a group of N elements, r at a time?"
There are many ways to tackle the problem. One such way is to generate a vector of integers with which to map to a vector of your elements.
Using std::next_permutation you can generate a list of numbers (permutations of the integer vector) and truncate to the amount of items you are considering. This list can then be sorted using vector tools, duplicates removed. This will give you a list of all unique permutations of N integers, r at a time, for mapping to your element vector.
Then it can be as easy as calling the r numbers from your permutation integer list and using them in your element list index to generate the permutations of your elements.
for (int k = 0; k < linecount_of_integer_permutation_list; k++)
{
// insert code for calling up integer permutation list line
// and assigning that permutation to vector
for (int i = 0; i < r; i++)
{
file << element[intvec[r]]; // can put whatever delimiters you want/need
}
file << std::endl;
intvec.clear();
// remember to clear vectors, or other flags depending on what you need
}
This is cumbersome and very slow.
https://howardhinnant.github.io/combinations.html
Has some very good ideas on how to handle this issue faster. The above will work for small sets, however the jump from small to absolutely unmanageable is very quick in permutations.
Thank you for all your help. It is actually an interesting question but apparently it isnt needed for many people's applications in programming.

Find that unique element from the 10^5 array size [duplicate]

This question already has answers here:
How to find the only number in an array that doesn't occur twice [duplicate]
(5 answers)
Closed 7 years ago.
What would be the best algorithm for finding a number that occurs only once in a list which has all other numbers occurring exactly twice.
So, in the list of integers (lets take it as an array) each integer repeats exactly twice, except one. To find that one, what is the best algorithm.

The fastest (O(n)) and most memory efficient (O(1)) way is with the XOR operation.
In C:
int arr[] = {3, 2, 5, 2, 1, 5, 3};
int num = 0, i;
for (i=0; i < 7; i++)
num ^= arr[i];
printf("%i\n", num);
This prints "1", which is the only one that occurs once.
This works because the first time you hit a number it marks the num variable with itself, and the second time it unmarks num with itself (more or less). The only one that remains unmarked is your non-duplicate.

By the way, you can expand on this idea to very quickly find two unique numbers among a list of duplicates.
Let's call the unique numbers a and b. First take the XOR of everything, as Kyle suggested. What we get is a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask -- in more detail: choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that a and b are in different buckets. We also know that each pair of duplicates is still in the same bucket. So we can now apply ye olde "XOR-em-all" trick to each bucket independently, and discover what a and b are completely.
Bam.

O(N) time, O(N) memory
HT= Hash Table
HT.clear()
go over the list in order
for each item you see
if(HT.Contains(item)) -> HT.Remove(item)
else
ht.add(item)
at the end, the item in the HT is the item you are looking for.
Note (credit #Jared Updike): This system will find all Odd instances of items.
comment: I don't see how can people vote up solutions that give you NLogN performance. in which universe is that "better" ?
I am even more shocked you marked the accepted answer s NLogN solution...
I do agree however that if memory is required to be constant, then NLogN would be (so far) the best solution.

Kyle's solution would obviously not catch situations were the data set does not follow the rules. If all numbers were in pairs the algorithm would give a result of zero, the exact same value as if zero would be the only value with single occurance.
If there were multiple single occurance values or triples, the result would be errouness as well.
Testing the data set might well end up with a more costly algorithm, either in memory or time.
Csmba's solution does show some errouness data (no or more then one single occurence value), but not other (quadrouples). Regarding his solution, depending on the implementation of HT, either memory and/or time is more then O(n).
If we cannot be sure about the correctness of the input set, sorting and counting or using a hashtable counting occurances with the integer itself being the hash key would both be feasible.

I would say that using a sorting algorithm and then going through the sorted list to find the number is a good way to do it.
And now the problem is finding "the best" sorting algorithm. There are a lot of sorting algorithms, each of them with its strong and weak points, so this is quite a complicated question. The Wikipedia entry seems like a nice source of info on that.

Implementation in Ruby:
a = [1,2,3,4,123,1,2,.........]
t = a.length-1
for i in 0..t
s = a.index(a[i])+1
b = a[s..t]
w = b.include?a[i]
if w == false
puts a[i]
end
end

You need to specify what you mean by "best" - to some, speed is all that matters and would qualify an answer as "best" - for others, they might forgive a few hundred milliseconds if the solution was more readable.
"Best" is subjective unless you are more specific.
That said:
Iterate through the numbers, for each number search the list for that number and when you reach the number that returns only a 1 for the number of search results, you are done.

Seems like the best you could do is to iterate through the list, for every item add it to a list of "seen" items or else remove it from the "seen" if it's already there, and at the end your list of "seen" items will include the singular element. This is O(n) in regards to time and n in regards to space (in the worst case, it will be much better if the list is sorted).
The fact that they're integers doesn't really factor in, since there's nothing special you can do with adding them up... is there?
Question
I don't understand why the selected answer is "best" by any standard. O(N*lgN) > O(N), and it changes the list (or else creates a copy of it, which is still more expensive in space and time). Am I missing something?

Depends on how large/small/diverse the numbers are though. A radix sort might be applicable which would reduce the sorting time of the O(N log N) solution by a large degree.

The sorting method and the XOR method have the same time complexity. The XOR method is only O(n) if you assume that bitwise XOR of two strings is a constant time operation. This is equivalent to saying that the size of the integers in the array is bounded by a constant. In that case you can use Radix sort to sort the array in O(n).
If the numbers are not bounded, then bitwise XOR takes time O(k) where k is the length of the bit string, and the XOR method takes O(nk). Now again Radix sort will sort the array in time O(nk).

You could simply put the elements in the set into a hash until you find a collision. In ruby, this is a one-liner.
def find_dupe(array)
h={}
array.detect { |e| h[e]||(h[e]=true; false) }
end
So, find_dupe([1,2,3,4,5,1]) would return 1.
This is actually a common "trick" interview question though. It is normally about a list of consecutive integers with one duplicate. In this case the interviewer is often looking for you to use the Gaussian sum of n-integers trick e.g. n*(n+1)/2 subtracted from the actual sum. The textbook answer is something like this.
def find_dupe_for_consecutive_integers(array)
n=array.size-1 # subtract one from array.size because of the dupe
array.sum - n*(n+1)/2
end

What algorithm is best adapt for a non-contiguous Array with Index Grouping?

I need some help writing an algorithm in C/C++ (Although any language example would work). The purpose is a container/array, which allows insertion at any index. However if inserting an element in an index that is not close to an existing index i.e. would cause an large empty space of buckets. Then the array would minimise the empty buckets.
Say you have a set of elements which need to be inserted at the following indexes:
14
54
56
57
12
8
6
5678
A contiguous array would produce a data structure. Something like this:
0
1
2
3
4
5
6 val
7
8 val
9
10
11
12 val
...
However, I'm looking for a solution that creates a new array when an index is not within x buckets of it's nearest neighbour.
Something like this:
Array1
6 val
7
8 val
10
11
12 val
13
14 val
Array2
54 val
56 val
57 val
Array 3
5678 val
Then use some kind of index map to find the array an index is in during a lookup. My question is what kind of algorithm should I be looking at to group the indexes together during inserts? (while still keeping a good space/time trade off)
Edit:
Thanks for the answers so far. The data I'm going to be looking at will contain one or two very large index ranges with no gaps, then one or two very large gaps then possibly a couple of "straggling" single values. Also the data needs to be sorted, so hash tables are out.

Why not just use a hashtable / dictionary? If you really need something this specific, the first thing that comes to mind for me is a B tree. But there's probably much better solutions than that too.

I believe you are looking for a hashmap or more generally a map. You can use the STL provided map class.
This sounds like exactly what you are looking for:
http://www.cplusplus.com/reference/stl/map/

Maybe what you want is a sparse vector? Try the Boost implementation.

You're looking either to use sparse arrays or some sort of hash, depending on circumstances. In general:
If you're going to eventually end up with long runs of filled buckets separated by large gaps, then you're better off with a sparse array, as they optimize memory use well in this situation.
If you're going to just end up with scattered entries in a huge sea of empty holes, you're better off with a hash.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Listing specific subsets using STL - c++

use permutation to create a vector of vectors. Then use std::partition with a function to sort it into the vectors that end with 5 and those that don't.

Related

What is the cheapest way to sort a permutation in C++?

Optimal data structure (in C++) for random access and looping through elements

How can one populate a vector with all possible string combinations of n vector length in the C++ programming language

Find that unique element from the 10^5 array size [duplicate]

What algorithm is best adapt for a non-contiguous Array with Index Grouping?

Categories

Resources