C++ library method for intersection of two unordered_set - c++

I have two unordered_set and want the intersection of those. I can't find a library function to do that.
Essentially, what I want is this:
unordered_set<int> a = {1, 2, 3};
unordered_set<int> b = {2, 4, 1};
unordered_set<int> c = a.intersect(b); // Should be {1, 2}
I can do something like
unordered_set<int> c;
for (int element : a) {
if (b.count(element) > 0) {
c.insert(element);
}
}
but I think there should be a more convenient way to do that? If there's not, can someone explain why? I know there is set_intersection, but that seems to operate on vectors only?
Thanks

In fact, a loop-based solutions is the best thing you can use with std::unordered_set.
There is an algorithm called std::set_intersection which allows to find an intersection of two sorted ranges:
Constructs a sorted range beginning at d_first consisting of elements
that are found in both sorted ranges [first1, last1) and [first2,
last2).
As you deal with std::unordered_set, you cannot apply this algorithm because there is no guaranteed order for the elements in std::unordered_set.
My advice is to stick with loops as it explicitly says what you want to achieve and has a linear complexity (O(N), where N is a number of elements in the unordered set you traverse with a for loop) which is the best compexity you might achieve.

There is a function from std called set_intersection. However, it would have a very high complexity using it with std::set as input parameter.. A better solution is, create two vectors from those sets and use set_intersection with vectors as input parameters.

Related

unordered set intersection in C++

Here is my code, wondering any ideas to make it faster? My implementation is brute force, which is for any elements in a, try to find if it also in b, if so, put in result set c. Any smarter ideas is appreciated.
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<int> a = {1,2,3,4,5};
std::unordered_set<int> b = {3,4,5,6,7};
std::unordered_set<int> c;
for (auto i = a.begin(); i != a.end(); i++) {
if (b.find(*i) != b.end()) c.insert(*i);
}
for (int v : c) {
std::printf("%d \n", v);
}
}
Asymptotically, your algorithm is as good as it can get.
In practice, I'd add a check to loop over the smaller of the two sets and do lookups in the larger one. Assuming reasonably evenly distributed hashes, a lookup in a std::unoredered_set takes constant time. So this way, you'll be performing fewer such lookups.
You can do it with std::copy_if()
std::copy_if(a.begin(), a.end(), std::inserter(c, c.begin()), [b](const int element){return b.count(element) > 0;} );
Your algorithm is as good as it gets for a unordered set. however if you use a std::set (which uses a binary tree as storage) or even better a sorted std::vector, you can do better. The algorithm should be something like:
get iterators to a.begin() and b.begin()
if the iterators point to equal element add to intersection and increment both iterators.
Otherwise increment the iterator pointing to the smallest value
Go to 2.
Both should be O(n) time but using a normal set should save you from calculating hashes or any performance degradation that arises from hash collisions.
Thanks Angew, why your method is faster? Could you elaborate a bit more?
Well, let me provide you some additional info...
It should be pretty clear that, whichever data structures you use, you will have to iterate over all elements in at least one of those, so you cannot get better than O(n), n being the number of elements in the data structure selected to iterate over. Elementary now is, how fast you can look up the elements in the other structure – with a hash set, which std::unordered_set actually is, this is O(1) – at least if the number of collisions is small enough ("reasonably evenly distributed hashes"); the degenerate case would be all values having the same key...
So far, you get O(n) * O(1) = O(n). But you still the choice: O(n) or O(m), if m is the number of elements in the other set. OK, in complexity calculations, this is the same, we have a linear algorithm anyway, in practice, though, you can spare some hash calculations and look-ups if you choose the set with the lower number of elements...

How do I find the indices of a number along the length of vector C++?

Suppose I have a vector A = {1,1,1,0,0};
Is there any inbuilt function in vector header to find all the indices of vector where A is repeated?
suppose for 1, returning, { 0,1,2 }
for 0, {3,4}
If not, is there any time efficient way to do so?
If not, is there any time efficient way to do so?
Sort your vector and use std::equal_range to find iterators range, then convert them to indexes. If you cannot sort the vector, create vector of indexes, sort it and copy the range from it to result.
If the vector is or can be sorted, then you can use std::equal_range.
http://en.cppreference.com/w/cpp/algorithm/equal_range

What's the FASTEST way to compare vectors in C++?

What is the fastest way to see if two vectors are equal in c++?
I'm trying to find the fastest way to see if any row is equal to any column of a matrix, so element by element comparison and exiting the loop when not equal is not good enough.
Do not reinvent the wheel. You can use std::equal from <algorithm>.
It has the following complexity:
No applications of the corresponding predicate if InputIterator1 and InputIterator2 meet the requirements of random access iterators and last1 - first1 != last2 - first2. Otherwise, at most min(last1 - first1, last2 - first2) applications of the corresponding predicate.
That's what you were looking for.
See the documentation for further details.
As mentioned in the comments, there is a subtle difference between operator== and std::equal: the former doesn't work if types are different (as an example, std::vector<int> and std::vector<double>), the latter does work instead.
I tried to give the most generic solution.
If types are the same, of course operator== works like a charm, as mentioned by #Jarod42.
Simply use operator == of vector:
std::vector<int> v1{1, 2, 3, 4}, v2{1, 2, 3, 4};
bool are_equal = (v1 == v2);
Equality operator ( == ) is overloaded in C++ Vector STL. So you can easily compare those directly like comparing two integers.
To compare row and column of a matrix as you said, use a loop and compare rows and columns directly by ==.

Using for loop and find instead of set_intersection?

This question suggests using std::set_intersection to find the intersection of two arrays. Wouldn't using std::find work just as well?
int a[5] = {1, 2, 3, 4, 5};
int b[5] = {3, 4, 5, 6, 7};
for (int i=0; i<5; i++)
{
if (std::find(b, b+5, a[i])!=b+5)
std::cout << a[i] << std::endl;
}
Does std::set_intersection basically do the same thing? Or maybe it uses a more efficient algorithm? I think the complexity of above is O(n^2) if std::find takes O(n) time.
For all (or at least almost all) of your standard-library-complexity questions, a good reference can answer this for you.
In particular we get that std::find performs At most last - first applications of the predicate (operator< in this case) where first and last define your range to be searched.
We also get that std::set_intersection performs At most 2·(N1+N2-1) comparisons, where N1 = std::distance(first1, last1) and N2 = std::distance(first2, last2).
This means that your loop performs at most N1 * N2 applications of operator<, which in this case is 25. std::set_intersection would use at most 18.
So, the two methods would "work just as well" in the sense that they give the same answer, but std::set_intersection would take less comparisons to do it.
std::set is an ordered collection. There are faster methods (linear) for such collections (think mergesort).
std::set::find takes O(lg n), it's a binary search. So using a for loop together with find takes O(n lg n). std::set_intersection takes linear time: align the two sets to find their intersection (similar to the merge operation of mergesort).

The best practice solution of differenses search of two STL vectors

I have already two STL vectors. For instance:
vector<int> MyList;
MyList.push_back(10);
MyList.push_back(20);
MyList.push_back(30);
MyList.push_back(40);
MyList.push_back(50);
vector<int> MyListSub;
MyListSub.push_back(20);
MyListSub.push_back(30);
MyListSub.push_back(40);
And I want to get the number of elements which is in the MyListSub and isn't in MyList.
For this instance, result is "2"
You can use std::set_difference for this:
std::vector<int> diff;
std::set_difference(MyList.begin(), MyList.end(),
MyListSub.begin(), MyListSub.end(),
std::back_inserter(diff));
As #Jan points out, the vectors have to be sorted. If they are not, use std::sort to sort them:
std::sort(MyList.begin(), MyList.end());
Alternatively you can consider storing your elements in an std::set in the first place, thus they will already be sorted.