What is the best way to verify if there are common members within multiple vectors?
The vectors aren't necessarily of equal size and they may contain custom data (such as structures containing two integers that represent a 2D coordinate).
For example:
vec1 = {(1,2); (3,1); (2,2)};
vec2 = {(3,4); (1,2)};
How to verify that both vectors have a common member?
Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.
For non-trivial data sets, the most efficient method is probably to sort both vectors, and then use std::set_intersection function defined in , like follows:
#include <vector>
#include <algorithm>
using namespace std;
typedef vector<pair<int, int>> tPointVector;
tPointVector vec1 {{1,2}, {3,1}, {2,2}};
tPointVector vec2 {{3,4}, {1,2}};
std::sort(begin(vec1), end(vec1));
std::sort(begin(vec2), end(vec2));
tPointVector vec3;
vec3.reserve(std::min(vec1.size(), vec2.size()));
set_intersection(begin(vec1), end(vec1), begin(vec2), end(vec2), back_inserter(vec3));
You may get better performance with a nonstandard algorithm if you do not need to know which elements are different, but only the number of common elements, because then you can avoid having to create new copies of the common elements.
In any case, it seems to me that starting by sorting both containers will give you the best performance for data sets with more than a few dozen elements.
Here's an attempt at writing an algorithm that just gives you the count of matching elements (untested):
auto it1 = begin(vec1);
auto it2 = begin(vec2);
const auto end1 = end(vec1);
const auto end2 = end(vec2);
sort(it1, end1);
sort(it2, end2);
size_t numCommonElements = 0;
while (it1 != end1 && it2 != end2) {
bool oneIsSmaller = *it1 < *it2;
if (oneIsSmaller) {
it1 = lower_bound(it1, end1, *it2);
} else {
bool twoIsSmaller = *it2 < *it1;
if (twoIsSmaller) {
it2 = lower_bound(it2, end2, *it1);
} else {
// none of the elements is smaller than the other
// so it's a match
Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.
You need to go through all elements at least once, I assume you're implying you don't want to check every combinations. Indeed you don't want to do :
for all elements in vec1, go through the entire vec2 to check if the element is here. This won't be efficient if your vectors have a big number of elements.
If you prefer a linear time solution and you don't mind using extra memory here is what you can do :
You need a hashing function to insert element in an unordered_map or unordered_set
See https://stackoverflow.com/a/13486174/2502814
// next_permutation example
#include <iostream> // std::cout
#include <unordered_set> // std::unordered_set
#include <vector> // std::vector
using namespace std;
namespace std {
template <>
struct hash<pair<int, int>>
typedef pair<int, int> argument_type;
typedef std::size_t result_type;
result_type operator()(const pair<int, int> & t) const
std::hash<int> int_hash;
return int_hash(t.first + 6495227 * t.second);
int main () {
vector<pair<int, int>> vec1 {{1,2}, {3,1}, {2,2}};
vector<pair<int, int>> vec2 {{3,4}, {1,2}};
// Copy all elements from vec2 into an unordered_set
unordered_set<pair<int, int>> in_vec2;
// Traverse vec1 and check if elements are here
for (auto& e : vec1)
if(in_vec2.find(e) != in_vec2.end()) // Searching in an unordered_set is faster than going through all elements of vec2 when vec2 is big.
//Here are the elements in common:
cout << "{" << e.first << "," << e.second << "} is in common!" << endl;
return 0;
Output : {1,2} is in common!
You can either do that, or copy all elements of vec1 into an unordered_set, and then traverse vec2.
Depending on the sizes of vec1 and vec2, one solution might be faster than the other.
Keep in mind that picking the smaller vector to insert in the unordered_set also means you will use less extra memory.
I believe you use a 2D tree to search in 2 dimenstions. An optimal algorithm to the problem you specified would fall under the class of geometric algorithms. Maybe this link is of use to you: http://www.cs.princeton.edu/courses/archive/fall05/cos226/lectures/geosearch.pdf .
The problem: I need to sort a vector of strings in exact specific order. Let say we have a constant vector or a array with the exact order:
vector<string> correctOrder = {"Item3", "Item1", "Item5", "Item4", "Item2"};
Next, we have a dynamic incoming vector which will have same Items, but they maybe mixed and less in number.
vector<string> incommingVector = {"Item1", "Item5", "Item3"};
So I need to sort the incomming vector with the order like the first vector, correctOrder, and the result must be:
vector<string> sortedVector = {"Item3", "Item1", "Item5"};
I think the correct order may be represented in a different way, but can't figure out.
Can someone help me please?
If the default comparison is not enough (lexicographic comparison) then the simplest thing you can do is to provide the sort function with a lambda that tells it which string come first.
You can have a unordered_map<string,int> with the strings in your correctorder vector as keys and their corresponding position in the sorted array as values.
The cmp function will simply compare the values of the keys you provide in your incommingVector.
unordered_map<string, int> my_map;
for(int i = 0 ; i < correctorder.size() ; i++)
auto cmp =[&my_map](const string& s, const string& s1){
return my_map[s] < my_map[s1];
sort(incommingVector.begin(), incommingVector.end() , cmp);
You can create your own functor to sort your vector in template vector order as explained by below code :
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
struct MyComparator
//static const int x = 9;
const std::vector<std::string> correctOrder{"Item1", "Item2", "Item3", "Item4", "Item5"};
bool operator() (const std::string& first,const std::string& second )
auto firstitr = std::find(correctOrder.begin(),correctOrder.end(),first);
auto seconditr = std::find(correctOrder.begin(),correctOrder.end(),second);
return firstitr < seconditr;
void printVector(const std::vector<std::string>& input)
for(const auto&elem:input)
std::cout<<elem<<" , ";
int main()
std::vector<string> incomingVector = {"Item3", "Item5", "Item1"};
std::cout<<"vector before sort... "<<std::endl;
std::cout<<"vector after sort...."<<std::endl;
return 0;
You can take advantage of std::unordered_map<std::string, int>, i.e., a hash table for mapping a string into an integer in constant time. You can use it for finding out the position that a given string occupies in your vector correctOrder in O(1), so that you can compare two strings that are in the vector incomming in constant time.
Consider the following function sort_incomming_vector():
#include <unordered_map>
using Vector = std::vector<std::string>;
void sort_incomming_vector(const Vector& correctOrder /*N*/, Vector& incomming /*M*/)
std::unordered_map<std::string, int> order;
// populate the order hash table in O(N) time
for (size_t i = 0; i < correctOrder.size(); ++i)
order[correctOrder[i]] = i;
// sort "incomming" in O(M*log M) time
std::sort(incomming.begin(), incomming.end(),
[&order](const auto& a, const auto& b) { // sorting criterion
return order[a] < order[b];
The hash table order maps the strings into integers, and this resulting integer is used by the lambda (i.e., the sorting criterion) passed to the sorting algorithm, std::sort, to compare a pair strings in the vector incomming, so that the sorting algorithm can permute them accordingly.
If correctOder contains N elements, and incomming contains M elements, then the hash table can be initialised in O(N) time, and incomming can be sorted in O(M*log M) time. Therefore, the whole algorithm will run in O(N + M*log M) time.
If N is much larger than M, this solution is optimal, since the dominant term will be N, i.e., O(N + M*log M) ~ O(N).
You need to create a comparison function that returns the correct ordering and pass that to std::sort. To do that, you can write a reusable function that returns a lambda that compares the result of trying to std::find the two elements being compared. std::find returns iterators, and you can compare those with the < operator.
#include <algorithm>
std::vector<std::string> correctOrder = {"Item1", "Item2", "Item3", "Item4", "Item5"};
// Could be just std::string correctOrder[], or std::array<...> etc.
// Returns a sorter that orders elements based on the order given by the iterator pair
// (so it supports not just std::vector<string> but other containers too.
template <typename ReferenceIter>
auto ordered_sorter(ReferenceIter ref_begin, ReferenceIter ref_end) {
// Note: you can build an std::unordered_map<ReferenceIter::value_type, std::size_t> to
// be more efficient and compare map.find(left)->second with
// map.find(right)->second (after you make sure the find does not return a
// one-past-the-end iterator.
return [&](const auto& left, const auto& right) {
return std::find(ref_begin, ref_end, left) < std::find(ref_begin, ref_end, right);
int main() {
using namespace std;
vector<string> v{"Item3", "Item5", "Item1"};
// Pass the ordered_sorter to std::sort
std::sort(v.begin(), v.end(), ordered_sorter(std::begin(correctOrder), std::end(correctOrder)));
for (const auto& s : v)
std::cout << s << ", "; // "Item1, Item3, Item5, "
Note that this answer less efficient with a large number of elements, but more simpler than the solutions using an std::unordered_map<std::string, int> for lookup, but a linear search is probably faster for small number of elements. Do your benchmarking if performance matters.
Edit: If you don't want the default comparison to be used, then you need to pass as a third parameter your custom compare method, as shown in the example that exists in the linked reference.
Use std::sort and you are done:
#include <iostream> // std::cout
#include <algorithm> // std::sort
#include <vector> // std::vector
#include <string> // std::string
using namespace std;
int main () {
vector<string> incommingVector = {"Item3", "Item5", "Item1"};
// using default comparison (operator <):
std::sort (incommingVector.begin(), incommingVector.end());
// print out content:
std::cout << "incommingVector contains:";
for (std::vector<string>::iterator it=incommingVector.begin(); it!=incommingVector.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
return 0;
incommingVector contains: Item1 Item3 Item5
Say I have a std::vector. Say the vectors contain numbers. Let's take this std::vector
std::sort<std::less<int>> will sort this into
How would I ammend sort so that at the same time it is sorting, it also computes the quantity of numbers at the same level. So say in addition to sorting, it would also compile the following dictionary [level is also int]
std::map<level, int>
<1, 2>
<2, 3>
<3, 2>
<4, 2>
<5, 1>
<6, 1>
so there are 2 1's, 3 3's, 2 4's, and so on.
The reason I [think] I need this is because I don't want to sort the vector, THEN once again, compute the number of duplicates at each level. It seems faster to do it both in one pass?
Thank you all! bjskishore123 is the closest thing to what I was asking, but all the responses educated me. Thanks again.
As stated by #bjskishore123, you can use a map to guarantee the correct order of your set. As a bonus, you will have an optimized strucutre to search (the map, of course).
Inserting/searching in a map takes O(log(n)) time, while traversing the vector is O(n). So, the alghorithm is O(n*log(n)). Wich is the same complexity as any sort algorithm that needs to compare elements: merge sort or quick sort, for example.
Here is a sample code for you:
int tmp[] = {5,5,5,5,5,5,2,2,2,2,7,7,7,7,1,1,1,1,6,6,6,2,2,2,8,8,8,5,5};
std::vector<int> values(tmp, tmp + sizeof(tmp) / sizeof(tmp[0]));
std::map<int, int> map_values;
for_each(values.begin(), values.end(), [&](int value)
for(std::map<int, int>::iterator it = map_values.begin(); it != map_values.end(); it++)
std::cout << it->first << ": " << it->second << "times";
1: 4times
2: 7times
5: 8times
6: 3times
7: 4times
8: 3times
I don't think you can do this in one pass. Let's say you provide your own custom comparator for sorting which somehow tries to count the duplicates.
However the only thing you can capture in the sorter is the value(maybe reference but doesn't matter) of the current two elements being compared. You have no other information because std::sort doesn't pass any thing else to the sorter.
Now the way std::sort works it will keep swapping elements until they reach the proper location in the sorted vector. That means a single member can be sent to the sorter multiple times making it impossible to count exactly. You can count how many times a certain element and all others value equal to it have been moved but not exactly how many of them are in there.
Instead of using a vector,
While storing number one by one, Use std::multiset container
It stores internally in sorted order.
While storing each number, use a map to keep track of the number of occurrences of each number.
map<int, int> m;
Each time a number is added do
So, no need of another pass to calculate the number of occurrences, although you need to iterate in map to get each occurrence count.
Below code makes use of comparison function to count the occurrences.
#include <iostream>
#include <map>
#include <vector>
#include <algorithm>
using namespace std;
struct Elem
int index;
int num;
std::map<int, int> countMap; //Count map
std::map<int, bool> visitedMap;
bool compare(Elem a, Elem b)
if(visitedMap[a.index] == false)
visitedMap[a.index] = true;
if(visitedMap[b.index] == false)
visitedMap[b.index] = true;
return a.num < b.num;
int main()
vector<Elem> v;
Elem e[5] = {{0, 10}, {1, 20}, {2, 30}, {3, 10}, {4, 20} };
for(size_t i = 0; i < 5; i++)
std::sort(v.begin(), v.end(), compare);
for(map<int, int>::iterator it = countMap.begin(); it != countMap.end(); it++)
cout<<"Element : "<<it->first<<" occurred "<<it->second<<" times"<<endl;
Element : 10 occurred 2 times
Element : 20 occurred 2 times
Element : 30 occurred 1 times
If you have lots of duplicates, the fastest way to accomplish this task is probably to first count duplicates using a hash map, which is O(n), and then to sort the map, which is O(m log m) where m is the number of unique values.
Something like this (in c++11):
#include <algorithm>
#include <unordered_map>
#include <utility>
#include <vector>
std::vector<std::pair<int, int>> uniqsort(const std::vector<int>& v) {
std::unordered_map<int, int> count;
for (auto& val : v) ++count[val];
std::vector<std::pair<int, int>> result(count.begin(), count.end());
std::sort(result.begin(), result.end());
return result;
There are lots of variations on the theme, depending on what you need, precisely. For example, perhaps you don't even need the result to be sorted; maybe it's enough to just have the count map. Or maybe you would prefer the result to be a sorted map from int to int, in which case you could just build a regular std::map, instead. (That would be O(n log m).) Or maybe you know something about the values which make them faster to sort (like the fact that they are small integers in a known range.) And so on.
I have a std::map with both key and value as integers. Now I want to randomly shuffle the map, so keys point to a different value at random. I tried random_shuffle but it doesn't compile. Note that I am not trying to shuffle the keys, which makes no sense for a map. I'm trying to randomise the values.
I could push the values into a vector, shuffle that and then copy back. Is there a better way?
You can push all the keys in a vector, shuffle the vector and use it to swap the values in the map.
Here is an example:
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <algorithm>
#include <random>
#include <ctime>
using namespace std;
int myrandom (int i) { return std::rand()%i;}
int main ()
map<int,string> m;
vector<int> v;
for(int i=0; i<10; i++)
for(auto i: m)
cout << i.first << ":" << i.second << endl;
random_shuffle(v.begin(), v.end(),myrandom);
vector<int>::iterator it=v.begin();
cout << endl;
for(auto& i:m)
string ts=i.second;
for(auto i: m)
cout << i.first << ":" << i.second << endl;
return 0;
The complexity of your proposal is O(N), (both the copies and the shuffle have linear complexity) which seems optimal (looking at less elements would introduce non-randomness into your shuffle).
If you want to repeatedly shuffle your data, you could maintain a map of type <Key, size_t> (i.e. the proverbial level of indirection) that indexes into a std::vector<Value> and then just shuffle that vector repeatedly. That saves you all the copying in exchange for O(N) space overhead. If the Value type itself is expensive, you have an extra vector<size_t> of indices into the real data on which you do the shuffling.
For convenience sake, you could encapsulate the map and vector inside one class that exposes a shuffle() member function. Such a wrapper would also need to expose the basic lookup / insertion / erase functionality of the underyling map.
EDIT: As pointed out by #tmyklebu in the comments, maintaining (raw or smart) pointers to secondary data can be subject to iterator invalidation (e.g. when inserting new elements at the end that causes the vector's capacity to be resized). Using indices instead of pointers solves the "insertion at the end" problem. But when writing the wrapper class you need to make sure that insertions of new key-value pairs never cause "insertions in the middle" for your secondary data because that would also invalidate the indices. A more robust library solution would be to use Boost.MultiIndex, which is specifically designed to allow multiple types of view over a data structure.
Well, with only using the map i think of that:
make a flag array for each cell of the map, randomly generate two integers s.t. 0<=i, j < size of map; swap them and mark these cells as swapped. iterate for all.
EDIT: the array is allocate by the size of the map, and is a local array.
I doubt it...
But... Why not write a quick class that has 2 vectors in. A sorted std::vector of keys and a std::random_shuffled std::vector of values? Lookup the key using std::lower_bound and use std::distance and std::advance to get the value. Easy!
Without thinking too deeply, this should have similar complexity to std::map and possibly better locality of reference.
Some untested and unfinished code to get you started.
template <class Key, class T>
class random_map
T& at(Key const& key);
void shuffle();
std::vector<Key> d_keys; // Hold the keys of the *map*; MUST be sorted.
std::vector<T> d_values;
template <class Key, class T>
T& random_map<Key, T>::at(Key const& key)
auto lb = std::lower_bound(d_keys.begin(), d_keys.end(), key);
if(key < *lb) {
throw std::out_of_range();
auto delta = std::difference(d_keys.begin(), lb);
auto it = std::advance(d_values.begin(), lb);
return *it;
template <class Key, class T>
void random_map<Key, T>::shuffle()
random_shuffle(d_keys.begin(), d_keys.end());
If you want to shuffle the map in place, you can implement your own version of random_shuffle for your map. The solution still requires placing the keys into a vector, which is done below using transform:
typedef std::map<int, std::string> map_type;
map_type m;
m[10] = "hello";
m[20] = "world";
m[30] = "!";
std::vector<map_type::key_type> v(m.size());
std::transform(m.begin(), m.end(), v.begin(),
[](const map_type::value_type &x){
return x.first;
auto n = m.size();
for (auto i = n-1; i > 0; --i) {
map_type::size_type r = drand48() * (i+1);
std::swap(m[v[i]], m[v[r]]);
I used drand48()/srand48() for a uniform pseudo random number generator, but you can use whatever is best for you.
Alternatively, you can shuffle v, and then rebuild the map, such as:
std::random_shuffle(v.begin(), v.end());
map_type m2 = m;
int i = 0;
for (auto &x : m) {
x.second = m2[v[i++]];
But, I wanted to illustrate that implementing shuffle on the map in place isn't overly burdensome.
Here is my solution using std::reference_wrapper of C++11.
First, let's make a version of std::random_shuffle that shuffles references. It is a small modification of version 1 from here: using the get method to get to the referenced values.
template< class RandomIt >
void shuffleRefs( RandomIt first, RandomIt last ) {
typename std::iterator_traits<RandomIt>::difference_type i, n;
n = last - first;
for (i = n-1; i > 0; --i) {
using std::swap;
swap(first[i].get(), first[std::rand() % (i+1)].get());
Now it's easy:
template <class MapType>
void shuffleMap(MapType &map) {
std::vector<std::reference_wrapper<typename MapType::mapped_type>> v;
for (auto &el : map) v.push_back(std::ref(el.second));
shuffleRefs(v.begin(), v.end());
There are 2 unsorted vectors of int and vector of pairs int, int
std::vector <int> v1;
std::vector <std::pair<int, float> > v2;
containing millions of items.
How to remove as fast as possible such items from v1, that are unique to v2.first (ie not included in v2.first)?
v1: 5 3 2 4 7 8
v2: {2,8} {7,10} {5,0} {8,9}
v1: 3 4
There are two tricks I would use to do this as quickly as possible:
Use some sort of associative container (probably std::unordered_set) to store all of the integers in the second vector to make it dramatically more efficient to look up whether some integer in the first vector should be removed.
Optimize the way in which you delete elements from the initial vector.
More concretely, I'd do the following. Begin by creating a std::unordered_set and adding all of the integers that are the first integer in the pair from the second vector. This gives (expected) O(1) lookup time to check whether or not a specific int exists in the set.
Now that you've done that, use the std::remove_if algorithm to delete everything from the original vector that exists in the hash table. You can use a lambda to do this:
std::unordered_set<int> toRemove = /* ... */
v1.erase(std::remove_if(v1.begin(), v1.end(), [&toRemove] (int x) -> bool {
return toRemove.find(x) != toRemove.end();
}, v1.end());
This first step of storing everything in the unordered_set takes expected O(n) time. The second step does a total of expected O(n) work by bunching all the deletes up to the end and making lookups take small time. This gives a total of expected O(n)-time, O(n) space for the entire process.
If you are allowed to sort the second vector (the pairs), then you could alternatively do this in O(n log n) worst-case time, O(log n) worst-case space by sorting the vector by the key, then using std::binary_search to check whether a particular int from the first vector should be eliminated or not. Each binary search takes O(log n) time, so the total time required is O(n log n) for the sorting, O(log n) time per element in the first vector (for a total of O(n log n)), and O(n) time for the deletion, giving a total of O(n log n).
Hope this helps!
Assuming that neither container is sorted and that sorting is actually too expensive or memory is scarce:
v1.erase(std::remove_if(v1.begin(), v1.end(),
[&v2](int i) {
return std::find_if(v2.begin(), v2.end(),
[](const std::pair<int, float>& p) {
return p.first == i; })
!= v2.end() }), v1.end());
Alternatively sort v2 on first and use a binary search instead. If there is enough memory use an unordered_set to sort the first of v2.
Complete C++03 version:
#include <iostream>
#include <vector>
#include <utility>
#include <algorithm>
struct find_func {
find_func(int i) : i(i) {}
int i;
bool operator()(const std::pair<int, float>& p) {
return p.first == i;
struct remove_func {
remove_func(std::vector< std::pair<int, float> >* v2)
: v2(v2) {}
std::vector< std::pair<int, float> >* v2;
bool operator()(int i) {
return std::find_if(v2->begin(), v2->end(), find_func(i)) != v2->end();
int main()
// c++11 here
std::vector<int> v1 = {5, 3, 2, 4, 7, 8};
std::vector< std::pair<int, float> > v2 = {{2,8}, {7,10}, {5,0}, {8,9}};
v1.erase(std::remove_if(v1.begin(), v1.end(), remove_func(&v2)), v1.end());
// and here
for(auto x : v1) {
std::cout << x << std::endl;
return 0;
I want to create a structure that holds distinct strings and assign to each one of them some (not one unique) int values. After I have filled that structure, I want to check for each string how many different int have been assigned to and which exactly are they. I know that it is possible to tackle this with a multimap. However I am not sure if (or how) it is possible to get all the distinct strings contained to the multimap, since the function “find” requires a parameter for matching, while I do not know when searching which distinct values could be in the multimap. How could this be done with a multimap?
As an alternative solution I tried to use a simple map with a vector as value. However I still cannot make this work because the iterator of the vector does not seem to be recognized and it indicates me : iterator must a have a pointer to class type.
map<string, vector<int>>::iterator multit;
int candID1, candID2, candID3;
for(multit=Freq.begin(); multit!=Freq.end(); multit++)
vector<int> vectorWithIds = (*multit).second;
for(vector<int>::iterator it = vectorWithIds.begin();
it != vectorWithIds.end();it++)
candID1 = it-> Problem: The iterator is not recognized
Could anyone detect the problem? Is there an attainable solution, either on the first or the second way?
What is it->? It's vector if ints, you probably want *it.
P.S. I have to admit I haven't read the whole prose.
I suggest a multimap<string, int>. Assuming I understood your requirements correctly, with you having "unique" strings and several different values for them. You could use count(key) to see how many values there are for a key and equal_range(key) which returns a pair<interator, iterator> with the first iterator pointing to the start of the range of values for a key and second iterator pointing past the value for key.
See reference
Ok, this is totaly not efficient at all, but you can use std::set initialized with you std::vector to extract only the unique values of std::vector, like in this example:
#include <iostream>
#include <vector>
#include <map>
#include <set>
int main() {
// some data
std::string keys[] = {"first", "second", "third"};
int values[] = {1, 2, 1, 3, 4, 2, 2, 4, 9};
// initial data structures
std::vector<std::string> words(keys, keys + sizeof(keys) / sizeof(std::string));
std::vector<int> numbers(values, values + sizeof(values) / sizeof(int));
// THE map
std::map< std::string, std::vector<int> > dict;
// inserting data into the map
std::vector<std::string>::iterator itr;
for(itr = words.begin(); itr != words.end(); itr++) {
dict.insert(std::pair< std::string, std::vector<int> > (*itr, numbers));
} // for
// count unique values for the key of std::map<std::string, std::vector<int> >
std::map<std::string, std::vector<int> >::iterator mtr;
for(mtr = dict.begin(); mtr != dict.end(); mtr++) {
std::set<int> unique((*mtr).second.begin(), (*mtr).second.end());
std::cout << unique.size() << std::endl;
} // for
return 0;
} // main