Sorting by struct data inside unordered_map - c++

I have an std::unordered_map<id, town_data> data, where town_data is a struct of different information - name (string), taxes collected (int) and distance from capital town (int). I'm supposed to build a std::vector<id>, which is sorted by beforementioned distance, lowest to high. I'm quite struggling to figure out how can this be done efficiently. I suppose I could simply loop through the data, create std::map<distance, id> through that loop/insertion, sort it by distance unless maps are sorted by default, and copy it key by key to new vector, std::vector<id>. But this seems really wasteful approach. Am I missing some shortcut or more efficient solution here?

You could create a std::vector of iterators into the map and then sort the iterators according to your sorting criteria. After sorting, you could transform the result into a std::vector<id>.
Create a std::vector of iterators:
std::vector<decltype(data)::iterator> its;
its.reserve(data.size());
for(auto it = data.begin(); it != data.end(); ++it)
its.push_back(it);
Sort that std::vector:
#include <algorithm> // std::sort, std::transform
std::sort(its.begin(), its.end(),
[](auto& lhs, auto&rhs) {
return lhs->second.distance < rhs->second.distance;
});
And finally, transform it into a std::vector<id>:
#include <iterator> // std::back_inserter
std::vector<id> vec;
vec.reserve(its.size());
std::transform(its.begin(), its.end(), std::back_inserter(vec),
[](auto it) {
return it->first;
});

I think that the vector of id can be sorted directly as below,
std::vector<decltype(decltype(data)::value_type::second_type::id)> vec;
vec.reserve(data.size());
for(auto it = data.begin(); it != data.end(); ++it)
vec.push_back(it->second.id);
std::sort(vec.begin(), vec.end(), [&data](auto lhs, auto rhs) { return data[lhs].distance < data[rhs].distance; });
I'm wondering if is this more efficient than sorting a vector of town_data, isn't it?

Related

Erasing many vector elements while going through it with 'auto'

Let's say that I have vector of pairs, where each pair corresponds to indexes (row and column) of certain matrix I am working on
using namespace std;
vector<pair<int, int>> vec;
I wanted to, using auto, go through the whole vector and delete at once all the pairs that fulfill certain conditions, for example something like
for (auto& x : vec) {
if (x.first == x.second) {
vec.erase(x);
}
}
but it doesn't work, as I suppose vec.erase() should have an iterator as an argument and x is actually a pair that is an element of vector vec, not iterator. I tried to modify it in few ways, but I am not sure how going through container elements with auto exactly works and how can I fix this.
Can I easily modify the code above to make it work and to erase multiple elements of vector, while going through it with auto? Or I should modify my approach?
For now it's just a vector of pairs, but it will be much worse later on, so I would like to use auto for simplicity.
vector::erase() invalidates any outstanding iterators, including the one your range based for loop is using. Use std::remove_if():
vec.erase(
std::remove_if(
vec.begin(),
vec.end(),
[](const pair<int,int> &xx) { return xx.first == xx.second; }
), vec.end()
);
std::remove_if() swaps the elements to the end of the vector and then you can safely erase them.
I would prefer something like this:
pair<int, int> pair = nullptr;
auto iter = vec.begin();
while(iter != vec.end()){
pair = (*iter);
if(pair.first == pair.second){
iter = this->vec.erase(iter);
}else{
++iter;
}
}

Find all matching elements in std::list

I was wondering if there's any built-in or well-established way (i.e. via lambda) to go through the elements of an std::list and find all the ones that match a given value? I know I can iterate through all of them, but I thought I'd ask if there's a way to get an iterator that iterates through just the elements that match a given criteria? My sample below only gives me the iterator to the first matching element.
#include <list>
#include <algorithm>
#include <stdio.h>
int main()
{
std::list<int> List;
List.push_back(100);
List.push_back(200);
List.push_back(300);
List.push_back(100);
int findValue = 100;
auto it = std::find_if(List.begin(), List.end(), [findValue](const int value)
{
return (value == findValue);
});
if (it != List.end())
{
for (; it != List.end(); ++it)
{
printf("%d\n", * it);
}
}
return 0;
}
Thanks for any feedback.
Updated answer
With the advent of C++20 just around the corner, the standard library has now introduced the concept of ranges which comes with view adapters and are simply lazy views over collections and their transformations.
This means you can now have an "iterator" which can be used to obtain a filtered and transformed view of an underlying container/collection, without having to create several iterators or even allocate memory.
Having said that, this is a way to create a view over just the filtered elements of your list:
// List is your std::list
auto matching_100 = List | std::views::filter([](auto &v) {
return v == 100;
});
How sweet is that? All you need to use all that?
#include <ranges>
Try it out
Previous answer
Using copy_if and iterators:
#include <list>
#include <algorithm>
#include <iterator>
#include <iostream>
int main()
{
std::list<int> List;
List.push_back(100);
List.push_back(200);
List.push_back(300);
List.push_back(100);
int findValue = 100;
std::copy_if(List.begin(), List.end(), std::ostream_iterator<int>(std::cout, "\n"), [&](int v) {
return v == findValue;
});
return 0;
}
If you don't want to directly output the results and want to fill another container with the matches:
std::vector<int> matches;
std::copy_if(List.begin(), List.end(), std::back_inserter(matches), [&](int v) {
return v == findValue;
});
boost::filter_iterator allows you to work with only the elements of a iterable that satisfy a predicate. Given a predicate Pred and a container Cont,
auto begin_iter = boost::make_filter_iterator(Pred, std::begin(Cont), std::end(Cont));
auto end_iter = boost::make_filter_iterator(Pred, std::end(Cont), std::end(Cont));
You can now use begin_iter and end_iter as if they were the begin and end iterators of a container containing only those elements of Cont that satisfied Pred. Another added advantage is that you can wrap the iterators in a boost::iterator_range and use it in places which expect a iterable object, like a range-based for loop like this:
auto range = boost::make_iterator_range(begin_iter, end_iter);
for(auto x : range) do_something(x);
In particular, setting Pred to a functor(could be a lambda) that checks for equality with your fixed value will give you the iterators you need.
std::find_if is a generalisation of std::find for when you need a function to check for the elements you want, rather than a simple test for equality. If you just want to do a simple test for equality then there's no need for the generalised form, and the lambda just adds complexity and verbosity. Just use std::find(begin, end, findValue) instead:
std::vector<std::list<int>::const_iterator> matches;
auto i = list.begin(), end = list.end();
while (i != end)
{
i = std::find(i, end, findValue);
if (i != end)
matches.push_back(i++);
}
But rather than calling find in a loop I'd just write the loop manually:
std::vector<std::list<int>::const_iterator> matches;
for (auto i = list.begin(), toofar = list.end(); i != toofar; ++i)
if (*i == findValue)
matches.push_back(i);
std::partition lets you simply move all elements matching the predicate to the front of the container (first partition). The return value is an iterator pointing to the first element of the second partition (containing the non matching elements). That's pretty much all you need to "filter" a container.

remove elements from `map` that are not in `set`

std::map<std::string, Obj> myMap;
std::set<std::string> mySet;
I want to remove those pairs from myMap which keys are not in mySet.
How do I do it? I found std::remove_if algorithm, but it seems to not be applicable here.
I'd start with this simple approach:
for (auto it = myMap.begin(); it != myMap.end(); )
{
if (mySet.find(it->first) == mySet.end()) { myMap.erase(it++); }
else { ++it; }
}
If you want something more efficient, you could iterate both containers in lockstep and do key-wise comparisons to take advantage of the compatible element order. On the other hand, the present algorithm works even on unordered containers, and given that your keys are strings, unordered containers may have a better performance anyway.

Check for common members in vector c++

What is the best way to verify if there are common members within multiple vectors?
The vectors aren't necessarily of equal size and they may contain custom data (such as structures containing two integers that represent a 2D coordinate).
For example:
vec1 = {(1,2); (3,1); (2,2)};
vec2 = {(3,4); (1,2)};
How to verify that both vectors have a common member?
Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.
For non-trivial data sets, the most efficient method is probably to sort both vectors, and then use std::set_intersection function defined in , like follows:
#include <vector>
#include <algorithm>
using namespace std;
typedef vector<pair<int, int>> tPointVector;
tPointVector vec1 {{1,2}, {3,1}, {2,2}};
tPointVector vec2 {{3,4}, {1,2}};
std::sort(begin(vec1), end(vec1));
std::sort(begin(vec2), end(vec2));
tPointVector vec3;
vec3.reserve(std::min(vec1.size(), vec2.size()));
set_intersection(begin(vec1), end(vec1), begin(vec2), end(vec2), back_inserter(vec3));
You may get better performance with a nonstandard algorithm if you do not need to know which elements are different, but only the number of common elements, because then you can avoid having to create new copies of the common elements.
In any case, it seems to me that starting by sorting both containers will give you the best performance for data sets with more than a few dozen elements.
Here's an attempt at writing an algorithm that just gives you the count of matching elements (untested):
auto it1 = begin(vec1);
auto it2 = begin(vec2);
const auto end1 = end(vec1);
const auto end2 = end(vec2);
sort(it1, end1);
sort(it2, end2);
size_t numCommonElements = 0;
while (it1 != end1 && it2 != end2) {
bool oneIsSmaller = *it1 < *it2;
if (oneIsSmaller) {
it1 = lower_bound(it1, end1, *it2);
} else {
bool twoIsSmaller = *it2 < *it1;
if (twoIsSmaller) {
it2 = lower_bound(it2, end2, *it1);
} else {
// none of the elements is smaller than the other
// so it's a match
++it1;
++it2;
++numCommonElements;
}
}
}
Note that I am trying to avoid inneficient methods such as going through all elements and check for equal data.
You need to go through all elements at least once, I assume you're implying you don't want to check every combinations. Indeed you don't want to do :
for all elements in vec1, go through the entire vec2 to check if the element is here. This won't be efficient if your vectors have a big number of elements.
If you prefer a linear time solution and you don't mind using extra memory here is what you can do :
You need a hashing function to insert element in an unordered_map or unordered_set
See https://stackoverflow.com/a/13486174/2502814
// next_permutation example
#include <iostream> // std::cout
#include <unordered_set> // std::unordered_set
#include <vector> // std::vector
using namespace std;
namespace std {
template <>
struct hash<pair<int, int>>
{
typedef pair<int, int> argument_type;
typedef std::size_t result_type;
result_type operator()(const pair<int, int> & t) const
{
std::hash<int> int_hash;
return int_hash(t.first + 6495227 * t.second);
}
};
}
int main () {
vector<pair<int, int>> vec1 {{1,2}, {3,1}, {2,2}};
vector<pair<int, int>> vec2 {{3,4}, {1,2}};
// Copy all elements from vec2 into an unordered_set
unordered_set<pair<int, int>> in_vec2;
in_vec2.insert(vec2.begin(),vec2.end());
// Traverse vec1 and check if elements are here
for (auto& e : vec1)
{
if(in_vec2.find(e) != in_vec2.end()) // Searching in an unordered_set is faster than going through all elements of vec2 when vec2 is big.
{
//Here are the elements in common:
cout << "{" << e.first << "," << e.second << "} is in common!" << endl;
}
}
return 0;
}
Output : {1,2} is in common!
You can either do that, or copy all elements of vec1 into an unordered_set, and then traverse vec2.
Depending on the sizes of vec1 and vec2, one solution might be faster than the other.
Keep in mind that picking the smaller vector to insert in the unordered_set also means you will use less extra memory.
I believe you use a 2D tree to search in 2 dimenstions. An optimal algorithm to the problem you specified would fall under the class of geometric algorithms. Maybe this link is of use to you: http://www.cs.princeton.edu/courses/archive/fall05/cos226/lectures/geosearch.pdf .

Removing duplicates from a non-sortable vector

I'm looking for a way to remove duplicates from a vector (lets call him theGreatVector :D).
I can't use std::sort followed by std::unique because there is no way to sort my objects.
theGreatVector contains some vector<Item*> (smallVectors)
I got an overload of == for vector<Item*> so i can use it
I'm able de create something in O(n²) but i need time efficiency
(theGreatVector.size() could be 10⁵ or 10⁶)
Right now what i got is something like that
(i fill my vector only if smallOne isnt in it) :
for(i=0;i<size;i++)
{
vector<Item*>smallOne = FindFacets(i)
if(smallOne doesnt belong to GreatOne) // this line already in O(n) :/
{
theGreatOne.push_back(smallOne);
}
}
If there is a way to do that even in nlog(n) + n or anything lower than n², that'd be great !
Thanks a lot
Azh
You can always std::tie every data member into a std::tuple and use lexicographic ordering on that to sort a vector of pointers to your big data structure. You can then do std::unique on that data structure before copying the output. With a small modification you could also remove the duplicates in place by sorting the big Item vector directly.
#include <tuple>
#include <memory>
#include <vector>
// tuples have builtin lexicographic ordering,
// I'm assuming all your Item's data members also have operator<
bool operator<(Item const& lhs, Item const& rhs)
{
return std::tie(lhs.first_data, /*...*/ lhs.last_data) < std::tie(rhs.first_data, /*...*/ rhs.last_Data);
}
int main()
{
// In the Beginning, there was some data
std::vector<Item> vec;
// fill it
// init helper vector with addresses of vec, complexity O(N)
std::vector<Item*> pvec;
pvec.reserve(vec.size());
std::transform(std::begin(vec), std::end(vec), std::back_inserter(pvec), std::addressof<Item>);
// sort to put duplicates in adjecent positions, complexity O(N log N)
std::sort(std::begin(pvec), std::end(pvec), [](Item const* lhs, Item const* rhs){
return *lhs < *rhs; // delegates to operator< for Item
});
// remove duplicates, complexity O(N)
auto it = std::unique(std::begin(pvec), std::end(pvec), [](Item const* lhs, Item const* rhs){
return *lhs == *rhs; // assumes Item has operator==, if not use std::tuple::operator==
});
pvec.erase(it, std::end(pvec));
// copy result, complexity O(N)
std::vector<Item> result;
result.reserve(pvec.size());
std::transform(std::begin(pvec), std::end(pvec), std::back_inserter(result), [](Item const* pelem){
return *pelem;
});
// And it was good, and done in O(N log N) complexity
}
Take a look at unordered set:
http://www.cplusplus.com/reference/unordered_set/unordered_set/
it seems to do what you want. Insertions for single elements are done in O(1) on average, O(n) in worst case, only equality operator needs to be provided.