Better way to get common keys from 2 std::maps - c++

I have references to 2 maps of type std::map<std::string, int>. I want to create a master list containing all the keys that both maps have in common. My current solution is as follows, but I am curious if there is a more efficient way of approaching this problem?
const std::map<std::string, int>& map1;
const std::map<std::string, int>& map2;
std::vector<std::string> shared_keys;
// only add to master list if both contain the string as a key
for (auto& entry : map1) {
if (map2.find(entry.first) != map2.end()) {
shared_keys.push_back(entry.first);
}
}
It would be nice if I could forgo the for loop entirely / do this as a "one-liner", but not sure how to accomplish that...

std::map is sorted, so you can just use std::set_intersection.
You'll need a custom comparator, since you're only comparing keys ... and then you need an adapter to only use the key in the output iterator ...
A one-liner is pushing it, unless you use something like the Boost.Iterator adapters. Rough sketch (untested):
template <typename K, typename V>
vector<K> map_key_intersection(map<K,V> const &a, map<K,V> const &b)
{
vector<K> result;
using Elem = typename map<K,V>::value_type;
set_intersection(a.begin(), a.end(),
b.begin(), b.end(),
boost::make_function_ouput_iterator(
[&result](Elem const &e) { result.push_back(e.first); }),
[](Elem const& a, Elem const& b) { return a.first < b.first; });
return result;
}
NB, there are several things wrong with this in practice, even apart from the fact that ranges are a better approach if you have access:
The std::map has more than two template parameters. So, add the Compare and Allocator params to your list.
What if they had different Compare types? Now we might not meet the requirements of set_intersection.
What if they have the same Compare type, but were constructed with a stateful comparator that does a different thing for each instance? Weird, but possible ... and we still don't meet the ordering constraint, but it's more expensive to check.
So, to be exactly correct, you should use eg. a.value_comp() instead of the bare operator<, but you also need to be reasonably sure that both maps use the same ordering. At least, you should add a comment to the effect that it's your client's problem if they don't.

You can use std::set_intersection, although as a one liner you will also get values from the one of the maps.
std::vector<std::pair<const std::string, int>> shared;
std::set_intersection(map1.begin(), map1.end(), map2.begin(), map2.end(), std::back_inserter(shared), map1.value_comp());
With C++20's ranges library (or a similar C++11 library), you can grab the keys for the intersection.
std::ranges::set_intersection(std::ranges::keys_view(map1), std::ranges::keys_view(map2), std::back_inserter(shared_keys));

Related

Is there a better way to split a container of a non-movable type based on a predicate

I want to ask for an alternative solution to this problem. I am dealing with this C/C++ style interface that has this non-moveable type NonMovableType defined roughly as follows:
union union_type {
int index;
const char* name;
};
struct NonMovableType
{
std::initializer_list<union_type> data;
};
This is something I cannot change, despite the unfortunate use of unions and initializer lists.
We then have some container of this type, say
std::vector<NonMovableType> container
and we want to split container based on some predicate for each of its members. Now, if it was a movable type i'd do
std::vector<NonMovableType> container;
std::vector<NonMovableType> result;
auto iter = std::partition(container.begin(), container.end(), [](const NonMovableType& element){
return element.data.size(); // the predicate
});
std::move(iter, container.end(), std::back_inserter(result));
container.erase(iter, container.end());
I could then trust container and result would contain the elements split by the predicate, that way I could then iterate over each one individually and do the necessary processing on them.
This wont work however because std::move and std::partition both require a movable type. Instead I have to result to the rather slow:
std::vector<NonMovableType> container;
std::vector<NonMovableType> result_a;
std::vector<NonMovableType> result_b;
std::copy_if(container.begin(), container.end(), std::back_inserter(result_a), [](const NonMovableType& element){
return element.data.size();
});
std::copy_if(container.begin(), container.end(), std::back_inserter(result_b), [](const NonMovableType& element){
return !element.data.size();
});
container.clear();
And so, my question is, is there any better way to do this? I suppose calling it a 'non movable type' may be wrong, its only the union and the initializer list which are giving me problems, so really the question becomes is there a way to move this type safely, and do so without having to change the initial class. Could it also be possible to wrap NonMovableType into another class and then use pointers as opposed to a direct type?
Is it really a performance problem or are you trying to optimize in advance?
As for a general answer: it really depends. I would probably try to achieve everything in a single pass (especially if the original container has a lot of elements), e.g.
for (const auto& el : container) {
if (el.data.size()) out1.push_back(el);
else out2.push_back(el);
}
which can be easily generalized into:
template<typename ForwardIt, typename OutputIt1, typename OutputIt2, typename Pred>
void split_copy(ForwardIt b, ForwardIt e, OutputIt1 out1, OutputIt2 out2, Pred f)
{
for(; b != e; ++b) {
if (f(*b)) {
*out1 = *b;
++out1;
} else {
*out2 = *b;
++out2;
}
}
}
Is this going to be faster than partitioning first and copying later?
I can't tell, maybe. Both solutions are imho ok in terms of readability, as for their performance - please measure and get back with the numbers. :)
Demo:
https://godbolt.org/z/axMsKGq7d
EDIT: operating on heap-allocated objects and vectors of pointers to them, as well as operating on lists is sth. to be verified in practice, for your particular use case. It might help, of course, but again, measure first, optimize later.

Unordered_map of unordered_map vs custom hash function for pair key C++?

I have some keys that are pair<string, string>. I was originally going to write my own hash function but thought that it might be easier to just implement an unordered_map<string, unordered_map<string, val>>. Are there any performance differences between these two I should be aware of?
I would use std::unordered_map<std::pair<std::string, std::string>, Value, [pair_hash][1]> for two reasons:
Performance
Of course, you can measure your two versions with your favorite profiler, but basing on my experience - the number of allocation is what matters here the most - so see:
flat_map.insert(key, value)); will create on average just one new bucket (or extend one), whilst
auto it = map2.insert(make_pair(key.first, map1{}));
it->second.insert(make_pair(key.second, value));
have to create empty map1 - what might not be zero-cost. Then it has to add/extend two buckets (list associated with the given hash value).
Maintainability/Readability
The Second reason is more important for me. Flat(one) map is easy to use. You could see in insert example already that it is more complicated, but consider erase - it is so complicated, that it is easy to make a mistake:
void remove(
std::unordered_map<std::string,
std::unordered_map<std::string, Value>>& map,
std::pair<std::string, std::string> const& key)
{
auto it1 = map.find(key.first);
if (it1 == map.end()) return;
it1->second.erase(key.second);
// easy to forget part
if (it1->second.empty())
{
map.erase(it1);
}
}
Defining a simple hash function in your case is trivial and performant. If the std::pair is semantically the key, then this approach makes your intent clear. It also allows duplicates of the first member of the std::pair in your map, as you only need the entire key to be unique. In terms of usage, you also avoid the additional layer of indirection, with nested maps.
Example implementation:
Godbolt
...
using pairSS = std::pair<std::string, std::string>;
namespace std
{
template<> struct hash<pairSS>
{
std::size_t operator()(pairSS const& pair) const noexcept
{
return std::hash<std::string>{}(pair.first) ^
(std::hash<std::string>{}(pair.second) << 1);
}
};
}
int main()
{
std::pair myPair = {"Hi", "bye"};
std::cout << std::hash<pairSS>{}(myPair) << std::endl;
struct val{};
std::unordered_map<pairSS, val> hashMap;
}

How to remove duplicates from a vector of pair<int, Object>

This is what I am trying right now. I made a comparison function:
bool compare(const std::pair<int, Object>& left, const std::pair<int, Object>& right)
{
return (left.second.name == right.second.name) && (left.second.time == right.second.time) &&
(left.second.value == right.second.value);
}
After I add an element I call std::unique to filter duplicates:
data.push_back(std::make_pair(index, obj));
data.erase(std::unique(data.begin(), data.end(), compare), data.end());
But it seems that this doesn't work. And I don't know what the problem is.
From my understanding std::unique should use the compare predicate.
How should I update my code to make this work ?
I am using C++03.
edit:
I have tried to sort it too, but still doens't work.
bool compare2(const std::pair<int, Object>& left, const std::pair<int, Object>& right)
{
return (left.second.time< right.second.time);
}
std::sort(simulatedLatchData.begin(), simulatedLatchData.end(), compare2);
std::unique requires the range passed to it to have all the duplicate elements next to one another in order to work.
You can use std::sort on the range before you a call unique to achieve that as sorting automatically groups duplicates.
Sorting and filtering is nice, but since you never want any duplicate, why not use std::set?
And while we're at it, these pairs look suspiciously like key-values, so how about std::map?
If you want to keep only unique objects, then use an appropriate container type, such as a std::set (or std::map). For example
bool operator<(object const&, object const&);
std::set<object> data;
object obj = new_object(/*...*/);
data.insert(obj); // will only insert if unique

Working with a vector of pair vectors?

I've been search around Google but I didn't find what I need. I'm trying to create a vector that allows me to add 3 (and after I'll need to store 4) variables, access and sort them.
I'm implementing the vector as follows for 3 variables:
std::vector<std::pair<std::string, std::pair<int, double> > > chromosomes;
To add information (variables), I'm doing:
chromosomes.emplace_back(dirp->d_name, std::make_pair(WSA, fault_percent));
How can I access each parameter and sort them based on the WSA and fault coverage? As in a vector of pair that I can do that using members first and second.
And for 4 variables, it would be as follows?
std::vector<std::pair<std::string, std::string>, std::pair<int, double> > > chromosomes;
chromosomes.emplace_back( std::make_pair(dirp->d_name, x), std::make_pair(WSA, fault_percent));`
As suggested here I think you should be using a vector of tuple<string, int, double>s or tuple<string, string, int, double>s respectively.
There is a defined tuple::operator< which uses the less-than-operator for each of it's composing types moving left to right. If a simple comparison of each element is sufficient then all you'll need to do is call sort:
sort(chromosomes.begin(), chromosomes.end());
If the tuple::operatior< does not provide a sufficient comparison for your needs sort provides an overload which takes a comparison lambda. Your lambda would need to do the following:
Take in 2 const references to the tuples
Return true if the first tuple is strictly smaller than the second tuple
Return false if the first tuple is greater or equal to the second tuple
In the end your call would look something like this:
sort(chromosomes.begin(), chromosomes.end(), [](const auto& lhs, const auto& rhs) {
// Your comparison between the two goes here
});
If you're not familiar with working with tuples you'll need to use the templated get method to extract either by index or type in the cases where there is not a duplicate type contained by the tuple.
First to access to the different elements:
for (auto& x :chromosomes)
cout <<x.first<<": "<<x.second.first<<" "<<x.second.second<<endl;
Next, to sort the elements on WSA:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.second.first<y.second.first;});
If you want to sort on several criteria, for example WSA and fault_percent, you just have to change the lambda function for comparison:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.second.first<y.second.first
|| (x.second.first==y.second.first
&& x.second.second<y.second.second );});
Here is an online demo
Remark
Now what puzzles me, is why you want to use pairs of pairs or even tuples, when you could use a clean struct which would be easier to store/retrieve, and access its members:
struct Chromosome {
string name;
int WSA;
double fault_percent;
};
vector <Chromosome> chromosomes;
It would be much more readable and maintainable this way:
sort(chromosomes.begin(), chromosomes.end(),
[](auto &x, auto &y) { return x.WSA<y.WSA
|| (x.WSA==y.WSA && x.fault_percent<y.fault_percent );});
It seems like you need a table-like data structure, that allows sorting by multiple columns. C++ isn't the easiest language to manipulate table/matrix data structures in, but here's a few links to help you get started.
An example Table class:
How to dynamically sort data by arbitrary column(s)
A vector/tuple solution, which is a slightly cleaner version of what you're currently working on:
sorting table in place using stl sort
A lengthy discussion of this problem, which might give you some additional ideas:
https://softwareengineering.stackexchange.com/questions/188130/what-is-the-best-way-to-store-a-table-in-c

Elegant and efficient algorithm for increasing values of a "vector<pair>"

I need to find an element in a vector<pair<int, float>> and increase the second value.
I tried an approach.
template <typename K, typename V>
struct match_first {
const K _k; match_first(const K& k) : _k(k) {}
bool operator()(const pair<K, V>& el) const {
return _k == el.first;
}
};
Eg to use.:
vector< pair<int, float> > vec;
vec.push_back(make_pair(2, 3.0));
vec.push_back(make_pair(3, 5.0));
vec.push_back(make_pair(1, 1.0));
vector< pair<int, float> >::iterator it = find_if(vec.begin(), vec.end(), match_first<int, float>(3));
if (it != vec.end()) {
it->second += 9;
}
There is a more efficient way of accomplishing this task?
A map seems more natural:
#include <map>
int main()
{
std::map<int, float> m;
m.insert(std::make_pair(2, 3.0));
m.insert(std::make_pair(3, 5.0));
m.insert(std::make_pair(1, 1.0));
auto it = m.find(3);
if (it != m.end()) {
it->second += 9;
}
}
It will also be faster because lookup is O(log(n))
You can reach the same complexity with a vector of sorted pairs by using std::lower_bound (or std::equal_range if keys can be repeated)
It depends on your constrains. If you have the unique key (the first element) you can use std::map<K,V> to hold your objects. Then increasing it is easy. If V has a default constructor initializing it to zero, you can even skip adding new elements and just increment (I am not sure it will work with ints through).
std::map<K,V> data;
data[key] = data[key] + 1;
the [] operator used for non-existent key will create the object for you using its default constructor. To just access data use at or find methods.
extending sehe's answer: You can use std::multimap in the same way if you may have duplicate keys. This container also keeps the <K,V> pair in sorted order(keys) so binary search approach obviously speed up things.
There is no exact answer to your question: it depends.
My first answer is: use std::find_if (available in <algorithm>, part of the C++ Standard Library), then profile your code. If the search turns out to be a bottleneck worthy of concern, then try another approach.
Beware of using a std::map, as it will sort the pairs by their first component (that is, the insertion order will be lost). In addition, it will not allow you to store two pairs with the same first component.
As others have mentioned, you can work around this caveats (if they are indeed caveats to your problem), but, like I mentioned before, it would only be worth your while if you demonstrate first that the search turned out to be a bottleneck after using the standard algorithms.