The best practice for (unordered) map keys and values modification - c++

The map of the form map<long long, vector<long long>> is given. One has to take all keys and values modulo some integer N. Some keys can merge and corresponding values must join accordingly. For example, the map {{1,{2,6,4}}, {5,{8,4,9}}, {10,{5,1,7}}} should be equal to {{1,{2,1,4}}, {0,{0,1,2,3,4}}} after reduction modulo 5.
My way is in using a new map but I think there should be a better way.
code added
vector<long long> tmp;
//integer N, for example N = 5
int N = 5;
unordered_map<long long, vector<long long>> map;
//temporary map
unordered_map<long long, vector<long long>> map_tmp;
for (auto & x : map)
{
tmp.clear();
for (auto & y : x.second) tmp.push_back(y % N);
ind = x.first % N;
map_tmp[ind].insert(map_tmp[ind].end(), tmp.begin(), tmp.end());
sort(map_tmp[ind].begin(), map_tmp[ind].end());
map_tmp[ind].erase(unique(map_tmp[ind].begin(), map_tmp[ind].end()), map_tmp[ind].end());
}
map = map_tmp;

Since apparently values in map are unique and after applying modulo operation values contains unique items, then you should use different data structure. for example:
using Map = std::unordered_map<int, std::set<int>>;
std::set will handle uniqueness and order of items for given key.
Now the whole trick is to inspect API of std::unordered_map and std::set and how item can be inserted there. See:
std::unordered_map::insert
std::set::insert
Note return value: std::pair<iterator,bool> which gives you iterator to inserted or exciting item in map/set.
Knowing this thing writing a code which is able to meet your requriements is quite simple:
using Map = std::unordered_map<int, std::set<int>>;
Map moduloMap(const Map& in, int mod)
{
Map out;
for (const auto& [k, s] : in) {
if (s.empty())
continue;
auto& destSet = out.insert({ k % mod, {} }).first->second;
for (auto x : s) {
destSet.insert(x % mod);
}
}
return out;
}
Live demo with tests

Sometimes a for loop can be the easiest, clearest way to do something.
map<long long, vector<long long>> result;
for (const auto& [key, vec] : input) {
process (result[key%5], vec);
}
and process takes the vector by (non-const) reference and appends the reduced values from the second (const) argument.
update
After seeing the code you posted, I have several suggestions:
use a set instead. You are spending multiple steps to append the new values, sort the whole thing together, then remove duplicates. Just use a set which maintains a single copy of each value automatically.
use structured binding in your loop. Instead of x.second and x.first you can just name them key and vec as in my earlier post.
Assuming you still need tmp, declare it where you are calling .clear() now, instead of declaring it way up at the top of your code. You don't need to clear it each time through the loop; it will be empty each time through the loop naturally.

Related

Finding the key with most values in map<string, vector<string>>

set<string> myFunc (const map<string, vector<string>>& m)
I want to return all the keys in a set of strings, that map the most values (several keys if number of mapped values is the same). My attempt was:
set<string> ret;
auto max_e = *max_element(m.begin(), m.end(), [] (const pair<string, vector<string>>& m1, const pair<string, vector<string>>& m2) {
return m1.second.size() < m2.second.size();
});
ret.insert(max_e.first);
return ret;
Logically, this cannot work (I think) since this would only return one key with the highest value. Any ideas?
One way of doing it would be iterating twice:
1st one to get the maximum size out of all keys.
2nd one to get the keys that map to that size.
It should look along the lines of:
set <string> myFunc(const map<string, vector<string>>& m) {
set <string> ret;
size_t maximumSize = 0;
for (const auto& e : m) {
maximumSize = max(maximumSize, e.second.size());
}
for (const auto& e : m) {
if (e.second.size() == maximumSize) {
ret.insert(e.first);
}
}
return ret;
}
In addition to #a.Li's answer, if possible, you can also optimize quite a few things along the way.
Of course, iterating the map twice is probably the least expensive & simple way of solving the issue:
using StringMapType = std::map<std::string, std::vector<std::string>>;
using StringMapVectorType = StringMapType::value_type::second_type;
std::set<StringMapType::key_type> findKeys(const StringMapType &stringMap) {
StringMapVectorType::size_type maximumSize {};
for (const auto &[key, values] : stringMap)
maximumSize = std::max(maximumSize, values.size());
std::set<StringMapType::key_type> results {};
for (const auto &[key, values] : stringMap)
if (values.size() == maximumSize)
results.emplace(key);
return results;
}
However, I would recommend the following, if possible:
if you aren't interested in ordering the keys in your map type, use std::unordered_map,
replace the return value type (std::set with std::vector, if you aren't interested in the order of the keys stored in the results.
Object lifetime specific optimizations:
use std::string_view for the keys you find; this will avoid additional copies of the strings, assuming they aren't optimized out with short string optimization,
return an array of iterators, instead of their keys
If applied, the code could look something like this:
std::vector<StringMapType::const_iterator> findKeys(const StringMapType &stringMap) {
StringMapVectorType::size_type maximumSize {};
for (const auto &[key, values] : stringMap)
maximumSize = std::max(maximumSize, values.size());
std::vector<StringMapType::const_iterator> results {};
for (auto iterator = stringMap.cbegin(); iterator !=
stringMap.cend(); ++iterator)
if (const auto &values = iterator->second;
values.size() == maximumSize)
results.emplace_back(iterator);
return results;
}
Of course, if you'd like to avoid the whole issue, you can instead sort your values at the time of insertion using a custom comparator, or find the the entry with the most amount of elements in its array, and insert the new entry before it (of course, you'd have to use an unordered map, or another container).
Useful things for the future:
In which scenario do I use a particular STL container?

Weird behaviour with unordered_map of vectors and erase-remove idiom in C++14

So I'm computing Biconnected Components (BCC) in an undirected graph, after computation my algo includes some Bridge edges in some BCCs as well, so as a post-processing step I run a loop on each BCC (represented as a vector<pair<int, int>>, each pair<int, int> representing an edge in that BCC.) Here's how I did it:
auto pred = [&Bridges](pair<int, int>& edge) -> bool
{
return Bridges.find(edge) != Bridges.end();
};
for (auto bcc = BCC.begin(); bcc != BCC.end(); bcc++)
{
vector<pair<int, int>>& BCCList = (bcc->second);
BCCList.erase(remove_if(
BCCList.begin(), BCCList.end(), pred), BCCList.end());
}
Bridges is a set of pair<int, int>s again, containing all Bridge edges found by my algo.
BCC is a unordered_map<int, vector<pair<int, int>>>.
The above code works as intended, removes any Bridge edges that may have been in a BCC vector before. BUT, if I make a slight change and do this:
auto pred = [&Bridges](pair<int, int>& edge) -> bool
{
return Bridges.find(edge) != Bridges.end();
};
for (auto bcc = BCC.begin(); bcc != BCC.end(); bcc++)
{
vector<pair<int, int>> BCCList = (bcc->second);
BCCList.erase(remove_if(
BCCList.begin(), BCCList.end(), pred), BCCList.end());
}
All I did was remove the & before BCCList in the first line inside the for-loop. This makes the code not work, and it produces a result as if this for-loop never executed; no Bridge edges in any BCC are removed, thus computing wrong BCCs in the end. Please tell me why's this happening?
I always thought that if I have a bcc like iterator on an unordered_map, then bcc->first is the key (here, bcc->first should be an int) and bcc->second is the value (here, bcc->second should be vector<pair<int, int>>). Is this not correct? Why must I explicitly specify an & (a reference variable) for the code to work?
Does this behaviour have something to do with remove_if perhaps?
vector<pair<int, int>>& BCCList = (bcc->second);
Here, BCCList is a reference (an alternative name) for the vector stored in bcc->second. Whatever change you do to BCCList is actually done to bcc->second.
vector<pair<int, int>> BCCList = (bcc->second);
Here, BCCList is a copy of the vector stored in bcc->second. It's a separate object. Changes to it do not affect bcc->second at all.
Here is a simpler example, where it should be more obvious what's happening:
int data = 42;
int *bcc = &data;
int &ref = *bcc;
ref = 314;
int cop = *bcc;
cop = -42;
I don't think you'd expect the assignment cop = -42; to modify data. It's exactly the same situation in your code.
In your original code, BCCList is a reference to an element in BCC. Everything you do on BCCList, you do it in reality to the original element.
When you remove & the BCCList is an independent value that is initialised with a copy of the original. Everything you do to it is kept local to that value, and is lost at the next iteration (or when leaving the loop).

C++ unordered_map where key is also unordered_map

I am trying to use an unordered_map with another unordered_map as a key (custom hash function). I've also added a custom equal function, even though it's probably not needed.
The code does not do what I expect, but I can't make heads or tails of what's going on. For some reason, the equal function is not called when doing find(), which is what I'd expect.
unsigned long hashing_func(const unordered_map<char,int>& m) {
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
}
bool equal_func(const unordered_map<char,int>& m1, const unordered_map<char,int>& m2) {
return m1 == m2;
}
int main() {
unordered_map<
unordered_map<char,int>,
string,
function<unsigned long(const unordered_map<char,int>&)>,
function<bool(const unordered_map<char,int>&, const unordered_map<char,int>&)>
> mapResults(10, hashing_func, equal_func);
unordered_map<char,int> t1 = getMap(str1);
unordered_map<char,int> t2 = getMap(str2);
cout<<(t1 == t2)<<endl; // returns TRUE
mapResults[t1] = "asd";
cout<<(mapResults.find(t2) != mapResults.end()); // returns FALSE
return 0;
}
First of all, the equality operator is certainly required, so you should keep it.
Let's look at your unordered map's hash function:
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
Since it's an unordered map, by definition, the iterator can iterate over the unordered map's keys in any order. However, since the hash function must produce the same hash value for the same key, this hash function will obviously fail in that regard.
Additionally, I would also expect that the hash function will also include the values of the unorderered map key, in addition to the keys themselves. I suppose that you might want to do it this way -- for two unordered maps to be considered to be the same key as long as their keys are the same, ignoring their values. It's not clear from the question what your expectation is, but you may want to think it over.
Comparing two std::unordered_map objects using == compares whether the maps contain the same keys. It does nothing to tell whether they contain them in the same order (it's an unordered map, after all). However, your hashing_func depends on the order of items in the map: hash<string>()("ab") is in general different from hash<string>()("ba").
A good place to start is with what hashing_func returns for each map, or more easily what the string construction in hashing_func generates.
A more obviously correct hash function for such a type could be:
unsigned long hashing_func(const unordered_map<char,int>& m) {
unsigned long res = 0;
for (auto& e : m)
res ^ hash<char>()(e.first) ^ hash<int>()(e.second);
return res;
}

Elegant and efficient algorithm for increasing values of a "vector<pair>"

I need to find an element in a vector<pair<int, float>> and increase the second value.
I tried an approach.
template <typename K, typename V>
struct match_first {
const K _k; match_first(const K& k) : _k(k) {}
bool operator()(const pair<K, V>& el) const {
return _k == el.first;
}
};
Eg to use.:
vector< pair<int, float> > vec;
vec.push_back(make_pair(2, 3.0));
vec.push_back(make_pair(3, 5.0));
vec.push_back(make_pair(1, 1.0));
vector< pair<int, float> >::iterator it = find_if(vec.begin(), vec.end(), match_first<int, float>(3));
if (it != vec.end()) {
it->second += 9;
}
There is a more efficient way of accomplishing this task?
A map seems more natural:
#include <map>
int main()
{
std::map<int, float> m;
m.insert(std::make_pair(2, 3.0));
m.insert(std::make_pair(3, 5.0));
m.insert(std::make_pair(1, 1.0));
auto it = m.find(3);
if (it != m.end()) {
it->second += 9;
}
}
It will also be faster because lookup is O(log(n))
You can reach the same complexity with a vector of sorted pairs by using std::lower_bound (or std::equal_range if keys can be repeated)
It depends on your constrains. If you have the unique key (the first element) you can use std::map<K,V> to hold your objects. Then increasing it is easy. If V has a default constructor initializing it to zero, you can even skip adding new elements and just increment (I am not sure it will work with ints through).
std::map<K,V> data;
data[key] = data[key] + 1;
the [] operator used for non-existent key will create the object for you using its default constructor. To just access data use at or find methods.
extending sehe's answer: You can use std::multimap in the same way if you may have duplicate keys. This container also keeps the <K,V> pair in sorted order(keys) so binary search approach obviously speed up things.
There is no exact answer to your question: it depends.
My first answer is: use std::find_if (available in <algorithm>, part of the C++ Standard Library), then profile your code. If the search turns out to be a bottleneck worthy of concern, then try another approach.
Beware of using a std::map, as it will sort the pairs by their first component (that is, the insertion order will be lost). In addition, it will not allow you to store two pairs with the same first component.
As others have mentioned, you can work around this caveats (if they are indeed caveats to your problem), but, like I mentioned before, it would only be worth your while if you demonstrate first that the search turned out to be a bottleneck after using the standard algorithms.

Composability of STL algorithms

The STL algorithms are a pretty useful thing in C++. But one thing that kind of irks me is that they seem to lack composability.
For example, let's say I have a vector<pair<int, int>> and want to transform that to a vector<int> containing only the second member of the pair. That's simple enough:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;
std::transform(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return p.second; });
Or maybe I want to filter the vector for only those pairs whose first member is even. Also pretty simple:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> result;
std::copy_if(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return (p.first % 2) == 0; });
But what if I want to do both? There is no transform_if algorithm, and using both transform and copy_if seems to require allocating a temporary vector to hold the intermediate result:
std::vector<std::pair<int, int>> values = GetValues();
std::vector<std::pair<int, int>> temp;
std::vector<int> result;
std::copy_if(values.begin(), values.end(), std::back_inserter(temp),
[] (std::pair<int, int> p) { return (p.first % 2) == 0; });
std::transform(values.begin(), values.end(), std::back_inserter(result),
[] (std::pair<int, int> p) { return p.second; });
This seems rather wasteful to me. The only way I can think of to avoid the temporary vector is to abandon transform and copy_if and simply use for_each (or a regular for loop, whichever suits your fancy):
std::vector<std::pair<int, int>> values = GetValues();
std::vector<int> result;
std::for_each(values.begin(), values.end(),
[&result] (std::pair<int, int> p)
{ if( (p.first % 2) == 0 ) result.push_back(p.second); });
Am I missing something here? Is there a good way to compose two existing STL algorithms into a new one without needing temporary storage?
You're right. You can use Boost.Range adaptors to achieve composition.
I think the problem is unfortunately structural
C++ uses two iterators to represent a sequence
C++ functions are single-valued
so you cannot chain them because a function cannot return "a sequence".
An option would have been to use single-object sequences instead (like the range approach from boost). This way you could have combined the result of one processing as the input of another... (one object -> one object).
In the standard C++ library instead the processing is (two objects -> one object) and it's clear that this cannot be chained without naming the temporary object.
Back in 2000, the problem was already noted. Gary Powell and Martin Weiser came up with a "view" concept, and coined the name "View Template Library". It didn't take off then but the idea makes sense. A "view" adaptor essentially applies an on-the-fly transform. For instance, it can adapt the value_type.
The concept probably should be readdressed now we have C++0x. We've made quite some progress in generic programming since 2000.
For example, let's use the vector<pair<int, int>> to vector<int> example. That could be quite simple:
std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, [](std::pair<int, int> p) { return p.first });
std::vector<int> result(view.begin(), view.end());
Or, using the boost::bind techniques, even simpler:
std::vector<std::pair<int, int>> values = GetValues();
vtl2::view v (values, &std::pair<int, int>::first);
std::vector<int> result(view.begin(), view.end());
Since C++20 you can use std::ranges::copy together with the range adaptors std::views::filter and std::views::values from the Ranges library as follows:
int main() {
std::vector<std::pair<int, int>> values = { {1,2}, {4,5}, {6,7}, {9,10} };
std::vector<int> result;
auto even = [](const auto& p) { return (p.first % 2) == 0; };
std::ranges::copy(values | std::views::filter(even) | std::views::values,
std::back_inserter(result));
for (int i : result)
std::cout << i << std::endl;
return 0;
}
Output:
5
7
In the solution above, no temporary vector is created for an intermediate result, because the view adaptors create ranges that don't contain elements. These ranges are just views over the input vector, but with a customized iteration behavior.
Code on Wandbox
Not sure if this is still active, but...
A new light wait header only lib that does what you describe. Doc talks about lazy evaluation and com compossible generators.
Doc snippet:
Read in up to 10 integers from a file "test.txt".
filter for the even numbers, square them and sum their values.
int total = lz::read<int>(ifstream("test.txt")) | lz::limit(10) |
lz::filter([](int i) { return i % 2 == 0; }) |
lz::map([](int i) { return i * i; }) | lz::sum();
you can split that line up into multiple expressions.
auto numbers = lz::read<int>(ifstream("test.txt")) | lz::limit(10);
auto evenFilter = numbers | lz::filter([](int i) { return i % 2 == 0; });
auto squares = evenFilter | lz::map([](int i) { return i * i; });
int total = squares | lz::sum();
Even though this expression is split over multiple variable assignments, it is not any less efficient.
Each intermediate variable simply
describes a unit of code to be executed. All held in stack.
https://github.com/SaadAttieh/lazyCode