Proper value for unordered_map

Proper value for unordered_map - c++

I have a set of strings that I have to put into a hash table and retrieve anagrams of it. I chose the unordered_map since it's an inbuilt hash table in c++. The strings are as followings,
cat, act, art, tar, rat... etc..
Now I used the alphabetically sorted word as key and a vector of unordered words as value. This implementation takes a lot of time in insertion. Is this the best possible implementation for the requirement, or is there something less time consuming that I can use?
std::tr1::unordered_map<std::string, std::vector<std::string>> map;
if(map.find(sortedWord) == map.end()){
std::vector<std::string> temp;
temp.push_back(word);
map.insert(make_pair(sortedWord, temp));
}else{
map.find(sortedWord)->second.push_back(word);
}

You're making that a lot more complicated than necessary, and in the process you're also slowing things down by:
Searching for the key twice, and
Copying a vector (with the contained word) when you have a new key. In fact, it is probably copied twice.
The following works fine with C++11, and I'm pretty sure it works the same way with tr1:
/* Name of map changed because I don't like to use "map"
* or "string" as variable names */
wordMap[sortedWord].push_back(word);
If sortedWord is not present in the map, then wordMap[sortedWord] will insert it with a default-constructed std::vector<std::string>>. So whether or not sortedWord was present, the new word can just be appended onto the value returned by the subscript.

Just to offer another solution, you may use C++11 std::unordered_multiset with customized hash algorithm and equality comparison.
The custom hash algorithm may simply combine the hash values of each characters with a commutative operation, say bitwise-xor, such that all anagrams have the same hash value.
The custom equality comparison can use std::is_permutation to equate all anagrams.
struct AnagramHash
{
typedef std::string argument_type;
typedef std::hash<char>::result_type result_type;
result_type operator()(const argument_type& s) const
{
std::hash<char> char_hash;
result_type result = 0;
for (const auto& x : s)
{
result ^= char_hash(x);
}
return result;
}
};
　
struct AnagramEqual
{
typedef bool result_type;
typedef std::string first_argument_type;
typedef std::string second_argument_type;
result_type operator()(const first_argument_type& lhs, const second_argument_type& rhs) const
{
if (lhs.size() == rhs.size())
return std::is_permutation(std::begin(lhs), std::end(lhs), std::begin(rhs));
else
return false;
}
};
　
int main()
{
std::unordered_multiset<std::string, AnagramHash, AnagramEqual> anagrams;
anagrams.insert("arc");
anagrams.insert("rac");
anagrams.insert("car");
anagrams.insert("H2O");
anagrams.insert("2OH");
auto range = anagrams.equal_range("car");
for (auto it = range.first ; it != range.second ; ++it)
{
cout << *it << endl;
}
cout << endl;
range = anagrams.equal_range("HO2");
for (auto it = range.first ; it != range.second ; ++it)
{
cout << *it << endl;
}
}

Related

Best way to calculate a running hash for an unordered_map?

I've got a simple wrapper-class for std::unordered_map that updates a running hash-code for the unordered_map's contents, as key-value pairs are added or removed; that way I never have to iterate over the entire contents to get the current hash code for the set. It does this by adding to the _hash member-variable whenever a new key-value pair is added, and subtracting from the _hash member-variable whenever an existing key-value pair is removed. This all works fine (but see the toy implementation below if you want a code-example of what I mean).
My only concern is that I suspect that simply adding and subtracting values from _hash might not be the optimal thing to do from the perspective of minimizing the likelihood of hash-value collisions. Is there a mathematically better way to compute the running-hash-code for the table, that would still preserve my ability to efficiently add/remove items from the table (i.e. without forcing me to iterate over the table to rebuild a hash code from scratch every time?)
#include <functional>
#include <unordered_map>
#include <string>
#include <iostream>
template<typename KeyType, typename ValueType> class UnorderedMapWithHashCode
{
public:
UnorderedMapWithHashCode() : _hash(0) {/* empty */}
void Clear() {_map.clear(); _hash = 0;}
void Put(const KeyType & k, const ValueType & v)
{
Remove(k); // to deduct any existing value from _hash
_hash += GetHashValueForPair(k, v);
_map[k] = v;
}
void Remove(const KeyType & k)
{
if (_map.count(k) > 0)
{
_hash -= GetHashValueForPair(k, _map[k]);
_map.erase(k);
}
}
const std::unordered_map<KeyType, ValueType> & GetContents() const {return _map;}
std::size_t GetHashCode() const {return _hash;}
private:
std::size_t GetHashValueForPair(const KeyType & k, const ValueType & v) const
{
return std::hash<KeyType>()(k) + std::hash<ValueType>()(v);
}
std::unordered_map<KeyType, ValueType> _map;
std::size_t _hash;
};
int main(int, char **)
{
UnorderedMapWithHashCode<std::string, int> map;
std::cout << "A: Hash is " << map.GetHashCode() << std::endl;
map.Put("peanut butter", 5);
std::cout << "B: Hash is " << map.GetHashCode() << std::endl;
map.Put("jelly", 25);
std::cout << "C: Hash is " << map.GetHashCode() << std::endl;
map.Remove("peanut butter");
std::cout << "D: Hash is " << map.GetHashCode() << std::endl;
map.Remove("jelly");
std::cout << "E: Hash is " << map.GetHashCode() << std::endl;
return 0;
}

Your concept's perfectly fine, just the implementation could be improved:
you could take the hash functions to use as template arguments that default to the relevant std::hash instantiations; note that for numbers it's common (GCC, Clang, Visual C++) for std::hash<> to be an identity hash, which is moderately collision prone; GCC and Clang mitigate that somewhat by having prime number of buckets (vs Visual C++'s power-of-2 choice), but you need to avoid having distinct key,value entries collide in the size_t hash-value space, rather than post-mod-bucket-count, so would be better off using a meaningful hash function. Similarly Visual C++'s std::string hash only incorporates 10 characters spaced along the string (so it's constant time), but if your key and value were both similar same-length long strings only differing in a few characters that would be horrible collision prone too. GCC uses a proper hash function for strings - MURMUR32.
return std::hash<KeyType>()(k) + std::hash<ValueType>()(v); is mediocre idea in general and an awful idea when using an identity hash function (e.g. h({k,v}) == k + v, so h({4,2}) == h({2,4}) == h({1,5}) etc.)
consider using something based on boost::hash_combine instead (assuming you do adopt the above advice to have template parameters provide the hash functions:
auto key_hash = KeyHashPolicy(key);
return (key_hash ^ ValueHashPolicy(value)) +
0x9e3779b9 + (key_hash << 6) + (key_hash >> 2);
you could dramatically improve the efficiency of your operations by avoiding unnecessarily hash table lookups (your Put does 2-4 table lookups, and Remove does 1-3):
void Put(const KeyType& k, const ValueType& v)
{
auto it = _map.find(k);
if (it == _map.end()) {
_map[k] = v;
} else {
if (it->second == v) return;
_hash -= GetHashValueForPair(k, it->second);
it->second = v;
}
_hash += GetHashValueForPair(k, v);
}
void Remove(const KeyType& k)
{
auto it = _map.find(k);
if (it == _map.end()) return;
_hash -= GetHashValueForPair(k, it->second);
_map.erase(it);
}
if you want to optimise further, you can create a version of GetHashValueForPair that returned the HashKeyPolicy(key) value and let you pass it in to avoid hashing the key twice in Put.

C++ map customize comparator

I defined a map to count the number of strings while sorting the strings by their length:
struct cmp {
bool operator()(const string& a, const string& b) {
return a.size() > b.size();
}
};
int main() {
map<string, int, cmp> mp;
mp["aaa"] = 1;
mp["bbb"] = 2;
cout << mp["aaa"];
}
I'm confused as the output is 2. How should I achieve my goal?

Because of the way your comparator is defined, strings "aaa" and "bbb" are considered equal. Your map has one item, not two. First you assigned 1 to that item, then you assigned 2.
To solve the problem, define your comparator as follows:
struct cmp {
bool operator()(const string& a, const string& b) {
return a.size() == b.size() ? a > b : a.size() > b.size();
}
};
That way, the strings will be considered equal only if they actually are equal, not only when their sizes match, but the string length will still have priority for sorting.

std::map not only sorts items by key, it stores them by (unique) key - 1 item per key.
This behavior is defined by the comparator: if for keys a & b neither of a<b and b<a is true, these keys are considered equal.
In your case mp["bbb"] = 2 just overwrites mp["aaa"].
If you want to fit all the strings in the map, you can use std::multimap, which allows more than 1 value per key.
The other way is to redefine the comparator, so that it would take the different strings into account:
struct cmp {
bool operator()(const string& a, const string& b) {
if(a.size() < b.size()) return true;
if(b.size() < a.size()) return false;
return a<b;
}
};
Thus your map will still prioritize sorting by string length, but it will also distinguish different strings of same size.
Depending on your use case, you can also check other containers like priority_queue or just plain vector with a proper insertion technique.

If you want to allow strings of identical size in your map but do not care about their relative order, then std::multimap is an alternative solution:
#include <map>
#include <iostream>
#include <string>
struct cmp {
bool operator()(const std::string& a, const std::string& b) const {
return a.size() > b.size();
}
};
int main() {
std::multimap<std::string, int, cmp> mp;
mp.emplace("eee", 5);
mp.emplace("aaa", 1);
mp.emplace("bbb", 2);
mp.emplace("cccc", 3);
mp.emplace("dd", 4);
auto const elements_of_size_3 = mp.equal_range("aaa");
for (auto iter = elements_of_size_3.first; iter != elements_of_size_3.second; ++iter)
{
std::cout << iter->first << " -> " << iter->second << '\n';
}
}
Output:
eee -> 5
aaa -> 1
bbb -> 2
From std::multimap<std::string, int, cmp>'s point of view, "eee", "aaa" and "bbb" are all completely equal to each other, and std::multimap allows different keys to be equal. Their relative order is actually guaranteed to be the order of insertion since C++11.

How to avoid duplicate values in maps using c++

I am trying to write a program in C++ using maps...
My goal is to avoid the same values repeated in maps.
If the keys are same, we can use maps to avoid the duplicated keys. To allow duplicate keys, we use multimaps.
In case the value is the same, how can we avoid that?
The program which I have written allows duplicated values:
typedef std::map<int, std::string> MyMap;
int main()
{
MyMap map;
MyMap::iterator mpIter;
int key;
string value;
int count;
for(count = 0; count < 3;count++)
{
cin >> key;
cin >> value;
std::pair<MyMap::iterator, bool> res = map.insert(std::make_pair(key,value));
}
for (mpIter=map.begin(); mpIter != map.end(); ++mpIter)
cout << " " << (*mpIter).second << endl;
}

Make the value part of the key and/or use a set but that may not really solve the problem. It isn't possible to easily define a container that has both unique keys AND values if that's what you want. However, you might still construct one. Here's a very simple example to illustrate what is needed:
// Assuming keys are KEY and values are VAL
class MyMap {
public:
std::set<KEY> keyset;
std::set<VAL> valset;
std::map<KEY,VAL> theRealMap;
// assuming existence of function HAS(S,V)
// which returns true if v is in set S
bool MyInsert(KEY ky, VAL val) {
if (HAS(keyset, ky) return false;
if (HAS(valset, val) return false;
keyset.insert(ky);
valset.insert(vl);
return theRealMap.insert(std::pair<KEY,VAL>(ky, val));
}
:
:
Since this is an example it's not intended to be copied. You will likely want to include the functionality provided by std:map. An easy way would be to use std::map as a base class but you will need to hide (by making private) or implement similar code for each variant of insert otherwise you might get inadvertent insert that may not be unique.
Note: this requires twice the size of a single map. You can save some space by using theRealMap instead of a separate set for keys set. Another way would be to search the map but that sacrifices time for space. It's your call.

One way to do this is to maintain a separate std::set of the values. When you insert a value into a set it returns a std::pair<iterator, bool>. The bool value is true if the value was not already in the set. This tells you if it is safe to also put the value in the map.
First, however, you need to make sure the key is unique because the same key may already have been inserted with a different value:
typedef std::map<int, std::string> MyMap;
int main()
{
MyMap map;
MyMap::iterator mpIter;
int key;
string value;
int count;
// keep track of values with a std::set
std::set<std::string> values;
for(count = 0; count < 3; count++)
{
cin >> key;
cin >> value;
auto found = map.find(key);
if(found != map.end()) // key already in map
continue; // don't add it again
// now try to add it to the set
// only add to the map if its value is not already in the set
if(values.insert(value).second)
map.insert(std::make_pair(key, value));
}
for(mpIter = map.begin(); mpIter != map.end(); ++mpIter)
cout << " " << (*mpIter).second << endl;
}

One (inefficient) way to do it is to create a reverse map (with <string,int>) and insert your input in reverse order as that of MyMap into it. If ok, then insert into MyMap
Here is the working code.
typedef std::map<int, std::string> MyMap;
typedef std::map<string, int> rev_Map;
int main()
{
MyMap map;
rev_Map rmap;
MyMap::iterator mpIter;
rev_Map::iterator rmap_iter;
int key;
string value;
int count;
for(count = 0; count < 3;count++)
{
cin >> key;
cin >> value;
std::pair<rev_Map::iterator, bool> ok = rmap.insert(std::make_pair(value,key)); //insert into the reverse map
if(ok.second) //if above amap.insert works
std::pair<MyMap::iterator, bool> res = map.insert(std::make_pair(key,value));
}
for (mpIter=map.begin(); mpIter != map.end(); ++mpIter)
cout << " " << (*mpIter).second << endl;
}

Output over unique elements of `std::multiset` and their frequency using std:: algorithm in C++ (no loops)

I have the following multiset in C++:
template<class T>
class CompareWords {
public:
bool operator()(T s1, T s2)
{
if (s1.length() == s2.length())
{
return ( s1 < s2 );
}
else return ( s1.length() < s2.length() );
}
};
typedef multiset<string, CompareWords<string>> mySet;
typedef std::multiset<string,CompareWords<string>>::iterator mySetItr;
mySet mWords;
I want to print each unique element of type std::string in the set once and next to the element I want to print how many time it appears in the list (frequency), as you can see the functor "CompareWord" keeps the set sorted.
A solution is proposed here, but its not what I need, because I am looking for a solution without using (while,for,do while).
I know that I can use this:
//gives a pointer to the first and last range or repeated element "word"
auto p = mWords.equal_range(word);
// compute the distance between the iterators that bound the range AKA frequency
int count = static_cast<int>(std::distance(p.first, p.second));
but I can't quite come up with a solution without loops?

Unlike the other solutions, this iterates over the list exactly once. This is important, as iterating over a structure like std::multimap is reasonably high overhead (the nodes are distinct allocations).
There are no explicit loops, but the tail-end recursion will be optimized down to a loop, and I call an algorithm that will run a loop.
template<class Iterator, class Clumps, class Compare>
void produce_clumps( Iterator begin, Iterator end, Clumps&& clumps, Compare&& compare) {
if (begin==end) return; // do nothing for nothing
typedef decltype(*begin) value_type_ref;
// We know runs are at least 1 long, so don't bother comparing the first time.
// Generally, advancing will have a cost similar to comparing. If comparing is much
// more expensive than advancing, then this is sub optimal:
std::size_t count = 1;
Iterator run_end = std::find_if(
std::next(begin), end,
[&]( value_type_ref v ){
if (!compare(*begin, v)) {
++count;
return false;
}
return true;
}
);
// call our clumps callback:
clumps( begin, run_end, count );
// tail end recurse:
return produce_clumps( std::move(run_end), std::move(end), std::forward<Clumps>(clumps), std::forward<Compare>(compare) );
}
The above is a relatively generic algorithm. Here is its use:
int main() {
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords { "A", "A", "B" };
produce_clumps( mWords.begin(), mWords.end(),
[]( mySetItr run_start, mySetItr /* run_end -- unused */, std::size_t count )
{
std::cout << "Word [" << *run_start << "] occurs " << count << " times\n";
},
CompareWords<std::string>{}
);
}
live example
The iterators must refer to a sorted sequence (with regards to the Comparator), then the clumps will be passed to the 3rd argument together with their length.
Every element in the multiset will be visited exactly once with the above algorithm (as a right-hand side argument to your comparison function). Every start of a clump will be visited (length of clump) additional times as a left-hand side argument (including clumps of length 1). There will be exactly N iterator increments performed, and no more than N+C+1 iterator comparisons (N=number of elements, C=number of clumps).

#include <iostream>
#include <algorithm>
#include <set>
#include <iterator>
#include <string>
int main()
{
typedef std::multiset<std::string> mySet;
typedef std::multiset<std::string>::iterator mySetItr;
mySet mWords;
mWords.insert("A");
mWords.insert("A");
mWords.insert("B");
mySetItr it = std::begin(mWords), itend = std::end(mWords);
std::for_each<mySetItr&>(it, itend, [&mWords, &it] (const std::string& word)
{
auto p = mWords.equal_range(word);
int count = static_cast<int>(std::distance(p.first, p.second));
std::cout << word << " " << count << std::endl;
std::advance(it, count - 1);
});
}
Outputs:
A 2
B 1
Live demo link.

Following does the job without explicit loop using recursion:
void print_rec(const mySet& set, mySetItr it)
{
if (it == set.end()) {
return;
}
const auto& word = *it;
auto next = std::find_if(it, set.end(),
[&word](const std::string& s) {
return s != word;
});
std::cout << word << " appears " << std::distance(it, next) << std::endl;
print_rec(set, next);
}
void print(const mySet& set)
{
print_rec(set, set.begin());
}
Demo

How to find if a given key exists in a C++ std::map

I'm trying to check if a given key is in a map and somewhat can't do it:
typedef map<string,string>::iterator mi;
map<string, string> m;
m.insert(make_pair("f","++--"));
pair<mi,mi> p = m.equal_range("f");//I'm not sure if equal_range does what I want
cout << p.first;//I'm getting error here
so how can I print what is in p?

Use map::find and map::end:
if (m.find("f") == m.end()) {
// not found
} else {
// found
}

To check if a particular key in the map exists, use the count member function in one of the following ways:
m.count(key) > 0
m.count(key) == 1
m.count(key) != 0
The documentation for map::find says: "Another member function, map::count, can be used to just check whether a particular key exists."
The documentation for map::count says: "Because all elements in a map container are unique, the function can only return 1 (if the element is found) or zero (otherwise)."
To retrieve a value from the map via a key that you know to exist, use map::at:
value = m.at(key)
Unlike map::operator[], map::at will not create a new key in the map if the specified key does not exist.

C++20 gives us std::map::contains to do that.
#include <iostream>
#include <string>
#include <map>
int main()
{
std::map<int, std::string> example = {{1, "One"}, {2, "Two"},
{3, "Three"}, {42, "Don\'t Panic!!!"}};
if(example.contains(42)) {
std::cout << "Found\n";
} else {
std::cout << "Not found\n";
}
}

You can use .find():
map<string,string>::iterator i = m.find("f");
if (i == m.end()) { /* Not found */ }
else { /* Found, i->first is f, i->second is ++-- */ }

C++17 simplified this a bit more with an If statement with initializer.
This way you can have your cake and eat it too.
if ( auto it{ m.find( "key" ) }; it != std::end( m ) )
{
// Use `structured binding` to get the key
// and value.
const auto&[ key, value ] { *it };
// Grab either the key or value stored in the pair.
// The key is stored in the 'first' variable and
// the 'value' is stored in the second.
const auto& mkey{ it->first };
const auto& mvalue{ it->second };
// That or just grab the entire pair pointed
// to by the iterator.
const auto& pair{ *it };
}
else
{
// Key was not found..
}

m.find == m.end() // not found
If you want to use other API, then find go for m.count(c)>0
if (m.count("f")>0)
cout << " is an element of m.\n";
else
cout << " is not an element of m.\n";

I think you want map::find. If m.find("f") is equal to m.end(), then the key was not found. Otherwise, find returns an iterator pointing at the element found.
The error is because p.first is an iterator, which doesn't work for stream insertion. Change your last line to cout << (p.first)->first;. p is a pair of iterators, p.first is an iterator, p.first->first is the key string.
A map can only ever have one element for a given key, so equal_range isn't very useful. It's defined for map, because it's defined for all associative containers, but it's a lot more interesting for multimap.

template <typename T, typename Key>
bool key_exists(const T& container, const Key& key)
{
return (container.find(key) != std::end(container));
}
Of course if you wanted to get fancier you could always template out a function that also took a found function and a not found function, something like this:
template <typename T, typename Key, typename FoundFunction, typename NotFoundFunction>
void find_and_execute(const T& container, const Key& key, FoundFunction found_function, NotFoundFunction not_found_function)
{
auto& it = container.find(key);
if (it != std::end(container))
{
found_function(key, it->second);
}
else
{
not_found_function(key);
}
}
And use it like this:
std::map<int, int> some_map;
find_and_execute(some_map, 1,
[](int key, int value){ std::cout << "key " << key << " found, value: " << value << std::endl; },
[](int key){ std::cout << "key " << key << " not found" << std::endl; });
The downside to this is coming up with a good name, "find_and_execute" is awkward and I can't come up with anything better off the top of my head...

map<string, string> m;
check key exist or not, and return number of occurs(0/1 in map):
int num = m.count("f");
if (num>0) {
//found
} else {
// not found
}
check key exist or not, and return iterator:
map<string,string>::iterator mi = m.find("f");
if(mi != m.end()) {
//found
//do something to mi.
} else {
// not found
}
in your question, the error caused by bad operator<< overload, because p.first is map<string, string>, you can not print it out. try this:
if(p.first != p.second) {
cout << p.first->first << " " << p.first->second << endl;
}

Be careful in comparing the find result with the the end like for map 'm' as all answer have
done above
map::iterator i = m.find("f");
if (i == m.end())
{
}
else
{
}
you should not try and perform any operation such as printing the key or value with iterator i if its equal to m.end() else it will lead to segmentation fault.

Comparing the code of std::map::find and std::map::count, I'd say the first may yield some performance advantage:
const_iterator find(const key_type& _Keyval) const
{ // find an element in nonmutable sequence that matches _Keyval
const_iterator _Where = lower_bound(_Keyval); // Here one looks only for lower bound
return (_Where == end()
|| _DEBUG_LT_PRED(this->_Getcomp(),
_Keyval, this->_Key(_Where._Mynode()))
? end() : _Where);
}
size_type count(const key_type& _Keyval) const
{ // count all elements that match _Keyval
_Paircc _Ans = equal_range(_Keyval); // Here both lower and upper bounds are to be found, which is presumably slower.
size_type _Num = 0;
_Distance(_Ans.first, _Ans.second, _Num);
return (_Num);
}

I know this question already has some good answers but I think my solution is worth of sharing.
It works for both std::map and std::vector<std::pair<T, U>> and is available from C++11.
template <typename ForwardIterator, typename Key>
bool contains_key(ForwardIterator first, ForwardIterator last, Key const key) {
using ValueType = typename std::iterator_traits<ForwardIterator>::value_type;
auto search_result = std::find_if(
first, last,
[&key](ValueType const& item) {
return item.first == key;
}
);
if (search_result == last) {
return false;
} else {
return true;
}
}

map <int , char>::iterator itr;
for(itr = MyMap.begin() ; itr!= MyMap.end() ; itr++)
{
if (itr->second == 'c')
{
cout<<itr->first<<endl;
}
}

If you want to compare pair of map you can use this method:
typedef map<double, double> TestMap;
TestMap testMap;
pair<map<double,double>::iterator,bool> controlMapValues;
controlMapValues= testMap.insert(std::pair<double,double>(x,y));
if (controlMapValues.second == false )
{
TestMap::iterator it;
it = testMap.find(x);
if (it->second == y)
{
cout<<"Given value is already exist in Map"<<endl;
}
}
This is a useful technique.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Proper value for unordered_map - c++

Related

Best way to calculate a running hash for an unordered_map?

C++ map customize comparator

How to avoid duplicate values in maps using c++

Output over unique elements of `std::multiset` and their frequency using std:: algorithm in C++ (no loops)

How to find if a given key exists in a C++ std::map

Categories

Resources