Same key, multiple entries for std::unordered_map?

Same key, multiple entries for std::unordered_map? - c++

I have a map inserting multiple values with the same key of C string type.
I would expect to have a single entry with the specified key.
However the map seems to take it's address into consideration when uniquely identifying a key.
#include <cassert>
#include <iostream>
#include <string>
#include <unordered_map>
typedef char const* const MyKey;
/// #brief Hash function for StatementMap keys
///
/// Delegates to std::hash<std::string>.
struct MyMapHash {
public:
size_t operator()(MyKey& key) const {
return std::hash<std::string>{}(std::string(key));
}
};
typedef std::unordered_map<MyKey, int, MyMapHash> MyMap;
int main()
{
// Build std::strings to prevent optimizations on the addresses of
// underlying C strings.
std::string key1_s = "same";
std::string key2_s = "same";
MyKey key1 = key1_s.c_str();
MyKey key2 = key2_s.c_str();
// Make sure addresses are different.
assert(key1 != key2);
// Make sure hashes are identical.
assert(MyMapHash{}(key1) == MyMapHash{}(key2));
// Insert two values with the same key.
MyMap map;
map.insert({key1, 1});
map.insert({key2, 2});
// Make sure we find them in the map.
auto it1 = map.find(key1);
auto it2 = map.find(key2);
assert(it1 != map.end());
assert(it2 != map.end());
// Get values.
int value1 = it1->second;
int value2 = it2->second;
// The first one of any of these asserts fails. Why is there not only one
// entry in the map?
assert(value1 == value2);
assert(map.size() == 1u);
}
A print in the debugger shows that map contains two elements just after inserting them.
(gdb) p map
$4 = std::unordered_map with 2 elements = {
[0x7fffffffda20 "same"] = 2,
[0x7fffffffda00 "same"] = 1
}
Why does this happen if the hash function which delegates to std::hash<std::string> only takes it's value into account (this is asserted in the code)?
Moreover, if this is the intended behaviour, how can I use a map with C string as key, but with a 1:1 key-value mapping?

The reason is that hash maps (like std::unordered_map) do not only rely on the hash function for determining if two keys are equal. The hash function is the first comparison layer, after that the elements are always also compared by value. The reason is that even with good hash functions you might have collisions where two different keys yield the same hash value - but you still need to be able to save both entries in the hashmap. There are various strategies to handle that, you can find more information on looking for collision resolution for hash maps.
In your examples both entries have the same hash value but different values. The values are just compared by the standard comparison function, which compares the char* pointers, which are different. Therefore the value comparison fails and you get two entries in the map. To solve your issue you also need to define a custom equality function for your hash map, which can be done by specifiying the fourth template parameter KeyEqual for std::unordered_map.

This fails because the unordered_map does not and cannot solely rely on the hash function for the key to differentiate keys, but it must also compare keys with the same hash for equality. And comparing two char pointers compares the address pointed to.
If you want to change the comparison, pass a KeyEqual parameter to the map in addition to the hash.
struct MyKeyEqual
{
bool operator()(MyKey const &lhs, MyKey const &rhs) const
{
return std::strcmp(lhs, rhs) == 0;
}
};

unordered_map needs to be able to perform two operations on the key - checking equality, and obtaining hash code. Naturally, two unequal keys are allowed to have different hash codes. When this happens, unordered map applies hash collision resolution strategy to treat these unequal keys as distinct.
That is precisely what happens when you supply a character pointer for the key, and provide an implementation of hash to it: the default equality comparison for pointers kicks in, so two different pointers produce two different keys, even though the content of the corresponding C strings is the same.
You can fix it by providing a custom implementation of KeyEqual template parameter to perform actual comparison of C strings, for example, by calling strcmp:
return !strcmp(lhsKey, rhsKey);

You didn't define a map of keys but a map of pointers to a key.
typedef char const* const MyKey;
The compiler can optimize the two instances of "name" and use only one instance in the const data segment, but that can happen or not. A.k.a. undefined behavior.
Your map should contain the key itself. Make the key a std::string or similar.

Related

Key already exists in unordered_map, but "find" returns as not found

I constructed an unordered_map using key type rot3d, which is defined below:
#ifndef EPS6
#define EPS6 1.0e-6
#endif
struct rot3d
{
double agl[3]; // alpha, beta, gamma in ascending order
bool operator==(const rot3d &other) const
{
// printf("== used\n");
return abs(agl[0]-other.agl[0]) <= EPS6 && abs(agl[1]-other.agl[1]) <= EPS6 && abs(agl[2]-other.agl[2]) <= EPS6;
}
};
Equality of rot3d is defined by the condition that each component is within a small range of the same component from the other rot3d object.
Then I defined a value type RotMat:
struct RotMat // rotation matrix described by a pointer to matrix and trunction number
{
cuDoubleComplex *mat = NULL;
int p = 0;
};
In the end, I defined a hash table from rot3d to RotMat using self-defined hash function:
struct rot3dHasher
{
std::size_t operator()(const rot3d& key) const
{
using std::hash;
return (hash<double>()(key.agl[0]) ^ (hash<double>()(key.agl[1]) << 1) >> 1) ^ (hash<double>()(key.agl[2]) << 1);
}
};
typedef std::unordered_map<rot3d,RotMat,rot3dHasher> HashRot2Mat;
The problem I met was, a key was printed to be in the hash table, but the function "find" didn't find it. For instance, I printed a key using an iterator of the hash table:
Key: (3.1415926535897931,2.8198420991931510,0.0000000000000000)
But then I also got this information indicating that the key was not found:
(3.1415926535897931,2.8198420991931505,0.0000000000000000) not found in the hash table.
Although the two keys are not 100% the same, the definition of "==" should ensure them to be equal. So why am I seeing this key in the hash table, but it was not found by "find"?

Hash-based equivalence comparisons are allowed to have false positives, which are resolved by calling operator==.
Hash-based equivalence comparisons are not allowed to have false negatives, but yours does. Your two "not 100% the same" keys have different hash values, so the element is not even found as a candidate for testing using operator==.
It is necessary that (a == b) implies (hash(a) == hash(b)) and your definitions break this precondition. A hashtable with a broken precondition can misbehave in many ways, including not finding the item you are looking for.
Use a different data structure that is not dependent on hashing, but nearest-neighbor matching. An octtree would be a smart choice.

Equality of rot3d is defined by the condition that each component is within a small range of the same component from the other rot3d object.
This is not an equivalence. You must have that a==b and b==c implies a==c. Yours fails this requirement.
Using a non-equality in a std algorithm or container breaks the std preconditions, which means your program is ill-formed, no diagnostic required.
Also your hash hashes equivalent values differently. Also illegal.
One way to fix this is to build buckets. Each bucket has a size of your epsilon.
To find if a value is in your buckets, check the bucket you'd put the probe value in, plus all adjacent buckets (3^3 or 27 of them).
For each element, double check distance.
struct bucket; // array of 3 doubles, each a multiple of EPS6. Has == and hash. Also construct-from-rod3d that rounds.
bucket get_bucket(rot3d);
Now, odds are that you are just caching. And within EPS-ish is good enough.
template<class T, class B>
struct adapt:T{
template<class...Args>
auto operator()(Args&&...args)const{
return T::operator()( static_cast<B>(std::forward<Args>(args))... );
}
using is_transparent=void;
};
std::unordered_map<bucket, RotMat, adapt<std::hash<rot3d>, bucket>, adapt<std::equal_to<>, bucket>> map;
here we convert rod3ds to buckets on the fly.

C++ unordered_map where key is also unordered_map

I am trying to use an unordered_map with another unordered_map as a key (custom hash function). I've also added a custom equal function, even though it's probably not needed.
The code does not do what I expect, but I can't make heads or tails of what's going on. For some reason, the equal function is not called when doing find(), which is what I'd expect.
unsigned long hashing_func(const unordered_map<char,int>& m) {
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
}
bool equal_func(const unordered_map<char,int>& m1, const unordered_map<char,int>& m2) {
return m1 == m2;
}
int main() {
unordered_map<
unordered_map<char,int>,
string,
function<unsigned long(const unordered_map<char,int>&)>,
function<bool(const unordered_map<char,int>&, const unordered_map<char,int>&)>
> mapResults(10, hashing_func, equal_func);
unordered_map<char,int> t1 = getMap(str1);
unordered_map<char,int> t2 = getMap(str2);
cout<<(t1 == t2)<<endl; // returns TRUE
mapResults[t1] = "asd";
cout<<(mapResults.find(t2) != mapResults.end()); // returns FALSE
return 0;
}

First of all, the equality operator is certainly required, so you should keep it.
Let's look at your unordered map's hash function:
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
Since it's an unordered map, by definition, the iterator can iterate over the unordered map's keys in any order. However, since the hash function must produce the same hash value for the same key, this hash function will obviously fail in that regard.
Additionally, I would also expect that the hash function will also include the values of the unorderered map key, in addition to the keys themselves. I suppose that you might want to do it this way -- for two unordered maps to be considered to be the same key as long as their keys are the same, ignoring their values. It's not clear from the question what your expectation is, but you may want to think it over.

Comparing two std::unordered_map objects using == compares whether the maps contain the same keys. It does nothing to tell whether they contain them in the same order (it's an unordered map, after all). However, your hashing_func depends on the order of items in the map: hash<string>()("ab") is in general different from hash<string>()("ba").

A good place to start is with what hashing_func returns for each map, or more easily what the string construction in hashing_func generates.
A more obviously correct hash function for such a type could be:
unsigned long hashing_func(const unordered_map<char,int>& m) {
unsigned long res = 0;
for (auto& e : m)
res ^ hash<char>()(e.first) ^ hash<int>()(e.second);
return res;
}

Replacing std::map with std::set and search by index

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.

C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.

Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.

If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

std::map override element under certain circumstances at insertion time

let's say I have a map whose key is a pair and whose custom comparator guarantees unicity against the first element of that pair.
class comparator
{
public:
bool operator()(const std::pair<std::string, std::int>& left,
const std::pair<std::string, std::int>& right)
{
return left.first < right.first;
}
};
std::map<std::pair<std::string, std::int>, foo, comparator>;
Now I'd like this map to be more intelligent than that, if possible.
Instead of being rejected at insertion time in case a key with the same string as first element of the pair already exists, I'd to overwrite the "already existing element" if the pair's integer (.second) of the "possibly going to be inserted element" is bigger.
Of course I can do this by looking in to the map for the key, getting the key details and overwriting it if necessary.
Alternatively I could adopt a post-insertion approach with a multimap on top of which I would iterate to clean up duplicates keeping just the key with the biggest pair integer.
The question is : can I do that natively by overriding part of the stl implementation ([] operator - insert method) or improving my custom comparator and then simply relying on map's insert method ?
I don't know if this is accepted but we could imagine having a non const comprator which would be able of updating the already stored (key, value) pair under certain circumstances.

ValueThe answer to your question is that you cannot do it.
There are two problems with your proposed implementation:
The keys must remain const as they are the index for the map
Independent of what the comparator did to the elements it is comparing the std::map would still insert the item before or after left based on the return of the comparator
The solution to the problem is as suggested by #MvG. Your key should not be paired, it is your value that should be paired.
This has the added benefit that you don't need a custom comparator.
The problem is that you will need a custom inserter:
std::pair< int, foo >& tempValue = _myMap[ keyToInsert ];
if( valueToInsert.first >= tempValue.first )
{
tempValue = valueToInsert;
}
Note that this will only work if all the valueToInsert.firsts that you use are positive, cause the default constructor for an int is 0. If you had negative valueToInsert.firsts the default constructed value pair would be inserted instead of your element.

std::map keys in C++

I have a requirement to create two different maps in C++. The Key is of type CHAR* and the Value is a pointer to a struct. I am filling 2 maps with these pairs, in separate iterations. After creating both maps I need find all such instances in which the value of the string referenced by the CHAR* are same.
For this I am using the following code :
typedef struct _STRUCTTYPE
{
..
} STRUCTTYPE, *PSTRUCTTYPE;
typedef pair <CHAR *,PSTRUCTTYPE> kvpair;
..
CHAR *xyz;
PSTRUCTTYPE abc;
// after filling the information;
Map.insert (kvpair(xyz,abc));
// the above is repeated x times for the first map, and y times for the second map.
// after both are filled out;
std::map<CHAR *, PSTRUCTTYPE>::iterator Iter,findIter;
for (Iter=iteratedMap->begin();Iter!=iteratedMap->end();mapIterator++)
{
char *key = Iter->first;
printf("%s\n",key);
findIter=otherMap->find(key);
//printf("%u",findIter->second);
if (findIter!=otherMap->end())
{
printf("Match!\n");
}
}
The above code does not show any match, although the list of keys in both maps show obvious matches. My understanding is that the equals operator for CHAR * just equates the memory address of the pointers.
My question is, what should i do to alter the equals operator for this type of key or could I use a different datatype for the string?

My understanding is that the equals operator for CHAR* just equates the memory address of the pointers.
Your understanding is correct.
The easiest thing to do would be to use std::string as the key. That way you get comparisons for the actual string value working without much effort:
std::map<std::string, PSTRUCTTYPE> m;
PSTRUCTTYPE s = bar();
m.insert(std::make_pair("foo", s));
if(m.find("foo") != m.end()) {
// works now
}
Note that you might leak memory for your structs if you don't always delete them manually. If you can't store by value, consider using smart pointers instead.
Depending on your usecase, you don't have to neccessarily store pointers to the structs:
std::map<std::string, STRUCTTYPE> m;
m.insert(std::make_pair("foo", STRUCTTYPE(whatever)));
A final note: typedefing structs the way you are doing it is a C-ism, in C++ the following is sufficient:
typedef struct STRUCTTYPE {
// ...
} *PSTRUCTTYPE;

If you use std::string instead of char * there are more convenient comparison functions you can use. Also, instead of writing your own key matching code, you can use the STL set_intersection algorithm (see here for more details) to find the shared elements in two sorted containers (std::map is of course sorted). Here is an example
typedef map<std::string, STRUCTTYPE *> ExampleMap;
ExampleMap inputMap1, inputMap2, matchedMap;
// Insert elements to input maps
inputMap1.insert(...);
// Put common elements of inputMap1 and inputMap2 into matchedMap
std::set_intersection(inputMap1.begin(), inputMap1.end(), inputMap2.begin(), inputMap2.end(), matchedMap.begin());
for(ExampleMap::iterator iter = matchedMap.begin(); iter != matchedMap.end(); ++iter)
{
// Do things with matched elements
std::cout << iter->first << endl;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Same key, multiple entries for std::unordered_map? - c++

Related

Key already exists in unordered_map, but "find" returns as not found

C++ unordered_map where key is also unordered_map

Replacing std::map with std::set and search by index

std::map override element under certain circumstances at insertion time

std::map keys in C++

Categories

Resources