Strict Weak Ordering and std::set / std::map - c++

#include <iostream>
#include <set>
#include <tuple>
struct Key {
int field1;
int field2;
Key(int field1, int field2) : field1(field1), field2(field2) {}
bool operator<(const Key& other) const {
// Is this acceptable?! Seems to work
if (field2 == 0 || other.field2 == 0) {
return field1 < other.field1;
} else {
return std::tie(field1, field2) < std::tie(other.field1, other.field2);
}
}
};
int main() {
std::set<Key> values{Key(4,3), Key(5,9), Key(5,7), Key(5,8), Key(6,1)};
std::cout << values.find(Key(5,0))->field2 << std::endl; // Prints '7'
auto range = values.equal_range(Key(5,0));
for (auto i = range.first; i != range.second; i++) {
std::cout << i->field2; // Prints '789'
}
return 0;
}
Field2 is not always available in my data, so sometimes I use a wildcard value of 0, which can match any value for which field1 matches. Is this valid in C++ if I never insert elements that have a wildcard value, and only ever look them up in the set? I'm okay with the find function returning any of the values in this case which happens rarely in my code, though hopefully it would be the same value when called repeatedly.
According to the specification, it seems like strict weak ordering is not required for binary_search, which should be the only algorithm used on the data structure when performing a lookup, right? Or is there some undefined behavior I should worry about here?
25.4 Sorting and related operations
... For algorithms other than those described in 25.4.3 to work
correctly, comp has to induce a strict weak ordering on the values...
25.4.3 Binary search

You're mistaken. std::set::find does a lookup in a binary search tree (in a typical implementation). That might seem like binary search algorithm, but the algorithms in 25.4.3 are not typically used for the lookup. A tree supports only non-random-access iterators and binary search with linear iterators is much slower than a lookup using the knowledge that the data is in a BST.
The comparator of std::set must comply to the Compare concept, which does require strict weak ordering.
Is this valid in C++ if I never insert elements that have a wildcard value, and only ever look them up in the set?
Technically no, since you're breaking the requirements. At the very least you will have indeterminate results, when looking up {x, 0} from a set that contains {x, a} and {x, b}. Either could be found. If that doesn't matter, then I doubt a typical implementation would pose trouble. What you're doing is not guaranteed to work by the standard though, which is enough for most people to shy away from it.

Related

Key already exists in unordered_map, but "find" returns as not found

I constructed an unordered_map using key type rot3d, which is defined below:
#ifndef EPS6
#define EPS6 1.0e-6
#endif
struct rot3d
{
double agl[3]; // alpha, beta, gamma in ascending order
bool operator==(const rot3d &other) const
{
// printf("== used\n");
return abs(agl[0]-other.agl[0]) <= EPS6 && abs(agl[1]-other.agl[1]) <= EPS6 && abs(agl[2]-other.agl[2]) <= EPS6;
}
};
Equality of rot3d is defined by the condition that each component is within a small range of the same component from the other rot3d object.
Then I defined a value type RotMat:
struct RotMat // rotation matrix described by a pointer to matrix and trunction number
{
cuDoubleComplex *mat = NULL;
int p = 0;
};
In the end, I defined a hash table from rot3d to RotMat using self-defined hash function:
struct rot3dHasher
{
std::size_t operator()(const rot3d& key) const
{
using std::hash;
return (hash<double>()(key.agl[0]) ^ (hash<double>()(key.agl[1]) << 1) >> 1) ^ (hash<double>()(key.agl[2]) << 1);
}
};
typedef std::unordered_map<rot3d,RotMat,rot3dHasher> HashRot2Mat;
The problem I met was, a key was printed to be in the hash table, but the function "find" didn't find it. For instance, I printed a key using an iterator of the hash table:
Key: (3.1415926535897931,2.8198420991931510,0.0000000000000000)
But then I also got this information indicating that the key was not found:
(3.1415926535897931,2.8198420991931505,0.0000000000000000) not found in the hash table.
Although the two keys are not 100% the same, the definition of "==" should ensure them to be equal. So why am I seeing this key in the hash table, but it was not found by "find"?
Hash-based equivalence comparisons are allowed to have false positives, which are resolved by calling operator==.
Hash-based equivalence comparisons are not allowed to have false negatives, but yours does. Your two "not 100% the same" keys have different hash values, so the element is not even found as a candidate for testing using operator==.
It is necessary that (a == b) implies (hash(a) == hash(b)) and your definitions break this precondition. A hashtable with a broken precondition can misbehave in many ways, including not finding the item you are looking for.
Use a different data structure that is not dependent on hashing, but nearest-neighbor matching. An octtree would be a smart choice.
Equality of rot3d is defined by the condition that each component is within a small range of the same component from the other rot3d object.
This is not an equivalence. You must have that a==b and b==c implies a==c. Yours fails this requirement.
Using a non-equality in a std algorithm or container breaks the std preconditions, which means your program is ill-formed, no diagnostic required.
Also your hash hashes equivalent values differently. Also illegal.
One way to fix this is to build buckets. Each bucket has a size of your epsilon.
To find if a value is in your buckets, check the bucket you'd put the probe value in, plus all adjacent buckets (3^3 or 27 of them).
For each element, double check distance.
struct bucket; // array of 3 doubles, each a multiple of EPS6. Has == and hash. Also construct-from-rod3d that rounds.
bucket get_bucket(rot3d);
Now, odds are that you are just caching. And within EPS-ish is good enough.
template<class T, class B>
struct adapt:T{
template<class...Args>
auto operator()(Args&&...args)const{
return T::operator()( static_cast<B>(std::forward<Args>(args))... );
}
using is_transparent=void;
};
std::unordered_map<bucket, RotMat, adapt<std::hash<rot3d>, bucket>, adapt<std::equal_to<>, bucket>> map;
here we convert rod3ds to buckets on the fly.

Purpose of having std:less (or similar function) while it just call < operator

Why is std::less (and equivalent other function object) are needed when it just calls < operator and we can anyways overload operators?
Possible answer is in question:
Why is std::less better than "<"?
However I am not totally convinced (specially about weak ordering). Can someone explain a bit more ?
The purpose of std::less and friends is it allows you to generalize your code. Lets say we are writing a sorting function. We start with
void sort(int * begin, int * end) { /* sort here using < /* }
So now we can sort a container we can get int*'s to. Now lets make it a template so it will work with all type
template<typename Iterator>
void sort(Iterator begin, Iterator end) { /* sort here using < /* }
Now we can sort any type and we are using an "Iterator" as our way of saying we need something that points to the element. This is all well and good but this means we require any type passed to provide a operator < for it to work. It also doesn't let use change the sort order.
Now we could use a function pointer but that won't work for built in types as there is no function you can point to. If we instead make an additional template parameter, lets call it Cmp, then we can add another parameter to the function of type Cmp. This will be the comparison function. We would like to provide a default value for that so using std::less makes that very easy and gives us a good "default" behavior.
So with something like
template<typename Iterator, typename Cmp>
void sort(Iterator begin, Iterator end,
Cmp c = std::less<typename std::iterator_traits<Iterator>::value_type>)
{ /* sort here using c /* }
It allows you to sort all built in types, any type that has a operator <, and lets you specify any other way you want to compare elements in the data to sort them.
This is why we need std::less and friends. It lets us make the code generic and flexible without having to write a lot of boiler plate.
Using a function object also gives us some performance benefits. It is easier for the compiler to inline a call to the function call operator then it if it was using a function pointer. It also allows the comparator to have state, like a counter for the number of times it was called. For a more in-depth look at this, see C++ Functors - and their uses.
std::less is just a default policy that takes the natural sorting order of an object (i.e., its comparison operators).
The good thing about using std::less as a default template parameter is that you can tune your sorting algorithm (or your ordered data structure), so that you can decide whether to use the natural sorting order (e.g. minor to major in natural numbers) or a different sorting order for your particular problem (e.g. first odd and then even numbers) without modifying the actual object operators or the algorithm itself.
struct natural {
unsigned value;
natural( unsigned v ) :
value(v)
{
}
bool operator< ( natural other ) const {
return value < other.value;
}
};
struct first_the_odds {
bool operator()( natural left, natural right ) const {
bool left_odd = left.value % 2 != 0;
bool right_odd = right.value % 2 != 0;
if( left_odd == right_odd ) {
return left < right;
} else {
return left_odd;
}
}
};
// Sort me some numbers
std::vector<natural> numbers = { 0, 1, 2, 3, 4 };
std::sort( numbers.begin(), numbers.end(), first_the_odds() );
for( natural n : numbers )
std::cout << n.value << ",";
Output:
1, 3, 0, 2, 4,
The first problem with < is that under the C++ standard, < on pointers can be utter nonsense on anything that doesn't point within the same "parent" object or array.
This is because of the existence of segmented memory models, like the 8086's. It is much faster to compare within segments by ignoring the segment number, and objects cannot span over segments; so < can just compare offsets within segments and ignore segment number.
There are going to be other cases like this on equally strange hardware; imagine hardware where const (ROM) and non-const (RAM) data exist in a separate memory space.
std::less<Pointer> meanwhile guarantees a strict weak ordering despite any quirks in the memory architecture. It will pay the price on every comparison.
The second reason we need std::less is to be able to pass the concept of "less than" around easiliy. Look at std::sort's 3 argument overload:
void sort( Iterator, Iterator, Comparator )
here we can pass how we want to sort in the 3rd parameter. If we pass std::greater<Foo>{} we get one sort, and if we pass std::less<Foo>{} we get the opposite.
By default the 2 argument version sorts like std::less, but once you have greater, greater equal, less equal, adding less just makes sense.
And once you have defined less, using it to describe the behavior of default std sort and std map and the like is easier than repeating all of the wording about how it uses < except if it is on pointers then it generates a strict weak ordering that agrees with < where < has fully specied behavior in the standard.

Why does std::set not have a "contains" member function?

I'm heavily using std::set<int> and often I simply need to check if such a set contains a number or not.
I'd find it natural to write:
if (myset.contains(number))
...
But because of the lack of a contains member, I need to write the cumbersome:
if (myset.find(number) != myset.end())
..
or the not as obvious:
if (myset.count(element) > 0)
..
Is there a reason for this design decision ?
I think it was probably because they were trying to make std::set and std::multiset as similar as possible. (And obviously count has a perfectly sensible meaning for std::multiset.)
Personally I think this was a mistake.
It doesn't look quite so bad if you pretend that count is just a misspelling of contains and write the test as:
if (myset.count(element))
...
It's still a shame though.
To be able to write if (s.contains()), contains() has to return a bool (or a type convertible to bool, which is another story), like binary_search does.
The fundamental reason behind the design decision not to do it this way is that contains() which returns a bool would lose valuable information about where the element is in the collection. find() preserves and returns that information in the form of an iterator, therefore is a better choice for a generic library like STL. This has always been the guiding principle for Alex Stepanov, as he has often explained (for example, here).
As to the count() approach in general, although it's often an okay workaround, the problem with it is that it does more work than a contains() would have to do.
That is not to say that a bool contains() isn't a very nice-to-have or even necessary. A while ago we had a long discussion about this very same issue in the
ISO C++ Standard - Future Proposals group.
It lacks it because nobody added it. Nobody added it because the containers from the STL that the std library incorporated where designed to be minimal in interface. (Note that std::string did not come from the STL in the same way).
If you don't mind some strange syntax, you can fake it:
template<class K>
struct contains_t {
K&& k;
template<class C>
friend bool operator->*( C&& c, contains_t&& ) {
auto range = std::forward<C>(c).equal_range(std::forward<K>(k));
return range.first != range.second;
// faster than:
// return std::forward<C>(c).count( std::forward<K>(k) ) != 0;
// for multi-meows with lots of duplicates
}
};
template<class K>
containts_t<K> contains( K&& k ) {
return {std::forward<K>(k)};
}
use:
if (some_set->*contains(some_element)) {
}
Basically, you can write extension methods for most C++ std types using this technique.
It makes a lot more sense to just do this:
if (some_set.count(some_element)) {
}
but I am amused by the extension method method.
The really sad thing is that writing an efficient contains could be faster on a multimap or multiset, as they just have to find one element, while count has to find each of them and count them.
A multiset containing 1 billion copies of 7 (you know, in case you run out) can have a really slow .count(7), but could have a very fast contains(7).
With the above extension method, we could make it faster for this case by using lower_bound, comparing to end, and then comparing to the element. Doing that for an unordered meow as well as an ordered meow would require fancy SFINAE or container-specific overloads however.
You are looking into particular case and not seeing bigger picture. As stated in documentation std::set meets requirement of AssociativeContainer concept. For that concept it does not make any sense to have contains method, as it is pretty much useless for std::multiset and std::multimap, but count works fine for all of them. Though method contains could be added as an alias for count for std::set, std::map and their hashed versions (like length for size() in std::string ), but looks like library creators did not see real need for it.
Although I don't know why std::set has no contains but count which only ever returns 0 or 1,
you can write a templated contains helper function like this:
template<class Container, class T>
auto contains(const Container& v, const T& x)
-> decltype(v.find(x) != v.end())
{
return v.find(x) != v.end();
}
And use it like this:
if (contains(myset, element)) ...
The true reason for set is a mystery for me, but one possible explanation for this same design in map could be to prevent people from writing inefficient code by accident:
if (myMap.contains("Meaning of universe"))
{
myMap["Meaning of universe"] = 42;
}
Which would result in two map lookups.
Instead, you are forced to get an iterator. This gives you a mental hint that you should reuse the iterator:
auto position = myMap.find("Meaning of universe");
if (position != myMap.cend())
{
position->second = 42;
}
which consumes only one map lookup.
When we realize that set and map are made from the same flesh, we can apply this principle also to set. That is, if we want to act on an item in the set only if it is present in the set, this design can prevent us from writing code as this:
struct Dog
{
std::string name;
void bark();
}
operator <(Dog left, Dog right)
{
return left.name < right.name;
}
std::set<Dog> dogs;
...
if (dogs.contain("Husky"))
{
dogs.find("Husky")->bark();
}
Of course all this is a mere speculation.
Since c++20,
bool contains( const Key& key ) const
is available.
I'd like to point out , as mentioned by Andy, that since C++20 the standard added the contains Member function for maps or set:
bool contains( const Key& key ) const; (since C++20)
Now I'd like to focus my answer regarding performance vs readability.
In term of performance if you compare the two versions:
#include <unordered_map>
#include <string>
using hash_map = std::unordered_map<std::string,std::string>;
hash_map a;
std::string get_cpp20(hash_map& x,std::string str)
{
if(x.contains(str))
return x.at(str);
else
return "";
};
std::string get_cpp17(hash_map& x,std::string str)
{
if(const auto it = x.find(str); it !=x.end())
return it->second;
else
return "";
};
You will find that the cpp20 version takes two calls to std::_Hash_find_last_result while the cpp17 takes only one call.
Now I find myself with many data structure with nested unordered_map.
So you end up with something like this:
using my_nested_map = std::unordered_map<std::string,std::unordered_map<std::string,std::unordered_map<int,std::string>>>;
std::string get_cpp20_nested(my_nested_map& x,std::string level1,std::string level2,int level3)
{
if(x.contains(level1) &&
x.at(level1).contains(level2) &&
x.at(level1).at(level2).contains(level3))
return x.at(level1).at(level2).at(level3);
else
return "";
};
std::string get_cpp17_nested(my_nested_map& x,std::string level1,std::string level2,int level3)
{
if(const auto it_level1=x.find(level1); it_level1!=x.end())
if(const auto it_level2=it_level1->second.find(level2);it_level2!=it_level1->second.end())
if(const auto it_level3=it_level2->second.find(level3);it_level3!=it_level2->second.end())
return it_level3->second;
return "";
};
Now if you have plenty of condition in-between these ifs, using the iterator really is painful, very error prone and unclear, I often find myself looking back at the definition of the map to understand what kind of object was at level 1 or level2, while with the cpp20 version , you see at(level1).at(level2).... and understand immediately what you are dealing with.
So in term of code maintenance/review, contains is a very nice addition.
What about binary_search ?
set <int> set1;
set1.insert(10);
set1.insert(40);
set1.insert(30);
if(std::binary_search(set1.begin(),set1.end(),30))
bool found=true;
contains() has to return a bool. Using C++ 20 compiler I get the following output for the code:
#include<iostream>
#include<map>
using namespace std;
int main()
{
multimap<char,int>mulmap;
mulmap.insert(make_pair('a', 1)); //multiple similar key
mulmap.insert(make_pair('a', 2)); //multiple similar key
mulmap.insert(make_pair('a', 3)); //multiple similar key
mulmap.insert(make_pair('b', 3));
mulmap.insert({'a',4});
mulmap.insert(pair<char,int>('a', 4));
cout<<mulmap.contains('c')<<endl; //Output:0 as it doesn't exist
cout<<mulmap.contains('b')<<endl; //Output:1 as it exist
}
Another reason is that it would give a programmer the false impression that std::set is a set in the math set theory sense. If they implement that, then many other questions would follow: if an std::set has contains() for a value, why doesn't it have it for another set? Where are union(), intersection() and other set operations and predicates?
The answer is, of course, that some of the set operations are already implemented as functions in (std::set_union() etc.) and other are as trivially implemented as contains(). Functions and function objects work better with math abstractions than object members, and they are not limited to the particular container type.
If one need to implement a full math-set functionality, he has not only a choice of underlying container, but also he has a choice of implementation details, e.g., would his theory_union() function work with immutable objects, better suited for functional programming, or would it modify its operands and save memory? Would it be implemented as function object from the start or it'd be better to implement is a C-function, and use std::function<> if needed?
As it is now, std::set is just a container, well-suited for the implementation of set in math sense, but it is nearly as far from being a theoretical set as std::vector from being a theoretical vector.

Replacing std::map with std::set and search by index

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.
C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.
Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.
If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

What is the best way to use a HashMap in C++?

I know that STL has a HashMap API, but I cannot find any good and thorough documentation with good examples regarding this.
Any good examples will be appreciated.
The standard library includes the ordered and the unordered map (std::map and std::unordered_map) containers. In an ordered map (std::map) the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map (std::unordered_map) insert and access is in O(1). It is just another name for a hashtable.
An example with (ordered) std::map:
#include <map>
#include <iostream>
#include <cassert>
int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}
Output:
23
Key: hello Value: 23
If you need ordering in your container and are fine with the O(log n) runtime then just use std::map.
Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map, which has a similar to std::map API (e.g. in the above example you just have to search and replace map with unordered_map).
The unordered_map container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS).
Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1. Thus, for old GCC compilers you can try to use it like this:
#include <tr1/unordered_map>
std::tr1::unordered_map<std::string, int> m;
It is also part of boost, i.e. you can use the corresponding boost-header for better portability.
A hash_map is an older, unstandardized version of what for standardization purposes is called an unordered_map (originally in TR1, and included in the standard since C++11). As the name implies, it's different from std::map primarily in being unordered -- if, for example, you iterate through a map from begin() to end(), you get items in order by key1, but if you iterate through an unordered_map from begin() to end(), you get items in a more or less arbitrary order.
An unordered_map is normally expected to have constant complexity. That is, an insertion, lookup, etc., typically takes essentially a fixed amount of time, regardless of how many items are in the table. An std::map has complexity that's logarithmic on the number of items being stored -- which means the time to insert or retrieve an item grows, but quite slowly, as the map grows larger. For example, if it takes 1 microsecond to lookup one of 1 million items, then you can expect it to take around 2 microseconds to lookup one of 2 million items, 3 microseconds for one of 4 million items, 4 microseconds for one of 8 million items, etc.
From a practical viewpoint, that's not really the whole story though. By nature, a simple hash table has a fixed size. Adapting it to the variable-size requirements for a general purpose container is somewhat non-trivial. As a result, operations that (potentially) grow the table (e.g., insertion) are potentially relatively slow (that is, most are fairly fast, but periodically one will be much slower). Lookups, which cannot change the size of the table, are generally much faster. As a result, most hash-based tables tend to be at their best when you do a lot of lookups compared to the number of insertions. For situations where you insert a lot of data, then iterate through the table once to retrieve results (e.g., counting the number of unique words in a file) chances are that an std::map will be just as fast, and quite possibly even faster (but, again, the computational complexity is different, so that can also depend on the number of unique words in the file).
1 Where the order is defined by the third template parameter when you create the map, std::less<T> by default.
Here's a more complete and flexible example that doesn't omit necessary includes to generate compilation errors:
#include <iostream>
#include <unordered_map>
class Hashtable {
std::unordered_map<const void *, const void *> htmap;
public:
void put(const void *key, const void *value) {
htmap[key] = value;
}
const void *get(const void *key) {
return htmap[key];
}
};
int main() {
Hashtable ht;
ht.put("Bob", "Dylan");
int one = 1;
ht.put("one", &one);
std::cout << (char *)ht.get("Bob") << "; " << *(int *)ht.get("one");
}
Still not particularly useful for keys, unless they are predefined as pointers, because a matching value won't do! (However, since I normally use strings for keys, substituting "string" for "const void *" in the declaration of the key should resolve this problem.)
Evidence that std::unordered_map uses a hash map in GCC stdlibc++ 6.4
This was mentioned at: https://stackoverflow.com/a/3578247/895245 but in the following answer: What data structure is inside std::map in C++? I have given further evidence of such for the GCC stdlibc++ 6.4 implementation by:
GDB step debugging into the class
performance characteristic analysis
Here is a preview of the performance characteristic graph described in that answer:
How to use a custom class and hash function with unordered_map
This answer nails it: C++ unordered_map using a custom class type as the key
Excerpt: equality:
struct Key
{
std::string first;
std::string second;
int third;
bool operator==(const Key &other) const
{ return (first == other.first
&& second == other.second
&& third == other.third);
}
};
Hash function:
namespace std {
template <>
struct hash<Key>
{
std::size_t operator()(const Key& k) const
{
using std::size_t;
using std::hash;
using std::string;
// Compute individual hash values for first,
// second and third and combine them using XOR
// and bit shifting:
return ((hash<string>()(k.first)
^ (hash<string>()(k.second) << 1)) >> 1)
^ (hash<int>()(k.third) << 1);
}
};
}
For those of us trying to figure out how to hash our own classes whilst still using the standard template, there is a simple solution:
In your class you need to define an equality operator overload ==. If you don't know how to do this, GeeksforGeeks has a great tutorial https://www.geeksforgeeks.org/operator-overloading-c/
Under the standard namespace, declare a template struct called hash with your classname as the type (see below). I found a great blogpost that also shows an example of calculating hashes using XOR and bitshifting, but that's outside the scope of this question, but it also includes detailed instructions on how to accomplish using hash functions as well https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/
namespace std {
template<>
struct hash<my_type> {
size_t operator()(const my_type& k) {
// Do your hash function here
...
}
};
}
So then to implement a hashtable using your new hash function, you just have to create a std::map or std::unordered_map just like you would normally do and use my_type as the key, the standard library will automatically use the hash function you defined before (in step 2) to hash your keys.
#include <unordered_map>
int main() {
std::unordered_map<my_type, other_type> my_map;
}