I need to implement a set of sets in my application.
Using QSet with a custom class requires providing a qHash() function and an operator==.
The code is as follows:
class Custom{
int x;
int y;
//some other irrelevant here
}
inline uint qHash(Custom* c){
return (qHash(c->x) ^ qHash(c->y));
}
bool operator==(Custom &c1, Custom &c2){
return ((c1.x==c2.x) && (c1.y == c2.y));
}
//now I can use: QSet<Custom*>
How can I implement qHash(QSet<Custom*>), to be able to use QSet< QSet<SomeClass*> >?
Edit:
Additional question:
In my application the "set of sets" can contain up to 15000 sets. Each subset up to 25 Custom class pointers. How to guarantee that qHash(QSet<Custom*>) will be unique enough?
You cannot implement qHash with boost::hash_range/boost::hash_combine (which is what pmr's answer does, effectively), because QSet is the Qt equivalent of std::unordered_set, and, as the STL name suggests, these containers are unordered, whereas the Boost Documentation states that hash_combine is order-dependent, ie. it will hash permutations to different hash values.
This is a problem because if you naively hash-combine the elements in stored order
you cannot guarantee that two sets that compare equal are, indeed, equal, which is one of the requirements of a hash function:
For all x, y: x == y => qHash(x) == qHash(y)
So, if your hash-combining function needs to produce the same output for any permutation of the input values, it needs to be commutative. Fortunately, both (unsigned) addition and the xor operation just fit the bill:
template <typename T>
inline uint qHash(const QSet<T> &set, uint seed=0) {
return std::accumulate(set.begin(), set.end(), seed,
[](uint seed, const T&value) {
return seed + qHash(value); // or ^
});
}
A common way to hash containers is to combine the hashes of all elements. Boost provides hash_combine and hash_range for this purpose. This should give you an idea how to implement this for the results of your qHash.
So, given your qHash for Custom:
uint qHash(const QSet<Custom*>& c) {
uint seed = 0;
for(auto x : c) {
seed ^= qHash(x) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
return seed;
}
Related
I tried to implement an unordered map for a Class called Pair, that stores an integer and a bitset. Then I found out, that there isn't a hashfunction for this Class.
Now I wanted to create my own hashfunction. But instead of using the XOR function or comparable functions, I wanted to have a hashfunction like the following approach:
the bitsets in my class obviously have fixed size, so I wanted to do the following:
example: for a instance of Pair with the bitset<6> = 101101, and the integer 6:
create a string = "1011016"
and now use the default hashfunction on this string
because the bitsets have fixed size, each key would be unique
how could I implement this approach?
thank you in advance
To expand on a comment, as requested:
Converting to string and then hashing that string would be somewhat slow. At least slower than it needs to be. A faster approach would be to combine the bit patterns, e.g. like this:
struct Pair
{
std::bitset<6> bits;
int intval;
};
template<>
std::hash<Pair>
{
std::size_t operator()(const Pair& pair) const noexcept
{
std::size_t rtrn = static_cast<std::size_t>(pair.intval);
rtrn = (rtrn << pair.bits.size()) | pair.bits.to_ulong();
return rtrn;
}
};
This works on two assumptions:
The upper bits of the integer are generally not interesting
The size of the bitset is always small compared to size_t
I think it is a suitable hash function for use in unordered_map. One may argue that it has poor mixing and a very good hash should change many bits if only a few bits in its input change. But that is not required here. unordered_map is generally designed to work with cheap hash functions. For example GCC's hash for builtin types and pointers is just a static- or reinterpret-cast.
Possible improvements
We can preserve the upper bits by rotating instead of shifting.
template<>
std::hash<Pair>
{
std::size_t operator()(const Pair& pair) const noexcept
{
std::size_t rtrn = static_cast<std::size_t>(pair.intval);
std::size_t intdigits = std::numeric_limits<decltype(pair.intval)>::digits;
std::size_t bitdigits = pair.bits.size();
// can be simplified to std::rotl(rtrn, bitdigits) in C++20
rtrn = (rtrn << bitdigits) | (rtrn >> (intdigits - bitdigits));
rtrn ^= pair.bits.to_ulong();
return rtrn;
}
};
Nothing will change for small integers (except some bitflips for small negative ints). But for large integers we still use the whole range of inputs, which might be of interest for pathological cases such as integer series 2^30, 2^30 + 2^29, 2^30 + 2^28, ...
If the size of the bitset may increase, stop doing fancy stuff and just combine the hashes. I wouldn't just xor them to avoid hash collisions on small integers.
std::hash<Pair>
{
std::size_t operator()(const Pair& pair) const noexcept
{
std::hash<decltype(pair.intval)> ihash;
std::hash<decltype(pair.bits)> bhash;
return ihash(pair.intval) * 31 + bhash(pair.bits);
}
};
I picked the simple polynomial hash approach common in Java. I believe GCC uses the same one internally for string hashing. Someone else may expand on the topic or suggest a better one. 31 is commonly chosen as it is a prime number one off a power of two. So it can be computed quickly as (x << 5) - x
I want to create an unordered_map, where the key is a combination of two integers. As the key values order shall be ignored when comparing, I thought of using an unordered_set as key like this:
#include <unordered_set>
#include <unordered_map>
using namespace std;
int main ()
{
unordered_set<int> key_set1 = {21, 42};
unordered_map<unordered_set<int>, char> map;
map[key_set1] = 'a';
...
unordered_set<int> key_set2 = {42, 21};
if(map[key_set2] == map[key_set2])
success();
}
On compile time it looks like some problem with the hash function:
error: no match for call to ‘(const std::hash<std::unordered_set<int> >) (const std::unordered_set<int>&)’
noexcept(declval<const _Hash&>()(declval<const _Key&>()))>
How can I solve this? Or is there a better way/data structure?
There is no predefined hash function for an unordered_set so you have to implement your own; there's documentation for that here http://en.cppreference.com/w/cpp/utility/hash.
Basically you'd need:
// custom specialization of std::hash can be injected in namespace std
namespace std
{
template<> struct hash<unordered_set<int>>
{
std::size_t operator()(unordered_set<int> const& s) const
{
std::size_t hash = 0;
for (auto && i : s) hash ^= std::hash<int>()(i);
return hash;
}
};
}
Now xor isn't the recommended way to combine hash functions, but it should work in this case specifically because it's both unordered and set. Because it's unordered you need a function that's commutative. The recommended hash combiners don't have this property as you usually want "abc" to hash differently than "bca". Secondly the fact that it's a set insures that you won't have any duplicate elements. This saves your hash function from failing because x ^ x == 0.
I should also mention that you want to define this in the cpp file so you don't expose this specific hash implementation on std types to everyone.
The problem is that unordered_set is not built for being used as a key in an unordered container.
If you always use exactly two ints, it would be more economical for you to use a pair of ints as a key, and add a function that makes a properly ordered pair from two integers:
pair<int,int> unordered_key(int a, int b) {
return a<b?make_pair(a, b):make_pair(b, a);
}
As pointed out earlier to use std::pair directly as a key you would need to explicitly define a hash function for it. If you want to avoid that, you can just do a bit-wise combination of 2 unsigned integers into 1:
uint64_t makeKey(uint32_t a, uint32_t b)
{
return a < b ? (static_cast<uint64_t>(a) << 32) + b : (static_cast<uint64_t>(b) << 32) + a;
}
int main ()
{
auto key_set1 = makeKey(21, 42);
unordered_map<uint64_t, char> map;
map[key_set1] = 'a';
//...
auto key_set2 = makeKey(42, 21);
if(map[key_set1] == map[key_set2])
std::cout << "success" << std::endl;
}
Since the order is not important here, you can use std::pair with a customized factory to force the order of the two integers:
std::pair<int, int> make_my_pair(int x, int y) {
return std::make_pair(std::min(x, y), std::max(x, y));
}
Of course this is only going to work if you use make_my_pair consistently.
Alternatively you can define your own key class that has a similar property.
Discussion:
Let's say I have a struct/class with an arbitrary number of attributes that I want to use as key to a std::unordered_map e.g.,:
struct Foo {
int i;
double d;
char c;
bool b;
};
I know that I have to define a hasher-functor for it e.g.,:
struct FooHasher {
std::size_t operator()(Foo const &foo) const;
};
And then define my std::unordered_map as:
std::unordered_map<Foo, MyValueType, FooHasher> myMap;
What bothers me though, is how to define the call operator for FooHasher. One way to do it, that I also tend to prefer, is with std::hash. However, there are numerous variations e.g.,:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
std::hash<double>()(foo.d) ^
std::hash<char>()(foo.c) ^
std::hash<bool>()(foo.b);
}
I've also seen the following scheme:
std::size_t operator()(Foo const &foo) const {
return std::hash<int>()(foo.i) ^
(std::hash<double>()(foo.d) << 1) ^
(std::hash<char>()(foo.c) >> 1) ^
(std::hash<bool>()(foo.b) << 1);
}
I've seen also some people adding the golden ratio:
std::size_t operator()(Foo const &foo) const {
return (std::hash<int>()(foo.i) + 0x9e3779b9) ^
(std::hash<double>()(foo.d) + 0x9e3779b9) ^
(std::hash<char>()(foo.c) + 0x9e3779b9) ^
(std::hash<bool>()(foo.b) + 0x9e3779b9);
}
Questions:
What are they trying to achieve by adding the golden ration or shifting bits in the result of std::hash.
Is there an "official scheme" to std::hash an object with arbitrary number of attributes of fundamental type?
A simple xor is symmetric and behaves badly when fed the "same" value multiple times (hash(a) ^ hash(a) is zero). See here for more details.
This is the question of combining hashes. boost has a hash_combine that is pretty decent. Write a hash combiner, and use it.
There is no "official scheme" to solve this problem.
Myself, I typically write a super-hasher that can take anything and hash it. It hash combines tuples and pairs and collections automatically, where it first hashes the count of elements in the collection, then the elements.
It finds hash(t) via ADL first, and if that fails checks if it has a manually written hash in a helper namespace (used for std containers and types), and if that fails does a std::hash<T>{}(t).
Then my hash for Foo support looks like:
struct Foo {
int i;
double d;
char c;
bool b;
friend auto mytie(Foo const& f) {
return std::tie(f.i, f.d, f.c, f.b);
}
friend std::size_t hash(Foo const& f) {
return hasher::hash(mytie(f));
}
};
where I use mytie to move Foo into a tuple, then use the std::tuple overload of hasher::hash to get the result.
I like the idea of hashes of structurally similar types having the same hash. This lets me act as if my hash is transparent in some cases.
Note that hashing unordered meows in this manner is a bad idea, as an asymmetric hash of an unordered meow may generate spurious misses.
(Meow is the generic name for map and set. Do not ask me why: Ask the STL.)
The standard hash framework is lacking in respect of combining hashes. Combining hashes using xor is sub-optimal.
A better solution is proposed in N3980 "Types Don't Know #".
The main idea is using the same hash function and its state to hash more than one value/element/member.
With that framework your hash function would look:
template <class HashAlgorithm>
void hash_append(HashAlgorithm& h, Foo const& x) noexcept
{
using std::hash_append;
hash_append(h, x.i);
hash_append(h, x.d);
hash_append(h, x.c);
hash_append(h, x.b);
}
And the container:
std::unordered_map<Foo, MyValueType, std::uhash<>> myMap;
Let's say I would like to create a unordered set of unordered multisets of unsigned int. For this, I need to create a hash function to calculate a hash of the unordered multiset. In fact, it has to be good for CRC as well.
One obvious solution is to put the items in vector, sort them and return a hash of the result. This seems to work, but it is expensive.
Another approach is to xor the values, but obviously if I have one item twice or none the result will be the same - which is not good.
Any ideas how I can implement this cheaper - I have an application that will be doing this thousand for thousands of sets, and relatively big ones.
Since it is a multiset, you would like for the hash value to be the same for identical multisets, whose representation might have the same elements presented, added, or deleted in a different order. You would then like for the hash value to be commutative, easy to update, and change for each change in elements. You would also like for two changes to not readily cancel their effect on the hash.
One operation that meets all but the last criteria is addition. Just sum the elements. To keep the sum bounded, do the sum modulo the size of your hash value. (E.g. modulo 264 for a 64-bit hash.) To make sure that inserting or deleting zero values changes the hash, add one to each value first.
A drawback of the sum is that two changes can readily cancel. E.g. replacing 1 3 with 2 2. To address that, you can use the same approach and sum a polynomial of the entries, still retaining commutativity. E.g. instead of summing x+1, you can sum x2+x+1. Now it is more difficult to contrive sets of changes with the same sum.
Here's a reasonable hash function for std::unordered_multiset<int> it would be better if the computations were taken mod a large prime but the idea stands.
#include <iostream>
#include <unordered_set>
namespace std {
template<>
struct hash<unordered_multiset<int>> {
typedef unordered_multiset<int> argument_type;
typedef std::size_t result_type;
const result_type BASE = static_cast<result_type>(0xA67);
result_type log_pow(result_type ex) const {
result_type res = 1;
result_type base = BASE;
while (ex > 0) {
if (ex % 2) {
res = res * base;
}
base *= base;
ex /= 2;
}
return res;
}
result_type operator()(argument_type const & val) const {
result_type h = 0;
for (const int& el : val) {
h += log_pow(el);
}
return h;
}
};
};
int main() {
std::unordered_set<std::unordered_multiset<int>> mySet;
std::unordered_multiset<int> set1{1,2,3,4};
std::unordered_multiset<int> set2{1,1,2,2,3,3,4,4};
std::cout << "Hash 1: " << std::hash<std::unordered_multiset<int>>()(set1)
<< std::endl;
std::cout << "Hash 2: " << std::hash<std::unordered_multiset<int>>()(set2)
<< std::endl;
return 0;
}
Output:
Hash 1: 2290886192
Hash 2: 286805088
When it's a prime p, the number of collisions is proportional to 1/p. I'm not sure what the analysis is for powers of two. You can make updates to the hash efficient by adding/subtracting BASE^x when you insert/remove the integer x.
Implement the inner multiset as a value->count hash map.
This will allow you to avoid the problem that an even number of elements cancels out via xor in the following way: Instead of xor-ing each element, you construct a new number from the count and the value (e.g. multiplying them), and then you can build the full hash using xor.
I'm trying to implement an unordered_map for a vector< pair < int,int> >. Since there's no such default hash function, I tried to imagine a function of my own :
struct ObjectHasher
{
std::size_t operator()(const Object& k) const
{
std::string h_string("");
for (auto i = k.vec.begin(); i != k.vec.end(); ++i)
{
h_string.push_back(97+i->first);
h_string.push_back(47); // '-'
h_string.push_back(97+i->second);
h_string.push_back(43); // '+'
}
return std::hash<std::string>()(h_string);
}
};
The main idea is to change the list of integers, say ( (97, 98), (105, 107) ) into a formatted string like "a-b+i-k" and to compute its hash thanks to hash < string >(). I choosed the 97, 48 and 43 numbers only to allow the hash string to be easily displayed in a terminal during my tests.
I know this kind of function might be a very naive idea since a good hash function should be fast and strong against collisions. Well, if the integers given to push_back() are greater than 255 I don't know what might happen... So, what do you think of the following questions :
(1) is my function ok for big integers ?
(2) is my function ok for all environments/platforms ?
(3) is my function too slow to be a hash function ?
(4) ... do you have anything better ?
All you need is a function to "hash in" an integer. You can steal such a function from boost:
template <class T>
inline void hash_combine(std::size_t& seed, const T& v)
{
std::hash<T> hasher;
seed ^= std::hash<T>(v) + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
Now your function is trivial:
struct ObjectHasher
{
std::size_t operator()(const Object& k) const
{
std::size_t hash = 0;
for (auto i = k.vec.begin(); i != k.vec.end(); ++i)
{
hash_combine(hash, i->first);
hash_combine(hash, i->second);
}
return hash;
}
};
This function is is probably very slow compared to other hash functions since it uses dynamic memory allocation. Also std::hash<std::string> Is not a very good hash function since it is very general. It's probably better to XOR all ints and use std::hash<int>.
This is a perfectly valid solution. All a hash function needs is a sequence of bytes and by concatenating your elements together as a string you are providing a unique byte representation of the map.
Of course this could become unruly if your map contains a large number of items.