Using an unordered_map with arrays as keys - c++

I don't understand why I can't have an unordered_map with an array<int,3> as the key type:
#include <unordered_map>
using namespace std;
int main() {
array<int,3> key = {0,1,2};
unordered_map< array<int,3> , int > test;
test[key] = 2;
return 0;
}
I get a long error, the most pertinent part being
main.cpp:11:9: error: no match for ‘operator[]’ (operand types are std::unordered_map<std::array<int, 3ul>, int>’ and ‘std::array<int, 3ul>’)
test[key] = 2;
^
Are arrays not eligible to be keys because they miss some requirements?

You have to implement a hash. Hash tables depending on hashing the key, to find a bucket to put them in. C++ doesn't magically know how to hash every type, and in this particular case it doesn't know how to hash an array of 3 integers by default. You can implement a simple hash struct like this:
struct ArrayHasher {
std::size_t operator()(const std::array<int, 3>& a) const {
std::size_t h = 0;
for (auto e : a) {
h ^= std::hash<int>{}(e) + 0x9e3779b9 + (h << 6) + (h >> 2);
}
return h;
}
};
And then use it:
unordered_map< array<int,3> , int, ArrayHasher > test;
Edit: I changed the function for combining hashes from a naive xor, to the function used by boost for this purpose: http://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html. This should be robust enough to actually use.

Why?
As mentioned in http://www.cplusplus.com/reference/unordered_map/unordered_map/
Internally, the elements in the unordered_map are not sorted in any
particular order with respect to either their key or mapped values,
but organized into buckets depending on their hash values to allow for
fast access to individual elements directly by their key values (with
a constant average time complexity on average).
Now as per your question we need to hash an array which has not been implemented internally in standard c++.
How to get over with it?
So if you want to map an array to a value you must implement your own std::hash http://en.cppreference.com/w/cpp/utility/hash for which you might get some help from C++ how to insert array into hash set?.
Some work around
If you are free to use boost then it can provide you with hashing of arrays and many other types. It basically uses hash_combine method for which you can have a look at http://www.boost.org/doc/libs/1_49_0/boost/functional/hash/hash.hpp.

The relevant error is
error: no match for call to '(const std::hash<std::array<int, 3ul> >) (const std::array<int, 3ul>&)'
The unordered_map needs a hash of the key, and it looks for an overload of std::hash to do that. You can extend the namespace std with a suitable hash function.

Compiled with msvc14 gives the following error:
"The C++ Standard doesn't provide a hash for this type."
I guess this is self-explanatory.

Related

C++ - Hash/Map a std::vector<uint64_t> in a single uint64_t

I need to map a std::vector<uint64_t> to a single uint64_t. It is possible to do? I thought to use a hash function. Is that a solution?
For example, this vector:
std::vector<uint64_t> v {
16377,
2631694347470643681,
11730294873282192384
}
should be converted into one uint64_t.
If a hash function is not a good solution (e.g. high percentage of collision) there is an alternative to do this mapping?
I need to hash a std::vector<uint64_t> to a single uint64_t. It is possibile to do?
Yes, variable length hash functions exist, and it's possible to implement them in C++.
C++ standard library comes with a few hash functions, but unfortunately not for vector (other than for the bool specialisation). We can reuse the hash function provided for string views, but this is a bit of a cludge:
const char* data = reinterpret_cast<const char*>(v.data());
std::size_t size = v.size() * sizeof(v[0]);
std::hash<std::string_view> hash;
std::cout << hash(std::string_view(data, size));
Note that using this is reasonable only in the case std::has_unique_object_representations_v is true of the element type of vector. I think it's reasonable to assume that to be the case for std::uint64_t.
A caveat when using standard library hash functions is that they don't have exact specification and as such you cannot rely on hashes being identical across separate systems. You should use another hash function if that is a concern.
You can create an std::map<std::vector<uint64_t>, uint64_t>, create a compare function for your vectors and just keep adding them to a map while incrementing a counter.
That counter will be your hash value.
The comment above in code :
#include <array>
#include <algorithm>
#include <vector>
#include <iostream>
static std::array<size_t,5> primes = { 3,5,7,11,13 };
static std::uint64_t hash(const std::vector<std::uint64_t>& v)
{
std::uint64_t hash = v[0];
for (size_t n = 1; n < std::min(primes.size(), v.size()); ++n) hash += (primes[n]*v[n]);
return hash;
}
int main()
{
std::vector<uint64_t> v{ 16377, 2631694347470643681, 11730294873282192384 };
std::cout << hash(v);
return 0;
}

How do I insert into boost::unordered_set<boost::unordered_set<int> >?

The following code fails to compile, but if I remove the commented line, it compiles and runs correctly. I was only intending to use boost because C++ doesn't provide a hash function for std::unordered_set<int> by default.
#include <iostream>
#include <boost/unordered_set.hpp>
int main() {
boost::unordered_set<boost::unordered_set<int> > fam;
boost::unordered_set<int> s;
s.insert(5);
s.insert(6);
s.insert(7);
std::cout << s.size() << std::endl;
fam.insert(s); // this is the line causing the problem
return 0;
}
Edit 1:
I want to be more clear than I was in the OP. First I know that the idea of the boost::unordered_set<> is that it is implemented with a hash table, rather than a BST. I know that anything that is to be a template type to the boost::unordered_set<> needs to have a hash function and equality function provided. I also know that by default the std::unordered_set<> does not have a hash function defined which is why the following code does not compile:
#include <iostream>
#include <unordered_set>
int main() {
std::unordered_set<std::unordered_set<int> > fam;
return 0;
}
However, I thought that boost provides hash functions for all their containers which is why I believed the following code does compile:
#include <iostream>
#include <boost/unordered_set.hpp>
int main() {
boost::unordered_set<boost::unordered_set<int> > fam;
return 0;
}
So now, I'm not sure why boost code just above compiles, but the code in the OP does not. Was I wrong that boost provides a hash function for all their containers? I would really like to avoid having to define a new hash function, especially when my actual intended use is to have a much more complicated data structure: boost::unordered_map<std::pair<boost::unordered_map<int, int>, boost::unordered_map<int, int> >, int>. It seems like this should be a solved problem that I shouldn't have to define myself, since IIRC python can handle sets of sets no problem.
An unordered_set (or _map) uses hashing, and requires a hash operator to be defined for its elements. There is no hash operator defined for boost::unordered_set<int>, therefore it cannot put such a type of element into your set.
You may write your own hash function for this. For example, this is a typical generic hash approach, though you may want to customize it for your particular data. If you drop this code into your example, it should work:
namespace boost {
std::size_t hash_value(boost::unordered_set<int> const& arg) {
std::size_t hashCode = 1;
for (int e : arg)
hashCode = 31 * hashCode + hash<int>{}(e);
return hashCode;
}
}

c++ - sorting a vector of custom structs based on frequency

I need to find the most frequent element in an array of custom structs. There is no custom ID to them just matching properties.
I was thinking of sorting my vector by frequency but I have no clue how to do that.
I'm assuming by frequency you mean the number of times an identical structure appears in the array.
You probably want to make a hash function (or overload std::hash<> for your type) for your custom struct. Then iterate over your array, incrementing the value on an unordered_map<mytype, int> for every struct in the array. This will give you the frequency in the value field. Something like the below would work:
std::array<mytype> elements;
std::unordered_map<mytype, int> freq;
mytype most_frequent;
int max_frequency = 0;
for (const mytype &el : elements) {
freq[el]++;
if (freq[el] > max_frequency) {
most_frequent = el;
}
}
For this to work, the map will need to be able to create a hash for the above function. By default, it tries to use std::hash<>. You are expressly allowed by the standard to specialize this template in the standard namespace for your own types. You could do this as follows:
struct mytype {
std::string name;
double value;
};
namespace std {
template <> struct hash<mytype> {
size_t operator()(const mytype &t) const noexcept {
// Use standard library hash implementations of member variable types
return hash<string>()(t.name) ^ hash<double>()(t.value)
}
}
}
The primary goal is to ensure that any two variables that do not contain exactly the same values will generate a different hash. The above XORs the results of the standard library's hash function for each type together, which according to Mark Nelson is probably as good as the individual hashing algorithms XOR'd together. An alternative algorithm suggested by cppreference's hash reference is the Fowler-Noll-Vo hash function.
Look at std::sort and the example provided in the ref, where you actually pass your own comparator to do the trick you want (in your case, use the frequencies). Of course, a lambda function can be used too, if you wish.

Why can't I store my objects in an unordered_set?

I understand a set is ordered, thus adding an object without overloading the < operator doesn't allow to say which object is smaller to keep the container sorted. However, I don't understand why this isn't possible with an unordered_set.
If I try something like this:
#include <iostream>
#include <string
#include <unordered_set>
struct someType{
string name;
int code;
};
int main(){
std::unordered_set <someType> myset;
myset.insert({"aaa",123});
myset.insert({"bbb",321});
myset.insert({"ccc",213});
return 0;
}
I get a couple of errors like:
c:\qt\qt5.1.0\tools\mingw48_32\lib\gcc\i686-w64-mingw32\4.8.0\include\c++\bits\hashtable_policy.h:1070: error: invalid use of incomplete type 'struct std::hash'
c:\qt\qt5.1.0\tools\mingw48_32\lib\gcc\i686-w64-mingw32\4.8.0\include\c++\bits\functional_hash.h:58: error: declaration of 'struct std::hash'
error: no matching function for call to 'std::unordered_set::unordered_set()'
c:\qt\qt5.1.0\tools\mingw48_32\lib\gcc\i686-w64-mingw32\4.8.0\include\c++\bits\hashtable_policy.h:1103: error: no match for call to '(const std::hash) (const someType&)'
c:\qt\qt5.1.0\tools\mingw48_32\lib\gcc\i686-w64-mingw32\4.8.0\include\c++\bits\stl_function.h:208: error: no match for 'operator==' (operand types are 'const someType' and 'const someType')
Why is that and how can I fix it?
To use type in unordered_set or unordered_map you need hashing function for your type. For common types, like int or std::string - hashing function is provided by standard library. For your type, you can overload standard std::hash, like this:
namespace std {
template <> struct hash<someType> {
size_t operator()(const someType & x) const {
std::hash<std::string> h;
return h(x.name);
// or simply return x.code
// or do something more interesting,
// like xor'ing hashes from both members of struct
}
};
}
Another way is to provide your own type with overloaded operator() and put it as hash template argument in unordered_set, like this:
struct someTypeHasher {
size_t operator()(const someType& x) const {
return x.code;
}
};
std::unordered_set<someType, someTypeHasher> myset;
Good reading for theory about hash based containers is here
Also, do not forget, that you need to overload operator== for someType, without it - it will also not work.
As explained in the answer given by Starl1ght, you need to provide a hash function for someType. However, I would combine all members of your class by that hash function. Otherwise, you might get a lot of collisions, for example, if the same name occurs very often, but with different code values. For creating a hash function, you can make use of Boost, but you can also handcraft it.
Starl1ght also mentioned that you need to overload operator== for someType,
but you can also define a separate comparison function instead and provide it to the unordered_set. Moreover, you can use lambda expressions instead of defining the hash and comparison functions. If you put everything together, then your code could be written as follows:
auto hash = [](const someType& st){
return std::hash<std::string>()(st.name) * 31 + std::hash<int>()(st.code);
};
auto equal = [](const someType& st1, const someType& st2){
return st1.name == st2.name && st1.code == st2.code;
};
std::unordered_set<someType, decltype(hash), decltype(equal)> myset(8, hash, equal);
Code on Ideone

C++ using an unordered key combination for a map lookup

I want to create an unordered_map, where the key is a combination of two integers. As the key values order shall be ignored when comparing, I thought of using an unordered_set as key like this:
#include <unordered_set>
#include <unordered_map>
using namespace std;
int main ()
{
unordered_set<int> key_set1 = {21, 42};
unordered_map<unordered_set<int>, char> map;
map[key_set1] = 'a';
...
unordered_set<int> key_set2 = {42, 21};
if(map[key_set2] == map[key_set2])
success();
}
On compile time it looks like some problem with the hash function:
error: no match for call to ‘(const std::hash<std::unordered_set<int> >) (const std::unordered_set<int>&)’
noexcept(declval<const _Hash&>()(declval<const _Key&>()))>
How can I solve this? Or is there a better way/data structure?
There is no predefined hash function for an unordered_set so you have to implement your own; there's documentation for that here http://en.cppreference.com/w/cpp/utility/hash.
Basically you'd need:
// custom specialization of std::hash can be injected in namespace std
namespace std
{
template<> struct hash<unordered_set<int>>
{
std::size_t operator()(unordered_set<int> const& s) const
{
std::size_t hash = 0;
for (auto && i : s) hash ^= std::hash<int>()(i);
return hash;
}
};
}
Now xor isn't the recommended way to combine hash functions, but it should work in this case specifically because it's both unordered and set. Because it's unordered you need a function that's commutative. The recommended hash combiners don't have this property as you usually want "abc" to hash differently than "bca". Secondly the fact that it's a set insures that you won't have any duplicate elements. This saves your hash function from failing because x ^ x == 0.
I should also mention that you want to define this in the cpp file so you don't expose this specific hash implementation on std types to everyone.
The problem is that unordered_set is not built for being used as a key in an unordered container.
If you always use exactly two ints, it would be more economical for you to use a pair of ints as a key, and add a function that makes a properly ordered pair from two integers:
pair<int,int> unordered_key(int a, int b) {
return a<b?make_pair(a, b):make_pair(b, a);
}
As pointed out earlier to use std::pair directly as a key you would need to explicitly define a hash function for it. If you want to avoid that, you can just do a bit-wise combination of 2 unsigned integers into 1:
uint64_t makeKey(uint32_t a, uint32_t b)
{
return a < b ? (static_cast<uint64_t>(a) << 32) + b : (static_cast<uint64_t>(b) << 32) + a;
}
int main ()
{
auto key_set1 = makeKey(21, 42);
unordered_map<uint64_t, char> map;
map[key_set1] = 'a';
//...
auto key_set2 = makeKey(42, 21);
if(map[key_set1] == map[key_set2])
std::cout << "success" << std::endl;
}
Since the order is not important here, you can use std::pair with a customized factory to force the order of the two integers:
std::pair<int, int> make_my_pair(int x, int y) {
return std::make_pair(std::min(x, y), std::max(x, y));
}
Of course this is only going to work if you use make_my_pair consistently.
Alternatively you can define your own key class that has a similar property.