C++ using an unordered key combination for a map lookup - c++

I want to create an unordered_map, where the key is a combination of two integers. As the key values order shall be ignored when comparing, I thought of using an unordered_set as key like this:
#include <unordered_set>
#include <unordered_map>
using namespace std;
int main ()
{
unordered_set<int> key_set1 = {21, 42};
unordered_map<unordered_set<int>, char> map;
map[key_set1] = 'a';
...
unordered_set<int> key_set2 = {42, 21};
if(map[key_set2] == map[key_set2])
success();
}
On compile time it looks like some problem with the hash function:
error: no match for call to ‘(const std::hash<std::unordered_set<int> >) (const std::unordered_set<int>&)’
noexcept(declval<const _Hash&>()(declval<const _Key&>()))>
How can I solve this? Or is there a better way/data structure?

There is no predefined hash function for an unordered_set so you have to implement your own; there's documentation for that here http://en.cppreference.com/w/cpp/utility/hash.
Basically you'd need:
// custom specialization of std::hash can be injected in namespace std
namespace std
{
template<> struct hash<unordered_set<int>>
{
std::size_t operator()(unordered_set<int> const& s) const
{
std::size_t hash = 0;
for (auto && i : s) hash ^= std::hash<int>()(i);
return hash;
}
};
}
Now xor isn't the recommended way to combine hash functions, but it should work in this case specifically because it's both unordered and set. Because it's unordered you need a function that's commutative. The recommended hash combiners don't have this property as you usually want "abc" to hash differently than "bca". Secondly the fact that it's a set insures that you won't have any duplicate elements. This saves your hash function from failing because x ^ x == 0.
I should also mention that you want to define this in the cpp file so you don't expose this specific hash implementation on std types to everyone.

The problem is that unordered_set is not built for being used as a key in an unordered container.
If you always use exactly two ints, it would be more economical for you to use a pair of ints as a key, and add a function that makes a properly ordered pair from two integers:
pair<int,int> unordered_key(int a, int b) {
return a<b?make_pair(a, b):make_pair(b, a);
}

As pointed out earlier to use std::pair directly as a key you would need to explicitly define a hash function for it. If you want to avoid that, you can just do a bit-wise combination of 2 unsigned integers into 1:
uint64_t makeKey(uint32_t a, uint32_t b)
{
return a < b ? (static_cast<uint64_t>(a) << 32) + b : (static_cast<uint64_t>(b) << 32) + a;
}
int main ()
{
auto key_set1 = makeKey(21, 42);
unordered_map<uint64_t, char> map;
map[key_set1] = 'a';
//...
auto key_set2 = makeKey(42, 21);
if(map[key_set1] == map[key_set2])
std::cout << "success" << std::endl;
}

Since the order is not important here, you can use std::pair with a customized factory to force the order of the two integers:
std::pair<int, int> make_my_pair(int x, int y) {
return std::make_pair(std::min(x, y), std::max(x, y));
}
Of course this is only going to work if you use make_my_pair consistently.
Alternatively you can define your own key class that has a similar property.

Related

C++ - Hash/Map a std::vector<uint64_t> in a single uint64_t

I need to map a std::vector<uint64_t> to a single uint64_t. It is possible to do? I thought to use a hash function. Is that a solution?
For example, this vector:
std::vector<uint64_t> v {
16377,
2631694347470643681,
11730294873282192384
}
should be converted into one uint64_t.
If a hash function is not a good solution (e.g. high percentage of collision) there is an alternative to do this mapping?
I need to hash a std::vector<uint64_t> to a single uint64_t. It is possibile to do?
Yes, variable length hash functions exist, and it's possible to implement them in C++.
C++ standard library comes with a few hash functions, but unfortunately not for vector (other than for the bool specialisation). We can reuse the hash function provided for string views, but this is a bit of a cludge:
const char* data = reinterpret_cast<const char*>(v.data());
std::size_t size = v.size() * sizeof(v[0]);
std::hash<std::string_view> hash;
std::cout << hash(std::string_view(data, size));
Note that using this is reasonable only in the case std::has_unique_object_representations_v is true of the element type of vector. I think it's reasonable to assume that to be the case for std::uint64_t.
A caveat when using standard library hash functions is that they don't have exact specification and as such you cannot rely on hashes being identical across separate systems. You should use another hash function if that is a concern.
You can create an std::map<std::vector<uint64_t>, uint64_t>, create a compare function for your vectors and just keep adding them to a map while incrementing a counter.
That counter will be your hash value.
The comment above in code :
#include <array>
#include <algorithm>
#include <vector>
#include <iostream>
static std::array<size_t,5> primes = { 3,5,7,11,13 };
static std::uint64_t hash(const std::vector<std::uint64_t>& v)
{
std::uint64_t hash = v[0];
for (size_t n = 1; n < std::min(primes.size(), v.size()); ++n) hash += (primes[n]*v[n]);
return hash;
}
int main()
{
std::vector<uint64_t> v{ 16377, 2631694347470643681, 11730294873282192384 };
std::cout << hash(v);
return 0;
}

Proper way to use large amount of known constant variables

The program receives a vector that represents a character.
It then compares the received vector with all the known vectors that represents characters.
I'm not sure how should I use the known vectors.
A few options I thought of:
1) Using global variables:
vector<int> charA{1,2,3,4,5};
vector<int> charB{5,3,7,1};
...
vector<int> charZ{3,2,5,6,8,9,0}
char getLetter(const vector<int> &input){
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
2) Declaring all variables in function:
char getLetter(const vector<int> &input){
vector<int> charA{1,2,3,4,5};
vector<int> charB{5,3,7,1};
...
vector<int> charZ{3,2,5,6,8,9,0}
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
3) Passing the variables
char getLetter(const vector<int> &input, vector<int> charA,
vector<int> charB... , vecotr<int> charZ){
if(compareVec(input,charA) return 'A';
if(compareVec(input,charB) return 'B';
....
if(compareVec(input,charZ) return 'Z';
}
This sounds like an application for a perfect hash generator (link to GNU gperf).
To quote the documentation
gperf is a perfect hash function generator written in C++. It
transforms an n element user-specified keyword set W into a perfect
hash function F. F uniquely maps keywords in W onto the range 0..k,
where k >= n-1. If k = n-1 then F is a minimal perfect hash function.
gperf generates a 0..k element static lookup table and a pair of C
functions. These functions determine whether a given character string
s occurs in W, using at most one probe into the lookup table.
If this is not a suitable solution then I'd recommend using function statics. You want to avoid function locals as this will badly affect performance, and globals will pollute your namespace.
So something like
char getLetter(const vector<int> &input){
static vector<int> charA{1,2,3,4,5};
static vector<int> charB{5,3,7,1};
Giving you snippet, I'd go for:
char getLetter(const vector<int> &input)
{
struct
{
char result;
std::vector<char> data;
} const data[]=
{
{ 'A', {1,2,3,4,5}, },
{ 'B', {5,3,7,1}, },
...
};
for(auto const & probe : data)
{
if (comparevec(input, probe.data))
return probe.result;
}
// input does not match any of the given values
throw "That's not the input I'm looking for!";
}
For 40 such pairs, if this is not called in a tight inner loop, the linear search is good enough.
Alternatives:
use a std::map<std::vector<char>, char> to map valid values to results, and turn compareVec into a functor suitable as key-comaprison for the map, and initialize it the same way.
as above, but use a std::unordered_map.
use gperf, as suggested by #PaulFloyd above
I would start by suggesting that you hash or represent the numbers in their binary collection so that you are not comparing vectors each time as that would prove very costly. That said, your question is about how to make a dictionary, so whether you improve your keys as I suggested or not, I'd prefer the use of a map:
map<vector<int>, char, function<bool(const vector<int>&, const vector<int>&)>> dictionary([](const auto& lhs, const auto& rhs){
const auto its = mismatch(cbegin(lhs), cend(lhs), cbegin(rhs), cend(rhs));
return its.second != cend(rhs) && (its.first == cend(lhs) || *its.first < *its.second);
});
If possible dictionary should be constructed constant with an initializer_list containing all mappings and the comparator. If mappings must be looked up before you are guaranteed to have finished all letters then you obviously can't construct constant. Either way this map should be a private member of the class responsible for translating strings. Adding and mapping should be public functions of the class.
Live Example

map comparator for pair of objects in c++

I want to use a map to count pairs of objects based on member input vectors. If there is a better data structure for this purpose, please tell me.
My program returns a list of int vectors. Each int vector is the output of a comparison between two int vectors ( a pair of int vectors). It is, however, possible, that the output of the comparison differs, though the two int vectors are the same (maybe in different order). I want to store how many different outputs (int vectors) each pair of int vectors has produced.
Assuming that I can access the int vector of my object with .inp()
Two pairs (a1,b1) and (a2,b2) should be considered equal, when (a1.inp() == a2.inp() && b2.inp() == b1.inp()) or (a1.inp() == b2.inp() and b1.inp() == a2.inp()).
This answer says:
The keys in a map a and b are equivalent by definition when neither a
< b nor b < a is true.
class SomeClass
{
vector <int> m_inputs;
public:
//constructor, setter...
vector<int> inp() {return m_inputs};
}
typedef pair < SomeClass, SomeClass > InputsPair;
typedef map < InputsPair, size_t, MyPairComparator > InputsPairCounter;
So the question is, how can I define equivalency of two pairs with a map comparator. I tried to concatenate the two vectors of a pair, but that leads to (010,1) == (01,01), which is not what I want.
struct MyPairComparator
{
bool operator() (const InputsPair & pair1, const InputsPair pair2) const
{
vector<int> itrc1 = pair1.first->inp();
vector<int> itrc2 = pair1.second->inp();
vector<int> itrc3 = pair2.first->inp();
vector<int> itrc4 = pair2.second->inp();
// ?
return itrc1 < itrc3;
}
};
I want to use a map to count pairs of input vectors. If there is a better data structure for this purpose, please tell me.
Using std::unordered_map can be considered instead due to 2 reasons:
if hash implemented properly it could be faster than std::map
you only need to implement hash and operator== instead of operator<, and operator== is trivial in this case
Details on how implement hash for std::vector can be found here. In your case possible solution could be to join both vectors into one, sort it and then use that method to calculate the hash. This is straightforward solution, but can produce to many hash collisions and lead to worse performance. To suggest better alternative would require knowledge of the data used.
As I understand, you want:
struct MyPairComparator
{
bool operator() (const InputsPair& lhs, const InputsPair pair2) const
{
return std::minmax(std::get<0>(lhs), std::get<1>(lhs))
< std::minmax(std::get<0>(rhs), std::get<1>(rhs));
}
};
we order the pair {a, b} so that a < b, then we use regular comparison.

Using an unordered_map with arrays as keys

I don't understand why I can't have an unordered_map with an array<int,3> as the key type:
#include <unordered_map>
using namespace std;
int main() {
array<int,3> key = {0,1,2};
unordered_map< array<int,3> , int > test;
test[key] = 2;
return 0;
}
I get a long error, the most pertinent part being
main.cpp:11:9: error: no match for ‘operator[]’ (operand types are std::unordered_map<std::array<int, 3ul>, int>’ and ‘std::array<int, 3ul>’)
test[key] = 2;
^
Are arrays not eligible to be keys because they miss some requirements?
You have to implement a hash. Hash tables depending on hashing the key, to find a bucket to put them in. C++ doesn't magically know how to hash every type, and in this particular case it doesn't know how to hash an array of 3 integers by default. You can implement a simple hash struct like this:
struct ArrayHasher {
std::size_t operator()(const std::array<int, 3>& a) const {
std::size_t h = 0;
for (auto e : a) {
h ^= std::hash<int>{}(e) + 0x9e3779b9 + (h << 6) + (h >> 2);
}
return h;
}
};
And then use it:
unordered_map< array<int,3> , int, ArrayHasher > test;
Edit: I changed the function for combining hashes from a naive xor, to the function used by boost for this purpose: http://www.boost.org/doc/libs/1_35_0/doc/html/boost/hash_combine_id241013.html. This should be robust enough to actually use.
Why?
As mentioned in http://www.cplusplus.com/reference/unordered_map/unordered_map/
Internally, the elements in the unordered_map are not sorted in any
particular order with respect to either their key or mapped values,
but organized into buckets depending on their hash values to allow for
fast access to individual elements directly by their key values (with
a constant average time complexity on average).
Now as per your question we need to hash an array which has not been implemented internally in standard c++.
How to get over with it?
So if you want to map an array to a value you must implement your own std::hash http://en.cppreference.com/w/cpp/utility/hash for which you might get some help from C++ how to insert array into hash set?.
Some work around
If you are free to use boost then it can provide you with hashing of arrays and many other types. It basically uses hash_combine method for which you can have a look at http://www.boost.org/doc/libs/1_49_0/boost/functional/hash/hash.hpp.
The relevant error is
error: no match for call to '(const std::hash<std::array<int, 3ul> >) (const std::array<int, 3ul>&)'
The unordered_map needs a hash of the key, and it looks for an overload of std::hash to do that. You can extend the namespace std with a suitable hash function.
Compiled with msvc14 gives the following error:
"The C++ Standard doesn't provide a hash for this type."
I guess this is self-explanatory.

C++ unordered_map fail when used with a vector as key

Background: I am comming from the Java world and I am fairly new to C++ or Qt.
In order to play with unordered_map, I have written the following simple program:
#include <QtCore/QCoreApplication>
#include <QtCore>
#include <iostream>
#include <stdio.h>
#include <string>
#include <unordered_map>
using std::string;
using std::cout;
using std::endl;
typedef std::vector<float> floatVector;
int main(int argc, char *argv[]) {
QCoreApplication a(argc, argv);
floatVector c(10);
floatVector b(10);
for (int i = 0; i < 10; i++) {
c[i] = i + 1;
b[i] = i * 2;
}
std::unordered_map<floatVector, int> map;
map[b] = 135;
map[c] = 40;
map[c] = 32;
std::cout << "b -> " << map[b] << std::endl;
std::cout << "c -> " << map[c] << std::endl;
std::cout << "Contains? -> " << map.size() << std::endl;
return a.exec();
}
Unfortunately, I am running into the folowing error which isn't inspiring. There is not even a line number.
:-1: error: collect2: ld returned 1 exit status
Any idea of the origin of the problem?
§23.2.5, paragraph 3, says:
Each unordered associative container is parameterized by Key, by a function object type Hash that meets the Hash requirements (17.6.3.4) and acts as a hash function for argument values of type Key, and by a binary predicate Pred that induces an equivalence relation on values of type Key.
Using vector<float> as Key and not providing explicit hash and equivalence predicate types means the default std::hash<vector<float>> and std::equal_to<vector<float>> will be used.
The std::equal_to for the equivalence relation is fine, because there is an operator == for vectors, and that's what std::equal_to uses.
There is however, no std::hash<vector<float>> specialization, and that's probably what the linker error you didn't show us says. You need to provide your own hasher for this to work.
An easy way of writing such an hasher is to use boost::hash_range:
template <typename Container> // we can make this generic for any container [1]
struct container_hash {
std::size_t operator()(Container const& c) const {
return boost::hash_range(c.begin(), c.end());
}
};
Then you can use:
std::unordered_map<floatVector, int, container_hash<floaVector>> map;
Of course, if you need different equality semantics in the map you need to define the hash and equivalence relation appropriately.
1. However, avoid this for hashing unordered containers, as different orders will produce different hashes, and the order in unordered container is not guaranteed.
I found R. Martinho Fernandes's answer unsuitable for competitive programming since most of the times you have to deal with a provided IDE and cannot use an external library such as boost. You can use the following method if you'd like to make the best of STL.
As already stated above, you just need to write a hash function. And it should specialize for the kind of data stored in your vector. The following hash function assumes int type data:
struct VectorHasher {
int operator()(const vector<int> &V) const {
int hash = V.size();
for(auto &i : V) {
hash ^= i + 0x9e3779b9 + (hash << 6) + (hash >> 2);
}
return hash;
}
};
Note that you can use any kind of operation to generate a hash. You just need to be creative so that collisions are minimized. For example, hash^=V[i], hash|=V[i], hash+=V[i]*V[i] or even hash+=(V[i]<<i)*(V[i]<<i)*(V[i]<<i) are all valid until of course, your hash doesn't overflows.
Finally to use this hash function with your unordered_map, initialize it as follows:
unordered_map<vector<int>,string,VectorHasher> hashMap;