How to implement a generic hash function in C++

How to implement a generic hash function in C++ - c++

I am trying to implement HashTable in C++ via templates.
Here is the signature:
template<class T1, class T2>
class HashTable {
public:
void add(T1 a, T2 b);
void hashFunction(T1 key, T2 value)
{
// how to implement this function using key as a generic
// we need to know the object type of key
}
};
So, I am unable to move ahead with implementation involving a generic key.
In Java, I could have easily cast the key to string and then be happy with implementing the hash for a key as string. But, in C++, what I know is that there is a concept of RTTI which can dynamically cast an object to the desired object.
How to implement that dynamic cast, if this method is correct at all?
If using template is not the correct approach to implement generics for this case, then please suggest some better approach.

You would typically use std::hash for this, and let type implementors specialize that template as required.
size_t key_hash = std::hash<T1>()(key);
There is no way you can generically implement a hash function for any random type you are given. If two objects are equal, their hash codes must be the same. You could simply run the raw memory of the objects through a hash function, but the types might implement an operator== overload that ignores some piece of object data (say, a synchronization object). In that case you could potentially (and very easily) return different hash values for equal objects.

It's strange that you want hash both key and value. How you will be able to get value by only key after it?
If you are using C++11 good idea is to use std::hash<T1> that provided for some types (integers, string, pointers) and maybe specialized for other classes. Besides, it's good idea to allow change it using third template parameter class. See how unordered_map is done
template<typename K, typename V, typename H = std::hash<T>>
class HashTable {
//...
void hashFunction(const T1& key) {
hash = H()(key);
//process hash somehow, probably you need get reminder after division to number of buckets or something same
return hash % size;
}
}
It seems impossible to write you own hasher, that will work OK for most types, because equality operator maybe overridden in some complicated way

Related

Proper design for C++ class wrapping multiple possible types

I am trying to implement a C++ class which will wrap a value (among other things). This value may be one of a number of types (string, memory buffer, number, vector).
The easy way to implement this would be to do something like this
class A {
Type type;
// Only one of these will be valid data; which one will be indicated by `type` (an enum)
std::wstring wData{};
long dwData{};
MemoryBuffer lpData{};
std::vector<std::wstring> vData{};
};
This feels inelegant and like it wastes memory.
I also tried implementing this as a union, but it came with significant development overhead (defining custom destructors/move constructors/copy constructors), and even with all of those, there were still some errors I encountered.
I've also considered making A a base class and making a derived class for each possible value it can hold. This also feels like it isn't a great way to solve the problem.
My last approach would be to make each member an std::optional, but this still adds some overhead.
Which approach would be the best? Or is there another design that works better than any of these?

Use std::variant. It is typesafe, tested and exactly the right thing for a finite number of possible types.
It also gets rid of the type enum.
class A {
std::variant<std::wstring, long, MemoryBuffer, std::vector<std::wstring>> m_data{}; // default initializes the wstring.
public
template<class T>
void set_data(T&& data) {
m_data = std::forward<T>(data);
}
int get_index() { // returns index of type.
m_data.index();
}
long& get_ldata() {
return std::get<long>(m_data); // throws if long is not the active type
}
// and the others, or
template<class T>
T& get_data() { // by type
return std::get<T>(m_data);
}
template<int N>
auto get_data() { // by index
return std::get<N>(m_data);
}
};
// using:
A a;
a.index() == 0; // true
a.set_data(42);
a.index() == 1; // true
auto l = a.get<long>(); // l is now of type long, has value 42
a.get<long>() = 1;
l = a.get<1>();
PS: This example does not even include the coolest (in my opinion) feature of std::variant: std::visit I am not sure what you want to do with your class, so I cannot create a meaningful example. If you let me know, I will think about it.

You basically want QVariant without the rest of Qt, then :)?
As others have mentioned, you could use std::variant and put using MyVariant = std::variant<t1, t2, ...> in some common header, and then use it everywhere it's called for. This isn't as inelegant as you may think - the specific types to be passed around are only provided in one place. It is the only way to do it without building a metatype machinery that can encapsulate operations on any type of an object.
That's where boost::any comes in: it does precisely that. It wraps concepts, and thus supports any object that implements these concepts. What concepts are required depends on you, but in general you'd want to choose enough of them to make the type usable and useful, yet not too many so as to exclude some types prematurely. It's probably the way to go, you'd have: using MyVariant = any<construct, _a>; then (where construct is a contract list, an example of which is as an example in the documentation, and _a is a type placeholder from boost::type_erasure.
The fundamental difference between std::variant and boost::any is that variant is parametrized on concrete types, whereas any is parametrized on contracts that the types are bound to. Then, any will happily store an arbitrary type that fulfills all of those contracts. The "central location" where you define an alias for the variant type will constantly grow with variant, as you need to encapsulate more type. With any, the central location will be mostly static, and would change rarely, since changing the contract requirements is likely to require fixes/adaptations to the carried types as well as points of use.

Type erasure: Retrieving value - type check at compile time

I have a limited set of very different types, from which I want to store instances in a single collection, specifically a map. To this end, I use the type erasure idiom, ie. I have a non-templated base class from which the templated, type specific class inherits:
struct concept
{
virtual std::unique_ptr<concept> copy() = 0; // example member function
};
template <typename T>
struct model : concept
{
T value;
std::unique_ptr<concept> copy() override { ... }
}
I then store unique_ptrs to concept in my map. To retrieve the value, I have a templated function which does a dynamic cast to the specified type.
template <typename T>
void get(concept& c, T& out) {
auto model = dynamic_cast<model<T>>(&c);
if (model == nullptr) throw "error, wrong type";
out = model->value;
}
What I don't like about this solution is, that specifying a wrong T is only detected at runtime. I'd really really like this to be done at compile time.
My options are as I see the following, but I don't think they can help here:
Using ad hoc polymorphism by specifying free functions with each type as an overload, or a template function, but I do not know where to store the result.
Using CRTP won't work, because then the base class would need to be templated.
Conceptually I would need a virtual function which takes an instance of a class where the result will be stored. However since my types are fundamentally different, this class would need to be templated, which does not work with virtual.
Anyways, I'm not even sure if this is logically possible, but I would be very glad if there was a way to do this.

For a limited set of types, your best option is variant. You can operate on a variant most easily by specifying what action you would take for every single variant, and then it can operate on a variant correctly. Something along these lines:
std::unordered_map<std::string, std::variant<Foo, Bar>> m;
m["a_foo"] = Foo{};
m["a_bar"] = Bar{};
for (auto& e : m) {
std::visit(overloaded([] (Foo&) { std::cerr << "a foo\n"; }
[] (Bar&) { std::cerr << "a bar\n"; },
e.second);
}
std::variant is c++17 but is often available in the experimental namespace beforehand, you can also use the version from boost. See here for the definition of overloaded: http://en.cppreference.com/w/cpp/utility/variant/visit (just a small utility the standard library unfortunately doesn't provide).
Of course, if you are expecting that a certain key maps to a particular type, and want to throw an error if it doesn't, well, there is no way to handle that at compile time still. But this does let you write visitors that do the thing you want for each type in the variant, similar to a virtual in a sense but without needing to actually have a common interface or base class.

You cannot do compile-time type checking for an erased type. That goes against the whole point of type erasure in the first place.
However, you can get an equivalent level of safety by providing an invariant guarantee that the erased type will match the expected type.
Obviously, wether that's feasible or not depends on your design at a higher level.
Here's an example:
class concept {
public:
virtual ~concept() {}
};
template<typename T>
struct model : public concept {
T value;
};
class Holder {
public:
template<typename T>
void addModel() {
map.emplace(std::type_index(typeid(T)), std::make_unique<model<T><());
}
template<typename T>
T getValue() {
auto found = types.find(std::type_index(typeid(T)));
if(found == types.end()) {
throw std::runtime_error("type not found");
}
// no need to dynamic cast here. The invariant is covering us.
return static_cast<model<T>*>(found->second.get())->value;
}
private:
// invariant: map[type] is always a model<type>
std::map<std::type_index, std::unique_ptr<concept>> types;
};
The strong encapsulation here provides a level of safety almost equivalent to a compile-time check, since map insertions are aggressively protected to maintain the invariant.
Again, this might not work with your design, but it's a way of handling that situation.

Your runtime check occurs at the point where you exit type erasure.
If you want to compile time check the operation, move it within the type erased boundaries, or export enough information to type erase later.
So enumerate the types, like std variant. Or enumerate the algorithms, like you did copy. You can even mix it, like a variant of various type erased sub-algorithms for the various kinds of type stored.
This does not support any algorithm on any type polymorphism; one of the two must be enumerated for things to resolve at compile time and not have a runtime check.

Is there a standard name / templated prototype for "congruent hash" vs "identity hash"?

I have a templated class Foo that can do identity comparisons (via ==), but has a function Foo::sameStructureAs(Foo const & other) for more of a "value" vs. "pointer" notion of equality.
I'd like to make an unordered_map which overrides the hash function and the equality predicate. They default to std::equal_to<Key> and std::hash<Key>...which I provide for my type, based on identity. But I need them to be comparing on the basis of my sameStructureAs.
Since Foo is a template, I do something like this:
template <class> struct same_structure_as;
template <class> struct hash_structure;
template <class T>
struct hash_structure<Foo<T>>
{
size_t operator() (Foo<T> const & value) const
{
// whatever...
}
};
template <class T>
struct same_structure_as<Foo<T>>
{
bool operator() (Foo<T> const & left, Foo<T> const & right) const
{
// whatever...
}
};
Which seems like I'm following roughly the strategy of the classes in std:: for this purpose, and creating something general. So does that look right?
Secondly: Is there any precedent for the naming of this or a prototype already existing in std::? I've thought about words like isomorphic or congruent. It seems like something that would come up often in designing classes when you have more than one idea of what it means to be "equal".

If you are looking at a type through this "different" notion of comparison or equality, ask if what you really need is another type. Perhaps there is some kind of cast or coercion you would apply to the underlying data so that its new notion of equality/assignment/comparison fits this test you are designing.
That way you can properly implement the std:: functions for that type... and use it in collections without having to pass in these extra predicates. So perhaps call the type with pointer-equality semantics FooRef and the one with value semantics Foo.
If for some reason you can't do this...then looking at the names one wants to parallel:
std::equal_to<Key>
std::hash<Key>
Keeping the equal_to and hash in there is probably the closest to "standard" one will accomplish. So rather than introducing new terms like congruence or isometric,call out exactly what is equal or getting hashed...and use the above as suffixes:
std::content_equal_to<Key>
std::content_hash<Key>
If it's the "structure" of something being compared you can apply that with structure_equal_to and structure_hash. Only caveat being that "struct"/"structure" has meaning in C++, so it may lead a reader to think it's comparing type_info or something like that.

Why use std::type_index instead of std::type_info*

I need to key some data in a map by a type. Currently I have something like this:
struct TypeInfoComparer
{
bool operator()(std::type_info const* a, std::type_info const* b) const
{
return a->before(*b);
};
};
std::map<std::type_info const*, Foo, TypeInfoComparer> d_fooByTypeId;
Which I can then look up from using (for example, in a template method having <typename T>:
auto pair = d_fooByTypeId.find(&typeid(T));
However today I was reading about std::type_index which seems to be intended for use in such a case as this.
I'm interested in improving my C++ knowledge. Can someone please explain whether I should modify my code to use std::type_index, and why? Is there a reason beyond being able to remove the TypeInfoComparer?

type_index is "a simple wrapper for type_info which can be used as an index type in associative containers (23.4) and in unordered associative containers (23.5)". If you use type_index instead of type_info*, you will free yourself from having to provide an explicit comparator in your maps. The only cost is that you need to #include <typeindex>.
Another benefit is that it will allow you to switch to (or also use) hashmaps (aka unordered_maps).
On the whole, since it simplifies your code, I'd say "go for it".

I don't think using a pointer to the result returned from typeid(x) is guaranteed to always yield the same result. In particular it seems problematic to guarantee the same object to be returned when shared libraries are used. The intended use of std::type_info for sorting is to use before() member. The class std::type_index wraps this into a simoler interface.

Implementing a C++ hashtable class using template

I am trying to implement a hashtable in C++ that sort of like the Java version
I would like it has the form of
template <class Key, class Value>
class hashtable {
...
}
Soon enough I notice that I need to somehow convert Key into a number, so that I can use the simple hash function
int h(int hashkey) {
return hashkey%some_prime;
}
But the headache is, Key type is only known at run time. Is it possible to check what type Key is on run time in C++. Or I have to create this hashtable class with different type manually? That is easier to do but ugly. Anyone know an elegant solution?

C++ templates are usually duck typed, meaning that you can explicitly cast to an integeral type in the template, and all types that implement the appropriate conversion can be used as a key. That has the disadvantage of requiring that the classes implement the conversion operator in such a fashion that the hash function will be decent, which is asking for a lot.
You could instead provide a function template
template<typename T> int hash (T t);
Along with specializations for the built in types, and any user that wants to use a custom class as key will just have to provide his own specialzation. I think this is a decent approach.

You seem to have a few misunderstandings. Key type is known at compile time - that's the whole point of using templates. Secondly, there is really no such thing as a completely generic hash function that will work on any type. You need to implement different hash functions for different types, using function overloading or template specialization. There are many common hash functions used for strings, for example.
Finally, C++11 includes a standard hash table (std::unordered_map) which you can use instead of implementing your own.

If you would like try to implement a "generic" one, perhaps you can start with a skeleton much like this:
template <class T, class K>
struct HashEntry { // you would need this to deal with collision
T curr;
K next;
}
template <class V, size_t n>
class HashTable {
void insert(V v)
{
...
size_t idx = v->getHashCode(n);
...
}
private:
HashEntry <V> table_[n];
}
It is usually instantiated with some pointer type, to figure out where a pointer should go, it requires the type implement member function "getHashCode" ...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js