unordered_multimap usage and operator overwriting - c++

I need to use an unordered_multimap for my Note objects and the keys will be the measureNumber member of my objects. I'm trying to implement it as shown here but I'm stuck.
First off, I don't understand why I have to overwrite the operator== before I can use it. I'm also confused about why I need a hash and how to implement it. In this example here, none of those two things is done.
So based on the first example, this is what I have:
class Note {
private:
int measureNumber;
public:
inline bool operator== (const Note &noteOne, const Note &noteTwo);
}
inline bool Note::operator ==(const Note& noteOne, const Note& noteTwo){
return noteOne.measureNumber == noteTwo.measureNumber;
}
I don't know how to implement the hash part though. Any ideas?

std::multimap is based on a sorted binary tree, which uses a less-than operation to sort the nodes.
std::unordered_multimap is based on a hash table, which uses hash and equality operations to organize the nodes without sorting them.
The sorting or hashing is based on the key values. If the objects are the keys, then you need to define these operations. If the keys are of predefined type like int or string, then you don't need to worry about it.
The problem with your pseudocode is that measureNumber is private, so the user of Note cannot easily specify the key to the map. I would recommend making measureNumber public or rethinking the design. (Is measure number really a good key value? I'm guessing this is musical notation.)
std::multimap< int, Note > notes;
Note myNote( e_sharp, /* octave */ 3, /* measure */ 5 );
notes.insert( std::make_pair( myNote.measureNumber, myNote ) );
The objects can be keys and values at the same time, if you use std::multiset or std::unordered_multiset, in which case you would want to define the operator overload (and possibly hash). If operator== (or operator<) is a member function, then the left-hand side becomes this and the right-hand side becomes the sole argument. Usually these functions should be non-member friends. So then you would have
class Note {
private:
int measureNumber;
public:
friend bool operator< (const Note &noteOne, const Note &noteTwo);
}
inline bool operator <(const Note& noteOne, const Note& noteTwo){
return noteOne.measureNumber < noteTwo.measureNumber;
}
This class could be used with std::multiset. To perform a basic lookup, you can construct a dummy object with uninitialized values except for measureNumber — this only works for simple object types.

I need to use an unordered_multimap for my Note objects and the keys
will be the measureNumber member of my objects.
OK - I'm not sure whether you're after a multiset, unordered_multiset, multimap, or unordered_multimap. I know your title refers to unordered_multimap, but the link you provided leads to unordered_multiset. There are a multitude of considerations which should be taken into account when choosing a container, but second-guessing which will be the best-performing without profiling is a risky business.
I don't understand why I have to overwrite the operator== before I can use it.
I'm also confused about why I need a hash and how to implement it.
In this example here, none of those two things is done.
You need the operator== and std::hash as they're used internally by unordered_multimap and unordered_multiset. In the example you linked to, the key is of type int, so operator== and std::hash<int> are already defined. If you choose to use Note as a key, you have to define these yourself.
I'd recommend starting with a multiset if you don't need to change the elements frequently. If you do want to be able to change Notes without erasing and inserting, I'd recommend removing measureNumber as a member of Note and using a multimap<int, Note>.
If you feel an unordered_ version of your container would better suit your needs, you still have the set vs map choice. If you choose unordered_multimap<int, Note> (having removed measureNumber from Note), then as in your linked example, the key is int. So you won't have to define anything special for this to work. If you choose to keep measureNumber as a member of Note and use unordered_multiset<Note>, then Note is the key and so you need to do further work, e.g.
#include <functional>
#include <unordered_set>
class Note; // Forward declaration to allow specialisation of std::hash<>
namespace std {
template<>
class hash<Note> {
public:
size_t operator()(const Note &) const; // declaration of operator() to
// allow befriending by Note
};
}
class Note {
private:
int measureNumber;
public:
// functions befriended to allow access to measureNumber
friend bool operator== (const Note &, const Note &);
friend std::size_t std::hash<Note>::operator()(const Note &) const;
};
inline bool operator== (const Note &noteOne, const Note &noteTwo) {
return noteOne.measureNumber == noteTwo.measureNumber;
}
std::size_t std::hash<Note>::operator()(const Note &note) const {
return std::hash<int>()(note.measureNumber);
}
This lets you create and use std::unordered_multiset<Note>. However, I'm not sure this is really what you need; you could even find that a sorted std::vector<Note> is best for you. Further research and thought as to how you'll use your container along with profiling should give the best answer.

Related

Using hash directly in methods from unordered_map instead of the user defined object producing the hash

So i have created a class that i use as a hash in an unordered_map
class MyClass
{
hash_type mHash = 0;
hash_type hash() { return mHash; }
bool operator!= (const MyClass& rhs) const;
bool operator== (const MyClass& rhs) const;
}
namespace std
{
template <>
struct hash<MyClass>
{
hash_type operator()(const MyClass& k) const noexcept
{
return k.hash();
}
};
}
It works as expected but i would like to add some functionality.
I would like to be able to use the hash itself when using unordered_map functions like find and erase.
Now i have to do this:
void _erase_key(const MyClass& key) { umap.erase(key); }
But i would like to be able to do this as well:
void _erase_key(const hash_type key) { umap.erase(key); }
Is it possible to somehow use the hash directly instead of the object producing the hash when using methods like find and erase?
If I understand you right, you want to have an std::unordered_map<MyClass, Value> such that you can also query with hash_type and you have hash_type h == MyClass m if std::hash<MyClass>{}(m) == h. Is this correct?
This is not possible in C++17. With C++20, there will be added the functionality of transparent hashes. You can read about that very briefly here. With this, your map has to fulfill certain properties
Your equality type Eq must provide a member type Eq::is_transparent, i.e. you have to put a using is_transparent = some_type; in it. (The exact type is without consequence.)
An object of type Eq has to provide an overload to compare all possible combinations of types you want to use. I.e. provide overloads for (MyClass, MyClass) and (MyClass, hash_type).
Your hash type Hash has to provide a member type Hash::transparent_key_equal, so again, put using transparent_key_equal = some_type; in it.
An object of type Hash must be callable with every type you want to use. I.e. you have to have an operator() overload for both MyClass and hash_type.
For Eq you can use std::equal_to<> (note the empty diamond!) if you provide publically accessible operator== for the appropriate types. This is NOT the default for unordered_map.
There is to the best of my knowledge no analagon for std::hash, so you have to add an own type for that and provide it to the map.
Pre-C++20 the only thing you can do if you want to keep your key type is to write a conversion from hash_type to MyClass.
But I sense a fundamental problem in this, namely that you see two objects of MyClass as identical if they have the same hash. If this is unintentional, you should really fix this. If it is intentional, make sure, that operator==(MyClass, MyClass) also only compares hashes.
In the later case, there is an easy fix for your problem. Change your map to std::unordered_map<hash_type, Value> and hash each MyClass you use to query the map. If you need a reverse lookup to get MyClass from a hash, either write a conversion function from hash_type to MyClass if this is possible, otherwise add another std::unordered_map<hash_type, MyClass>, where you store every object of type MyClass you ever use.

How to use insert in the set in c++ for user defined data type?

I would like to use a set<vector<data>> where data is a user-defined class and both the set and the vector are STL,
class data
{
int info;
};
I am not able to understand whether we need to define comparator operator for both vector<data> and data class or only data class.
And how do we define the comparator operator for the same?
std::vector already has an ordering - lexicographical order - so you normally don't need to do anything with that.
You always need to define an ordering for your own classes if you use the default vector ordering (see example below for a case where you don't need to), and the most common way is to overload operator<.
Note that the ordering relation must be a strict weak ordering, or using the set is undefined.
If you want a special sense of "equality" for the set, you need to define your own.
For example, this code would make a set where vectors of equal length are considered equal (so only the first one encountered of each length is added to the set):
template<typename T>
struct shorter_vector
{
bool operator() (const std::vector<T>& left, const std::vector<T>& right) const
{
return left.size() < right.size();
}
};
// ...
struct A { int x; };
std::set<std::vector<A>, shorter_vector<A>> samelengths;
samelengths.insert({A{1}});
samelengths.insert({A{2}});
samelengths.insert({A{3},A{4}});
samelengths.insert({A{5},A{67}});
// set now contains {A{1}} and {A{3},A{4}}
Note that this set doesn't need an ordering for the vector's elements, since the equivalence relation is defined on structure alone.

Is it bad practice for operator== to mutate its operands?

Scenario
I have a class which I want to be able to compare for equality. The class is large (it contains a bitmap image) and I will be comparing it multiple times, so for efficiency I'm hashing the data and only doing a full equality check if the hashes match. Furthermore, I will only be comparing a small subset of my objects, so I'm only calculating the hash the first time an equality check is done, then using the stored value for subsequent calls.
Example
class Foo
{
public:
Foo(int data) : fooData(data), notHashed(true) {}
private:
void calculateHash()
{
hash = 0; // Replace with hashing algorithm
notHashed = false;
}
int getHash()
{
if (notHashed) calculateHash();
return hash;
}
inline friend bool operator==(Foo& lhs, Foo& rhs)
{
if (lhs.getHash() == rhs.getHash())
{
return (lhs.fooData == rhs.fooData);
}
else return false;
}
int fooData;
int hash;
bool notHashed;
};
Background
According to the guidance on this answer, the canonical form of the equality operator is:
inline bool operator==(const X& lhs, const X& rhs);
Furthermore, the following general advice for operator overloading is given:
Always stick to the operator’s well-known semantics.
Questions
My function must be able to mutate it's operands in order to perform the hashing, so I have had to make them non-const. Are there any potential negative consequences of this (examples might be standard library functions or STL containers which will expect operator== to have const operands)?
Should a mutating operator== function be considered contrary to its well-known semantics, if the mutation doesn't have any observable effects (because there's no way for the user to see the contents of the hash)?
If the answer to either of the above is "yes", then what would be a more appropriate approach?
It seems like a perfectly valid usecase for a mutable member. You can (and should) still make your operator== take the parameters by const reference and give the class a mutable member for the hash value.
Your class would then have a getter for the hash value that is itself marked as a const method and that lazy-evaluates the hash value when called for the first time. It's actually a good example of why mutable was added to the language as it does not change the object from a user's perspective, it's only an implementation detail for caching the value of a costly operation internally.
Use mutable for the data that you want to cache but which does not affect the public interface.
U now, “mutate” → mutable.
Then think in terms of logical const-ness, what guarantees the object offers to the using code.
You should never modify the object on comparison. However, this function does not logically modify the object. Simple solution: make hash mutable, as computing the hash is a form of cashing. See:
Does the 'mutable' keyword have any purpose other than allowing the variable to be modified by a const function?
Having side effect in the comparison function or operator is not recommended. It will be better if you can manage to compute the hash as part of the initialization of the class. Another option is to have a manager class that is responsible for that. Note: that even what seems as innocent mutation will require locking in multithreaded application.
Also I will recommend to avoid using the equality operator for classes where the data structure is not absolutely trivial. Very often the progress of the project creates a need for comparison policy (arguments) and the interface of the equality operator becomes insufficient. In this case adding compare method or functor will not need to reflect the standard operator== interface for immutability of the arguments.
If 1. and 2. seem overkill for your case you could use the c++ keyword mutable for the hash value member. This will allow you to modify it even from a const class method or const declared variable
Yes, introducing semantically unexpected side-effects is always a bad idea. Apart from the other reasons mentioned: always assume that any code you write will forever only be used by other people who haven't even heard of your name, and then consider your design choices from that angle.
When someone using your code library finds out his application is slow, and tries to optimize it, he will waste ages trying to find the performance leak if it is inside an == overload, since he doesn't expect it, from a semantic point of view, to do more than a simple object comparison. Hiding potentially costly operations within semantically cheap operations is a bad form of code obfuscation.
You can go the mutable route, but I'm not sure if that is needed. You can do a local cache when needed without having to use mutable. For example:
#include <iostream>
#include <functional> //for hash
using namespace std;
template<typename ReturnType>
class HashCompare{
public:
ReturnType getHash()const{
static bool isHashed = false;
static ReturnType cachedHashValue = ReturnType();
if(!isHashed){
isHashed = true;
cachedHashValue = calculate();
}
return cachedHashValue;
}
protected:
//derived class should implement this but use this.getHash()
virtual ReturnType calculate()const = 0;
};
class ReadOnlyString: public HashCompare<size_t>{
private:
const std::string& s;
public:
ReadOnlyString(const char * s):s(s){};
ReadOnlyString(const std::string& s): s(s){}
bool equals(const ReadOnlyString& str)const{
return getHash() == str.getHash();
}
protected:
size_t calculate()const{
std::cout << "in hash calculate " << endl;
std::hash<std::string> str_hash;
return str_hash(this->s);
}
};
bool operator==(const ReadOnlyString& lhs, const ReadOnlyString& rhs){ return lhs.equals(rhs); }
int main(){
ReadOnlyString str = "test";
ReadOnlyString str2 = "TEST";
cout << (str == str2) << endl;
cout << (str == str2) << endl;
}
Output:
in hash calculate
1
1
Can you give me a good reason to keep as to why keeping isHashed as a member variable is necessary instead of making it local to where its needed? Note that we can further get away from 'static' usage if we really want, all we have todo is make a dedicated structure/class

Extension of STL container through composition or free functions?

Say I need a new type in my application, that consists of a std::vector<int> extended by a single function. The straightforward way would be composition (due to limitations in inheritance of STL containers):
class A {
public:
A(std::vector<int> & vec) : vec_(vec) {}
int hash();
private:
std::vector<int> vec_
}
This requires the user to first construct a vector<int> and a copy in the constructor, which is bad when we are going to handle a sizeable number of large vectors. One could, of course, write a pass-through to push_back(), but this introduces mutable state, which I would like to avoid.
So it seems to me, that we can either avoid copies or keep A immutable, is this correct?
If so, the simplest (and efficiency-wise equivalent) way would be to use a typedef and free functions at namespace scope:
namespace N {
typedef std::vector<int> A;
int a_hash(const A & a);
}
This just feels wrong somehow, since extensions in the future will "pollute" the namespace. Also, calling a_hash(...) on any vector<int> is possible, which might lead to unexpected results (assuming that we impose constraints on A the user has to follow or that would otherwise be enforced in the first example)
My two questions are:
how can one not sacrifice both immutability and efficiency when using the above class code?
when does it make sense to use free functions as opposed to encapsulation in classes/structs?
Thank you!
Hashing is an algorithm not a type, and probably shouldn't be restricted to data in any particular container type either. If you want to provide hashing, it probably makes the most sense to create a functor that computes a hash one element (int, as you've written things above) at a time, then use std::accumulate or std::for_each to apply that to a collection:
namespace whatever {
struct hasher {
int current_hash;
public:
hasher() : current_hash(0x1234) {}
// incredibly simplistic hash: just XOR the values together.
operator()(int new_val) { current_hash ^= new_val; }
operator int() { return current_hash; }
};
}
int hash = std::for_each(coll.begin(), coll.end(), whatever::hasher());
Note that this allows coll to be a vector, or a deque or you can use a pair of istream_iterators to hash data in a file...
Ad immutable: You could use the range constructor of vector and create an input iterator to provide the content for the vector. The range constructor is just:
template <typename I>
A::A(I const &begin, I const &end) : vec_(begin, end) {}
The generator is a bit more tricky. If you now have a loop that constructs a vector using push_back, it takes quite a bit of rewriting to convert to object that returns one item at a time from a method. Than you need to wrap a reference to it in a valid input iterator.
Ad free functions: Due to overloading, polluting the namespace is usually not a problem, because the symbol will only be considered for a call with the specific argument type.
Also free functions use the argument-dependent lookup. That means the function should be placed in the namespace the class is in. Like:
#include <vector>
namespace std {
int hash(vector<int> const &vec) { /*...*/ }
}
//...
std::vector<int> v;
//...
hash(v);
Now you can still call hash unqualified, but don't see it for any other purpose unless you do using namespace std (I personally almost never do that and either just use the std:: prefix or do using std::vector to get just the symbol I want). Unfortunately I am not sure how the namespace-dependent lookup works with typedef in another namespace.
In many template algorithms, free functions—and with fairly generic names—are often used instead of methods, because they can be added to existing classes, can be defined for primitive types or both.
One simple solution is to declare the private member variable as reference & initialize in constructor. This approach introduces some limitation, but it's a good alternative in most cases.
class A {
public:
A(std::vector<int> & vec) : vec_(vec) {}
int hash();
private:
std::vector<int> &vec_; // 'vec_' now a reference, so will be same scoped as 'vec'
};

How can I define a "Do-Nothing" sort?

I'm working on a system where I need to be able to sort a vector by a given predicate, which my classes shouldn't have control over. Basically, I pass them a derived class and they blindly sort on it.
As one of the "delightful quirks", one of the sort patterns is order of entry.
Here's what I've got so far.
struct Strategy
{
virtual bool operator()(const Loan& lhs, const Loan& rhs) const = 0;
};
struct strategyA : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return true;
}
};
struct strategyB : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return lhs.getID() > rhs.getID();
}
};
struct strategyC : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return lhs.getFee() > rhs.getFee();
}
};
Obviously, as strategyA is reflexive, it can't be used, and if I set it to false, it'll treat everything as equal and I can kiss my data goodbye.
So here's my question. Is there a way of defining a predicate function for sorting a vector which will NOT change anything?
I'm aware that possibly the simplest solution is to add an order of entry variable to the Loan class, or partner it with one in a pair. Alternatively I could feed a parameter in with the predicate that tells the sorter whether to use it or not.
Is there a way of defining a predicate function for sorting a vector which will NOT change anything?
It depends on the algorithm. If your sort is a stable sort, the order of "equal" elements won't be changed (which is undefined for unstable sorts).
Consider using std::stable_sort.
Personally, I think your strategy class should have a "sort" method. That way, it can either call std::sort or not, as it sees fit. Whether as well as how becomes part of the sorting strategy.
Darios stable_sort answer is very good, if you can use it.
It is possible to do sorting based on item position in a vector, but it doesn't mean items won't move (many sort algorithms will basically scramble-then-resort your data), so you have to have some reliable way of determining where the items were when you started.
It's possible for the comparison to keep a mapping of current-position to original-position, but a lot of work. Ideally the logic needs to be built into the sort algorithm - not just the comparison - and that's essentially how stable_sort works.
Another problem - depending on the container - the order of (say) item addresses isn't always the order of the items.
if it is simply a vector you are talking about, perhaps you can get away with providing an interface that determines whether you should sort or not. vectors are not an ordered container, so you need to explicitly sort them. Just don't sort them at all.
There is no sort function which would keep the order of items based only on items' values. You need to provide more information to your Strategy, if it's possible.
A different approach might be to bring the semantics of your data to the container. Consider using boost::multi_index for different ways of access and ordering on the same data:
http://www.boost.org/doc/libs/1_42_0/libs/multi_index/doc/index.html