How can I define a "Do-Nothing" sort? - c++

I'm working on a system where I need to be able to sort a vector by a given predicate, which my classes shouldn't have control over. Basically, I pass them a derived class and they blindly sort on it.
As one of the "delightful quirks", one of the sort patterns is order of entry.
Here's what I've got so far.
struct Strategy
{
virtual bool operator()(const Loan& lhs, const Loan& rhs) const = 0;
};
struct strategyA : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return true;
}
};
struct strategyB : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return lhs.getID() > rhs.getID();
}
};
struct strategyC : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return lhs.getFee() > rhs.getFee();
}
};
Obviously, as strategyA is reflexive, it can't be used, and if I set it to false, it'll treat everything as equal and I can kiss my data goodbye.
So here's my question. Is there a way of defining a predicate function for sorting a vector which will NOT change anything?
I'm aware that possibly the simplest solution is to add an order of entry variable to the Loan class, or partner it with one in a pair. Alternatively I could feed a parameter in with the predicate that tells the sorter whether to use it or not.

Is there a way of defining a predicate function for sorting a vector which will NOT change anything?
It depends on the algorithm. If your sort is a stable sort, the order of "equal" elements won't be changed (which is undefined for unstable sorts).
Consider using std::stable_sort.

Personally, I think your strategy class should have a "sort" method. That way, it can either call std::sort or not, as it sees fit. Whether as well as how becomes part of the sorting strategy.
Darios stable_sort answer is very good, if you can use it.
It is possible to do sorting based on item position in a vector, but it doesn't mean items won't move (many sort algorithms will basically scramble-then-resort your data), so you have to have some reliable way of determining where the items were when you started.
It's possible for the comparison to keep a mapping of current-position to original-position, but a lot of work. Ideally the logic needs to be built into the sort algorithm - not just the comparison - and that's essentially how stable_sort works.
Another problem - depending on the container - the order of (say) item addresses isn't always the order of the items.

if it is simply a vector you are talking about, perhaps you can get away with providing an interface that determines whether you should sort or not. vectors are not an ordered container, so you need to explicitly sort them. Just don't sort them at all.

There is no sort function which would keep the order of items based only on items' values. You need to provide more information to your Strategy, if it's possible.

A different approach might be to bring the semantics of your data to the container. Consider using boost::multi_index for different ways of access and ordering on the same data:
http://www.boost.org/doc/libs/1_42_0/libs/multi_index/doc/index.html

Related

Class equality check without operator==

I have a data model that is quite large with many members, and many of them are themselves large data models, with nesting like this for several levels deep. The top class represents the overall model that is serialized and sent off to a server for backup. As a debugging step, we would like to deserialize a recent backup and compare it to the in-memory data model at the time of backup, which should be equal. The most obvious way to do this is apply operator== on the current model and its serialized-then-deserialized version.
The problem is that the degree of nesting and quantity of custom data structures will require a tremendous amount of code to write all those operator== implementations. Not to mention that many of those individual implementations will alone be many lines long to compare every member's equality. We're easily talking >1k lines of code just spent on operator==. Even if we do all that, there is large room for programmer error on something like this.
Is there any alternative for a quick and dirty (though reliable) equality check, perhaps using much lower level techniques, or anything that would not require a couple of days of doing nothing but writing operator== functions?
The tie solution is going to be your best bet.
struct equal_by_tie {
template<class T>
using enable = std::enable_if_t<std::is_base_of<equal_by_tie, T>,bool>;
template<class T>
friend enable<T>
operator==( T const& lhs, T const& rhs ) {
return mytie(lhs) == mytie(rhs);
}
template<class T>
friend enable<T>
operator!=( T const& lhs, T const& rhs ) {
return mytie(lhs) != mytie(rhs);
}
};
Now you have to write
struct some_thing : equal_by_tie {
public:
friend auto mytie( some_thing const& self ) {
return std::tie( self.x, self.y, self.mem3 );
}
};
and == and != are written for you.
There is currently no way to audit if mytie is written correctly, except with some hackery in C++17 that is honestly not worth considering (structured bindings, it is a horrible hack, don't ask).
One way you can reduce the chance that mytie is wrong is to use it more.
Implement swap in terms of it (maybe using the same parent class trick as operator== above). Now implement operator= in terms of swap or mytie. Do the same for friend std::size_t hash(Foo const&) and hook that into your standard hasher.
Insist that mytie be in the same order as your data declarations, and have it tie parent instances as sub-ties. Write a function that takes your system structure/class alignment into account and calculates how big the structure should be in a constexpr. Static assert that the sizes of Foo and calc_struct_size(tag<decltype(mytie(std::declval<Foo&>()))>) match. (Add in fudge factors for vtables or the like as required). Now changing the layout of the struct without touching mytie results in bad things happening.
Compare each pair of fields in mytie for pointer inequality to ensure you don't repeat the same field twice; try to ensure that this optimizes out to true at runtime (tricky, as you'll want to do this check in debug, and debug often has optimizations turned off; maybe this is a unique situation of an assert you want to execute only in release builds!).
You'll also want to do some sanity checks. If your mytie contains raw pointers, == is wrong, and same for smart pointers; you want your == to be a deep equality.
To that end, maybe == is the wrong thing.
struct deep_equal_by_tie {
template<class T>
using enable = std::enable_if_t<std::is_base_of<equal_by_tie, T>,bool>;
template<class T>
friend enable<T>
deep_equal( T const& lhs, T const& rhs ) {
// code to call deep_equal on each tie
// deep_equal on non-pointer basic types defined as ==
// deep_equal on pointers is to check for null (nulls are equal)
// then dereference and deep_equal
// ditto for smart pointers
// deep_equal on vectors and other std containers is to check size,
// and if matches deep_equal on elements
}
};
this, however, increases your load. But the idea is to increase reliability, as you have noted the hard part is that there is a lot of code and lots of spots to make mistakes.
There is no easy way to do this.
memcmp is a bad idea if your data is anything other than perfectly packed plain old data with no pointers or virtual functions or anything. And it is easy for padding to slip into code, breaking memcmp based equality; such braeks will be hard to find, as the state of data in the padding is undefined.

Working with a secondary datastructure// Advice for data structure

I'm trying to build a Graph Datastructure based on an already existing Datastructure (which I cannot modify and which is not a graph itself).
I think I have somewhat a grasp on how to build most of the structure concerning the graph itself, but right now I have to reference back to the original data structure for one little "compare" function and having a really hard time how to model that properly...
My vertices represent two different classes A and B of the original data structure, that have different member variables and no common ancestors. For an algorithm I have to check whether two vertices are compatible.
The rule is: an A-Vertex and an B-Vertex are always incompatible, but if both vertices represent the same type I have to check some specifics for the respective type.
So the base idea is roughly like this:
bool isCompatible(const Vertex& other){
// if this->data is of other type than other->data
// return false;
// else return compareFunction(this->data, other->data)
// where maybe one could overload that compare-function
// or make a template out of it
}
But I don't really know how to store the reference to data without making it really ugly.
Idea 1) Use a void pointer for data, have some variable to store the type and then cast the void pointer into respective type
-> would probably work but seems really dangerous (type-safety?) and really ugly (basically no reusability for the Graph structure if you ever wanna use it on other data). Seems a bit like the brute force approach.
Idea 2) Make an abstract data class that offers some "isCompatible(data)" function, and have wrapper-classes for A and B respectively that inherit from the abstract class and override that function. Inside the overridden function one could use dynamic_cast then and compare the objects.
-> still doesn't seem like good design, but should also work?
Idea 3) Make templates work? It's my first time working with C++ so I'm having a few problems wrapping my head around that properly.
I think something like the following should work for comparing:
template<typename T1, typename T2>
bool compare(T1 object1, T2 object2){
return false;
}
And then having instances for (A,A) and (B,B) that override this. For me this seems like the way to got for the comparison itself. But I don't really know how to manage the reference from Vertex to the Object without losing the Type. Any suggestions?
I'm open to any other suggestions as well of course.
edit: I'm using C++11 if that's of relevance.
If your data is either an A or a B, where those two types have nothing in common, then sounds like what you want is a variant data type. The C++ standard library doesn't have one yet, but you could use Boost's:
boost::variant<A, B> data;
A variant gives you type safety (which void* doesn't) and doesn't require you to have a common ancestor between the two types (which apparently are conceptually unrelated).
With a variant like the above, you can implement your comparison using binary visitation:
bool isCompatible(const Vertex& other) {
boost::apply_visitor(is_compatible(), data, other.data);
}
with:
class is_compatible
: public boost::static_visitor<bool>
{
public:
template <typename T, typename U>
bool operator()( const T &, const U & ) const
{
return false; // cannot compare different types
}
bool operator()( const A& lhs, const A& rhs ) const
{
// whatever A-specific comparison
}
bool operator()( const B& lhs, const B& rhs ) const
{
// whatever B-specific comparison
}
};

C++ Define an order for a set inside a class

I have a set inside a class, and I'd like to define a new order for that set, but the order depends on an attribute of the class. How should I implement it?
I tried something like this
class myclass{
int c;
set<int,cmp> myset;
struct cmp{
bool operator()(const unsint a, const unsint b)
const {
return (depends on c) ;
}
};
}
but it didn't work. Any help is appreciated, thanks.
EDIT: The problem is that i don't know c a priori. It's a value i get in input, and then it will be always the same.
return (depends on c) ;
I think it's not good idea to make comapare function depend on c, because your set object is already built tree and no rebuild is supported by std::set.
Additionally, note that std::set requires Comparator which meet strict weak ordering rule.
You could read more at 'Compare' documentation and wikipedia
As for your problem, you could create another set with another compare function, and then copy contents here.
typedef std::set<int, cmp2> anotherSet;
std::copy(std::begin(firstSet), std::end(firstSet), std::inserter(anotherSet));
However, it looks like you actually don't need std::set if you have to reorder it depending on some parameter. Consider using another data structure like vector or list. Additionally, if you need ~O(log N) access complexity, you can organize data to heap within your vector.

Is it bad practice for operator== to mutate its operands?

Scenario
I have a class which I want to be able to compare for equality. The class is large (it contains a bitmap image) and I will be comparing it multiple times, so for efficiency I'm hashing the data and only doing a full equality check if the hashes match. Furthermore, I will only be comparing a small subset of my objects, so I'm only calculating the hash the first time an equality check is done, then using the stored value for subsequent calls.
Example
class Foo
{
public:
Foo(int data) : fooData(data), notHashed(true) {}
private:
void calculateHash()
{
hash = 0; // Replace with hashing algorithm
notHashed = false;
}
int getHash()
{
if (notHashed) calculateHash();
return hash;
}
inline friend bool operator==(Foo& lhs, Foo& rhs)
{
if (lhs.getHash() == rhs.getHash())
{
return (lhs.fooData == rhs.fooData);
}
else return false;
}
int fooData;
int hash;
bool notHashed;
};
Background
According to the guidance on this answer, the canonical form of the equality operator is:
inline bool operator==(const X& lhs, const X& rhs);
Furthermore, the following general advice for operator overloading is given:
Always stick to the operator’s well-known semantics.
Questions
My function must be able to mutate it's operands in order to perform the hashing, so I have had to make them non-const. Are there any potential negative consequences of this (examples might be standard library functions or STL containers which will expect operator== to have const operands)?
Should a mutating operator== function be considered contrary to its well-known semantics, if the mutation doesn't have any observable effects (because there's no way for the user to see the contents of the hash)?
If the answer to either of the above is "yes", then what would be a more appropriate approach?
It seems like a perfectly valid usecase for a mutable member. You can (and should) still make your operator== take the parameters by const reference and give the class a mutable member for the hash value.
Your class would then have a getter for the hash value that is itself marked as a const method and that lazy-evaluates the hash value when called for the first time. It's actually a good example of why mutable was added to the language as it does not change the object from a user's perspective, it's only an implementation detail for caching the value of a costly operation internally.
Use mutable for the data that you want to cache but which does not affect the public interface.
U now, “mutate” → mutable.
Then think in terms of logical const-ness, what guarantees the object offers to the using code.
You should never modify the object on comparison. However, this function does not logically modify the object. Simple solution: make hash mutable, as computing the hash is a form of cashing. See:
Does the 'mutable' keyword have any purpose other than allowing the variable to be modified by a const function?
Having side effect in the comparison function or operator is not recommended. It will be better if you can manage to compute the hash as part of the initialization of the class. Another option is to have a manager class that is responsible for that. Note: that even what seems as innocent mutation will require locking in multithreaded application.
Also I will recommend to avoid using the equality operator for classes where the data structure is not absolutely trivial. Very often the progress of the project creates a need for comparison policy (arguments) and the interface of the equality operator becomes insufficient. In this case adding compare method or functor will not need to reflect the standard operator== interface for immutability of the arguments.
If 1. and 2. seem overkill for your case you could use the c++ keyword mutable for the hash value member. This will allow you to modify it even from a const class method or const declared variable
Yes, introducing semantically unexpected side-effects is always a bad idea. Apart from the other reasons mentioned: always assume that any code you write will forever only be used by other people who haven't even heard of your name, and then consider your design choices from that angle.
When someone using your code library finds out his application is slow, and tries to optimize it, he will waste ages trying to find the performance leak if it is inside an == overload, since he doesn't expect it, from a semantic point of view, to do more than a simple object comparison. Hiding potentially costly operations within semantically cheap operations is a bad form of code obfuscation.
You can go the mutable route, but I'm not sure if that is needed. You can do a local cache when needed without having to use mutable. For example:
#include <iostream>
#include <functional> //for hash
using namespace std;
template<typename ReturnType>
class HashCompare{
public:
ReturnType getHash()const{
static bool isHashed = false;
static ReturnType cachedHashValue = ReturnType();
if(!isHashed){
isHashed = true;
cachedHashValue = calculate();
}
return cachedHashValue;
}
protected:
//derived class should implement this but use this.getHash()
virtual ReturnType calculate()const = 0;
};
class ReadOnlyString: public HashCompare<size_t>{
private:
const std::string& s;
public:
ReadOnlyString(const char * s):s(s){};
ReadOnlyString(const std::string& s): s(s){}
bool equals(const ReadOnlyString& str)const{
return getHash() == str.getHash();
}
protected:
size_t calculate()const{
std::cout << "in hash calculate " << endl;
std::hash<std::string> str_hash;
return str_hash(this->s);
}
};
bool operator==(const ReadOnlyString& lhs, const ReadOnlyString& rhs){ return lhs.equals(rhs); }
int main(){
ReadOnlyString str = "test";
ReadOnlyString str2 = "TEST";
cout << (str == str2) << endl;
cout << (str == str2) << endl;
}
Output:
in hash calculate
1
1
Can you give me a good reason to keep as to why keeping isHashed as a member variable is necessary instead of making it local to where its needed? Note that we can further get away from 'static' usage if we really want, all we have todo is make a dedicated structure/class

unordered_multimap usage and operator overwriting

I need to use an unordered_multimap for my Note objects and the keys will be the measureNumber member of my objects. I'm trying to implement it as shown here but I'm stuck.
First off, I don't understand why I have to overwrite the operator== before I can use it. I'm also confused about why I need a hash and how to implement it. In this example here, none of those two things is done.
So based on the first example, this is what I have:
class Note {
private:
int measureNumber;
public:
inline bool operator== (const Note &noteOne, const Note &noteTwo);
}
inline bool Note::operator ==(const Note& noteOne, const Note& noteTwo){
return noteOne.measureNumber == noteTwo.measureNumber;
}
I don't know how to implement the hash part though. Any ideas?
std::multimap is based on a sorted binary tree, which uses a less-than operation to sort the nodes.
std::unordered_multimap is based on a hash table, which uses hash and equality operations to organize the nodes without sorting them.
The sorting or hashing is based on the key values. If the objects are the keys, then you need to define these operations. If the keys are of predefined type like int or string, then you don't need to worry about it.
The problem with your pseudocode is that measureNumber is private, so the user of Note cannot easily specify the key to the map. I would recommend making measureNumber public or rethinking the design. (Is measure number really a good key value? I'm guessing this is musical notation.)
std::multimap< int, Note > notes;
Note myNote( e_sharp, /* octave */ 3, /* measure */ 5 );
notes.insert( std::make_pair( myNote.measureNumber, myNote ) );
The objects can be keys and values at the same time, if you use std::multiset or std::unordered_multiset, in which case you would want to define the operator overload (and possibly hash). If operator== (or operator<) is a member function, then the left-hand side becomes this and the right-hand side becomes the sole argument. Usually these functions should be non-member friends. So then you would have
class Note {
private:
int measureNumber;
public:
friend bool operator< (const Note &noteOne, const Note &noteTwo);
}
inline bool operator <(const Note& noteOne, const Note& noteTwo){
return noteOne.measureNumber < noteTwo.measureNumber;
}
This class could be used with std::multiset. To perform a basic lookup, you can construct a dummy object with uninitialized values except for measureNumber — this only works for simple object types.
I need to use an unordered_multimap for my Note objects and the keys
will be the measureNumber member of my objects.
OK - I'm not sure whether you're after a multiset, unordered_multiset, multimap, or unordered_multimap. I know your title refers to unordered_multimap, but the link you provided leads to unordered_multiset. There are a multitude of considerations which should be taken into account when choosing a container, but second-guessing which will be the best-performing without profiling is a risky business.
I don't understand why I have to overwrite the operator== before I can use it.
I'm also confused about why I need a hash and how to implement it.
In this example here, none of those two things is done.
You need the operator== and std::hash as they're used internally by unordered_multimap and unordered_multiset. In the example you linked to, the key is of type int, so operator== and std::hash<int> are already defined. If you choose to use Note as a key, you have to define these yourself.
I'd recommend starting with a multiset if you don't need to change the elements frequently. If you do want to be able to change Notes without erasing and inserting, I'd recommend removing measureNumber as a member of Note and using a multimap<int, Note>.
If you feel an unordered_ version of your container would better suit your needs, you still have the set vs map choice. If you choose unordered_multimap<int, Note> (having removed measureNumber from Note), then as in your linked example, the key is int. So you won't have to define anything special for this to work. If you choose to keep measureNumber as a member of Note and use unordered_multiset<Note>, then Note is the key and so you need to do further work, e.g.
#include <functional>
#include <unordered_set>
class Note; // Forward declaration to allow specialisation of std::hash<>
namespace std {
template<>
class hash<Note> {
public:
size_t operator()(const Note &) const; // declaration of operator() to
// allow befriending by Note
};
}
class Note {
private:
int measureNumber;
public:
// functions befriended to allow access to measureNumber
friend bool operator== (const Note &, const Note &);
friend std::size_t std::hash<Note>::operator()(const Note &) const;
};
inline bool operator== (const Note &noteOne, const Note &noteTwo) {
return noteOne.measureNumber == noteTwo.measureNumber;
}
std::size_t std::hash<Note>::operator()(const Note &note) const {
return std::hash<int>()(note.measureNumber);
}
This lets you create and use std::unordered_multiset<Note>. However, I'm not sure this is really what you need; you could even find that a sorted std::vector<Note> is best for you. Further research and thought as to how you'll use your container along with profiling should give the best answer.