Class equality check without operator== - c++

I have a data model that is quite large with many members, and many of them are themselves large data models, with nesting like this for several levels deep. The top class represents the overall model that is serialized and sent off to a server for backup. As a debugging step, we would like to deserialize a recent backup and compare it to the in-memory data model at the time of backup, which should be equal. The most obvious way to do this is apply operator== on the current model and its serialized-then-deserialized version.
The problem is that the degree of nesting and quantity of custom data structures will require a tremendous amount of code to write all those operator== implementations. Not to mention that many of those individual implementations will alone be many lines long to compare every member's equality. We're easily talking >1k lines of code just spent on operator==. Even if we do all that, there is large room for programmer error on something like this.
Is there any alternative for a quick and dirty (though reliable) equality check, perhaps using much lower level techniques, or anything that would not require a couple of days of doing nothing but writing operator== functions?

The tie solution is going to be your best bet.
struct equal_by_tie {
template<class T>
using enable = std::enable_if_t<std::is_base_of<equal_by_tie, T>,bool>;
template<class T>
friend enable<T>
operator==( T const& lhs, T const& rhs ) {
return mytie(lhs) == mytie(rhs);
}
template<class T>
friend enable<T>
operator!=( T const& lhs, T const& rhs ) {
return mytie(lhs) != mytie(rhs);
}
};
Now you have to write
struct some_thing : equal_by_tie {
public:
friend auto mytie( some_thing const& self ) {
return std::tie( self.x, self.y, self.mem3 );
}
};
and == and != are written for you.
There is currently no way to audit if mytie is written correctly, except with some hackery in C++17 that is honestly not worth considering (structured bindings, it is a horrible hack, don't ask).
One way you can reduce the chance that mytie is wrong is to use it more.
Implement swap in terms of it (maybe using the same parent class trick as operator== above). Now implement operator= in terms of swap or mytie. Do the same for friend std::size_t hash(Foo const&) and hook that into your standard hasher.
Insist that mytie be in the same order as your data declarations, and have it tie parent instances as sub-ties. Write a function that takes your system structure/class alignment into account and calculates how big the structure should be in a constexpr. Static assert that the sizes of Foo and calc_struct_size(tag<decltype(mytie(std::declval<Foo&>()))>) match. (Add in fudge factors for vtables or the like as required). Now changing the layout of the struct without touching mytie results in bad things happening.
Compare each pair of fields in mytie for pointer inequality to ensure you don't repeat the same field twice; try to ensure that this optimizes out to true at runtime (tricky, as you'll want to do this check in debug, and debug often has optimizations turned off; maybe this is a unique situation of an assert you want to execute only in release builds!).
You'll also want to do some sanity checks. If your mytie contains raw pointers, == is wrong, and same for smart pointers; you want your == to be a deep equality.
To that end, maybe == is the wrong thing.
struct deep_equal_by_tie {
template<class T>
using enable = std::enable_if_t<std::is_base_of<equal_by_tie, T>,bool>;
template<class T>
friend enable<T>
deep_equal( T const& lhs, T const& rhs ) {
// code to call deep_equal on each tie
// deep_equal on non-pointer basic types defined as ==
// deep_equal on pointers is to check for null (nulls are equal)
// then dereference and deep_equal
// ditto for smart pointers
// deep_equal on vectors and other std containers is to check size,
// and if matches deep_equal on elements
}
};
this, however, increases your load. But the idea is to increase reliability, as you have noted the hard part is that there is a lot of code and lots of spots to make mistakes.
There is no easy way to do this.
memcmp is a bad idea if your data is anything other than perfectly packed plain old data with no pointers or virtual functions or anything. And it is easy for padding to slip into code, breaking memcmp based equality; such braeks will be hard to find, as the state of data in the padding is undefined.

Related

Working with a secondary datastructure// Advice for data structure

I'm trying to build a Graph Datastructure based on an already existing Datastructure (which I cannot modify and which is not a graph itself).
I think I have somewhat a grasp on how to build most of the structure concerning the graph itself, but right now I have to reference back to the original data structure for one little "compare" function and having a really hard time how to model that properly...
My vertices represent two different classes A and B of the original data structure, that have different member variables and no common ancestors. For an algorithm I have to check whether two vertices are compatible.
The rule is: an A-Vertex and an B-Vertex are always incompatible, but if both vertices represent the same type I have to check some specifics for the respective type.
So the base idea is roughly like this:
bool isCompatible(const Vertex& other){
// if this->data is of other type than other->data
// return false;
// else return compareFunction(this->data, other->data)
// where maybe one could overload that compare-function
// or make a template out of it
}
But I don't really know how to store the reference to data without making it really ugly.
Idea 1) Use a void pointer for data, have some variable to store the type and then cast the void pointer into respective type
-> would probably work but seems really dangerous (type-safety?) and really ugly (basically no reusability for the Graph structure if you ever wanna use it on other data). Seems a bit like the brute force approach.
Idea 2) Make an abstract data class that offers some "isCompatible(data)" function, and have wrapper-classes for A and B respectively that inherit from the abstract class and override that function. Inside the overridden function one could use dynamic_cast then and compare the objects.
-> still doesn't seem like good design, but should also work?
Idea 3) Make templates work? It's my first time working with C++ so I'm having a few problems wrapping my head around that properly.
I think something like the following should work for comparing:
template<typename T1, typename T2>
bool compare(T1 object1, T2 object2){
return false;
}
And then having instances for (A,A) and (B,B) that override this. For me this seems like the way to got for the comparison itself. But I don't really know how to manage the reference from Vertex to the Object without losing the Type. Any suggestions?
I'm open to any other suggestions as well of course.
edit: I'm using C++11 if that's of relevance.
If your data is either an A or a B, where those two types have nothing in common, then sounds like what you want is a variant data type. The C++ standard library doesn't have one yet, but you could use Boost's:
boost::variant<A, B> data;
A variant gives you type safety (which void* doesn't) and doesn't require you to have a common ancestor between the two types (which apparently are conceptually unrelated).
With a variant like the above, you can implement your comparison using binary visitation:
bool isCompatible(const Vertex& other) {
boost::apply_visitor(is_compatible(), data, other.data);
}
with:
class is_compatible
: public boost::static_visitor<bool>
{
public:
template <typename T, typename U>
bool operator()( const T &, const U & ) const
{
return false; // cannot compare different types
}
bool operator()( const A& lhs, const A& rhs ) const
{
// whatever A-specific comparison
}
bool operator()( const B& lhs, const B& rhs ) const
{
// whatever B-specific comparison
}
};

Datatype for lookup table/index into array

Assume I have a class 'Widget'. In my application, I create a lot of Widgets which (for cache locality and other reasons) I keep in a vector.
For efficient lookups I would like to implement an index datastructure. For the sake of the question, let's assume it is a simple lookup table from int indices to Widget elements in the abovementioned vector.
My question is: What should the contents of the lookup table be.
In other words, with which type should I replace the question mark in
using LookupTable = std::vector<?>
I see the following options:
References (Widget&, or rather as it has to be assignable: reference_wrapper<Widget>)
Pointers (Widget*)
Indices in the Widget vector (size_t)
Iterator objects pointing into the Widget vector (std::vector<Widget>::iterator)
Among these options, indices seem to be the only option that don't get invalidated by a vector resize. I might actually be able to avoid resizes, however, implementing the lookup table like that means making assumptions about the vector implementation which seems unreasonable from a 'decoupled design' standpoint.
OTOH indices are not typesafe: If the thing I get out of the lookup table was a reference I could only use it to access the corresponding widget. Using size_t values I can do nonsensical operations like multiplying the result by 3. Also consider the following two signatures:
void doSomethingWithLookupResult(Widget& lookupResult);
void doSomethingWithLookupResult(size_t lookupResult);
The former is significantly more descriptive.
In summary: Which datatype can I use for my lookup table to achieve both a decoupling from the vector implementation and type safety?
Use std::vector::size_type (not size_t). std::vector::size_type may be size_t in most implementations, but for portability and future-proofing sake, we'll do it right.
Go ahead and make a typedef:
using WidgetIndex = std::vector::size_type;
so that this looks reasonable:
void doSomethingWithLookupResult(WidgetIndex lookupResult);
This avoids the vector resize issue which, while you down play it in your question, will eventually come back to bite you.
Don't play games with some user defined type such as tohava (very cleverly) proposes, unless you plan to use this idiom a great deal in your code base. Here is why not:
The problem that you are addressing (type-safety) is real and we'd like a solution to it if it is "free," but compared to other opportunities C++ programmers have to shoot themselves in the foot, this isn't that big an issue.
You'll be wasting time. Your time to design the class and then the time of every user of your code base (including yourself after you've forgotten the implementation in a few months) that will stare at that code and have to puzzle it out.
At some point in the future you'll trip over that "interesting" corner case that none of us can see now by staring at this code.
All that said, if you are going to use this idiom often in your code base (you have many classes that are stored in very static vectors or arrays), then it may make sense to make this investment. In that case the maintenance burden is spread over more code and the possibility of using the wrong index type with the wrong container is greater.
You can create a class that represents an index which also carries type information (in compile-time).
#include <vector>
template <class T>
struct typed_index {
typed_index(int i) : i(i) {}
template <class CONTAINER>
T &operator[](CONTAINER &c) { return c[i]; }
template <class CONTAINER>
const T &operator[](const CONTAINER &c) { return c[i]; }
int i;
};
int main() {
std::vector<int> v1 = {0};
std::vector<const char *> v2 = {"asd"};
typed_index<int> i = 3;
int z = i[v1];
const char *s = i[v2]; // will fail
}

Polymorphic operator on a list of boost::any?

Suppose I have a list of type list<boost::any> that has some type in it that is unknown. Now suppose I want to apply some operation to the elements in the list that is polymorphic. In this case, consider the + operator. Suppose that I know that the list will always contain a homogenous set of objects that support operator+, and I want to get the result of applying operator+ (the "sum" in one sense) between each element of the list into a new boost::any. Something like this:
boost::any sum(list<boost::any> lst) {
// return lst[0]+lst[1]+lst[2] etc
}
Without enumerating all possible types that could support operator+, is there a way to do this? I'm extremely open to crazy ideas.
(I really do have an ok reason for doing this... I'm implementing an interpreter)
You could use boost::variant instead if you know the range of possible types in the list.
I don't see how you can do this without a mesh of operator+ functions to handle every possible combination of contained types, or regular runtime polymorphism.
What is the concrete type you wish to see in the final boost::any output, I wonder?
btw if you are implementing an interpreter, check out Boost.Spirit which might illuminate your design problem here.
C++ matches functions (and operators are merely fancy functions that have an additional infix syntax) by their types, not by their names, at compile-time. (Rather than checking at run-time whether the objects involved support the requested operation.)
The only exception to that I can think of is virtual functions. If the types were polymorphic, you could use any of the workarounds for missing multi-methods (double dispatch). But since they can be anything, I don't think you can do this.
If you have a limited set of types, template-meta programming might help the generate functions implementing addition. But if the number of types involved were limited, you'd probably use boost::variant.
(IME saying this means that, in very short time, someone comes along and proves me wrong.)
No. Not with boost::any nor with boost::variant (doesn't qualify your, "Without enumerating all possible types that could support operator+," requirement).
What you need to do is make your own. The concept behind boost::any is quite simple. If you look at the documentation they have a link to an article explaining the technique (it's basically the handle/body idiom with polymorphism). All you need to do is decide what interface your various objects must have and write the 'any' interface and it's impl accordingly. Something resembling something like so:
struct my_any
{
template < typename T >
my_any(T const& t) : pimpl(new impl<T>(t)) {}
...
some_type get_some_type() const;
...
private:
struct impl_base
{
....
virtual some_type get_some_type() const = 0;
};
template < typename T >
struct impl : impl_base
{
some_type get_some_type() const { return t.get_some_type(); }
impl(T const& t_var) : t(t_var) {}
....
};
boost::scoped_ptr<impl_base> pimpl;
};
some_type operator+ (my_any const& a, my_any const& b)
{
return a.get_some_type() + b.get_some_type();
}
It's hard to imagine what operator+ would do on generic types so I made something up that makes a small amount of sense to me. You'll of course need to change to your needs.

How can I define a "Do-Nothing" sort?

I'm working on a system where I need to be able to sort a vector by a given predicate, which my classes shouldn't have control over. Basically, I pass them a derived class and they blindly sort on it.
As one of the "delightful quirks", one of the sort patterns is order of entry.
Here's what I've got so far.
struct Strategy
{
virtual bool operator()(const Loan& lhs, const Loan& rhs) const = 0;
};
struct strategyA : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return true;
}
};
struct strategyB : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return lhs.getID() > rhs.getID();
}
};
struct strategyC : public Strategy
{
bool operator()(const Loan& lhs, const Loan& rhs) const
{
return lhs.getFee() > rhs.getFee();
}
};
Obviously, as strategyA is reflexive, it can't be used, and if I set it to false, it'll treat everything as equal and I can kiss my data goodbye.
So here's my question. Is there a way of defining a predicate function for sorting a vector which will NOT change anything?
I'm aware that possibly the simplest solution is to add an order of entry variable to the Loan class, or partner it with one in a pair. Alternatively I could feed a parameter in with the predicate that tells the sorter whether to use it or not.
Is there a way of defining a predicate function for sorting a vector which will NOT change anything?
It depends on the algorithm. If your sort is a stable sort, the order of "equal" elements won't be changed (which is undefined for unstable sorts).
Consider using std::stable_sort.
Personally, I think your strategy class should have a "sort" method. That way, it can either call std::sort or not, as it sees fit. Whether as well as how becomes part of the sorting strategy.
Darios stable_sort answer is very good, if you can use it.
It is possible to do sorting based on item position in a vector, but it doesn't mean items won't move (many sort algorithms will basically scramble-then-resort your data), so you have to have some reliable way of determining where the items were when you started.
It's possible for the comparison to keep a mapping of current-position to original-position, but a lot of work. Ideally the logic needs to be built into the sort algorithm - not just the comparison - and that's essentially how stable_sort works.
Another problem - depending on the container - the order of (say) item addresses isn't always the order of the items.
if it is simply a vector you are talking about, perhaps you can get away with providing an interface that determines whether you should sort or not. vectors are not an ordered container, so you need to explicitly sort them. Just don't sort them at all.
There is no sort function which would keep the order of items based only on items' values. You need to provide more information to your Strategy, if it's possible.
A different approach might be to bring the semantics of your data to the container. Consider using boost::multi_index for different ways of access and ordering on the same data:
http://www.boost.org/doc/libs/1_42_0/libs/multi_index/doc/index.html

Template with static functions vs object with non-static functions in overloaded operator

Which approach is the better one and why?
template<typename T>
struct assistant {
T sum(const T& x, const T& y) const { ... }
};
template<typename T>
T operator+ (const T& x, const T& y) {
assistant<T> t;
return t.sum(x, y);
}
Or
template<typename T>
struct assistant {
static T sum(const T& x, const T& y) { ... }
};
template<typename T>
T operator+ (const T& x, const T& y) {
return assistant<T>::sum(x, y);
}
To explain the things a bit more: assistant has no state it only provides several utility functions and later I can define template specialization of it to achieve a different behavior for certain types T.
I think for higher optimization levels these two approaches don't lead to different byte codes because anyway the assistant will optimized "away"...
Thanks!
It is usually not a question of run-time performance, but one of readability. The former version communicates to a potential maintainer that some form of object initialization is performed. The latter makes the intent much clearer and should be (in my opinion) preferred.
By the way, what you've created is basically a traits class. Take a look at how traits are done in the standard library (they use static member functions).
Since assistant is essentially a collection of free functions, I would go with the static approach (maybe even make the constructor private). This makes clear that assistant is not intended to be instatiated. Also, and this is only a wild guess, this may result in slightly less memory consumption, since no implicit this-pointer (and no instance of the class) is needed.
I'd use the object approach - it seems a bit more standard and similar to the way you pass functors to STL algorithms - it's also easier to extend by allowing parameters passed to the constructor of the assistant to influence the results of the operations etc. There's no difference but the object approach will probably be more flexible long term and more in sync with similar solutions you'll find elsewhere.
Why an object is more flexible? One example is that you can easily implement more complex operations (like average in this example) that require you to store the temporary result "somewhere" and require analyzing results from a couple invocations while still keeping the same "paradigm" of usage. Second might be you'd want to do some optimization - say you need a temporary array to do something in those functions - why allocate it each time or have it static in your class and leave hanging and waste memory when you can allocate it when it's really needed, re-use on a number of elements but then release when all operations are done and the destructor of the object is called.
There's no advantage to using static functions - and as seen above there are at least a few advantages to using objects so the choice is rather simple imho.
Also the calling semantics can be practically identical - Assistant().sum( A, B ) instead of Assistant::sum( A, B ) - there's really little reason to NOT use an object approach :)
In the first method an assistant has to be created while the second method consists of just the function call, thus the second method is faster.
2nd method is preferred, in this method, there is no strcutre "assistent" variable created and it is calling only required member function. I think it is little bit faster in execution than 1st method.