memcmp vs multiple equality comparisons - c++

Precondition: Consider such a class or struct T, that for two objects a and b of type T
memcmp(&a, &b, sizeof(T)) == 0
yields the same result as
a.member1 == b.member1 && a.member2 == b.member2 && ...
(memberN is a non-static member variable of T).
Question: When should memcmp be used to compare a and b for equality, and when should the chained ==s be used?
Here's a simple example:
struct vector
{
int x, y;
};
To overload operator == for vector, there are two possibilities (if they're guaranteed to give the same result):
bool operator==(vector lhs, vector rhs)
{ return lhs.x == rhs.x && lhs.y == rhs.y; }
or
bool operator==(vector lhs, vector rhs)
{ return memcmp(&lhs, &rhs, sizeof(vector)) == 0; }
Now if a new member were to be added to vector, for example a z component:
If ==s were used to implement operator==, it would have to be modified.
If memcmp was used instead, operator== wouldn't have to be modified at all.
But I think using chained ==s conveys a clearer meaning. Although for a large T with many members memcmp is more tempting. Additionally, is there a performance improvement from using memcmp over ==s? Anything else to consider?

Regarding the precondition of memcmp yielding the same result as member-wise comparisons with ==, while this precondition is often fulfilled in practice, it's somewhat brittle.
Changing compilers or compiler options can in theory break that precondition. Of more concern, code maintenance (and 80% of all programming work is maintenance, IIRC) can break it by adding or removing members, making the class polymorphic, adding custom == overloads, etc. And as mentioned in one of the comments, the precondition can hold for static variables while it doesn't hold for automatic variables, and then maintenance work that creates non-static objects can do Bad Things™.
And regarding the question of whether to use memcmp or member-wise == to implement an == operator for the class, first, this is a false dichotomy, for those are not the only options.
For example, it can be less work and more maintainable to use automatic generation of relational operator overloads, in terms of a compare function. The std::string::compare function is an example of such a function.
Secondly, the answer to what implementation to choose depends strongly on what you consider important, e.g.:
should one seek to maximize runtime efficiency, or
should one seek to create clearest code, or
should one seek the most terse, fastest to write code, or
should one seek to make the class most safe to use, or
something else, perhaps?
Generating relational operators.
You may have heard of CRTP, the Curiously Recurring Template Pattern. As I recall it was invented to deal with the requirement of generating relational operator overloads. I may possibly be conflating that with something else, though, but anyway:
template< class Derived >
struct Relops_from_compare
{
friend
auto operator!=( const Derived& a, const Derived& b )
-> bool
{ return compare( a, b ) != 0; }
friend
auto operator<( const Derived& a, const Derived& b )
-> bool
{ return compare( a, b ) < 0; }
friend
auto operator<=( const Derived& a, const Derived& b )
-> bool
{ return compare( a, b ) <= 0; }
friend
auto operator==( const Derived& a, const Derived& b )
-> bool
{ return compare( a, b ) == 0; }
friend
auto operator>=( const Derived& a, const Derived& b )
-> bool
{ return compare( a, b ) >= 0; }
friend
auto operator>( const Derived& a, const Derived& b )
-> bool
{ return compare( a, b ) > 0; }
};
Given the above support, we can investigate the options available for your question.
Implementation A: comparison by subtraction.
This is a class providing a full set of relational operators without using either memcmp or ==:
struct Vector
: Relops_from_compare< Vector >
{
int x, y, z;
// This implementation assumes no overflow occurs.
friend
auto compare( const Vector& a, const Vector& b )
-> int
{
if( const auto r = a.x - b.x ) { return r; }
if( const auto r = a.y - b.y ) { return r; }
return a.z - b.z;
}
Vector( const int _x, const int _y, const int _z )
: x( _x ), y( _y ), z( _z )
{}
};
Implementation B: comparison via memcmp.
This is the same class implemented using memcmp; I think you'll agree that this code scales better and is simpler:
struct Vector
: Relops_from_compare< Vector >
{
int x, y, z;
// This implementation requires that there is no padding.
// Also, it doesn't deal with negative numbers for < or >.
friend
auto compare( const Vector& a, const Vector& b )
-> int
{
static_assert( sizeof( Vector ) == 3*sizeof( x ), "!" );
return memcmp( &a, &b, sizeof( Vector ) );
}
Vector( const int _x, const int _y, const int _z )
: x( _x ), y( _y ), z( _z )
{}
};
Implementation C: comparison member by member.
This is an implementation using member-wise comparisons. It doesn't impose any special requirements or assumptions. But it's more source code.
struct Vector
: Relops_from_compare< Vector >
{
int x, y, z;
friend
auto compare( const Vector& a, const Vector& b )
-> int
{
if( a.x < b.x ) { return -1; }
if( a.x > b.x ) { return +1; }
if( a.y < b.y ) { return -1; }
if( a.y > b.y ) { return +1; }
if( a.z < b.z ) { return -1; }
if( a.z > b.z ) { return +1; }
return 0;
}
Vector( const int _x, const int _y, const int _z )
: x( _x ), y( _y ), z( _z )
{}
};
Implementation D: compare in terms of relational operators.
This is an implementation sort of reversing the natural order of things, by implementing compare in terms of < and ==, which are provided directly and implemented in terms of std::tuple comparisons (using std::tie).
struct Vector
{
int x, y, z;
friend
auto operator<( const Vector& a, const Vector& b )
-> bool
{
using std::tie;
return tie( a.x, a.y, a.z ) < tie( b.x, b.y, b.z );
}
friend
auto operator==( const Vector& a, const Vector& b )
-> bool
{
using std::tie;
return tie( a.x, a.y, a.z ) == tie( b.x, b.y, b.z );
}
friend
auto compare( const Vector& a, const Vector& b )
-> int
{
return (a < b? -1 : a == b? 0 : +1);
}
Vector( const int _x, const int _y, const int _z )
: x( _x ), y( _y ), z( _z )
{}
};
As given, client code using e.g. > needs a using namespace std::rel_ops;.
Alternatives include adding all other operators to the above (much more code), or using a CRTP operator generation scheme that implements the other operators in terms of < and = (possibly inefficiently).
Implementation E: comparision by manual use of < and ==.
This implementation is the result not applying any abstraction, just banging away at the keyboard and writing directly what the machine should do:
struct Vector
{
int x, y, z;
friend
auto operator<( const Vector& a, const Vector& b )
-> bool
{
return (
a.x < b.x ||
a.x == b.x && (
a.y < b.y ||
a.y == b.y && (
a.z < b.z
)
)
);
}
friend
auto operator==( const Vector& a, const Vector& b )
-> bool
{
return
a.x == b.x &&
a.y == b.y &&
a.z == b.z;
}
friend
auto compare( const Vector& a, const Vector& b )
-> int
{
return (a < b? -1 : a == b? 0 : +1);
}
Vector( const int _x, const int _y, const int _z )
: x( _x ), y( _y ), z( _z )
{}
};
What to choose.
Considering the list of possible aspects to value most, like safety, clarity, efficiency, shortness, evaluate each approach above.
Then choose the one that to you is clearly best, or one of the approaches that seem about equally best.
Guidance: For safety you would not want to choose approach A, subtraction, since it relies on an assumption about the values. Note that also option B, memcmp, is unsafe as an implementation for the general case, but can do well for just == and !=. For efficiency you should better MEASURE, with relevant compiler options and environment, and remember Donald Knuth's adage: “premature optimization is the root of all evil” (i.e. spending time on that may be counter-productive).

If, as you say, you've chosen types such that the two solutions yield the same results (presumably, then, you have no indirect data and the alignment/padding is all the same), then clearly you can use whichever solution you like.
Things to consider:
Performance: I doubt you'll see much if any difference, but measure it to be sure, if you care;
Safety: Well you say the two solutions are the same for your T, but are they? Are they really? On all systems? Is your memcmp approach portable? Probably not;
Clarity: If your preconditions ever change and you did not adequately comment-describe your memcmp usage, then your program is liable to break — you've therefore made it fragile;
Consistency: Presumably you use == elsewhere; certainly you'll have to do it for every T that doesn't meet your preconditions; unless this is a deliberate optimising specialisation for T, you may consider sticking to a single approach throughout your program;
Ease of use: Of course, it's pretty easy to miss out a member from chained ==, especially if your list of members ever grows.

If two solutions are both correct, prefer the more readable one. I'd say that for a C++ programmer, == is more readable than memcmp. I would go so far as to use std::tie instead of chaining:
bool operator==(const vector &lhs, const vector &rhs)
{ return std::tie(lhs.x, lhs.y) == std::tie(rhs.x, rhs.y); }

If any only if the structure is POD and if it is safely memcmp comparable (not even all numeric types are...) the result is the same and the question is about readability and performance.
Readability? This is a rather opinion based question I think but I'd prefer operator==.
Performance? operator== is a short-circuit operator. You have more control over your program here because you can reorder the comparison sequence.
Although a == b && c == d and c == d && a == b are equivalent in terms of algorithmic logic (result is the same) they aren't equivalent in terms of produced assembly, "background logic" and possibly performance.
You can influence your program if you can forsee some points.
In example:
If both statements are roughly equally likely to yield false, you'll want to have the cheaper statement first to skip the more complex comparison if possible.
If both statements are roughly equally complex and you know in advance that c == d is more likely to be false than a == b, you should compare c and d first.
It is possible to adjust the comparison sequence in a problem-dependant fashion using operator== whereas memcmp does not give you this kind of freedom.
PS: You would want to measure it but for a small struct with 3 members, MS VS 2013 produces slightly more complex assembly for the memcmp case. I'd expect the operator== solution to have a higher performace (if the impact would be measurable) in this case.
-/edith-
Note: Even POD struct members can have overloaded operator==.
Consider:
#include <iostream>
#include <iomanip>
struct A { int * p; };
bool operator== (A const &a, A const &b) { return *(a.p) == *(b.p); }
struct B { A m; };
bool operator== (B const &a, B const &b) { return a.m == b.m; }
int main()
{
int a(1), b(1);
B x, y;
x.m.p = &a;
y.m.p = &b;
std::cout << std::boolalpha;
std::cout << (memcmp(&x, &y, sizeof(B)) == 0) << "\n";
std::cout << (x == y) << "\n";
return 0;
}
Prints
false
true
Even if -in turn- all the members are fundamental types I would prefer operator== and leave it to the compiler to consider optimizing the comparison into whatever assembly it considers to be preferable.

== is better, because memcmp compares pure memory data(comparing that way can be wrong in many situations, such as std::string, array-imitiating classes or types that can be equal even if they aren't perfectly identical). Since inside your classes there may be such types, you should always use their own operators instead of comparing raw memory data.
== is also better because it's more readable than some weird-looking function.

You imposed a very strong condition that there is no padding (I assume neither between the members of the class, nor inside these members). I presume that you also intended to exclude any "hidden" housekeeping data from the class. In addition the question itself implies that we always compare objects of exactly the same type. Under such strong conditions there's probably no way to come up with a counterexample that would make the memcmp-based comparison to differ from == comparison.
Whether it makes it worth it to use memcmp for performance reasons... well, if you really have a good reason to aggressively optimize some critical piece of code and profiling shows that there's improvement after switching from == to memcmp, then definitely go ahead. But I wouldn't use it a a routine technique for writing comparison operators, even if your class satisfies the requirements.

Related

Multiple operator override behaviors by manipulating type

So let's say I'm overwriting the inequality checks of a structure describing a point. I have it set to compare the magnitude by default.
I want to have an option to check each value instead, and I was thinking of ways to do it inline. I could make a function that recasts it as a dummy structure so it has a different type when comparing.
struct vec3 {
double x,y,z;
struct _all {double x,y,z};
friend _all All (vec3 &A) { return *(_all*) &vec3;}
struct _any {double x,y,z};
friend _any Any (vec3 &A) { return *(_any*) &vec3;}
friend bool operator < (vec3 &A, double B) { /* does the magnitude check */ }
friend bool operator < (_all &A, double B) { return A.x<B && A.y < B && A.z<B; }
friend bool operator < (_any &A, double B) { return A.x<B || A.y < B || A.z<B; }
};
This is all so I can write
if (All(PointA) < Limit) {}
I have a feeling this breaks a bunch of coding practices, but I'm having a hard time finding any similar problem. Maybe it's safer to use a constructor instead of type changes? Or would it be clearer if I wrote a function IsAnyLessThan(double A) for my structure instead?

How to sort an array of structures according to values of one of its members, breaking ties on the basis of another member?

Suppose there is a structure:
struct x
{
int a,b,c;
};
The array of structure contains arr[0]={4,2,5}, arr[1]={6,3,1}, arr[2]={4,1,8}
then how can we sort this array in ascending order depending on value of member 'a',.
tie will be broken according to value of member 'b'.
So array after sorting should be : arr[2],then arr[0], then arr[1].
I have used qsort(arr,n,sizeof(struct x),compare);
with compare function defined as
int compare(const void* a, const void * b)
{
return (*(int*)a-*(int*)b);
}
What modification I hav to do if I have to break ties acccording to member b.(currently it is breaking ties on first come first serve basis).
int compare(const void* a, const void * b){
struct x x = *(struct x*)a;
struct x y = *(struct x*)b;
return x.a < y.a ? -1 : (x.a > y.a ? 1 : (x.b < y.b ? -1 : x.b > y.b));
}
Use std::sort with an appropriate comparator. This example uses std::tie to implement a lexicographical comparison using a first and then b, but you can write your own by hand. The only requirement is that it satisfy a strict weak ordering:
bool cmp(const x& lhs, const x& rhs)
{
return std::tie(lhs.a, lhs.b) < std::tie(rhs.a, rhs.b);
}
std::sort(std::begin(arr), std::end(arr), cmp);
or use a lambda:
std::sort(std::begin(arr),
std::end(arr),
[](const x& lhs, const x& rhs)
{
return std::tie(lhs.a, lhs.b) < std::tie(rhs.a, rhs.b);
});
If you are using C instead of C++, it can be done by this compare():
int compare(const void* a, const void* b) {
struct x s_a = *((struct x*)a);
struct x s_b = *((struct x*)b);
if(s_a.a == s_b.a)
return s_a.b < s_b.b ? -1 : 1; //the order of equivalent elements is undefined in qsort() of stdlib, so it doesn't matter to return 1 directly.
return s_a.a < s_b.a ? -1 : 1;
}
If you want to break ties according to member c when member a and b are both equal, add more if-else statements in compare().

Is there a concept name for a regular type for which comparisons doesn't compare the full object state?

I have a set of types which looks like this:
struct MyFlag
{
SomeId source_id; // INVALID_ID by default
SomeData data; // regular type
friend bool operator==( const MyFlag& a, const MyFlag& b ) { return a.source_id == b.source_id; }
friend bool operator<( const MyFlag& a, const MyFlag& b ) { return a.source_id < b.source_id; }
friend bool operator!=( const MyFlag& a, const MyFlag& b ) { return !(a == b); }
friend bool operator==( const SomeId& a, const MyFlag& b ) { return a == b.source_id; }
friend bool operator<( const SomeId& a, const MyFlag& b ) { return a < b.source_id; }
};
MyFlag flag_a { id, data_A };
MyFlag flag_b { id, data_B };
assert( flag_a == flag_b );
assert( flag_a.data != flag_b.data );
assert( flag_a == id );
assert( flag_b == id );
MyFlag flag = flag_b;
assert( flag == flag_a );
assert( flag == id );
assert( flag.data != flag_a.data );
const MyFlag flag_x ={ id_x, data_A };
flag = flag_X;
assert( flag != flag_a );
assert( flag.data == flag_a.data );
That is, only a specific part of the state of the object is considered in comparison: in this example, any MyFlag object would be compared to others using their ids, but not the rest of the data they contain.
I think it match the definition Sean Parent gave of a "value type", but I also think this is a strange or unfamiliar (but useful in my case) pattern.
So my question is: is there a concept name for this ... concept?
How is that kind of type useful? I use this kind of type in a "black board" event system which is basically a kind of set of any value that have a type that is at least regular.
However, this black board systematically overwrite the value pushed (inserted) in it even if it's already found (through comparison). That way, I overwrite the full state of a value in the black board using the comparison operators as identifiers.
I have no idea if it's a well known pattern or idea or if it's problematic on the long run. So far it have been very useful. It also feels like something that might be "too smart", but I lack experience with this pattern to confirm that. It might be that I am abusing the use of comparison operators, but it feels that the semantic of these types is correct in my use.
I can provide a detailed example of my usage if necessary.
MyFlag is not EqualityComparable, since == returns true for objects with distinct values. The definition of EqualityComparable in §3.3 includes axiom { a == b <=> eq(a, b); }.
Informally, eq is meant to represent equality of what we would consider to be the value of an object regardless of the existence of an == for that object's type. This isn't strictly the same thing as representational equality, since (a) different representations can be considered equal (e.g., -0.0 == 0.0), and (b) there can be insignificant state in representations (colloquially "padding").
In the case of MyFlag, I find it almost certain that data would be considered significant in the value of a MyFlag in some context (several occurrences appear in the OP itself). Formally, I could define an operator cmp over MyFlag as:
bool cmp(const MyFlag& a, const MyFlag& b) {
return a == b && a.data == b.data;
}
which clearly provides a stronger interpretation of equality than the corresponding operator ==.
Consider an implementation of std::copy:
template <typename In, typename Out>
Out copy_(In first, In last, Out out, std::false_type) {
while(first != last) {
*out++ = *first++;
}
}
template <typename In, typename Out>
Out copy_(In first, In last, Out out, std::true_type) {
while(first != last) {
*out = *first;
*out.data = SomeData();
++first;
++out;
}
}
template <typename In, typename Out>
Out copy(In first, In last, Out out) {
copy_(first, last, out, std::is_same<
Myflag,
typename std::iterator_traits<In>::value_type>());
}
Would you consider this to be a valid implementation of copy, or would you say it is corrupting data? It is equality-preserving according to Myflag's operator ==.
Contrastingly, had Myflag been defined as:
class MyFlag
{
SomeData trash_bits;
public:
SomeId source_id; // INVALID_ID by default
friend bool operator==( const MyFlag& a, const MyFlag& b ) { return a.source_id == b.source_id; }
friend bool operator<( const MyFlag& a, const MyFlag& b ) { return a.source_id < b.source_id; }
friend bool operator!=( const MyFlag& a, const MyFlag& b ) { return !(a == b); }
friend bool operator==( const SomeId& a, const MyFlag& b ) { return a == b.source_id; }
friend bool operator<( const SomeId& a, const MyFlag& b ) { return a < b.source_id; }
};
you could make a compelling argument that trash_bits are not part of the value of a MyFlag since they are never observed. Then I would agree that MyFlag is Regular.
I think you might find the answer in this paper from John Lakos, specificly in the background section. In short Lakos distinguishes the salient attributes which make up the value of an object vs. the non-salient attributes (I remember them being called incidental attributes, too but might be wrong about that) that do not (like e.g. the capacity of a vector).
The type has correct comparison operators defining a total ordering and is therefore TotallyOrdered (using the N3351 definition).
That does not distinguish whether the total ordering compares all of the object state or not, but there does not seem to be any concept for differentiating that. Because it would neither be possible to define (the == says the objects are equal based on the compared part of the state, how can you tell whether there is also any uncompared part?) nor does any algorithm reason to care.
What you seem to be describing is a non-essential part. It is very similar to capacity() on an std::vector. The concept of Regular is defined in terms of the semantics of copy, assignment, and equality. So long as those semantics are obeyed, your type is Regular. You need to decide what the essential part of you type are by deciding what it is that the type represents. The essential parts, those that contribute to what the object represents, must be copies and included in equality comparison.
I think you should distinguish between the level at which you apply your relational operators and their semantics. Your operators seem to have the correct semantics, but are applied at a confusing level (ID member, rather than the whole object).
First, I would define operator== and operator< to compare the entire object state. This is the least surprising and most idiomatic way. To only compare ids, just make a named operator id_equal_to that does a projection onto the ID data member. If you like, you can even define mixed versions (taking one MyFlag and one SomeID parameter), but that is usually only necessary to avoid overhead of implicit conversions. It doesn't seem required in this case.
Second, to make sure that these operators have the correct semantics (reflexive, symmetric and transitive for operator==, and irreflexive, asymmetric, transitive and total for operator<), just define them in terms of std::tie and the corresponding operator for std::tuple. You should also make sure that operator== and operator< on SomeId also has the correct semantics. For builtins, this is guaranteed, but for user-defined ID types, you can apply the same std::tie trick again.
#include <cassert>
#include <tuple>
enum { invalid = -1 };
using SomeId = int; // or any regular type with op== and op<
using SomeData = int; // or any regular type with op== and op<
struct MyFlag
{
SomeId source_id; // INVALID_ID by default
SomeData data; // regular type
friend bool operator==(MyFlag const& a, MyFlag const& b)
{ return std::tie(a.source_id, a.data) == std::tie(b.source_id, b.data); }
friend bool operator!=(MyFlag const& a, MyFlag const& b)
{ return !(a == b); }
friend bool operator<(MyFlag const& a, MyFlag const& b)
{ return std::tie(a.source_id, a.data) < std::tie(b.source_id, b.data); }
// similarly define >=, >, and <= in terms of !(a < b), (b < a) and !(b < a)
friend bool id_equal_to(MyFlag const& a, MyFlag const& b)
{ return a.source_id == b.source_id; }
};
int main()
{
auto const id = 0;
auto const data_A = 1;
auto const data_B = 2;
MyFlag flag_a { id, data_A };
MyFlag flag_b { id, data_B };
assert( flag_a != flag_b );
assert( id_equal_to(flag_a, flag_b) );
assert( flag_a.data != flag_b.data );
MyFlag flag = flag_b;
assert( flag != flag_a );
assert( id_equal_to(flag, flag_a) );
assert( flag.data != flag_a.data );
auto const id_x = invalid;
const MyFlag flag_x = { id_x, data_A };
flag = flag_x;
assert( flag != flag_a );
assert( id_equal_to(flag, flag_x) );
assert( !id_equal_to(flag, flag_a) );
assert( flag.data == flag_a.data );
}
Live Example.

Check if structure is not in vector

I have a vector of structs. I need to check if the struct is or is not in the vector. The entire struct, not any specific member. It throws me this error upon compile time:
binary '==' : no operator found which takes a left-hand operand of type 'NavigationNode'
(or there is no acceptable conversion)
My struct:
struct NavigationNode{
int x, y; //their x and y position on the grid
float f, g, h;
int parentGCost;
int value;
};
NavigationNode currentNode;
The vector
vector<NavigationNode> openList;
My find:
if (find(closedList.begin(), closedList.end(), currentNode) == closedList.end() )
{
}
You need to overload operator==.
As global function:
bool operator==( const NavigationNode& lhs, const NavigationNode& rhs )
{
// compare lhs and rhs
}
Or as member function:
bool operator==( const NavigationNode& other ) const
{
// compare this & other
}
You will have to write an equality operator for your custom type. Assuming all variables have to be the same for two NavigationNode objects to be the same, it should look something like this:
bool floatEqual(float a, float b)
{
// adapt this comparison to fit your use-case - see the notes below
static const int EPSILON = 0.00001; // arbitrarily chosen - needs to be adapted to the occuring values!
return std::abs(a – b) <= EPSILON;
}
bool operator==(NavigationNode const & a, NavigationNode const & b)
{
return a.x == b.x &&
a.y == b.y &&
floatEqual(a.f, b.f) &&
floatEqual(a.g, b.g) &&
floatEqual(a.h, b.h) &&
a.parentGCost == b.parentGCost &&
a.value == b.value;
}
Even if you could also do it as a member function of NavigationNode, the recommended way is to implement the operator== as a free function (that way, both parameters can take advantage of any possible implicit conversions).
Note on float comparison: Due to how floating point numbers are represented, it is not a trivial task to compare them. Just checking for equality might not give the desired results. See e.g. this question for details:
What is the most effective way for float and double comparison?
You need to overload the comparison operator.
If your intention of "==" is "are each of the values contained in my struct equal to the corresponding members in this other struct" then you can write that.
bool operator==(const NavigationNode& lhs, const NavigationNode& rhs)
{
return /* compare each member in here */
}

Why do I have to overload operator== in POD types?

I have a struct that's defined like this:
struct Vec3 {
float x, y, z;
}
When I attempted to use std::unique on a std::vector<Vec3>, I was met with this error:
Description Resource Path Location Type
no match for ‘operator==’ in ‘_first._gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* with _Iterator = Vec3*, _Container = std::vector > == _next._gnu_cxx::__normal_iterator<_Iterator, _Container>::operator* with _Iterator = Vec3*, _Container = std::vector >’ ModelConverter line 4351, external location: /usr/include/c++/4.4.6/bits/stl_algo.h C/C++ Problem
I understand the the necessity of the naievite of the compiler in inequality operators and others (in this case, * would almost certainly not be what I mean), but is this a matter of policy, or is there a technical reason for it that I'm not aware of? There's a default assignment operator, so why no default equality operator?
There's no technical reason. Pedantically, you might say this is because C doesn't let you compare two structures with ==, and this is a good reason; that behavior switching when you go to C++ is non-obvious. (Presumably, the reason that C doesn't support that is that field-wise comparison might work for some structs, but definitely not all.)
And just from a C++ point of view, what if you have a private field? A default == technically exposes that field (indirectly, but still). So would the compiler only generate an operator== if there are no private or protected data members?
Also, there are classes that have no reasonable definition of equality (empty classes, classes that do not model state but cache it, etc.), or for whom the default equality check might be extremely confusing (classes that wrap pointers).
And then there's inheritance. Deciding what to do for operator== in a situation of inheritance is complicated, and it'd be easy for the compiler to make the wrong decision. (For example, if this was what C++ did, we would probably be getting questions about why == always succeed when you test equality between two objects that are both descendants of an abstract base class and being used with a reference to it.)
Basically, it's a thorny problem, and it's safer for the compiler to stay out of it, even considering that you could override whatever the compiler decided.
The question of why you have to provide operator== is not the same as the question of why you have to provide some comparison function.
Regarding the latter, the reason that you are required to provide the comparison logic, is that element-wise equality is seldom appropriate. Consider, for example, a POD struct with an array of char in there. If it’s being used to hold a zero-terminated string, then two such structs can compare unequal at the binary level (due to arbitrary contents after the zero bytes in the strings) yet being logically equivalent.
In addition, there are all the C++ level complications mentioned by other answers here, e.g. the especially thorny one of polymorphic equality (you really don’t want the compiler to choose!).
So, essentially, there is simply no good default choice, so the choice is yours.
Regarding the former question, which is what you literally asked, why do you have to provide operator==?
If you define operator< and operator==, then the operator definitions in namespace std::rel_ops can fill in the rest for you. Presumably the reason why operator== is needed is that it would be needlessly inefficient to implement it in terms of operator< (then requiring two comparisons). However, the choice of these two operators as basis is thoroughly baffling, because it makes user code verbose and complicated, and in some cases much less efficient than possible!
The IMHO best basis for comparison operators is instead the three-valued compare function, such as std::string::compare.
Given a member function variant comparedTo, you can then use a Curiously Recurring Template Pattern class like the one below, to provide the full set of operators:
template< class Derived >
class ComparisionOps
{
public:
friend int compare( Derived const a, Derived const& b )
{
return a.comparedTo( b );
}
friend bool operator<( Derived const a, Derived const b )
{
return (compare( a, b ) < 0);
}
friend bool operator<=( Derived const a, Derived const b )
{
return (compare( a, b ) <= 0);
}
friend bool operator==( Derived const a, Derived const b )
{
return (compare( a, b ) == 0);
}
friend bool operator>=( Derived const a, Derived const b )
{
return (compare( a, b ) >= 0);
}
friend bool operator>( Derived const a, Derived const b )
{
return (compare( a, b ) > 0);
}
friend bool operator!=( Derived const a, Derived const b )
{
return (compare( a, b ) != 0);
}
};
where compare is an overloaded function, e.g. like this:
template< class Type >
inline bool lt( Type const& a, Type const& b )
{
return std::less<Type>()( a, b );
}
template< class Type >
inline bool eq( Type const& a, Type const& b )
{
return std::equal_to<Type>()( a, b );
}
template< class Type >
inline int compare( Type const& a, Type const b )
{
return (lt( a, b )? -1 : eq( a, b )? 0 : +1);
}
template< class Char >
inline int compare( basic_string<Char> const& a, basic_string<Char> const& b )
{
return a.compare( b );
}
template< class Char >
inline int compareCStrings( Char const a[], Char const b[] )
{
typedef char_traits<Char> Traits;
Size const aLen = Traits::length( a );
Size const bLen = Traits::length( b );
// Since there can be negative Char values, cannot rely on comparision stopping
// at zero termination (this can probably be much optimized at assembly level):
int const way = Traits::compare( a, b, min( aLen, bLen ) );
return (way == 0? compare( aLen, bLen ) : way);
}
inline int compare( char const a[], char const b[] )
{
return compareCStrings( a, b );
}
inline int compare( wchar_t const a[], wchar_t const b[] )
{
return compareCStrings( a, b );
}
Now, that’s the machinery. What does it look like to apply it to your class …
struct Vec3
{
float x, y, z;
};
?
Well it’s pretty simple:
struct Vec3
: public ComparisionOps<Vec3>
{
float x, y, z;
int comparedTo( Vec3 const& other ) const
{
if( int c = compare( x, other.x ) ) { return c; }
if( int c = compare( y, other.y ) ) { return c; }
if( int c = compare( z, other.z ) ) { return c; }
return 0; // Equal.
}
};
Disclaimer: not very tested code… :-)
C++20 adds this capability:
struct Vec3 {
float x, y, z;
auto operator<=>(const Vec3&) const = default;
bool operator==(X const&) const = default;
}
This is currently only implemented in GCC and clang trunk. Note that currently defaulting operator<=> is equivalent to also defaulting operator==, however there is an accepted proposal to remove this. The proposal suggests having defaulting operator<=> also imply (not be equivalent to as it is today) defaulting operator== as an extension.
Microsoft has documentation on this feature at https://devblogs.microsoft.com/cppblog/simplify-your-code-with-rocket-science-c20s-spaceship-operator/.
What would you like the equality operation to be? All the fields the same? It's not gonna make that decision for you.