Is the three-way comparison operator always efficient? - c++

Herb Sutter, in his proposal for the "spaceship" operator (section 2.2.2, bottom of page 12), says:
Basing everything on <=> and its return type: This model has major advantages, some unique to this proposal compared to previous proposals for C++ and the capabilities of other languages:
[...]
(6) Efficiency, including finally achieving zero-overhead abstraction for comparisons: The vast majority of comparisons are always single-pass. The only exception is generated <= and >= in the case of types that support both partial ordering and equality. For <, single-pass is essential to achieve the zero-overhead principle to avoid repeating equality comparisons, such as for struct Employee { string name; /*more members*/ }; used in struct Outer { Employeee; /*more members*/ }; – today’s comparisons violates zero-overhead abstraction because operator< on Outer performs redundant equality comparisons, because it performs if (e != that.e) return e < that.e; which traverses the equal prefix of
e.name twice (and if the name is equal, traverses the equal prefixes of other members of Employee twice as well), and this cannot be optimized away in general. As Kamiński notes, zero-overhead abstraction is a pillar of C++, and achieving it for comparisons for the first time is a significant advantage of this design based on <=>.
But then he gives this example (section 1.4.5, page 6):
class PersonInFamilyTree { // ...
public:
std::partial_ordering operator<=>(const PersonInFamilyTree& that) const {
if (this->is_the_same_person_as ( that)) return partial_ordering::equivalent;
if (this->is_transitive_child_of( that)) return partial_ordering::less;
if (that. is_transitive_child_of(*this)) return partial_ordering::greater;
return partial_ordering::unordered;
}
// ... other functions, but no other comparisons ...
};
Would define operator>(a,b) as a<=>b > 0 not lead to large overhead? (though in a different form than he discusses). That code would first test for equality, then for less, and finally for greater, rather than only and directly testing for greater.
Am I missing something here?

Would define operator>(a,b) as a<=>b > 0 not lead to large overhead?
It would lead to some overhead. The magnitude of the overhead is relative, though - in situations when costs of running comparisons are negligible in relation to the rest of the program, reducing code duplication by implementing one operator instead of five may be an acceptable trade-off.
However, the proposal does not suggest removing other comparison operators in favor of <=>: if you want to overload other comparison operators, you are free to do it:
Be general: Don’t restrict what is inherent. Don’t arbitrarily restrict a complete set of uses. Avoid special cases and partial features. – For example, this paper supports all seven comparison operators and operations, including adding three-way comparison via <=>. It also supports all five major comparison categories, including partial orders.

For some definition of large. There is overhead because in a partial ordering, a == b iff a <= b and b <= a. The complexity would be the same as a topological sort, O(V+E). Of course, the modern C++ approach is to write safe, clean and readable code and then optimizing. You can choose to implement the spaceship operator first, then specialize once you determine performance bottlenecks.

Generally speaking, overloading <=> makes sense when you're dealing with a type where doing all comparisons at once is either only trivially more expensive or has the same cost as comparing them differently.
With strings, <=> seems more expensive than a straight == test, since you have to subtract each pair of two characters. However, since you already had to load each pair of characters into memory, adding a subtraction on top of that is a trivial expense. Indeed, comparing two numbers for equality is sometimes implemented by compilers as a subtraction and test against zero. And even for compilers that don't do that, the subtract and compare against zero is probably not significantly less efficient.
So for basic types like that, you're more or less fine.
When you're dealing with something like tree ordering, you really need to know up-front which operation you care about. If all you asked for was ==, you really don't want to have to search the rest of the tree just to know that they're unequal.
But personally... I would never implement something like tree ordering with comparison operators to begin with. Why? Because I think that comparisons like that ought to be logically fast operations. Whereas a tree search is such a slow operation that you really don't want to do it by accident or at any time other than when it is absolutely necessary.
Just look at this case. What does it really mean to say that a person in a family tree is "less than" another? It means that one is a child of the other. Wouldn't it be more readable in code to just ask that question directly with is_transitive_child_of?
The more complex your comparison logic is, the less likely it is that what you're doing is really a "comparison". That there's probably some textual description that this "comparison" operation could be called that would be more readable.
Oh sure, such a class could have a function that returns a partial_order representing the relationship between the two objects. But I wouldn't call that function operator<=>.
But in any case, is <=> a zero-overhead abstraction of comparison? No; you can construct cases where it costs significantly more to compute the ordering than it does to detect the specific relation you asked for. But personally if that's the case, there's a good chance that you shouldn't be comparing such types through operators at all.

Related

why is there no ordering that implies substitutable, disallows incomparable?

The standard library defines weak ordering, partial ordering and strong ordering. these types define semantics for orderings that imply 3 of the 4 combinations of implying/not implying substitutability (a == b implies f(a) == f(b) where f() reads comparison-related state) and allowing/disallowing incomparable values (a < b, a == b and a > b may all be false).
I have a situation where my three-way (spaceship) operator has the semantics of equality implies substitutability, but some values may be incomparable. I could return a weak_ordering from my three-way comparison, but this lacks the semantic meaning of my type. I could also define my own ordering type, but I am reluctant to do that without understanding why it was omitted.
I believe the standard library's orderings are equivalent to the mathematical definition of weak ordering, partial ordering and total ordering. however, those are defined without the notion of substitutability. Is there a mathematical ordering equivalent to the one I am describing? Is there a reason why it was omitted from the standard?
It's not entirely clear from the original paper why the possibility of a "strong partial order" wasn't included. It already had comparison categories that were themselves incomparable (e.g., weak_ordering and strong_equality), so it could presumably have included another such pair.
One possibility is that the mathematical notion of a partial order actually requires that no two (distinct) elements are equal in the sense that a ≤ b and b ≤ a (so that the graph induced by ≤ is acyclic). People therefore don't usually talk about what "equivalent" means in a partial order. Relaxing that restriction produces a preorder, or equivalently one can talk about a preorder as a partial order on a set of equivalence classes projected onto their elements. That's truly what std::partial_ordering means if we consider there to exist multiple values that compare equivalent.
Of course, the easy morphism between preorders and partial orders limits the significance of the restriction on the latter; "distinct" can always be taken to mean "not in the same equivalence class" so long as you equally apply the equivalence-class projection to any coordinate questions of (say) cardinality. In the same fashion, the standard's notion of substitutability is not very useful: it makes very little sense to say that a std::weak_ordering operator<=> produces equivalent for two values that differ "where f() reads comparison-related state" since if the state were comparison-related operator<=> would distinguish them.
The practical answer is therefore to largely ignore std::weak_ordering and consider your choices to be total/partial ordering—in each case defined on the result of some implicit projection onto equivalence classes. (The standard talks about a "strict weak order" in the same fashion, supposing that there is some "true identity" that equivalent objects might lack but which is irrelevant anyway. Another term for such a relation is a "total preorder", so you could also consider the two meaningful comparison categories to simply be preorders that might or might not be total. That makes for a pleasingly concise categorization—"<=> defines preorders; is yours total?"—but would probably lose many readers.)

What are the best sorting algorithms when 'n' is very small?

In the critical path of my program, I need to sort an array (specifically, a C++ std::vector<int64_t>, using the gnu c++ standard libray). I am using the standard library provided sorting algorithm (std::sort), which in this case is introsort.
I was curious about how well this algorithm performs, and when doing some research on various sorting algorithms different standard and third party libraries use, almost all of them care about cases where 'n' tends to be the dominant factor.
In my specific case though, 'n' is going to be on the order of 2-20 elements. So the constant factors could actually be dominant. And things like cache effects might be very different when the entire array we are sorting fits into a couple of cache lines.
What are the best sorting algorithms for cases like this where the constant factors likely overwhelm the asymptotic factors? And do there exist any vetted C++ implementations of these algorithms?
Introsort takes your concern into account, and switches to an insertion sort implementation for short sequences.
Since your STL already provides it, you should probably use that.
Insertion sort or selection sort are both typically faster for small arrays (i.e., fewer than 10-20 elements).
Watch https://www.youtube.com/watch?v=FJJTYQYB1JQ
A simple linear insertion sort is really fast. Making a heap first can improve it a bit.
Sadly the talk doesn't compare that against the hardcoded solutions for <= 15 elements.
It's impossible to know the fastest way to do anything without knowing exactly what the "anything" is.
Here is one possible set of assumptions:
We don't have any knowledge of the element structure except that elements are comparable. We have no useful way to group them into bins (for radix sort), we must implement a comparison-based sort, and comparison takes place in an opaque manner.
We have no information about the initial state of the input; any input order is equally likely.
We don't have to care about whether the sort is stable.
The input sequence is a simple array. Accessing elements is constant-time, as is swapping them. Furthermore, we will benchmark the function purely according to the expected number of comparisons - not number of swaps, wall-clock time or anything else.
With that set of assumptions (and possibly some other sets), the best algorithms for small numbers of elements will be hand-crafted sorting networks, tailored to the exact length of the input array. (These always perform the same number of comparisons; it isn't feasible to "short-circuit" these algorithms conditionally because the "conditions" would depend on detecting data that is already partially sorted, which still requires comparisons.)
For a network sorting four elements (in the known-optimal five comparisons), this might look like (I did not test this):
template<class RandomIt, class Compare>
void _compare_and_swap(RandomIt first, Compare comp, int x, int y) {
if (comp(first[x], first[y])) {
auto tmp = first[x];
arr[x] = arr[y];
arr[y] = tmp;
}
}
// Assume there are exactly four elements available at the `first` iterator.
template<class RandomIt, class Compare>
void network_sort_4(RandomIt first, Compare comp) {
_compare_and_swap(2, 0);
_compare_and_swap(1, 3);
_compare_and_swap(0, 1);
_compare_and_swap(2, 3);
_compare_and_swap(1, 2);
}
In real-world environments, of course, we will have different assumptions. For small numbers of elements, with real data (but still assuming we must do comparison-based sorts) it will be difficult to beat naive implementations of insertion sort (or bubble sort, which is effectively the same thing) that have been compiled with good optimizations. It's really not feasible to reason about these things by hand, considering both the complexity of the hardware level (e.g. the steps it takes to pipeline instructions and then compensate for branch mis-predictions) and the software level (e.g. the relative cost of performing the swap vs. performing the comparison, and the effect that has on the constant-factor analysis of performance).

Is there a constexpr ordering of types in C++?

Does C++ provide an ordering of the set of all types as a constant expression? It doesn't matter which particular order, any one will do. This could be in form of a constexpr comparison function:
template <typename T1, typename T2>
constexpr bool TypeLesser ();
My use for this is for a compile time self-balancing binary search tree of types, as a replacement of (cons/nil) type lists, to speed up the compilation. For example, checking whether a type is contained in such a tree may be faster than checking if it is contained in a type list.
I will also accept compiler-specific intrinsics if standard C++ does not provide such a feature.
Note that if the only way to get an ordering is to define it manually by adding boilerplate all over the code base (which includes a lot of templates and anonymous structs), I will rather stay with type lists.
The standard’s only ordering is via type_info (provided by typeid expression), which you can use more easily via type_index – the latter provides ordinary comparison functionality so that it can be used in collections.
I guess its ancestry is the class Andrei Alexandrescu had in “Modern C++ Design”.
It's not compile time.
To reduce compilation time you can define traits classes for the types in question, assigning each type some ordinal value. A 128 bit UUID would do nicely as a type id, to avoid the practical issue of guaranteeing unique id's. This of course assumes that you or the client code controls the set of possible types.
The idea of having to "register" relevant types has been used before, in early Boost machinery for determining function result types.
I must anyway recommend seriously measuring compilation performance. The balancing operations that are fast at run time, involving only adjustment of a few pointers, may be slow at compile time, involving creating a huge descriptor of a whole new type. So even though checking for type set membership may be faster, building the type set may be seriously much slower, e.g. O(n2).
Disclaimer: I haven't tried that.
But anyway, I remember again that Andrei Alexandrescu discussed something of the sort in the already mentioned “Modern C++ Design”, or if you don't have access to that book, look in the Loki library (which is a library of things from that book).
You have two main problems: 1) You have no specific comparison criteria (Hence the question, isn't?), and 2) You don't have any standard way to sort at compile-time.
For the first use std::type_info as others suggested (Its currently used on maps via the std::type_index wrapper) or define your own metafunction to specify the ordering criteria for different types. For the second, you could try to write your own template-metaprogramming based quicksort algorithm. Thats what I did for my personal metaprogramming library and works perfectly.
About the assumption "A self-balancing search tree should perform better than classic typelists" I really encourage you to do some profillings (Try templight) before saying that. Compile-time performance has nothing to do with classic runtime performance, depends heavily in the exact implementation of the template instantation system the compiler has.
For example, based on my own experience I'm pretty sure that my simple "O(n)" linear search could perform better than your self balanced tree. Why? Memoization. Compile-time performance is not only instantation depth. In fact memoization has a crucial role on this.
To give you a real example: Consider the implementation of quicksort (Pseudo meta code):
list sort( List l )
{
Int pivot = l[l.length/2];
Tuple(List,List) lists = reorder( l , pivot , l.length/2 );
return concat( sort( lists.left ) , sort( lists.right ) );
}
I hope that example is self-explanatory. Note the functional way it works, there are no side effects. I will be glad if some day metaprogramming in C++ has that syntax...
Thats the recursive case of quicksort. Since we are using typelists (Variadic typelists in my case), the first metainstruction, which computes the value of the pivot has O(n) complexity. Specifically requires a template instantation depth of N/2. The seconds step (Reordering) could be done in O(n), and concatting is O(1) (Remember that are C++11 variadic typelists).
Now consider an example of execution:
[1,2,3,4,5]
The first step calls the recursive case, so the trace is:
Int pivot = l[l.length/2]; traverses the list until 3. That means the instantations needed to perform the traversings [1], [1,2], [1,2,3] are memoized.
During the reordering, more subtraversings (And combinations of subtraversing generated by element "swapping") are generated.
Recursive "calls" and concat.
Since such linear traversings performed to go to the middle of the list are memoized, they are instantiated only once along the whole sort execution. When I first encountered this using templight I was completely dumbfounded. The fact, looking at the instantations graph, is that only the first large traverses are instantiated, the little ones are just part of the large and since the large where memoized, the little are not instanced again.
So wow, the compiler is able of memoizing at least the half of that so slow linear traversings, right? But, what is the cost of such enormous memoization efforts?
What I'm trying to say with this answer is: When doing template meta-programming, forget everything about runtime performance, optimizations, costs, etc, and don't do assumptions. Measure. You are entering in a completely different league. I'm not completely sure what implementation (Your selft balancing trees vs simple linear traversing) is faster, because that depends on the compiler. My example was only to show how actually a compiler could break down completely your assumptions.
Side note: The first time I did that profilings I showed them to an algorithms teacher of my university, and he's still trying to figure out whats happening. In fact, he asked a question here about how to measure the complexity and performance of this monster: Best practices for measuring the run-time complexity of a piece of code

Which is faster, string.empty() or string.size() == 0?

Recently, during a discussion I was asked by a fellow programmer to do some code changes. I had something like:
if( mystring.size() == 0)
// do something
else
// do something else
The discussion was regarding the use of mystring.empty() to validate if the string is empty. Now, I agree that it can be argued that string.empty() is more verbose and readable code, but are there any performance benefits to it?
I did some digging and found these 2 answers pertaining to my question:
Implementation from basic_string.h
SO Answer that points to ISO Standard - here
Both the answers buttress my claim that the string.empty() is just more readable and doesn't offer any performance benefits, compared to string.size() == 0.
I still want to be sure, if there are any implementations of string that keep an internal boolean flag to validate if a string is empty or not?
Or there are other ways that some implementations use, that would nullify my claim??
The standard defines empty() like this:
bool empty() const noexcept;
Returns: size() == 0.
You'd be hard-pressed to find something that doesn't do that, and any performance difference would be negligible due to both being constant time operations. I would expect both to compile to the exact same assembly on any reasonable implementation.
That said, empty() is clear and explicit. You should prefer it over size() == 0 (or !size()) for readability.
Now, this is a pretty trivial matter, but I'll try to cover it exhaustively so whatever arguments are put by colleagues aren't likely to take you by surprise....
As usual, if profiling proved you really really had to care, measure: there could be a difference (see below). But in a general code review situation for not-proved-problematically-slow code, the outstanding issues are:
in some other containers (e.g. C++03 lists but not C++11), size() was less efficient than empty(), leading to some coding tips to prefer empty() over size() in general so that if someone needed to switch the container later (or generalise the processing into a template where the container type may vary) no change needs to be made to retain efficiency.
does either reflect a more natural way to conceive of the test? - not just what you happened to think of first, or size() because you're not as used to using empty(), but when you're 100% focused on the surrounding code logic, does size() or empty() fit in better? For example, perhaps because it's one of several tests of size() and you like having consistency, or because you're implementing a famous algorithm or formula that's traditionally expressed in terms of size: being consistent might reduce the mental noise/effort in verifying the implementation against the formula.
Most of the time the factors above are insignificant, and raising the issue in a code review is really a waste of time.
Possible performance difference
While the Standard requires functional equivalence, some implementations might implement them differently, though I've struggled and so far failed to document a particularly plausible reason for doing so.
C++11 has more constraints than C++03 over behaviours of other functions that impact implementation choices: data() must be NUL terminated (used to be just c_str()), [size()] is now a valid index and must return a reference to a NUL character. For various subtle reasons, these restrictions make it even more likely that empty() will be no faster than size().
Anyway - measure if you have to care.

GLM + STL: operator == missing

I try to use GLM vector classes in STL containers. No big deal as long as I don't try to use <algorithm>. Most algorithms rely on the == operator which is not implemented for GLM classes.
Anyone knows an easy way to work around this? Without (re-)implementing STL algorithms :(
GLM is a great math library implementing GLSL functions in c++
Update
I just found out that glm actually implements comparison operators in an extension (here). But how do i use them in stl?
Update 2
This question has been superseded by this one: how to use glm's operator== in stl algorithms?
Many STL algorithms accept a functor for object comparison (of course, you need to exercise special care when comparing two vectors containing floating point values for equality).
Example:
To sort a std::list<glm::vec3> (it's up to you whether sorting vectors that way would make any practical sense), you could use
std::sort(myVec3List.begin(), myVec3List.end(), MyVec3ComparisonFunc)
with
bool MyVec3ComparisonFunc(const glm::vec3 &vecA, const glm::vec3 &vecB)
{
return vecA[0]<vecB[0]
&& vecA[1]<vecB[1]
&& vecA[2]<vecB[2];
}
So, thankfully, there is no need to modify GLM or even reinvent the wheel.
You should be able to implement a operator== as a stand-alone function:
// (Actually more Greg S's code than mine.....)
bool operator==(const glm::vec3 &vecA, const glm::vec3 &vecB)
{
const double epsilion = 0.0001; // choose something apprpriate.
return fabs(vecA[0] -vecB[0]) < epsilion
&& fabs(vecA[1] -vecB[1]) < epsilion
&& fabs(vecA[2] -vecB[2]) < epsilion;
}
James Curran and Greg S have already shown you the two major approaches to solving the problem.
define a functor to be used explicitly in the STL algorithms that need it, or
define the actual operators == and < which STL algorithms use if no functor is specified.
Both solutions are perfectly fine and idiomatic, but a thing to remember when defining operators is that they effectively extend the type. Once you've defined operator< for a glm::vec3, these vectors are extended to define a "less than" relationship, which means that any time someone wants to test if one vector is "less than" another, they'll use your operator. So operators should only be used if they're universally applicable. If this is always the one and only way to define a less than relationship between 3D vectors, go ahead and make it an operator.
The problem is, it probably isn't. We could order vectors in several different ways, and none of them is obviously the "right one". For example, you might order vectors by length. Or by magnitude of the x component specifically, ignoring the y and z ones. Or you could define some relationship using all three components (say, if a.x == b.x, check the y coordinates. If those are equal, check the z coordinates)
There is no obvious way to define whether one vector is "less than" another, so an operator is probably a bad way to go.
For equality, an operator might work better. We do have a single definition of equality for vectors: two vectors are equal if every component is equal.
The only problem here is that the vectors consist of floating point values, and so you may want to do some kind of epsilon comparison so they're equal if all members are nearly equal. But then the you may also want the epsilon to be variable, and that can't be done in operator==, as it only takes two parameters.
Of course, operator== could just use some kind of default epsilon value, and functors could be defined for comparisons with variable epsilons.
There's no clear cut answer on which to prefer. Both techniques are valid. Just pick the one that best fits your needs.