The standard library defines weak ordering, partial ordering and strong ordering. these types define semantics for orderings that imply 3 of the 4 combinations of implying/not implying substitutability (a == b implies f(a) == f(b) where f() reads comparison-related state) and allowing/disallowing incomparable values (a < b, a == b and a > b may all be false).
I have a situation where my three-way (spaceship) operator has the semantics of equality implies substitutability, but some values may be incomparable. I could return a weak_ordering from my three-way comparison, but this lacks the semantic meaning of my type. I could also define my own ordering type, but I am reluctant to do that without understanding why it was omitted.
I believe the standard library's orderings are equivalent to the mathematical definition of weak ordering, partial ordering and total ordering. however, those are defined without the notion of substitutability. Is there a mathematical ordering equivalent to the one I am describing? Is there a reason why it was omitted from the standard?
It's not entirely clear from the original paper why the possibility of a "strong partial order" wasn't included. It already had comparison categories that were themselves incomparable (e.g., weak_ordering and strong_equality), so it could presumably have included another such pair.
One possibility is that the mathematical notion of a partial order actually requires that no two (distinct) elements are equal in the sense that a ≤ b and b ≤ a (so that the graph induced by ≤ is acyclic). People therefore don't usually talk about what "equivalent" means in a partial order. Relaxing that restriction produces a preorder, or equivalently one can talk about a preorder as a partial order on a set of equivalence classes projected onto their elements. That's truly what std::partial_ordering means if we consider there to exist multiple values that compare equivalent.
Of course, the easy morphism between preorders and partial orders limits the significance of the restriction on the latter; "distinct" can always be taken to mean "not in the same equivalence class" so long as you equally apply the equivalence-class projection to any coordinate questions of (say) cardinality. In the same fashion, the standard's notion of substitutability is not very useful: it makes very little sense to say that a std::weak_ordering operator<=> produces equivalent for two values that differ "where f() reads comparison-related state" since if the state were comparison-related operator<=> would distinguish them.
The practical answer is therefore to largely ignore std::weak_ordering and consider your choices to be total/partial ordering—in each case defined on the result of some implicit projection onto equivalence classes. (The standard talks about a "strict weak order" in the same fashion, supposing that there is some "true identity" that equivalent objects might lack but which is irrelevant anyway. Another term for such a relation is a "total preorder", so you could also consider the two meaningful comparison categories to simply be preorders that might or might not be total. That makes for a pleasingly concise categorization—"<=> defines preorders; is yours total?"—but would probably lose many readers.)
Related
I've checked all major compilers, and sizeof(std::tuple<int, char, int, char>) is 16 for all of them. Presumably they just put elements in order into the tuple, so some space is wasted because of alignment.
If tuple stored elements internally like: int, int, char, char, then its sizeof could be 12.
Is it possible for an implementation to do this, or is it forbidden by some rule in the standard?
std::tuple sizeof, is it a missed optimization?
Yep.
Is it possible for an implementation to do this[?]
Yep.
[Is] it forbidden by some rule in the standard?
Nope!
Reading through [tuple], there is no constraint placed upon the implementation to store the members in template-argument order.
In fact, every passage I can find seems to go to lengths to avoid making any reference to member-declaration order at all: get<N>() is used in the description of operational semantics. Other wording is stated in terms of "elements" rather than "members", which seems like quite a deliberate abstraction.
In fact, some implementations do apparently store the members in reverse order, at least, probably simply due to the way they use inheritance recursively to unpack the template arguments (and because, as above, they're permitted to).
Speaking specifically about your hypothetical optimisation, though, I'm not aware of any implementation that doesn't store elements in [some trivial function of] the user-given order; I'm guessing that it would be "hard" to come up with such an order and to provide the machinery for std::get, at least as compared to the amount of gain you'd get from doing so. If you are really concerned about padding, you may of course choose your element order carefully to avoid it (on some given platform), much as you would with a class (without delving into the world of "packed" attributes). (A "packed" tuple could be an interesting proposal…)
Yes, it's possible and has been (mostly) done by R. Martinho Fernandes. He used to have a blog called Flaming Danger Zone, which is now down for some reason, but its sources are still available on github.
Here are the all four parts of the Size Matters series on this exact topic: 1, 2, 3, 4.
You might wish to view them raw since github doesn't understand C++ highlighting markup used and renders code snippets as unreadable oneliners.
He essentially computes a permutation for tuple indices via C++11 template meta-program, that sorts elements by alignment in non-ascending order, stores the elements according to it, and then applies it to the index on every access.
They could. One possible reason they don’t: some architectures, including x86, have an indexing mode that can address an address base + size × index in a single instruction—but only when size is a power of 2. Or it might be slightly faster to do a load or store aligned to a 16-byte boundary. This could make code that addresses arrays of std::tuple slightly faster and more compact if they add four padding bytes.
Herb Sutter, in his proposal for the "spaceship" operator (section 2.2.2, bottom of page 12), says:
Basing everything on <=> and its return type: This model has major advantages, some unique to this proposal compared to previous proposals for C++ and the capabilities of other languages:
[...]
(6) Efficiency, including finally achieving zero-overhead abstraction for comparisons: The vast majority of comparisons are always single-pass. The only exception is generated <= and >= in the case of types that support both partial ordering and equality. For <, single-pass is essential to achieve the zero-overhead principle to avoid repeating equality comparisons, such as for struct Employee { string name; /*more members*/ }; used in struct Outer { Employeee; /*more members*/ }; – today’s comparisons violates zero-overhead abstraction because operator< on Outer performs redundant equality comparisons, because it performs if (e != that.e) return e < that.e; which traverses the equal prefix of
e.name twice (and if the name is equal, traverses the equal prefixes of other members of Employee twice as well), and this cannot be optimized away in general. As Kamiński notes, zero-overhead abstraction is a pillar of C++, and achieving it for comparisons for the first time is a significant advantage of this design based on <=>.
But then he gives this example (section 1.4.5, page 6):
class PersonInFamilyTree { // ...
public:
std::partial_ordering operator<=>(const PersonInFamilyTree& that) const {
if (this->is_the_same_person_as ( that)) return partial_ordering::equivalent;
if (this->is_transitive_child_of( that)) return partial_ordering::less;
if (that. is_transitive_child_of(*this)) return partial_ordering::greater;
return partial_ordering::unordered;
}
// ... other functions, but no other comparisons ...
};
Would define operator>(a,b) as a<=>b > 0 not lead to large overhead? (though in a different form than he discusses). That code would first test for equality, then for less, and finally for greater, rather than only and directly testing for greater.
Am I missing something here?
Would define operator>(a,b) as a<=>b > 0 not lead to large overhead?
It would lead to some overhead. The magnitude of the overhead is relative, though - in situations when costs of running comparisons are negligible in relation to the rest of the program, reducing code duplication by implementing one operator instead of five may be an acceptable trade-off.
However, the proposal does not suggest removing other comparison operators in favor of <=>: if you want to overload other comparison operators, you are free to do it:
Be general: Don’t restrict what is inherent. Don’t arbitrarily restrict a complete set of uses. Avoid special cases and partial features. – For example, this paper supports all seven comparison operators and operations, including adding three-way comparison via <=>. It also supports all five major comparison categories, including partial orders.
For some definition of large. There is overhead because in a partial ordering, a == b iff a <= b and b <= a. The complexity would be the same as a topological sort, O(V+E). Of course, the modern C++ approach is to write safe, clean and readable code and then optimizing. You can choose to implement the spaceship operator first, then specialize once you determine performance bottlenecks.
Generally speaking, overloading <=> makes sense when you're dealing with a type where doing all comparisons at once is either only trivially more expensive or has the same cost as comparing them differently.
With strings, <=> seems more expensive than a straight == test, since you have to subtract each pair of two characters. However, since you already had to load each pair of characters into memory, adding a subtraction on top of that is a trivial expense. Indeed, comparing two numbers for equality is sometimes implemented by compilers as a subtraction and test against zero. And even for compilers that don't do that, the subtract and compare against zero is probably not significantly less efficient.
So for basic types like that, you're more or less fine.
When you're dealing with something like tree ordering, you really need to know up-front which operation you care about. If all you asked for was ==, you really don't want to have to search the rest of the tree just to know that they're unequal.
But personally... I would never implement something like tree ordering with comparison operators to begin with. Why? Because I think that comparisons like that ought to be logically fast operations. Whereas a tree search is such a slow operation that you really don't want to do it by accident or at any time other than when it is absolutely necessary.
Just look at this case. What does it really mean to say that a person in a family tree is "less than" another? It means that one is a child of the other. Wouldn't it be more readable in code to just ask that question directly with is_transitive_child_of?
The more complex your comparison logic is, the less likely it is that what you're doing is really a "comparison". That there's probably some textual description that this "comparison" operation could be called that would be more readable.
Oh sure, such a class could have a function that returns a partial_order representing the relationship between the two objects. But I wouldn't call that function operator<=>.
But in any case, is <=> a zero-overhead abstraction of comparison? No; you can construct cases where it costs significantly more to compute the ordering than it does to detect the specific relation you asked for. But personally if that's the case, there's a good chance that you shouldn't be comparing such types through operators at all.
It seems to me that for any composite object, it is inefficient to implement operator!= in terms of operator==.
For an object with N subobjects to compare, operator== must always perform every one of those comparisons: 2 objects are equal if and only if every sub-object is equal. (Ignoring irrelevant sub-objects for this discussion.)
Compare to operator!=, in which it is sufficient to check sub-objects only until a mismatch is found. To make operator!= delegate to operator== is therefore inefficient in the composite case, as it yields N comparisons in every case, instead of just the worst case.
This is posted as a new question after an overlong comment discussion started by this comment. The other commenter stated:
operator== does not always require N comparisons, it only requires N comparisons in the worst case, which is when at least all but the last subobjects are equal
I can't see how that can possibly be correct..have I missed something?
This is of course far more general than purely C++, but since the discussion was specific to that I've kept the same tag.
Both == and != will stop evaluation at the first subobject not satisfying the condition. So, from a general perspective, they are equal.
== will evaluate to:
A==A0 && B==B0 && C==C0 && ....
and stop as soon as A==A0 or B==B0 or C==C0 or .... evaluates false.
!= will evaluate to:
A!=A0 || B!=B0 || C!=C0 || ...
and stop as soon as A!=A0 or B!=B0 or C!=C0 or .... evaluates true.
In specific cases, one may be better than the other but in general they are equal.
Well the same should hold for ==, shouldn't?
As soon as the == operator finds two elements that are not equal it can seize to compare the following elements, since the objects cannot be equal anymore.
I was using lambda function in sort() function. In my lambda function I return true if two are equal. Then I got segmentation error.
After reviewing C++ Compare, it says
For all a, comp(a,a) == false
I don't understand why it must be false. Why can't I let comp(a,a)==true?
(Thanks in advance)
Think of Comp as some sort of "is smaller than" relationship, that is it defines some kind of ordering on a set of data.
Now you probably want to do some stuff with this relationship, like sorting data in increasing order, binary search in sorted data, etc.
There are many algorithms that do stuff like this very fast, but they usually have the requirement that the ordering they deal with is "reasonable", which was formalized with the term Strict weak ordering. It is defined by the rules in the link you gave, and the first one basically means:
"No element shall be smaller than itself."
This is indeed reasonable to assume, and one of the things our algorithms require.
In the local object there is a collate facet.
The collate facet has a hash method that returns a long.
http://www.cplusplus.com/reference/std/locale/collate/hash/
Two questions:
Does anybody know what hashing method is used.
I need a 32bit value.
If my long is longer than 32 bits, does anybody know about techniques for folding the hash into a shorter version. I can see that if done incorrectly that folding could generate lots of clashes (and though I can cope with clashes as I need to take that into account anyway, I would prefer if they were minimized).
Note:
I can't use C++0x features
Boost may be OK.
No, nobody really knows -- it can vary from one implementation to another. The primary requirements are (N3092, §20.8.15):
For all object types Key for which there exists a specialization hash, the instantiation hash shall:
satisfy the Hash requirements (20.2.4), with Key as the function call argument type, the DefaultConstructible requirements (33), the CopyAssignable requirements (37),
be swappable (20.2.2) for lvalues,
provide two nested types result_type and argument_type which shall be synonyms for size_t and Key, respectively,
satisfy the requirement that if k1 == k2 is true, h(k1) == h(k2) is also true, where h is an object of type hash and k1 and k2 are objects of type Key.
and (N3092, §20.2.4):
A type H meets the Hash requirements if:
it is a function object type (20.8),
it satisifes the requirements of CopyConstructible and Destructible (20.2.1),
the expressions shown in the following table are valid and have the indicated semantics, and
it satisfies all other requirements in this subclause.
§20.8.15 covers the requirements on the result of hashing, §20.2.4 on the hash itself. As you can see, however, both are pretty general. The table that's mentioned basically covers three more requirements:
A hash function must be "pure" (i.e., the result depends only on the input, not any context, history, etc.)
The function must not modify the argument that's passed to it, and
It must not throw any exceptions.
Exact algorithms definitely are not specified though -- and despite the length, most of the requirements above are really just stating requirements that (at least to me) seem pretty obvious. In short, the implementation is free to implement hashing nearly any way it wants to.
If the implementation uses a reasonable hash function, there should be no bits in the hash value that have any special correlation with the input. So if the hash function gives you 64 "random" bits, but you only want 32 of them, you can just take the first/last/... 32 bits of the value as you please. Which ones you take doesn't matter since every bit is as random as the next one (that's what makes a good hash function).
So the simplest and yet completely reasonable way to get a 32-bit hash value would be:
int32_t value = hash(...);
(Of course this collapses groups of 4 billion values down to one, which looks like a lot, but that can't be avoided if there are four billion times as many source values as target values.)