less or less_equal using set - c++

We can pass a function as <(less) operator to STL data structures such as set, multiset, map, priority_queue, ...
Is there a problem if our function acts like <=(less_equal)?

Yes, there is a problem.
Formally, the comparison function must define a strict weak ordering, and <= does not do that.
more specifically, the < is also used to determine equivalence (x and y are equivalent iff !(x < y) && !(y < x)). This does not hold true for <= (using that operator would have your set believe that objects are never equivalent)

From Effective STL -> Item 21. Always have comparison functions return false for equal
values.
Create a set where less_equal is the comparison type, then insert 10 into the set:
set<int, less_equal<int> > s; // s is sorted by "<="
s.insert(10); //insert the value 10
Now try inserting 10 again:
s.insert(10);
For this call to insert, the set has to figure out whether 10 is already present. We know
that it is. but the set is dumb as toast, so it has to check. To make it easier to
understand what happens when the set does this, we'll call the 10 that was initially
inserted 10A and the 10 that we're trying to insert 10B.The set runs through its internal data structures looking for the place to insert 10B. It ultimately has to check 10B to see if it's the same as 10A. The definition of "the same"
for associative containers is equivalence, so the set tests to see whether
10B is equivalent to 10A. When performing this test, it naturally uses the set's
comparison function. In this example, that's operator<=, because we specified
less_equal as the set's comparison function, and less_equal means operators. The set
thus checks to see whether this expression is true:
!(10A<= 10B)&&!(10B<= 10A) //test 10Aand 10B for equivalence
Well, 10A and 10B are both 10, so it's clearly true that 10A <= 10B. Equally clearly, 10B
<= 10A. The above expression thus simplifies to
!!(true)&&!(true)
and that simplifies to
false && false
which is simply false. That is, the set concludes that 10A and 10B are not equivalent,
hence not the same, and it thus goes about inserting 10B into the container alongside
10A. Technically, this action yields undefined behavior, but the nearly universal
outcome is that the set ends up with two copies of the value 10, and that means it's not
a set any longer. By using less_equal as our comparison type, we've corrupted the
container! Furthermore, any comparison function where equal values return true will
do the same thing. Equal values are, by definition, not equivalent!

There is indeed a problem.
The comparison function should satisfy strict weak ordering which <= does not.

Related

Stroustrup on comparisons, errata?

In Stroustrup C++ 4th Ed Page 891, where comparisons properties are described. He explains that the function cmp can be represented by less than < for a strict weak ordering. I'm confused by his explanation of "Transitivity of equivalence" as follows;
Transitivity of equivalence: Define equiv(x,y) to be
!(cmp(x,y)||cmp(y,x)). If equiv(x,y) and equiv(y,z), then equiv(x,z).
The last rule is the one that allows us to define equality (x==y) as !(cmp(x,y)||cmp(y,x)) if we need ==.
Should this instead be defined as follows?
cmp is <= and equiv(x,y) = (cmp(x,y) && cmp(y,x))
Appreciate your guidance.
This is not errata.
equiv(x,y) := !(cmp(x,y)||cmp(y,x))
x := x
y := x
substituting in:
!((x < x) || (x < x))
!((false) || (false))
!(false)
true
Can this instead be defined as follows?
cmp is <= and equiv(x,y) = (cmp(x,y) && cmp(y,x))
Yes, that also gives you consistent definitions.
Should this instead be defined as follows?
It isn't better than the definitions we use, so I'd suggest no, mostly because there's loads of existing code written for the current definition.
Instead of the current definition of Compare
Compare is a set of requirements expected by some of the standard library facilities from the user-provided function object types.
The return value of the function call operation applied to an object of a type satisfying Compare, when contextually converted to bool, yields true if the first argument of the call appears before the second in the strict weak ordering relation induced by this type, and false otherwise.
For all a, cmp(a,a)==false
If cmp(a,b)==true then cmp(b,a)==false
If cmp(a,b)==true and cmp(b,c)==true then cmp(a,c)==true
It would instead be
The return value of the function call operation applied to an object of a type satisfying Compare, when contextually converted to bool, yields false if the first argument of the call appears after the second in the strict weak ordering relation induced by this type, and true otherwise.
For all a, cmp(a,a)==true
If cmp(a,b)==false then cmp(b,a)==true
If cmp(a,b)==true and cmp(b,c)==true then cmp(a,c)==true

Does greater operator ">" satisfy strict weak ordering?

Definition:
Let < be a binary relation where a < b means "a is less than b".
Let > be a binary relation where a > b means "a is greater than b".
So, we assume < and > have meanings we usually use in a daily life. Though, in some programming languages (e.g. C++), we can overload them to give them different definitions, hereafter we don't think about that.
Context:
As far I read mathematical definition of strict weak ordering (e.g. Wikipedia), I think both < and > satify it. However, all examples I saw in many websites refer only to <. There is even a website which says
what they roughly mean is that a Strict Weak Ordering has to behave the way that "less than" behaves: if a is less than b then b is not less than a, if a is less than b and b is less than c then a is less than c, and so on.
Also, in N4140 (C++14 International Standard), strict weak ordering is defines as
(§25.4-4) If we define equiv(a, b) as !comp(a, b) && !comp(b, a), then the requirements are that comp and equiv both be transitive relations
where comp is defined as
(§25.4-2) Compare is a function object type (20.9). The return value of the function call operation applied to an object of type Compare, when contextually converted to bool (Clause 4), yields true if the first argument of the call is less than the second, and false otherwise. Compare comp is used throughout for algorithms
assuming an ordering relation.
Question:
Does ">" satisfy strict weak ordering? I expect so, but have no confidence.
Does greater operator “>” satisfy strict weak ordering?
The mathematical strict greater than relation is a strict weak ordering.
As for the operator in C++ langauge: For all integers types: Yes. In general: No, but in most cases yes. Same applies to strict less than operator.
As for the confusing quote, "is less than" in that context intends to convey that means that the the end result of the sort operation is a non-decreasing sequence i.e. objects are "less" or equal to objects after them. If std::greater is used as comparison object, then greater values are "lesser" in order.
This may be confusing, but is not intended to exclude strict greater than operator.
what is the case where > doesn't satisfy strict weak ordering?
Some examples:
Overloaded operators that don't satisfy the properties.
> operator on pointers that do not point to the same array has unspecified result.
> does not satisfy irreflexivity requirement for floating point types in IEEE-754 representation unless NaNs are excluded from the domain.
Even if the standard refers to "less than" for arbitrary Compare functions, that only implies "less than" in the context of the ordering.
If I define an ordering by comparison function [](int a, int b) { return a > b; }, then an element is "less than" another in this ordering if its integer value is greater. That's because the ordering I've created is an ordering of the integers in reverse order. You shouldn't read < as "less than" in orderings. You should read it as "comes before".
Whenever x < y is a strict weak ordering then x > y is also a strict weak ordering, just with the reverse order.

What does "compares less than 0" mean?

Context
While I was reading Consistent comparison, I have noticed a peculiar usage of the verb to compare:
There’s a new three-way comparison operator, <=>. The expression a <=> b
returns an object that compares <0 if a < b, compares >0 if a > b, and
compares ==0 if a and b are equal/equivalent.
Another example found on the internet (emphasis mine):
It returns a value that compares less than zero on failure. Otherwise,
the returned value can be used as the first argument on a later call
to get.
One last example, found in a on GitHub (emphasis mine):
// Perform a circular 16 bit compare.
// If the distance between the two numbers is larger than 32767,
// and the numbers are larger than 32768, subtract 65536
// Thus, 65535 compares less than 0, but greater than 65534
// This handles the 65535->0 wrap around case correctly
Of course, for experienced programmers the meaning is clear. But the way the verb to compare is used in these examples is not standard in any standardized forms of English.
Questions*
How does the programming jargon sentence "The object compares less than zero" translate into plain English?
Does it mean that if the object is compared with0 the result will be "less than zero"?
Why would be wrong to say "object is less than zero" instead of "object compares less than zero"?
* I asked for help on English Language Learners and English Language & Usage.
"compares <0" in plain English is "compares less than zero".
This is a common shorthand, I believe.
So to apply this onto the entire sentence gives:
The expression a <=> b returns an object that compares less than zero
if a is less than b, compares greater than zero if a is greater than
b, and compares equal to zero if a and b are equal/equivalent.
Which is quite a mouthful. I can see why the authors would choose to use symbols.
What I am interested in, more exactly, is an equivalent expression of "compares <0". Does "compares <0" mean "evaluates to a negative number"?
First, we need to understand the difference between what you quoted and actual wording for the standard. What you quoted was just an explanation for what would actually get put into the standard.
The standard wording in P0515 for the language feature operator<=> is that it returns one of 5 possible types. Those types are defined by the library wording in P0768.
Those types are not integers. Or even enumerations. They are class types. Which means they have exactly and only the operations that the library defines for them. And the library wording is very specific about them:
The comparison category types’ relational and equality friend functions are specified with an anonymous parameter of unspecified type. This type shall be selected by the implementation such that these parameters can accept literal 0
as a corresponding argument. [Example: nullptr_t satisfies this requirement. —
end example] In this context, the behaviour of a program that supplies an argument other than a literal 0 is undefined.
Therefore, Herb's text is translated directly into standard wording: it compares less than 0. No more, no less. Not "is a negative number"; it's a value type where the only thing you can do with it is comparing it to zero.
It's important to note how Herb's descriptive text "compares less than 0" translates to the actual standard text. The standard text in P0515 makes it clear that the result of 1 <=> 2 is strong_order::less. And the standard text in P0768 tells us that strong_order::less < 0 is true.
But it also tells us that all other comparisons are the functional equivalent of the descriptive phrase "compares less than 0".
For example, if -1 "compares less than 0", then that would also imply that it does not compare equal to zero. And that it does not compare greater than 0. It also implies that 0 does not compare less than -1. And so on.
P0768 tells us that the relationship between strong_order::less and the literal 0 fits all of the implications of the words "compares less than 0".
"acompares less than zero" means that a < 0 is true.
"a compares == 0 means that a == 0 is true.
The other expressions I'm sure make sense now right?
Yes, an "object compares less than 0" means that object < 0 will yield true. Likewise, compares equal to 0 means object == 0 will yield true, and compares greater than 0 means object > 0 will yield true.
As to why he doesn't use the phrase "is less than 0", I'd guess it's to emphasize that this is all that's guaranteed. For example, this could be essentially any arbitrary type, including one that doesn't really represent an actual value, but instead only supports comparison with 0.
Just, for example, let's consider a type something like this:
class comparison_result {
enum { LT, GT, EQ } res;
friend template <class Integer>
bool operator<(comparison_result c, Integer) { return c.res == LT; }
friend template <class Integer>
bool operator<(Integer, comparison_result c) { return c.res == GT; }
// and similarly for `>` and `==`
};
[For the moment, let's assume the friend template<...> stuff is all legit--I think you get the basic idea, anyway).
This doesn't really represent a value at all. It just represents the result of "if compared to 0, should the result be less than, equal to, or greater than". As such, it's not that it is less than 0, only that it produces true or false when compared to 0 (but produces the same results when compared to another value).
As to whether <0 being true means that >0 and ==0 must be false (and vice versa): there is no such restriction on the return type for the operator itself. The language doesn't even include a way to specify or enforce such a requirement. There's nothing in the spec to prevent them from all returning true. Returning true for all the comparisons is possible and seems to be allowed, but it's probably pretty far-fetched.
Returning false for all of them is entirely reasonable though--just, for example, any and all comparisons with floating point NaNs should normally return false. NaN means "Not a Number", and something that's not a number isn't less than, equal to or greater than a number. The two are incomparable, so in every case, the answer is (quite rightly) false.
I think the other answers so far have answered mostly what the result of the operation is, and that should be clear by now. #VTT's answer explains it best, IMO.
However, so far none have answered the English language behind it.
"The object compares less than zero." is simply not standard English, at best it is jargon or slang. Which makes it all the more confusing for non-native speakers.
An equivalent would be:
A comparison of the object using <0 (less than zero) always returns true.
That's quite lengthy, so I can understand why a "shortcut" was created:
The object compares less than zero.
It means that the expression will return an object that can be compared to <0 or >0 or ==0.
If a and b are integers, then the expression evaluates to a negative value (probably -1) if a is less than b.
The expression evaluates to 0 if a==b
And the expression will evaluates to a positive value (probably 1) if a is greater than b.

how is cmp defined in c++? with < or with <=?

I asked me how the cmp function in std::sort and std::is_sorted is defined.
here are two documentations for is_sorted_until how say it should be operator< :
en.cppreference.com
cplusplus.com
But i think there should be a problem with equal elements.
The list {1,1,1} should not be sorted because 1<1==false.
But there is an example which says:
...
int *sorted_end = std::is_sorted_until(nums, nums + N);
...
1 1 4 9 5 3 : 4 initial sorted elements
but that should return 1 if < is used like documented.
It would work with <=, but that is not the way it is documented.
I'm really confused.
The comparison is required to define a strict weak ordering. A strict weak ordering defines a set of equivalence classes from the incomparability relation, i.e., if x < y is false, and y < x is false too (i.e. x and y cannot be compared with <), x and y are considered equivalent. These equivalence classes have a total order, and that's the total order resulting from the sort functions.
In the example given, {1,1,1} has only a single equivalence class, the one composed of {1,1,1}.
is_sorted_until finds the first element x[i] for which x[i] < x[i-1] is true.
To be exact, it's neither < nor <=, it is defaulted to std::less. That one in turn calls < for most types, except where it is specialized. For example, < for pointers does not generally give a strict ordering, while std::less does.
It does indeed use operator< unless you provide a custom comparison. But the definition of "sorted" is not a[n] < a[n+1] (which we might call "strictly sorted"), but !(a[n+1] < a[n]); so equal elements are considered sorted. This is equivalent to using <=, but (in common with all other standard algorithms) doesn't require that operator to be defined.
In general, all ordered comparisons must define a "strict weak ordering". "Strict" means that the comparison must be false for equivalent objects; so < is valid, while <= is not.
If you look at the example implementation, < is used for checking if the next element is less than the previous one:
if (*next < *first)
return next;
If it is, then the order is broken, and the function returns. I. e. the logic is reversed - the algorithm does not terminate if the next element is equal to the previous.

Are the integer comparison operators short circuited in C++?

Like the title states, are the integer (or any numerical datatypes like float etc.) comparison operators (==, !=, >, >=, <, <=) short circuited in C++?
They can't short circuit. To know if x == y, x != y, etc are true or false you need to evaluate both, x and y. Short circuiting refers to logical boolean operators && and ||. Logical AND is known to be false if the first argument is false and Logical OR is known to be true if the first argument is true. In these cases you don't need to evaluate the second argument, this is called short circuiting.
Edit: this follows the discussions for why x >= y don't short circuit when the operands are unsigned ints and x is zero:
For logical operands short circuiting comes for free and is implementation neutral. The machine code for if(f() && g()) stmt; is likely to look similar to this:
call f
test return value of f
jump to next on zero
call g
test return value of g
jump to next on zero
execute stmt
next: ...
To prevent short circuiting you actually need to do the computation of the result of the operator and test it after that. This takes you a register and makes the code less efficient.
For non-logical operators the situation is the opposite. Mandating short circuiting implies:
The compiler can't choose an evaluation of the expression that uses a minimum number of registers.
The semantics may be implementation defined (or even undefined) for many cases, like when comparing with maximum value.
The compiler needs to add an additional test/jump. For if(f() > g()) stmt; the machine code will look like this:
call f
mov return value of f to R1
test return value of f
jump to next on zero
call g
compare R1 with return value of g
jump to next on less-than equal
execute stmt
next: ...
Note how the first test and jump are just unnecessary otherwise.
No. The comparison operators require both operands to evaluate the correct answer. By contrast, the logical operators && and || in some cases don't need to evaluate the right operand to get the right answer, and therefore do "short-circuit".
No, how could they be. In order to check whether 1 == 2 you have to inspect both the 1 and the 2. (Ofcoruse, a compiler can do a lot of reordering, static checking, optimizations, etc. but that's not inherit to c++)
How would that work? Short-circuiting means you can avoid evaluating the RHS based solely on the result of evaluating the LHS.
e.g.
true || false
doesn't need to evaluate the RHS because true || x is true no matter what x turns out to be.
But this won't work for any of the comparisons that you list. For example:
5 == x
How can you ever know the result of the expression without knowing x?