When are two elements of an STL set considered identical? - c++

From cplusplus.com:
template < class Key, class Compare = less<Key>,
class Allocator = allocator<Key> > class set;
"Compare: Comparison class: A class that takes two arguments of the same type as the container elements and returns a bool. The expression comp(a,b), where comp is an object of this comparison class and a and b are elements of the container, shall return true if a is to be placed at an earlier position than b in a strict weak ordering operation. This can either be a class implementing a function call operator or a pointer to a function (see constructor for an example). This defaults to less, which returns the same as applying the less-than operator (a<b).
The set object uses this expression to determine the position of the elements in the container. All elements in a set container are ordered following this rule at all times."
Given that the comparison class is used to decide which of the two objects is "smaller" or "less", how does the class check whether two elements are equal (e.g. to prevent insertion of the same element twice)?
I can imagine two approaches here: one would be calling (a == b) in the background, but not providing the option to override this comparison (as with the default less<Key>)doesn't seem too STL-ish to me. The other would be the assumption that (a == b) == !(a < b) && !(b < a) ; that is, two elements are considered equal if neither is "less" than the other, but somehow this doesn't feel right to me either, considering that the comparison can be an arbitrarily complex bool functor between objects of an arbitrarily complex class.
So how is it really done?

Not an exact duplicate, but the first answer here answers your question
Your second guess as to the behaviour is correct

Associative containers in the standard library are defined in terms of equivalence of keys, not equality per se.
As not all set and map instances use less, but may use a generic comparison operator it's necessary to define equivalence in terms of this one comparison function rather then attempting to introduce a separate equality concept.
In general, two keys (k1 and k2) in an associative container using a comparison function comp are equivalent if and only if:
comp( k1, k2 ) == false && comp( k2, k1 ) == false
In a container using std::less for types that don't have a specific std::less specialization, this means the same as:
!(k1 < k2) && !(k2 < k1)

Your mistake is the assumption that "the comparison can be an arbitrarily complex bool functor". It can't.
std::set requires a partial ordering so that a<b implies !(b<a). This excludes most binary boolean functors. Because of that, we can talk about the relative position of a and b in that ordering. If a<b, a precedes b. If b<a , b precedes a. If neither a<b nor b<a, then a and b occupy the same position in the ordering and thus are equivalent.

Your second option is the right one. Why doesn't feel it right? What would you do if the equality test wasn't consistent with the equation you give?

Related

Why does std::sort work when the comparison function uses greater-than (>), but not greater-than-or-equal (>=)?

On WIN32, Visual Studio 2022. When I define a vector<int> containing one hundred 0s and sort it with the code below, an exception "invalid comparator" throws.
vector<int> v(100, 0);
sort(v.begin(), v.end(), [](const int& a, const int& b)
{
return a >= b;
});
However, if I use return a > b, it will execute well. Why is that?
This is just how it is required to work. You need strict weak ordering.
For the rationale, I believe that the sufficient explanation is that this enables you to determine whether those elements are equal (useful for e.g. std::sets). <= or >= can't do that.
<= or >= can also do that, but it seems like it was just decided to use < instead of any other relation. With this decision in mind, standard library facilities are implemented and they heavily rely on it.
The problem is that the comparator(aka comparing function) that you've provided does not implement a strict-weak-ordering and thus it violates a precondition of std::sort leading to undefined behavior.
From std::sort:
comp - comparison function object (i.e. an object that satisfies the requirements of Compare) which returns ​true if the first argument is less than (i.e. is ordered before) the second.
And from Compare:
The return value of the function call operation applied to an object of a type satisfying Compare, when contextually converted to bool, yields true if the first argument of the call appears before the second in the strict weak ordering relation induced by this type, and false otherwise.
This basically means that the comparator function Compare that we provide should not evaluate to true for both of the expressions: Compare(x, y) and Compare(y, x) where x and y are some arguments. Otherwise, the comparator does not obey the strict-weak-ordering.
To solve this you should replace the >= with >.

Basic std set logic

This may be dull question, but I want to be sure.
Lets say I have a struct:
struct A
{
int number;
bool flag;
bool operator<(const A& other) const
{
return number < other.number;
}
};
Somewhere in code:
A a1, a2, a3;
std::set<A> set;
a1.flag = true;
a1.number = 0;
a2.flag = false;
a2.number = 10;
a3 = a1;
set.insert(a1);
set.insert(a2);
if(set.find(a3) == set.end())
{
printf("NOT FOUND");
}
else
{
printf("FOUND");
}
The output I get is "FOUND". I understand that, since I am passing values, elements in set are compared by value. But how can objects A be compared by their values, since equality operator is not overrided? I dont understand how overriding operator '<' can be enough for sets finding function.
The ordered containers (set, multiset, map, multimap) use one single predicate to establish the element order and find values, the less-than predicate.
Two elements are considered equal if neither one is less-than the other.
This notion if "equality" may not be the same as some other notion of equality you may have. Sometimes the term "equivalent" is preferred to distinguish this notion that's induced by the less-than ordering from other, ad-hoc notions of equality that may exist simultaneously (e.g. an overloaded operator==).
For "sane" value types (also called regular types), ad-hoc equality and less-than-induced equivalence are required to be the same; many naturally occurring types are regular (e.g. arithmetic types (if NaNs are removed)). In other cases, especially if the less-than predicate is provided externally and not by the type itself, it's entirely possible that the less-than equivalence classes contain many non-"equal" values.
The flag member is entirely irrelevant here. The set has found an element that is equivalent to the searched-for value, with respect to <.
That is, if a is not less than b, and b is not less than a, then a and b must be equal. This is how it works with normal integers. That is how it is decided 2 values are equivalent in a std::set.
std::set doesn't use == at all. (unordered_set, which is a hash set, does use it, because it's the only way to distinguish hash collisions).
You can also provide a function to do the work of <, but it must behave as a strict weak ordering. Which is a bit heavy on the maths, but basically you could use > instead, via std::greater, or define your own named function rather than defining operator<.
So there is nothing technically to stop you defining an operator== that behaves differently from the notion of equivalence that comes from your operator<, but std::set won't use it, and it would probably confuse people.
From the documentation of set
two objects a and b are considered equivalent (not unique) if neither
compares less than the other: !comp(a, b) && !comp(b, a)
http://en.cppreference.com/w/cpp/container/set
In the template you can see
template<
class Key,
class Compare = std::less<Key>,
class Allocator = std::allocator<Key>
> class set;
std::less
will call operator< and that is why it works.

What does same 'value' mean for a std::set?

In C++, the std::set::insert() only inserts a value if there is not already one with the same 'value'. By the same, does this mean operator== or does it mean one for which operator< is false for either ordering, or does it mean something else?
does it mean one for which operator< is false for either ordering?
Yes, if the set uses the default comparator and compares keys using <. More generally, in an ordered container with comparator Compare, two keys k1 and k2 are regarded as equivalent if !Compare(k1,k2) && !Compare(k2,k1).
Keys are not required to implement operator== or anything else; they are just required to be comparable using the container's comparator to give a strict weak ordering.
std::set has a template argument called `Compare' as in this signature:
template < class Key, class Compare = less<Key>,
class Allocator = allocator<Key> > class set;
Compare is used to determine the ordering between elements. Here, the default less<Key> uses the < operator to compare two keys.
If it helps, you can think of a set as just a std::map with meaningless values, ie a std::set<int> can be thought of as a std::map<int, int> where the values are meaningless.
The only comparison that set is allowed to perform on T is via the functor type it was given to do comparisons as part of the template. Thus, that's how it defines equivalence.
For every value in the set, the comparison must evaluate to true for one of the two ordering between that value and the new one. If it's false both ways for any value, then it won't be stored.

Which operator needs to be overridden in order to use std::set in the C++ code?

This is an interview question.
Referring to the sample code, which one of the operators needs to be overridden in order to use std::set<Value>
#include<iostream>
class Value
{
std::string s_val;
int i_val;
public:
Value(std::string s, int i): s_val(s) , i_val(i){}
};
// EOF
/*
a operator !=
b operator >
c operator <=
d operator >=
e operator <
*/
Actually, I do not understand why an operator needs to be overridden here. "set" does not allow duplicated elements, maybe operator != needs to be overridden ?
You don't have to override any operator, the std::set class template allows you to provide a comparison function as a template parameter. But if you were to provide an operator, the one needed is bool operator<(). This operator has to implement strict weak ordering. See this std::set documentation.
The reason strict weak ordering is used is because set is an ordered container, typically implemented as a self-balancing binary tree. So it is not enough to know whether two elements are the same or not. The set must be able to order them. And the less than operator or the comparator functor are also used to test for element equality.
You need to implement operator< for your type. The implementation must follow strick weak ordering to be able to use with associative containers from Standard library such as std::set and std::map.
Read about:
Strict Weak Ordering
An example here:
std map composite key
A set keeps out the duplicates without needing operator= or operator!= by using the notion of equivalence. Two items are equivalent if neither is less than the other:
if (!(a < b || b < a))
// equivalent!
To speed up the enforcement of no duplicate elements and generally checking if element is in its usually some sort of a tree and only needs operator <. (The only usage of less is enforced by the standard, the rest is just the avarage implementation)

std::map Requirements for Keys (Design Decision)

When I make a std::map<my_data_type, mapped_value>, what C++ expects from me is that my_data_type has its own operator<.
struct my_data_type
{
my_data_type(int i) : my_i(i) { }
bool operator<(const my_data_type& other) const { return my_i < other.my_i; }
int my_i;
};
The reason is that you can derive operator> and operator== from operator<. b < a implies a > b, so there's operator>. !(a < b) && !(b < a) means that a is neither less than b nor greater than it, so they must be equal.
The question is: Why hasn't the C++ designer require operator== to be explicitly defined? Obviously, operator== is inevitable for std::map::find() and for removing duplicates from the std::map. Why implement 5 operations and call a method twice in order not to compel me to explicitly implement operator==?
operator== is inevitable for std::map::find()
This is where you go badly wrong. map does not use operator== at all, it is not "inevitable". Two keys x and y are considered equivalent for the purposes of the map if !(x < y) && !(y < x).
map doesn't know or care whether you've implemented operator==. Even if you have, it need not be the case that all equivalent keys in the order are equal according to operator==.
The reason for all this is that wherever C++ relies on orders (sorting, maps, sets, binary searches), it bases everything it does on the well-understood mathematical concept of a "strict weak order", which is also defined in the standard. There's no particular need for operator==, and if you look at the code for these standard functions you won't very often see anything like if (!(x < y) && !(y < x)) that does both tests close together.
Additionally, none of this is necessarily based on operator<. The default comparator for map is std::less<KeyType>, and that by default uses operator<. But if you've specialized std::less for KeyType then you needn't define operator<, and if you specify a different comparator for the map then it may or may not have anything to do with operator< or std::less<KeyType>. So where I've said x < y above, really it's cmp(x,y), where cmp is the strict weak order.
This flexibility is another reason why not to drag operator== into it. Suppose KeyType is std::string, and you specify your own comparator that implements some kind of locale-specific, case-insensitive collation rules. If map used operator== some of the time, then that would completely ignore the fact that strings differing only by case should count as the same key (or in some languages: with other differences that are considered not to matter for collation purposes). So the equality comparison would also have to be configurable, but there would only be one "correct" answer that the programmer could provide. This isn't a good situation, you never want your API to offer something that looks like a point of customization but really isn't.
Besides, the concept is that once you've ruled out the section of the tree that's less than the key you're searching for, and the section of the tree for which the key is less than it, what's left either is empty (no match found) or else has a key in it (match found). So, you've already used current < key then key < current, leaving no other option but equivalence. The situation is exactly:
if (search_key < current_element)
go_left();
else if (current_element < search_key)
go_right();
else
declare_equivalent();
and what you're suggesting is:
if (search_key < current_element)
go_left();
else if (current_element < search_key)
go_right();
else if (current_element == search_key)
declare_equivalent();
which is obviously not needed. In fact, it's your suggestion that's less efficient!
Your assumptions aren't correct. Here's what's really happening:
std::map is a class template which takes four template parameters: key type K, mapped type T, comparator Comp and allocator Alloc (the names are immaterial, of course, and only local to this answer). What matters for this discussion is that an object Comp comp; can be called with two key refrences, comp(k1, k2), where k1 and k2 are K const &, and the result is a boolean which imlpements a strict weak ordering.
If you do not specify the third argument, then Comp is the default type std::less<K>, and this (stateless) class imlpements the binary operation as k1 < k2. It does not matter whether this <-operator is a member of K, or a free function, or a template, or whatever.
And that wraps up the story, too. The comparator type is the only datum required to implement an ordered map. Equality is defined as !comp(a, b) && !comp(b,a), and the map only stores one unique key according to this definition of equality.
There is no reason to make additional requirements on the key type, and also there is no logical reason that a user-defined operator== and operator< should at all be compatible. They could both exist, independently, and serve entirely different and unrelated purpose.
A good library imposes the minimal necessary requirements and offers the greatest possible amount of flexibility, and this is precisely what std::map does.
In order to find the element i within the map, we have traversed to element e the tree search will already have tested i < e, which would have returned false.
So either you call i == e or you call e < i, both of which imply the same thing given the prerequisite of finding e in the tree already. Since we already had to have an operator< we don't rely on operator==, since that would increase the demands of the key concept.
You have a faulty assumption:
!(a < b) && !(b < a) means that a is neither less than b nor greater than it, so they must be equal.
It means that they are equivalent, but not necessarily equal. You are free to implement operator< and operator== in such a way that two objects can be equivalent but not equal.
Why hasn't the C++ designer require operator== to be explicitly defined?
To simplify the implementation of types that can be used as keys, and to allow you to use a single custom comparator for types without overloaded operators. The only requirement is that you supply a comparator (either operator< or a custom functor) that defines a partial ordering. Your suggestion would require both the extra work of implementing an equality comparison, and the extra restriction of requiring equivalent objects to compare equal.
The reason why a comparison operator is needed is the way map is implemented: as a binary search tree, which allows you to look up, insert and delete elements in O(log n). In order to build this tree, a strict weak order must be defined for the set of keys. That's why only one operator definition is needed.