Basic std set logic - c++

This may be dull question, but I want to be sure.
Lets say I have a struct:
struct A
{
int number;
bool flag;
bool operator<(const A& other) const
{
return number < other.number;
}
};
Somewhere in code:
A a1, a2, a3;
std::set<A> set;
a1.flag = true;
a1.number = 0;
a2.flag = false;
a2.number = 10;
a3 = a1;
set.insert(a1);
set.insert(a2);
if(set.find(a3) == set.end())
{
printf("NOT FOUND");
}
else
{
printf("FOUND");
}
The output I get is "FOUND". I understand that, since I am passing values, elements in set are compared by value. But how can objects A be compared by their values, since equality operator is not overrided? I dont understand how overriding operator '<' can be enough for sets finding function.

The ordered containers (set, multiset, map, multimap) use one single predicate to establish the element order and find values, the less-than predicate.
Two elements are considered equal if neither one is less-than the other.
This notion if "equality" may not be the same as some other notion of equality you may have. Sometimes the term "equivalent" is preferred to distinguish this notion that's induced by the less-than ordering from other, ad-hoc notions of equality that may exist simultaneously (e.g. an overloaded operator==).
For "sane" value types (also called regular types), ad-hoc equality and less-than-induced equivalence are required to be the same; many naturally occurring types are regular (e.g. arithmetic types (if NaNs are removed)). In other cases, especially if the less-than predicate is provided externally and not by the type itself, it's entirely possible that the less-than equivalence classes contain many non-"equal" values.

The flag member is entirely irrelevant here. The set has found an element that is equivalent to the searched-for value, with respect to <.
That is, if a is not less than b, and b is not less than a, then a and b must be equal. This is how it works with normal integers. That is how it is decided 2 values are equivalent in a std::set.
std::set doesn't use == at all. (unordered_set, which is a hash set, does use it, because it's the only way to distinguish hash collisions).
You can also provide a function to do the work of <, but it must behave as a strict weak ordering. Which is a bit heavy on the maths, but basically you could use > instead, via std::greater, or define your own named function rather than defining operator<.
So there is nothing technically to stop you defining an operator== that behaves differently from the notion of equivalence that comes from your operator<, but std::set won't use it, and it would probably confuse people.

From the documentation of set
two objects a and b are considered equivalent (not unique) if neither
compares less than the other: !comp(a, b) && !comp(b, a)
http://en.cppreference.com/w/cpp/container/set
In the template you can see
template<
class Key,
class Compare = std::less<Key>,
class Allocator = std::allocator<Key>
> class set;
std::less
will call operator< and that is why it works.

Related

What is the use of operator<?

In the code snippet below, please, if someone can clarify what is the function of bool operator<... and why it is used as a function?
bool operator<(const RankedFace& other) const
{
if (lastDelay == other.lastDelay)
return face.getId() < other.face.getId();
return lastDelay < other.lastDelay;
}
It's the (in class) definition of the operator< for a user defined type (RankedFace I guess).
Thanks to that code, you will be able to compare two objects of type RankedFace with the <, e.g. if( r1 < r2 ) // do something...
It gives the type RankedFace a less-than comparison (operator<). As declared; it looks like a member method. It could also have been a non-member method with the following signature;
bool operator<(const RankedFace& lys, const RankedFace& rhs)
It is typically required for use in the standard library associative containers (std::set etc.).
The associative containers require a comparator to order the objects in them. A custom comparator can be used, but the standard one is std::less which is simply a lhs < rhs.
It allows client code to use the less than comparison on objects of that type (face1 < face2). It is often (not always) implemented together with other comparators (==, !=, <= etc.). If operator< and operator== have been implemented, the remaining ones can be implemented using std::rel_ops.
This is RankedFace's less than operator. It compares two RankedFace objects. For example:
RankedFace foo;
RankedFace bar;
cout << foo < bar ? "foo is smaller than bar" : "bar is greater than or equal to foo";
In the code above foo < bar causes C++ to call foo.operator<(bar).
Dissection of RankedFace::operator< reveals that it:
Considers the object with the lower lastDelay member the smaller object
For objects with identical lastDelays it considers the object which returns a lower getId() the smaller object.
An actual in code comparison between RankedFaces may not exist. The motivation for implementing the less than operator may have been that the less than operator is required to use a RankedFace as in the key in any associative container or unordered associative container.

Why C++ STL containers use "less than" operator< and not "equal equal" operator== as comparator?

While implementing a comparator operator inside a custom class for std::map, I came across this question and couldn't see anywhere being asked.
Apart from the above question, also interested to know in brief, how operator< would work for std::map.
Origin of the question:
struct Address {
long m_IPv4Address;
bool isTCP;
bool operator< (const Address&) const; // trouble
};
std::map<K,D> needs to be able to sort. By default is uses std::less<K>, which for non-pointers uses <1.
Using the rule that you demand the least you can from your users, it synthesizes "equivalence" from < when it needs it (!(a<b) && !(b<a) means a and b are equivalent, ie, neither is less than the other).
This makes it easier to write classes to use as key components for a map, which seems like a good idea.
There are std containers that use == such as std::unordered_map, which uses std::hash and ==. Again, they are designed so that they require the least from their users -- you don't need full ordering for unordered_ containers, just equivalence and a good hash.
As it happens, it is really easy to write a < if you have access to <tuple>.
struct Address {
long m_IPv4Address;
bool isTCP;
bool operator< (const Address& o) const {
return
std::tie( m_IPv4Address, isTCP )
< std::tie( o.m_IPv4Address, o.isTCP );
}
};
which uses std::tie defined in <tuple> to generate a proper < for you. std::tie takes a bunch of data, and generates a tuple of references, which has a good < already defined.
1 For pointers, it uses some comparison that is compatible with < where < behaviour is specified, and behaves well when < does not. This only really matters on segmented memory model and other obscure architectures.
Because std::map is a sorted associative container, it's keys need ordering.
An == operator would not allow to order multiple keys
You might be looking for std::unordered_map , which work has a hashtable. You can specify your own hash and equality operator functions :
explicit unordered_map( size_type bucket_count,
const Hash& hash = Hash(),
const KeyEqual& equal = KeyEqual(),
const Allocator& alloc = Allocator() );
With < you can order elements. If a < b then a should be placed before b in the collection.
You can also determine if two items are equivalent: if !(a < b) && !(b < a) (if neither object is smaller than the other), then they're equivalent.
Those two capabilities are all std::map requires. So it just expects its element type to provide an operator <.
With == you could determine equality, but you wouldn't be able to order elements. So it wouldn't satisfy the requirements of std::map.

Which operator needs to be overridden in order to use std::set in the C++ code?

This is an interview question.
Referring to the sample code, which one of the operators needs to be overridden in order to use std::set<Value>
#include<iostream>
class Value
{
std::string s_val;
int i_val;
public:
Value(std::string s, int i): s_val(s) , i_val(i){}
};
// EOF
/*
a operator !=
b operator >
c operator <=
d operator >=
e operator <
*/
Actually, I do not understand why an operator needs to be overridden here. "set" does not allow duplicated elements, maybe operator != needs to be overridden ?
You don't have to override any operator, the std::set class template allows you to provide a comparison function as a template parameter. But if you were to provide an operator, the one needed is bool operator<(). This operator has to implement strict weak ordering. See this std::set documentation.
The reason strict weak ordering is used is because set is an ordered container, typically implemented as a self-balancing binary tree. So it is not enough to know whether two elements are the same or not. The set must be able to order them. And the less than operator or the comparator functor are also used to test for element equality.
You need to implement operator< for your type. The implementation must follow strick weak ordering to be able to use with associative containers from Standard library such as std::set and std::map.
Read about:
Strict Weak Ordering
An example here:
std map composite key
A set keeps out the duplicates without needing operator= or operator!= by using the notion of equivalence. Two items are equivalent if neither is less than the other:
if (!(a < b || b < a))
// equivalent!
To speed up the enforcement of no duplicate elements and generally checking if element is in its usually some sort of a tree and only needs operator <. (The only usage of less is enforced by the standard, the rest is just the avarage implementation)

std::map Requirements for Keys (Design Decision)

When I make a std::map<my_data_type, mapped_value>, what C++ expects from me is that my_data_type has its own operator<.
struct my_data_type
{
my_data_type(int i) : my_i(i) { }
bool operator<(const my_data_type& other) const { return my_i < other.my_i; }
int my_i;
};
The reason is that you can derive operator> and operator== from operator<. b < a implies a > b, so there's operator>. !(a < b) && !(b < a) means that a is neither less than b nor greater than it, so they must be equal.
The question is: Why hasn't the C++ designer require operator== to be explicitly defined? Obviously, operator== is inevitable for std::map::find() and for removing duplicates from the std::map. Why implement 5 operations and call a method twice in order not to compel me to explicitly implement operator==?
operator== is inevitable for std::map::find()
This is where you go badly wrong. map does not use operator== at all, it is not "inevitable". Two keys x and y are considered equivalent for the purposes of the map if !(x < y) && !(y < x).
map doesn't know or care whether you've implemented operator==. Even if you have, it need not be the case that all equivalent keys in the order are equal according to operator==.
The reason for all this is that wherever C++ relies on orders (sorting, maps, sets, binary searches), it bases everything it does on the well-understood mathematical concept of a "strict weak order", which is also defined in the standard. There's no particular need for operator==, and if you look at the code for these standard functions you won't very often see anything like if (!(x < y) && !(y < x)) that does both tests close together.
Additionally, none of this is necessarily based on operator<. The default comparator for map is std::less<KeyType>, and that by default uses operator<. But if you've specialized std::less for KeyType then you needn't define operator<, and if you specify a different comparator for the map then it may or may not have anything to do with operator< or std::less<KeyType>. So where I've said x < y above, really it's cmp(x,y), where cmp is the strict weak order.
This flexibility is another reason why not to drag operator== into it. Suppose KeyType is std::string, and you specify your own comparator that implements some kind of locale-specific, case-insensitive collation rules. If map used operator== some of the time, then that would completely ignore the fact that strings differing only by case should count as the same key (or in some languages: with other differences that are considered not to matter for collation purposes). So the equality comparison would also have to be configurable, but there would only be one "correct" answer that the programmer could provide. This isn't a good situation, you never want your API to offer something that looks like a point of customization but really isn't.
Besides, the concept is that once you've ruled out the section of the tree that's less than the key you're searching for, and the section of the tree for which the key is less than it, what's left either is empty (no match found) or else has a key in it (match found). So, you've already used current < key then key < current, leaving no other option but equivalence. The situation is exactly:
if (search_key < current_element)
go_left();
else if (current_element < search_key)
go_right();
else
declare_equivalent();
and what you're suggesting is:
if (search_key < current_element)
go_left();
else if (current_element < search_key)
go_right();
else if (current_element == search_key)
declare_equivalent();
which is obviously not needed. In fact, it's your suggestion that's less efficient!
Your assumptions aren't correct. Here's what's really happening:
std::map is a class template which takes four template parameters: key type K, mapped type T, comparator Comp and allocator Alloc (the names are immaterial, of course, and only local to this answer). What matters for this discussion is that an object Comp comp; can be called with two key refrences, comp(k1, k2), where k1 and k2 are K const &, and the result is a boolean which imlpements a strict weak ordering.
If you do not specify the third argument, then Comp is the default type std::less<K>, and this (stateless) class imlpements the binary operation as k1 < k2. It does not matter whether this <-operator is a member of K, or a free function, or a template, or whatever.
And that wraps up the story, too. The comparator type is the only datum required to implement an ordered map. Equality is defined as !comp(a, b) && !comp(b,a), and the map only stores one unique key according to this definition of equality.
There is no reason to make additional requirements on the key type, and also there is no logical reason that a user-defined operator== and operator< should at all be compatible. They could both exist, independently, and serve entirely different and unrelated purpose.
A good library imposes the minimal necessary requirements and offers the greatest possible amount of flexibility, and this is precisely what std::map does.
In order to find the element i within the map, we have traversed to element e the tree search will already have tested i < e, which would have returned false.
So either you call i == e or you call e < i, both of which imply the same thing given the prerequisite of finding e in the tree already. Since we already had to have an operator< we don't rely on operator==, since that would increase the demands of the key concept.
You have a faulty assumption:
!(a < b) && !(b < a) means that a is neither less than b nor greater than it, so they must be equal.
It means that they are equivalent, but not necessarily equal. You are free to implement operator< and operator== in such a way that two objects can be equivalent but not equal.
Why hasn't the C++ designer require operator== to be explicitly defined?
To simplify the implementation of types that can be used as keys, and to allow you to use a single custom comparator for types without overloaded operators. The only requirement is that you supply a comparator (either operator< or a custom functor) that defines a partial ordering. Your suggestion would require both the extra work of implementing an equality comparison, and the extra restriction of requiring equivalent objects to compare equal.
The reason why a comparison operator is needed is the way map is implemented: as a binary search tree, which allows you to look up, insert and delete elements in O(log n). In order to build this tree, a strict weak order must be defined for the set of keys. That's why only one operator definition is needed.

When are two elements of an STL set considered identical?

From cplusplus.com:
template < class Key, class Compare = less<Key>,
class Allocator = allocator<Key> > class set;
"Compare: Comparison class: A class that takes two arguments of the same type as the container elements and returns a bool. The expression comp(a,b), where comp is an object of this comparison class and a and b are elements of the container, shall return true if a is to be placed at an earlier position than b in a strict weak ordering operation. This can either be a class implementing a function call operator or a pointer to a function (see constructor for an example). This defaults to less, which returns the same as applying the less-than operator (a<b).
The set object uses this expression to determine the position of the elements in the container. All elements in a set container are ordered following this rule at all times."
Given that the comparison class is used to decide which of the two objects is "smaller" or "less", how does the class check whether two elements are equal (e.g. to prevent insertion of the same element twice)?
I can imagine two approaches here: one would be calling (a == b) in the background, but not providing the option to override this comparison (as with the default less<Key>)doesn't seem too STL-ish to me. The other would be the assumption that (a == b) == !(a < b) && !(b < a) ; that is, two elements are considered equal if neither is "less" than the other, but somehow this doesn't feel right to me either, considering that the comparison can be an arbitrarily complex bool functor between objects of an arbitrarily complex class.
So how is it really done?
Not an exact duplicate, but the first answer here answers your question
Your second guess as to the behaviour is correct
Associative containers in the standard library are defined in terms of equivalence of keys, not equality per se.
As not all set and map instances use less, but may use a generic comparison operator it's necessary to define equivalence in terms of this one comparison function rather then attempting to introduce a separate equality concept.
In general, two keys (k1 and k2) in an associative container using a comparison function comp are equivalent if and only if:
comp( k1, k2 ) == false && comp( k2, k1 ) == false
In a container using std::less for types that don't have a specific std::less specialization, this means the same as:
!(k1 < k2) && !(k2 < k1)
Your mistake is the assumption that "the comparison can be an arbitrarily complex bool functor". It can't.
std::set requires a partial ordering so that a<b implies !(b<a). This excludes most binary boolean functors. Because of that, we can talk about the relative position of a and b in that ordering. If a<b, a precedes b. If b<a , b precedes a. If neither a<b nor b<a, then a and b occupy the same position in the ordering and thus are equivalent.
Your second option is the right one. Why doesn't feel it right? What would you do if the equality test wasn't consistent with the equation you give?