Implementing Compare for std::set - c++

I have a struct that is just two ints that I want to store in std::set, while also taking advantage of its sorting properties. For example
struct Item
{
int order;
int value;
};
So I wrote a comparator
struct ItemCmp
{
bool operator()( const Item& lhs, const Item& rhs ) const
{
return lhs.order < rhs.order || lhs.value < rhs.value;
}
};
The intent is that Item should be sorted in the collection by ORDER first, and then by VALUE. If I put these Item in a vector and use std::sort, seems to be working as expected.
I also implemented unit tests for the cases in https://en.cppreference.com/w/cpp/named_req/Compare
However now this test case is failing:
std::set<Item, ItemCmp> stuff;
stuff.insert( {1, 1} );
stuff.insert( {1, 1} );
CHECK( stuff.size() == 1 );
The size of the set is 2, violating the contract of set. Where am I going wrong?

based on #retired-ninja 's comment, the answer is to ensure the comparator does proper lexicographical comparison. One shortcut to this is to leverage std::tuple's operators (see https://en.cppreference.com/w/cpp/utility/tuple/operator_cmp ) using this example:
return std::tie(lhs.order, lhs.value) < std::tie(rhs.order, rhs.value);

Related

Spaceship operator on arrays

The following code is intended to implement comparison on an object that contains an array. Two objects should compare as <,==,> if all array elements compare like that. The following does not compile for a variety of reason:
#include <compare>
class witharray {
private:
array<int,4> the_array;
public:
witharray( array<int,4> v )
: the_array(v) {};
int size() { return the_array.size(); };
auto operator<=>( const witharray& other ) const {
array< std::strong_ordering,4 > cmps;
for (int id=0; id<4; id++)
cmps[id] = the_array[id]<=>other.the_array[id];
return accumulate
(cmps.begin(),cmps.end(),
std::equal,
[] (auto x,auto y) -> std::strong_ordering { return x and y; }
);
};
};
First of all, the array of comparisons:
call to implicitly-deleted default constructor of 'array<std::strong_ordering, 4>
Then the attempt to accumulate the comparisons:
no matching function for call to 'accumulate'
Compiler explorer: https://godbolt.org/z/E3ovh5qGa
Or am I completely on the wrong track?
Two objects should compare as <,==,> if all array elements compare like that.
This is a fairly interesting order. One thing to note here is that it's a partial order. That is, given {1, 2} vs {2, 1}, those elements aren't all < or == or >. So you're left with unordered.
C++20's comparisons do have a way to represent that: you have to return a std::partial_ordering.
The way that we can achieve this ordering is that we first compare the first elements, and then we ensure that all the other elements compare the same. If any pair of elements doesn't compare the same, then we know we're unordered:
auto operator<=>( const witharray& other ) const
-> std::partial_ordering
{
std::strong_ordering c = the_array[0] <=> other.the_array[0];
for (int i = 1; i < 4; ++i) {
if ((the_array[i] <=> other.the_array[i]) != c) {
return std::partial_ordering::unordered;
}
}
return c;
}
This has the benefit of not having to compare every pair of elements, since we might already know the answer by the time we get to the 2nd element (e.g. {1, 2, x, x} vs {1, 3, x, x} is already unordered, doesn't matter what the other elements are).
This seems like what you were trying to accomplish with your accumulate, except accumulate is the wrong algorithm here since we want to stop early. You'd want all_of in this case:
auto comparisons = views::iota(0, 4)
| views::transform([&](int i){
return the_array[i] <=> other.the_array[i];
});
bool all_match = ranges::all_of(comparisons | drop(1), [&](std::strong_ordering c){
return c == comparisons[0];
});
return all_match ? comparisons[0] : std::partial_ordering::unordered;
Which is admittedly awkward. In C++23, we can do the comparisons part more directly:
auto comparisons = views::zip_transform(
compare_three_way{}, the_array, other.the_array);
And then it would read better if you had a predicate like:
bool all_match = ranges::all_of(comparisons | drop(1), equals(comparisons[0]));
or wrote your own algorithm for this specific use-case (which is a pretty easy algorithm to write):
return all_same_value(comparisons)
? comparisons[0]
: std::partial_ordering::unordered;
Note that std::array already has spaceship operator which apparently does what you need:
class witharray {
private:
array<int, 4> the_array;
public:
witharray(array<int, 4> v)
: the_array(v) {};
int size() { return the_array.size(); };
auto operator<=>(const witharray& other) const
{
return the_array <=> other.the_array;
};
};
https://godbolt.org/z/4drddWa8G
Now to cover problems with your code:
array< std::strong_ordering, 4 > cmps; can't be initialized since there is no default value for std::strong_ordering
use of std::accumluate here is strange there is better algorithm for that: std::lexicographical_compare_three_way which was added to handle spaceship operator
You have feed std::equal to std::accumluate as binary operation when in fact this is algorithm to compare ranges (it accepts iterators). Most probably your plan here was to use std::equal_to.

C++ STL Binary Search (lower_bound, upper_bound)

I have implemented a binary search like this:
typedef std::vector<Cell>::iterator CellVectorIterator;
typedef struct _Point {
char x,y;
} CoordinatePoint;
typedef struct _Cell {
...
CoordinatePoint coordinates;
} Cell;
struct CellEqualityByCoordinates
{
bool
operator()(const Cell& cell1, const Cell& cell2) const
{ return cell1.coordinates.x == cell2.coordinates.x && cell1.coordinates.y == cell2.coordinates.y; }
};
CellVectorIterator FindCellByCoordinates (CellVectorIterator first, CellVectorIterator last, const Cell &val)
{
return std::upper_bound(first, last, val, CellEqualityByCoordinates());
}
But it doesn't always find a value.
What's wrong with that?
Your comparison function will not work for a binary search. It is not supposed to determine equality, it is supposed to determine an order relation. Specifically, it should return true if the first argument would definitively come before the second in a sorted range. If the arguments should be considered equal, or the second would come before the first, it should return false. Your range also needs to be sorted by this same criteria in order for the binary search to work.
An example function that might work:
bool operator()(const Cell& cell1, const Cell& cell2) const
{
if (cell1.coordinates.x < cell2.coordinates.x) return true;
if (cell2.coordinates.x < cell1.coordinates.x) return false;
return cell1.coordinates.y < cell2.coordinates.y;
}
A similar example that doubles as a lesson in short-circuit boolean evaluation would be something like:
bool operator()(const Cell& cell1, const Cell& cell2) const
{
return (cell1.coordinates.x < cell2.coordinates.x) ||
(!(cell2.coordinates.x < cell1.coordinates.x) &&
cell1.coordinates.y < cell2.coordinates.y);
}
Both exhibit a property called strict weak ordering. It is frequently required for various sorting and/or searches in standard library collections and search algorithms.
Yet another example utilizes a std::pair, which already has a proper std::less overload available that does the above, and thus makes this considerably less complicated:
bool operator()(const Cell& cell1, const Cell& cell2) const
{
return std::make_pair(cell1.coordinates.x, cell1.coordinates.y) <
std::make_pair(cell2.coordinates.x, cell2.coordinates.y);
}
A similar algorithm is available for tuples via std::tie.
Of course, all of this assumes you have an actual ordered sequence in the first place, ordered by the same comparison logic. (which we can only assume is true, as no evidence of such was posted).

C++ set with arbitrary comparator

I have the following C++ code
#include <set>
#include <string>
#include <iostream>
using namespace std;
class Pair {
public:
string lhs;
string rhs;
Pair();
Pair( string l, string r ) {
lhs=l;
rhs=r;
};
};
struct compare {
bool operator()(const Pair& a, const Pair& b) const{
if ( ( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs ) ) {
cout << "MATCH" << endl;
}
return ( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs );
}
};
int main () {
set<Pair, compare > s;
Pair p( string("Hello"), string("World") );
s.insert(p);
cout << s.size() << "\n";
Pair q( string("World"), string("Hello") );
s.insert(q);
cout << s.size() << "\n";
compare cmp;
cout << cmp( p, q );
return 0;
}
Invoking the compiled code gives:
1
MATCH
MATCH
2
MATCH
Somehow the set s ends up with both Pairs p, and q in spite of the fact that the comparator identifies them as identical.
Why?
Any help will be much appreciated!
UPDATE:
Many thanks for the great answers and your kind and professional help.
As you might have guessed already, I am quite a newby to C++.
Anyway, I was wondering, if Antoine's answer could be done with a lambda expression?
Something like:
std::set< …, [](){ my_comparator_code_here } > s;
????
The comparison operator for a std::set (which is an ordered container) needs to identify a strict weak ordering not any arbitrary test you wish. Normally a properly implemented operator< does the job.
If your comparison operator does not provide a strict weak ordered (as yours does not) the behavior will be undefined. There is no way to work around this requirement of the C++ standard.
Note that in certain cases where an equality comparison is needed it will have to use the operator< twice to make the comparison.
Also have you considered using std::pair<std::string, std::string> instead of rolling your own?
I've reread your question about five times now and I'm starting to wonder if what you want is a set of pairs where which string is in first and second doesn't matter as far as the comparison goes. In that case #Antoine has what appears to be the correct solution for you.
A comparator for a set, map or any algorithm such as lower_bound or sort which require an order need to implement a strict weak ordering (basically, behave like a <).
Such an ordering is required to have 3 properties:
irreflexive: not (a < a) is always true
asymmetric: a < b implies not (b < a)
transitive: a < b and b < c imply a < c
Which you will not < has.
Such an ordering defines equivalence classes, which are groups of elements that compare equal according to the ordering (that is not (a < b) and not (b < a) is verified). In a set or map, only a single element per equivalence class can be inserted whereas a multiset or multimap may hold multiple elements per equivalence class.
Now, if you look at your comparator, you will realize that you have implemented == which does not define any order at all. You need to implement something akin to < instead.
A simple, but extremely efficient trick, is to use tuples which have < (and == and any other comparison operator) already implemented in a lexicographical order. Thus, std::tuple<std::string, std::string> has exactly the order you which; and even better, std::tuple<std::string const&, std::string const&> also has it, and can be constructed very easily using std::tie.
Therefore, the implementation of a straightforward comparator is as simple as:
struct comparator {
bool operator()(Pair const& left, Pair const& right) const {
return std::tie( left.a, left.b)
< std::tie(right.a, right.b);
}
};
Note: although not discussed much, it is absolutely essential that the ordering of the comparator be stable across calls. As such, it should generally only depend on the values of the elements, and nothing external or runtime-related (such as their addresses in memory)
EDIT: as noted, your comparator is slightly more complicated.
In your case, though, you also need to take into account that a and b have a symmetric role. In general, I would suggest uniquifying the representation in the constructor of the object; if not possible, you can uniquify first and compare second:
struct comparator {
bool operator()(Pair const& left, Pair const& right) const {
auto uleft = left.a < left.b ? std::tie(left.a, left.b)
: std::tie(left.b, left.a);
auto uright = right.a < right.b ? std::tie(right.a, right.b)
: std::tie(right.b, right.a);
assert(get<0>(uleft) <= get<1>(uleft) and "Incorrect uleft");
assert(get<0>(uright) <= get<1>(uright) and "Incorrect uright");
return uleft < uright;
}
}; // struct comparator
As Mark B said compare represents an ordering and not an equality, by default it is std::less. In your case, you don't want the comparison to depend on the order in your pair, but at the same time, your operator< must be satisfy a number of conditions.
All the answers here propose to change your specification and make the comparison order-dependant. But if you don't want that, here is the solution:
bool operator()(const Pair & a, const Pair & b) {
const bool swapA = a.lhs < a.rhs;
const std::string & al = swapA ? a.lhs : a.rhs;
const std::string & ar = swapA ? a.rhs : a.lhs;
const bool swapB = b.lhs < b.rhs;
const std::string & bl = swapB ? b.lhs : b.rhs;
const std::string & br = swapB ? b.rhs : b.lhs;
return al < bl || (al == bl && ar < br);
}
At least, it works on your example, and the relation is reflexive and transitive.
Here is how it works: it is the lexicographic order for pairs: al < bl || (al == bl && ar < br), applied to sorted pairs.
In fact your data structure is a (set of size N) of (set of size 2). Internally, std::set sorts its elements using your comparison operators. For your "set of size 2" Pair you also need to consider them as internally sorted.
If the comparison code looks too heavy, you could move the pair sorting into the Pair class, like implement two methods min() and max(). Also, you implement operator< and then don't need a compare class:
struct Pair {
string lhs, rhs;
Pair();
Pair( string l, string r ) : lhs(l), rhs(r) {}
const std::string & min() const { return lhs < rhs ? lhs : rhs; }
const std::string & max() const { return lhs < rhs ? rhs : lhs; }
bool operator<(const Pair& b) const {
return min() < b.min() || (min() == b.min() && max() < b.max());
}
};
from here
The set object uses this expression to determine both the order the elements follow in the container and whether two element keys are equivalent (by comparing them reflexively: they are equivalent if !comp(a,b) && !comp(b,a)). No two elements in a set container can be equivalent.
Sorry all jumped the gun becuase I disliked another answer. I will exapand and correct momentarily. AS pointed out, an order needs to be implemented. typcially this would be a lexicographical order. Importantly however you still need to make sure that the case for which you consider two pairs to be equal returns false for both cases.
if (( a.lhs == b.lhs && a.rhs == b.rhs ) || ( a.lhs == b.rhs && a.rhs == b.lhs )) return false;
//ordinary lexicographical compare
if( a.lhs < b.lhs) return true;
else if( a.lhs == b.lhs && a.rhs < b.rhs) return true;
else return false;
Notic the "!", simple. Your code is saying pair one is less than pair two which is less than pair one. You want it to say that neither is less than the other.
DISCLAIMER STILL WRONG ON A TECHNICALITY, ANTOINE'S IS THE CORRECT ONE

What requirements must std::map key classes meet to be valid keys?

I want to map objects of a given class to objects of another. The class I want to use as key, however, was not written by me and is a simple struct with a few values. std::map orders it's contents, and I was wondering how it does it, and if any arbitrary class can be used as a key or if there's a set of requirements (operators and what not) that need to be defined.
If so, I could create a wrapper for the class implementing the operators map uses. I just need to know what I need to implement first, and none of the references for the class I found online specify them.
All that is required of the key is that it be copiable and assignable.
The ordering within the map is defined by the third argument to the
template (and the argument to the constructor, if used). This
defaults to std::less<KeyType>, which defaults to the < operator,
but there's no requirement to use the defaults. Just write a comparison
operator (preferably as a functional object):
struct CmpMyType
{
bool operator()( MyType const& lhs, MyType const& rhs ) const
{
// ...
}
};
Note that it must define a strict ordering, i.e. if CmpMyType()( a, b
) returns true, then CmpMyType()( b, a ) must return false, and if
both return false, the elements are considered equal (members of the
same equivalence class).
You need to define the operator<, for example like this :
struct A
{
int a;
std::string b;
};
// Simple but wrong as it does not provide the strict weak ordering.
// As A(5,"a") and A(5,"b") would be considered equal using this function.
bool operator<(const A& l, const A& r )
{
return ( l.a < r.a ) && ( l.b < r.b );
}
// Better brute force.
bool operator<(const A& l, const A& r )
{
if ( l.a < r.a ) return true;
if ( l.a > r.a ) return false;
// a are equal, compare b
return ( l.b < r.b );
}
// This can often be seen written as
bool operator<(const A& l, const A& r )
{
// This is fine for a small number of members.
// But I prefer the brute force approach when you start to get lots of members.
return ( l.a < r.a ) ||
(( l.a == r.a) && ( l.b < r.b ));
}
The answer is actually in the reference you link, under the description of the "Compare" template argument.
The only requirement is that Compare (which defaults to less<Key>, which defaults to using operator< to compare keys) must be a "strict weak ordering".
Same as for set: The class must have a strict ordering in the spirit of "less than". Either overload an appropriate operator<, or provide a custom predicate. Any two objects a and b for which !(a<b) && !(b>a) will be considered equal.
The map container will actually keep all the elements in the order provided by that ordering, which is how you can achieve O(log n) lookup and insertion time by key value.

Out of four std::vector objects select the one with the most elements

I have four std::vector containers that all might (or might not) contain elements. I want to determine which of them has the most elements and use it subsequently.
I tried to create a std::map with their respective sizes as keys and references to those containers as values. Then I applied std::max on the size() of each vector to figure out the maximum and accessed it through the std::map.
Obviously, this gets me into trouble once there is the same number of elements in at least two vectors.
Can anyone think of a elegant solution ?
You're severely overthinking this. You've only got four vectors. You can determine the largest vector using 3 comparisons. Just do that:
std::vector<blah>& max = vector1;
if (max.size() < vector2.size()) max = vector2;
if (max.size() < vector3.size()) max = vector3;
if (max.size() < vector4.size()) max = vector4;
EDIT:
Now with pointers!
EDIT (280Z28):
Now with references! :)
EDIT:
The version with references won't work. Pavel Minaev explains it nicely in the comments:
That's correct, the code use
references. The first line, which
declares max, doesn't cause a copy.
However, all following lines do cause
a copy, because when you write max =
vectorN, if max is a reference, it
doesn't cause the reference to refer
to a different vector (a reference
cannot be changed to refer to a
different object once initialized).
Instead, it is the same as
max.operator=(vectorN), which simply
causes vector1 to be cleared and
replaced by elements contained in
vectorN, copying them.
The pointer version is likely your best bet: it's quick, low-cost, and simple.
std::vector<blah> * max = &vector1;
if (max->size() < vector2.size()) max = &vector2;
if (max->size() < vector3.size()) max = &vector3;
if (max->size() < vector4.size()) max = &vector4;
Here's one solution (aside from Pesto's far-too-straightforward approach) - I've avoided bind and C++0x lambdas for explanatory purposes, but you could use them to remove the need for a separate function. I'm also assuming that with two vectors with an equal number of elements, which one is picked is irrelevant.
template <typename T> bool size_less (const T* lhs, const T* rhs) {
return lhs->size() < rhs ->size();
}
void foo () {
vector<T>* vecs[] = {&vec1, &vec2, &vec3, &vec4};
vector<T>& vec = std::min_element(vecs, vecs + 4, size_less<vector<T> >);
}
Here is my very simple method. Only interest is that you just need basic c++ to understand it.
vector<T>* v[] = {&v1, &v2, &v3, &v4}, *max=&v1;
for(int i=1; i < 4; ++i)
if (v[i]->size() > max->size()) max = v[i];
This is a modified version of coppro's answer using a std::vector to reference any number of vectors for comparison.
template <typename T> bool size_less (const T* lhs, const T* rhs) {
return lhs->size() < rhs ->size();
}
void foo () {
// Define vector holding pointers to the original vectors
typedef vector< vector<T>* > VectorPointers;
// Fill the list
VectorPointers vecs;
vecs.push_back(&vec1);
vecs.push_back(&vec2);
vecs.push_back(&vec3);
vecs.push_back(&vec4);
vector<T>& vec = std::min_element(
vecs.begin(),
vecs.end(),
size_less<vector<T> >
);
}
I'm all for over-thinking stuff :)
For the general problem of finding the highest/lowest element in a group, I would use a priority_queue with a comparator:
(copying shamelessly from coppro, and modifying...)
template <typename T> bool size_less (const T* lhs, const T* rhs)
{
return lhs->size() < rhs ->size();
}
vector* highest()
{
priority_queue<vector<T>, size_less<T> > myQueue;
...
...
return myQueue.top();
}
You could use a std::multimap. That allows multiple entries with the same key.