C++ set: storing duplicates: confused about < operator - c++

I'm quite new to C++ (but know my way around C) so I'm probably missing something obvious.
TLDR: I use a std::set which stores elements twice, which is definitely not what I want.
Long story:
I've defined a class Clique and I need to store elements of this class in a set, so I've defined the < operator for Clique:
class Clique{
public :
int b;
int e;
int l;
std::set<int> X;
bool operator <( const Clique &rhs ) const
{
if( b < rhs.b)
return true;
if( e < rhs.e)
return true;
if( X.size() < rhs.X.size() )
return true;
std::set<int>::iterator itX = X.begin();
std::set<int>::iterator itrhs = rhs.X.begin();
// both sets have same size, need only to check end for one of them
while( (*itX == *itrhs) && ( itX != X.end() ) ){
++itX;
++itrhs;
}
if( itX == X.end() ){
//both sets are equal
return false;
}
else
return ( *itX < *itrhs );
}
void print_clique(FILE *F) const ;
};
(I wasn't sure how set comparison is done, so I wrote a routine for comparing them first by size, then element by element).
Now I want to store Clique elements in a set and this is where the problem appears.
My std::set
(1) does not appear to store Clique elements in the order I've defined;
(2) stores several copies of the same Clique
I've written a function to print a set of Clique:
void print_cliqueset(std::set<Clique> mySet){
int setsize = 0;
std::set<Clique>::iterator it = mySet.begin();
Clique cur_c = *it;
Clique prev_c = *it;
while( it != mySet.end() ){
// for( std::set<Clique>::iterator it = mySet.begin(); it != mySet.end(); ++it ){
it->print_clique(stdout);
setsize ++;
++it;
if( it != mySet.end() ){
cur_c = *it;
assert ( prev_c < cur_c);
gassert( prev_c.b <= cur_c.b );
prev_c = *it;
}
}
assert( setsize == mySet.size() );
}
My function is more complicated than needed but I wanted to make sure I understood what was going on.
Here is a typical output of printing such a set:
There's a line for each Clique, in which I print first b, then e, then the elements in the set X.
6829 9716 1 2 3 5 8 9 10
6792 9687 1 2 3 7 8 9 10
606 6531 1 2 3 5 6 7 8 9
6829 9687 1 2 3 5 7 8 9 10
410 9951 2 6
484 9805 1 2 4 6
494 9805 2 4 6 10
506 9805 1 2 5 6
484 9821 1 2 4
484 9871 2 3 4 6
506 9821 1 2 5
484 9802 1 2 3 4 6
486 9805 1 2 4 6 9
486 9802 1 2 3 4 6 9
507 9802 1 2 3 4 6 9 10
502 9802 1 2 3 4 6 10
506 9802 1 2 3 5 6
507 9806 1 2 4 9 10
507 9805 1 2 5 6 9
527 9806 1 2 5 9 10
As we can see, the cliques are not at all sorted on the order I defined (or wanted to define). They should be sorted first by member b (which is the first of each line), and this is not the case at all.
Then I have some duplicate lines in the output (not appearing in the example above but present in the full output). I guess the fact that I have duplicates is not surprising given that it seems confused about the order...
I guess the answer is something fairly obvious but I fail to see it. Any help would be appreciated!

Your bool operator <( const Clique &rhs ) const is wrong as it doesn't respect strict ordering.
It may simply be:
bool operator <(const Clique& rhs) const
{
return std::tie(b, e, X) < std::tie(rhs.b, rhs.e, rhs.X);
}

Your operator< is broken. Consider two Cliques:
c1 is {b = 0, e = 1, ...}
c2 is {b = 1, e = 0, ...}
Your code will return true for both c1 < c2 and c2 < c1.
Obviously, in such situation std::set shows strange behavior.
I would fix your operator< in the following way:
bool operator <( const Clique &rhs ) const
{
if( b != rhs.b)
return b < rhs.b;
if( e != rhs.e)
return e < rhs.e;
if( X.size() != rhs.X.size() )
return X.size() < rhs.X.size();
std::set<int>::iterator itX = X.begin();
std::set<int>::iterator itrhs = rhs.X.begin();
// both sets have same size, need only to check end for one of them
while((itX != X.end()) && (itX == *itrhs)){
++itX;
++itrhs;
}
if( itX == X.end() ){
//both sets are equal
return false;
}
else
return ( *itX < *itrhs );
}

The definition of operator< should be such that for each pair of elements 'b' and 'e' the relationship b < e should be used to determine any kind of relationship. The following equivalences are in force here:
a > b <==> b < a
a == b <==> !(a < b) && !(b < a)
a >= b <==> `!(a < b)
And so on. If you use multiple fields to be checked for every relationship check, then you have a kind-of multidimensional ranges. Making a flat range out of that can be only done this way:
More significant field is checked first; if in this field values aren't equal, you return the result immediately
Otherwise - if they are equal - you check the next field in the significance order and so on.
The requirement of using this complicated relationship definition in the set makes things actually harder for you because all you should do is to state whether one element is less than the other. So in your case you'll have to check for equality inside by yourself. Your procedure checks the fields "next in significance chain" also if lhs.b > rhs.b.

Operator < must provide strict weak ordering. I.e. if x < y then !(y < x) and !(y == x).
In the case of Clique, the requirements seem to be that the elements b, e, and X are compared lexographically.
The idiomatic way to represent this is to do all comparisons in terms of operator<:
#include <set>
class Clique{
public :
int b;
int e;
int l;
std::set<int> X;
bool operator <( const Clique &r ) const
{
auto const& l = *this;
if (l.b < r.b) return true;
if (r.b < l.b) return false;
if (l.e < r.e) return true;
if (r.e < l.e) return false;
if (l.X < r.X) return true;
if (r.X < l.X) return false;
return false;
}
void print_clique(FILE *F) const ;
};
And yes, std::set really does provide operator< when the key type provides it.
Another way to write this, as Jarod was alluding to is this:
#include <set>
#include <tuple>
class Clique{
public :
int b;
int e;
int l;
std::set<int> X;
bool operator <( const Clique &r ) const
{
auto const& l = *this;
return std::tie(l.b, l.e, l.X) < std::tie(r.b, r.e, r.X);
}
void print_clique(FILE *F) const ;
};
Which I think you'll agree is concise, expressive, correct and idiomatic.

Related

Unordered selection implementation based on std::set ends up having duplicates

Trying to implement a combination of 4 objects taken 2 at a time without taking into account the arrangement (such must be considered duplicates: so that order is not important) of objects with std::set container:
struct Combination {
int m;
int n;
Combination(const int m, const int n):m(m),n(n){}
};
const auto operator<(const auto & a, const auto & b) {
//explicitly "telling" that order should not matter:
if ( a.m == b.n && a.n == b.m ) return false;
//the case "a.m == b.m && a.n == b.n" will result in false here too:
return a.m == b.m ? a.n < b.n : a.m < b.m;
}
#include <set>
#include <iostream>
int main() {
std::set< Combination > c;
for ( short m = 0; m < 4; ++ m ) {
for ( short n = 0; n < 4; ++ n ) {
if ( n == m ) continue;
c.emplace( m, n );
} }
std::cout << c.size() << std::endl; //12 (but must be 6)
}
The expected set of combinations is 0 1, 0 2, 0 3, 1 2, 1 3, 2 3 which is 6 of those, but resulting c.size() == 12. Also, my operator<(Combination,Combination) does satisfy !comp(a, b) && !comp(b, a) means elements are equal requirement.
What am I missing?
Your code can't work1, because your operator< does not introduce a strict total ordering. One requirement for a strict total ordering is that, for any three elements a, b and c
a < b
and
b < c
imply that
a < c
(in a mathematical sense). Let's check that. If we take
Combination a(1, 3);
Combination b(1, 4);
Combination c(3, 1);
you see that
a < b => true
b < c => true
but
a < c => false
If you can't order the elements you can't use std::set. A std::unordered_set seems to more suited for the task. You just need a operator== to compare for equality, which is trivial and a hash function that returns the same value for elements that are considere identical. It could be as simple as adding m and n.
1 Well, maybe it could work, or not, or both, it's undefined behaviour.
Attached is the working code. The tricky part that you were missing was not adding a section of code to iterate through the already working set to then check the values. You were close! If you need a more thorough answer I will answer questions in the comments. Hope this helps!
#include <set>
#include <iostream>
using namespace std;
struct Combination {
int m;
int n;
Combination(const int m, const int n):m(m),n(n){}
};
const auto operator<(const auto & a, const auto & b) {
//explicitly "telling" that order should not matter:
if ( a.m == b.n && a.n == b.m ) return false;
//the case "a.m == b.m && a.n == b.n" will result in false here too:
return a.m == b.m ? a.n < b.n : a.m < b.m;
}
int main() {
set< Combination > c;
for ( short m = 0; m < 4; ++ m )
{
for ( short n = 0; n < 4; ++ n )
{
//Values are the same we do not add to the set
if(m == n){
continue;
}
else{
Combination s(n,m);
const bool is_in = c.find(s) != c.end();
if(is_in == true){
continue;
}
else{
cout << " M: " << m << " N: " << n << endl;
c.emplace( m, n);
}
}
}
}
cout << c.size() << endl; //16 (but must be 6)
}

How to sort std::set according to the second element?

Given n points in a two-dimensional space, sort all the points in ascending order.
(x1,y1) > (x2,y2) if and only if (x1>x2) or (x1==x2 && y1<y2)
Input specification:
The first line consists of an integer t, the number of test cases. Then for each test case, the first line consists of an integer n, the number of points. Then the next n lines contain two integers xi, yi which represents the point.
Output Specification:
For each test case print the sorted order of the points.
Input constraints:
1 <= t <= 10
1 <= n <= 100000
- 10 ^ 9 <= co - ordinates <= 10 ^ 9
NOTE: Strict time limit. Prefer scanf/printf/BufferedReader instead of cin/cout/Scanner.
Sample Input:
1
5
3 4
-1 2
5 -3
3 3
-1 -2
Sample Output:
-1 2
-1 -2
3 4
3 3
5 -3
I declared a set, now I want to sort descendingly(values) if the keys are equal. Here is my code:
int main()
{
int n, i, hold = 0;
set<pair<int, int>>s;
int x, y, t;
set<pair<int, int>>::iterator it;
SF(t)
while (t--)
{
SF(n) while (n--) {
SF(x) SF(y)
s.insert({ x,y });
}
for (it = s.begin(); it != s.end(); it++) {
PF(it->first) printf(" "); PF(it->second); printf("\n");
}
s.clear();
}
return 0;
}
my output
-1 -2
-1 2
3 3
3 4
5 -3
I want the key values to be sorted descendingly if the keys are same.
The std::set uses by default std::less as default comparator for comparing the elements inserting to it.
In your case, you have std::pair<int,int> as your element type hence, the std::set uses the default operator< of std::pair defined in the standard and hence you are not getting the result you want.
In order to achieve your custom style comparison, you need to provide a custom comparator
template<
class Key,
class Compare = std::less<Key>,
// ^^^^^^^^^^^^^^^ --> instead of this
class Allocator = std::allocator<Key>
> class set;
which should meet the requirements of compare.
Since C++11 you could also use a lambda function for this:
Following is a sample example code: (See Online)
#include <iostream>
#include <set>
using pairs = std::pair<int, int>;
int main()
{
// custom compare
const auto compare = [](const pairs &lhs, const pairs &rhs)
{
return lhs.first < rhs.first || (lhs.first == rhs.first && lhs.second > rhs.second);
};
std::set<pairs, decltype(compare)> mySet(compare);
mySet.emplace(3, 4);
mySet.emplace(-1, 2);
mySet.emplace(5, -3);
mySet.emplace(3, 3);
mySet.emplace(-1, -2);
for (const auto& it : mySet)
std::cout << it.first << " " << it.second << std::endl;
}
Output:
-1 2
-1 -2
3 4
3 3
5 -3
Set doesn't sort the way you want by default, so you have to supply your own comparison function.
struct MyComp
{
bool operator()(const pair<int,int>& x, const pair<int,int>& y) const
{
return x.first < y.first || (x.first == y.first && x.second > y.second);
}
};
set<pair<int,int>, MyComp> s;
As Jejo and others have answered, you can create a custom comparitor to specify how you want your points sorted:
// custom compare
const auto compare = [](const pairs &lhs, const pairs &rhs)
{
return lhs.first < rhs.first || (lhs.first == rhs.first && lhs.second > rhs.second);
};
set<pair<int, int>, decltype(compare)> mySet(compare);
However, if performance is your concern, you will probably find that using a std::vector and calling std::sort is much faster than the std::set/insert alternative:
#include <vector>
#include <algorithm>
using namespace std;
int main()
{
int n, i, hold = 0;
vector<pair<int, int>> v;
int x, y, t;
SF(t)
while (t--)
{
SF(n)
v.reserve(n);
while (n--) {
SF(x) SF(y)
v.emplace_back( x,y );
}
// custom comparitor
const auto comp = [](const pairs &lhs, const pairs &rhs)
{
return lhs.first < rhs.first || (lhs.first == rhs.first && lhs.second > rhs.second);
};
sort(v.begin(), v.end(), comp);
for (const auto &p : v) {
PF(p.first) printf(" "); PF(p.second); printf("\n");
}
v.clear();
}
return 0;
}
A couple reasons why inserting into a set is slower than inserting into a vector and then sorting:
std::set implementations involve binary trees, usually red-black trees. See here for details.
Iterating over the range of elements in a std::set is much slower
Note that both methods require n allocations and require on the order of nlog(n) operations for insertion + sorting.

Sorting packed vertices with thrust

So I have an device array of PackedVertex structs:
struct PackedVertex {
glm::vec3 Vertex;
glm::vec2 UV;
glm::vec3 Normal;
}
I'm trying to sort them so that duplicates are clustered together in the array; I don't care about overall order at all.
I've tried sorting them by comparing the lengths of the vectors which ran but didn't sort them correctly so now I'm trying per variable using 3 stable_sorts with the binary_operators:
__thrust_hd_warning_disable__
struct sort_packed_verts_by_vertex : public thrust::binary_function < PackedVertex, PackedVertex, bool >
{
__host__ __device__ bool operator()(const PackedVertex &lhs, const PackedVertex &rhs)
{
return lhs.Vertex.x < rhs.Vertex.x || lhs.Vertex.y < rhs.Vertex.y || lhs.Vertex.z < rhs.Vertex.z;
}
};
__thrust_hd_warning_disable__
struct sort_packed_verts_by_uv : public thrust::binary_function < PackedVertex, PackedVertex, bool >
{
__host__ __device__ bool operator()(const PackedVertex &lhs, const PackedVertex &rhs)
{
return lhs.UV.x < rhs.UV.x || lhs.UV.y < rhs.UV.y;
}
};
__thrust_hd_warning_disable__
struct sort_packed_verts_by_normal : public thrust::binary_function < PackedVertex, PackedVertex, bool >
{
__host__ __device__ bool operator()(const PackedVertex &lhs, const PackedVertex &rhs)
{
return lhs.Normal.x < rhs.Normal.x || lhs.Normal.y < rhs.Normal.y || lhs.Normal.z < rhs.Normal.z;
}
};
Trouble is I'm getting a thrust error now: "launch_closure_by_value" which hazarding a guess means that my sort isn't converging due to my operators.
That being said I'm also pretty sure this is not the best way for me to be doing this kind of sort so any feedback would be greatly appreciated.
I don't believe your sort functors are correct.
A sort functor must give a consistent ordering. Let's just consider this one:
return lhs.UV.x < rhs.UV.x || lhs.UV.y < rhs.UV.y;
Suppose I have two UV quantites like this:
UV1.x: 1
UV1.y: 0
UV2.x: 0
UV2.y: 1
This functor will return true no matter which order I present UV1 and UV2. Your other functors are similarly defective.
In thrust speak, these are not valid StrictWeakOrdering functors. If we wish to order UV1 and UV2, we must provide a functor which (consistently) returns true for one presentation order and false for the other presentation order. (The only exception to this is if the two presented quantities are truly equal, then the functor should always return just one answer, either true or false, consistently, regardless of presentation order. However the UV1 and UV2 presented here are not "equal" for the purposes of your desired ordering, i.e. grouping of identical structs.)
The following simple test seems to work for me:
$ cat t717.cu
#include <thrust/sort.h>
#include <thrust/device_ptr.h>
#include <iostream>
#include <stdlib.h>
#define DSIZE 64
#define RNG 10
struct PackedVertex {
float3 Vertex;
float2 UV;
float3 Normal;
};
struct my_PV_grouper {
template <typename T>
__host__ __device__
bool operator()(const T &lhs, const T &rhs) const {
if (lhs.Vertex.x > rhs.Vertex.x) return true;
else if (lhs.Vertex.x < rhs.Vertex.x) return false;
else if (lhs.Vertex.y > rhs.Vertex.y) return true;
else if (lhs.Vertex.y < rhs.Vertex.y) return false;
else if (lhs.Vertex.z > rhs.Vertex.z) return true;
else if (lhs.Vertex.z < rhs.Vertex.z) return false;
else if (lhs.UV.x > rhs.UV.x) return true;
else if (lhs.UV.x < rhs.UV.x) return false;
else if (lhs.UV.y > rhs.UV.y) return true;
else if (lhs.UV.y < rhs.UV.y) return false;
else if (lhs.Normal.x > rhs.Normal.x) return true;
else if (lhs.Normal.x < rhs.Normal.x) return false;
else if (lhs.Normal.y > rhs.Normal.y) return true;
else if (lhs.Normal.y < rhs.Normal.y) return false;
else if (lhs.Normal.z > rhs.Normal.z) return true;
else return false;
}
};
int main(){
PackedVertex h_data[DSIZE];
PackedVertex *d_data;
for (int i =0; i < DSIZE; i++)
h_data[i].Vertex.x = h_data[i].Vertex.y = h_data[i].Vertex.z = h_data[i].UV.x = h_data[i].UV.y = h_data[i].Normal.x = h_data[i].Normal.y = h_data[i].Normal.z = rand()%RNG;
cudaMalloc(&d_data, DSIZE*sizeof(PackedVertex));
cudaMemcpy(d_data, h_data, DSIZE*sizeof(PackedVertex), cudaMemcpyHostToDevice);
thrust::device_ptr<PackedVertex> d_ptr(d_data);
thrust::sort(d_ptr, d_ptr+DSIZE, my_PV_grouper());
cudaMemcpy(h_data, d_data, DSIZE*sizeof(PackedVertex), cudaMemcpyDeviceToHost);
for (int i =0; i < DSIZE; i++)
std::cout << h_data[i].Vertex.x << " ";
std::cout << std::endl;
}
$ nvcc -o t717 t717.cu
$ ./t717
9 9 9 9 9 9 9 8 8 8 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 4 4 4 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 0 0 0 0 0 0
$
In case it's not clear, there is nothing particularly specific to the usage of thrust and functors here; the fundamental logic used to order these items needs to be correct for a valid sort. Even if you wrote a simple serial bubble-sort, it would have to use similar logic. The logic presented in your functors cannot be used to provide a sensible ordering.
If there are other problems with your approach, I can't say, as you have not provided a proper MCVE, which is expected for questions like this.

Not sorting properly

I have a class Node
class Node
{
public:
int Number;
char Ch;
Node(int N, char A)
{
Number = N;
Ch = A;
}
};
that I want to sort as follows. First sort by the Number, and if the numbers, are equal, place the one with the Character as "M" in front.
bool Srt(Node A, Node B)
{
if (A.Number < B.Number)
return true;
if (A.Number > B.Number)
return false;
if (A.Number == B.Number)
{
if (B.Ch == 'M')
{
return true;
}
return false;
}
return false;
}
However, this does not work properly for the following input:
1 S
2 S
3 S
4 S
5 S
6 S
7 S
8 S
9 S
10 S
11 S
12 S
13 S
14 S
15 S
16 S
999999985 M
999999986 M
999999987 M
999999988 M
999999989 M
999999990 M
999999991 M
999999992 M
999999993 M
999999994 M
999999995 M
999999996 M
999999997 M
999999998 M
999999999 M
1000000000 M
It should return the list again but instead it returns
1 S
2 S
3 S
4 S
5 S
6 S
7 S
8 S
9 S
999999993 M
999999994 M
999999995 M
999999996 M
999999997 M
999999998 M
999999999 M
1000000000 M
10 S
11 S
12 S
13 S
14 S
15 S
16 S
999999985 M
999999986 M
999999987 M
999999988 M
999999989 M
999999990 M
999999991 M
999999992 M
It looks like you have implemented a less-than comparison for use with std::sort.
Such a comparator needs to be a strict weak ordering, such that A < B implies !( B < A ). Your function violates this if the numbers are equal and both have character M. Try this instead:
if (A.Number < B.Number)
return true;
if (A.Number > B.Number)
return false;
// Now A.Number == B.Number so there is no need to check.
return B.Ch == 'M' && A.Ch != 'M';

Where is my missing Edge?

I'm trying to use a std::set where I will throw a bunch of edges in, and have only the unique ones remain.
An Edge is a line between two (integer indexed) nodes. Edge (1,2)==(2,1), because these edges are undirected.
I'm encountering a puzzling situation though, with this. At the section marked //?? in the code below, the behavior is not as I expect.
The results of running this code are to only keep 2 edges, (1,2) and (4,8). (2,1) is discarded by the set, but it should not be unless I activate the commented out //|| ( A==o.B && B==o.A ) section in operator==! What is happening here?
This set<Edge> implementation is leaving me feeling .. edgy.
#include <stdio.h>
#include <set>
using namespace std ;
struct Edge
{
int A,B ;
Edge( int iA, int iB ) : A(iA), B(iB) {}
bool operator==( const Edge & o ) const {
//??
return ( A==o.A && B==o.B ) ;//|| ( A==o.B && B==o.A ) ;
}
bool operator<( const Edge& o ) const {//MUST BE CONST
return A < o.A && B < o.B ;
}
void print() const { printf( "( %d, %d )", A,B ) ; }
void compare( const Edge& o ) const {
print() ;
if( *this==o ) printf( "==" ) ;
else printf( "!=" ) ;
o.print() ;
puts("");
}
} ;
int main()
{
Edge e1( 1, 2 ) ;
Edge e2( 1, 2 ) ;
Edge e3( 2, 1 ) ;
Edge e4( 4, 8 ) ;
e1.compare( e2 ) ;
e1.compare( e3 ) ;
e1.compare( e4 ) ;
set<Edge> edges ;
edges.insert( e1 ) ;
edges.insert( e2 ) ;
edges.insert( e3 ) ;
edges.insert( e4 ) ;
printf( "%d edges\n", edges.size() ) ;
for( auto edge : edges )
{
edge.print();
}
}
C++ set does not care about your == operator as much as it does about your < operator. It is your < operator that presents the problem: if you would like to make sure that (1,2) is equal to (2,1), you should change the implementation of your < to behave like this:
bool operator<( const Edge& o ) const {
int myMin = min(A, B);
int myMax = max(A, B);
int hisMin = min(o.A, o.B);
int hisMax = max(o.A, o.B);
return myMin < hisMin || ( myMin == hisMin && myMax < hisMax );
}
What this implementation does is constructing a canonical representation of an edge, where the smaller of the {A,B} becomes the "canonical A", and the larger one becomes the "canonical B". When edges are compared in their canonical form, the equality of (1,2) and (2,1) can be implied from the fact that both (1,2) < (2,1) and (2,1) < (1,2) evaluate to false.
I believe your operator< is wrong, both e3<e2 and e2<e3 are false.
Maybe you wanted something like:
return A < o.A || ((A == o.A) && (B < o.B)) ;
I suggest that you change your Edge() constructor to ensure that A and B are always initialized such that A<=B (if edges can point back to their originating node) or A<B (if not), and forego having the extra logic in the operator== implementation. That seems less "edgy" to me.
Your comparison should be
return (A == o.A && B == o.B) || (B == o.A && A == o.B);