Strange Behavior of std::set and < Operator Overloading? - c++

I understand that if the < operator is overloaded in C++ (for example, to insert custom structs into std::set), the implementation must be a strict weak order over the underlying type.
Consider the following struct and implementation. This implementation is not a strict weak order, but the code compiles and runs without throwing an error (I would expect it to throw an error, given the requirement of a strict weak order):
#include <iostream>
#include <set>
using namespace std;
struct Pixel {
int x;
int y;
};
bool operator < (Pixel lhs, Pixel rhs){
return lhs.x < rhs.x || lhs.y < rhs.y;
};
int main(){
set<Pixel> mySet;
Pixel *newPixelA = new Pixel;
newPixelA->x = 1;
newPixelA->y = 3;
Pixel *newPixelB = new Pixel;
newPixelB->x = 4;
newPixelB->y = 2;
mySet.insert(*newPixelA);
mySet.insert(*newPixelB);
}
Is this the expected behavior? EDIT: using Xcode.

The compiler has no way of determining whether your operator< is a strict weak ordering. Instead, what is meant by std::set requiring this is that it will only work correctly if you give it a strict weak ordering. It makes no guarantees about what will happen if you give it something else.
In general, what C++ means when it requires something is that it is your responsibility to make sure that something happens. If you do, then the compiler and library will guarantee that you get the right results.

Standard guarantees expected behavior if comparator requirements are met. Otherwise, what happens depends on implementation and data sets. Your comparison function may work properly for some data sets (where for all points greater x implies greater y). Set cannot contain equal elements (as a math concept), and for std::set equivalence means equality, so it'll just prevent you from inserting value a if there is already value b, such that:
a < b == true
b < a == true
even though a may be not equal to b

When the comparison operator implements strictly weak ordering of the contained elements, the objects in the std::set are ordered in a predictable patten. If not, there is no telling which object appears first in the std::set when you iterate over the objects.
Take the following sample program in which ordering of Pixel1 is not done right and ordering of Pixel2 is done right.
#include <iostream>
#include <set>
struct Pixel1 {
int x;
int y;
};
bool operator < (Pixel1 lhs, Pixel1 rhs){
return lhs.x < rhs.x || lhs.y < rhs.y;
};
struct Pixel2 {
int x;
int y;
};
bool operator < (Pixel2 lhs, Pixel2 rhs){
if ( lhs.x != rhs.x )
{
return (lhs.x < rhs.x);
}
return (lhs.y < rhs.y);
};
template <typename Pixel> void print(std::set<Pixel> const& mySet)
{
for ( Pixel p : mySet )
{
std::cout << "(" << p.x << ", " << p.y << ") ";
}
std::cout << std::endl;
}
template <typename Pixel> void test1()
{
std::set<Pixel> mySet;
Pixel pixelA = {2, 3};
Pixel pixelB = {4, 2};
Pixel pixelC = {4, 1};
mySet.insert(pixelA);
mySet.insert(pixelB);
mySet.insert(pixelC);
print(mySet);
}
template <typename Pixel> void test2()
{
std::set<Pixel> mySet;
Pixel pixelA = {2, 3};
Pixel pixelB = {4, 2};
Pixel pixelC = {4, 1};
mySet.insert(pixelB);
mySet.insert(pixelA);
mySet.insert(pixelC);
print(mySet);
}
int main()
{
std::cout << "Pixel1 ... \n";
test1<Pixel1>();
test2<Pixel1>();
std::cout << "Pixel2 ... \n";
test1<Pixel2>();
test2<Pixel2>();
}
Output
Pixel1 ...
(4, 1) (4, 2) (2, 3)
(4, 1) (2, 3) (4, 2)
Pixel2 ...
(2, 3) (4, 1) (4, 2)
(2, 3) (4, 1) (4, 2)
The order of objects in the std::set<Pixel1> depends on the order of insertion while the order of objects in the std::set<Pixel2> is independent of the order of insertion.
Only you can tell whether that is acceptable in your application,

Related

Select value randomly from groups of values of different types

I have arrays of types int, bool and float:
std::array<int, 3>myInts = {15, 3, 6};
std::array<bool, 2>myBools = {true, false};
std::array<float,5>myFloats = {0.1, 15.2, 100.6, 10.44, 5.5};
I would like to generate a random integer(I know how to do that) from 0 to the total number of elements (3 + 2 + 5) so the generated random Integer represents one of the values. Next based on that integer I would like to retrieve my value and do further calculations with it. The problem I am facing is that I don't want to use if else statements like these:
int randInt = RandIntGen(0, myInts.size() + myBools.size() + myFloats.size());//Generates a random Integer
if(randInt<myInts.size()){//if the random integer is less than the size of the integers array I can choose
// from the the integers array
int myValue = myInts[RandInt]
}
else if(randInt>=myInts.size() && randInt<myBools.size() + myInts.size()){//if the random integer
//is between the size o the integer's array and the size of the bool's array + the size of the integers array
//then I can choose from the bool's array
bool myValue = myBools(RandInt - myInts.size())
}
.
.
.
Then if for example randInt=2 then myValue=6 or if randInt=4 then myValue=false
However I would like that the selection algorithm was more straightforward something like:
int randInt = RandIntGen(0, myInts.size() + myBools.size() + myFloats.size());
allValues = {myInts, myBools, myFloats}
if(type_id(allValues[randInt]).name=="int")
int myValue = allValues[randInt] //(this value will be used for further calculations)
if(type_id(allValues[randInt]).name=="bool")
bool myValue = allValues[randInt] //(this value will be used for further calculations)
I've tried with a mix of templates, inheritance and linked lists however I cannot implement what I want. I think the solution should be really simple but at this time I cannot think of something else.
I am novice in C++ I've been learning already for 1 and half months, before I was doing stuff in python and everything was way easier but then I decided to try C++. I am not a experienced programmer I know some basic things and I am trying to learn new things, thanks for the help.
Most probably, you need to think how to satisfy your requirements in a simpler way, but it is possible to get literally what you want with C++17. If your compiler doesn't support C++17, you can use corresponding boost libraries. Here is the code:
#include <array>
#include <iostream>
#include <tuple>
#include <variant>
using Result = std::variant<int, bool, float>;
template<class T>
bool take_impl(int& i, const T& vec, Result& result)
{
if (i < static_cast<int>(std::size(vec)))
result = vec[i];
i -= std::size(vec);
return i < 0;
}
template<class T>
Result take(int i, const T& arrays)
{
if (i < 0)
throw std::runtime_error("i is too small");
Result res;
std::apply([&i, &res](const auto&... array) { return (take_impl(i, array, res) || ...); }, arrays);
if (i >= 0)
throw std::runtime_error("i is too large");
return res;
}
std::ostream& operator<<(std::ostream& s, const Result& v)
{
if (std::holds_alternative<int>(v))
std::cout << "int(" << std::get<int>(v);
else if (std::holds_alternative<bool>(v))
std::cout << "bool(" << std::get<bool>(v);
else
std::cout << "float(" << std::get<float>(v);
return std::cout << ')';
}
auto arrays = std::make_tuple(
std::array<int, 3>{15, 3, 6},
std::array<bool, 2>{true, false},
std::array<float,5>{0.1, 15.2, 100.6, 10.44, 5.5}
);
int main()
{
for (int i = 0; i < 10; ++i)
std::cout << take(i, arrays) << '\n';
}
If you are not required to keep separate arrays of different types, you can make one uniform array of std::variant<int, bool, float>. This will be significantly more efficient than using std::shared_ptr-s.

Using std::set container for range items

I'd like to store a bunch of range items in std::set container.
This data structure should provide fast decision whether a specific input range contained by one of the ranges that the set currently holds, by overloading the comparison of std::set in order use the set::find method to check one of the items in set contain the input range argument.
It should also support range item that represents a single point (start_range == end_range).
Here's my implementation :
#include <iostream>
#include <map>
#include <set>
using std::set;
using std::map;
class range : public std::pair<int,int>
{
public:
range(int lower, int upper)
{
if (upper < lower)
{
first = upper;
second = lower;
}
else
{
first = lower;
second = upper;
}
}
range(int val)
{
first = second = val;
}
bool operator<(range const & b) const
{
if (second < b.first)
{
return true;
}
return false;
}
};
And here's how I test my data structure:
int main(int argc, const char * argv[])
{
std::map<int, std::set<range>> n;
n[1].insert(range(-50,-40));
n[1].insert(range(40,50));
n[2].insert(range(-30,-20));
n[2].insert(range(20,30));
n[3].insert(range(-20,-10));
n[3].insert(range(10,20));
range v[] = {range(-50,-41), range(30,45), range(-45,-45), range(25,25)};
int j[] = {1,2,3};
for (int l : j)
{
for (range i : v)
{
if (n[l].find(i) != n[l].end())
{
std::cout << l << "," << i.first << "," << i.second << " : "
<< n[l].find(range(i))->first << " "
<< n[l].find(range(i))->second << std::endl;
}
}
}
}
and here are the results I get:
1,-50,-41 : -50 -40 --> good
1,30,45 : 40 50 --> bad
1,-45,-45 : -50 -40 --> good
2,30,45 : 20 30 --> bad
2,25,25 : 20 30 --> good
So as you can see, my code does support perfectly well single point range (-45 is contained by range (-50,-40) and 25 is contained by by range (20,30))
However, as for wider ranges, my current operator < is capable of finding the contained relationship which is equal for the set terminology (meaning that for ranges a and b a<b && a<b.
Is there anyway to change this operator to make it work ?
Sounds like a perfect match for using Boost Interval Container Library. In short, you can
#include <boost/icl/interval_set.hpp>
// Helper function template to reduce explicit typing:
template <class T>
auto closed(T&& lower, T&& upper)
{
return boost::icl::discrete_interval<T>::closed(std::forward<T>(lower),
std::forward<T>(upper));
}
boost::icl::interval_set<int> ranges;
ranges.insert(closed(1, 2));
ranges.insert(closed(42, 50));
std::cout << contains(ranges, closed(43, 46)) << "\n"; // true
std::cout << contains(ranges, closed(42, 54)) << "\n"; // false
This should easily be pluggable into your std::map and be usable without further adjustments.
Your operator < defines partial order:
(30,45) < (40, 50) == false and simultaneously (40, 50) < (30, 45) == false so in terms of std::set and std::map they are equal. That is why you got these results.
There is a paper about partial order: https://en.wikipedia.org/wiki/Partially_ordered_set
You might want use std::unordered_map or define somehow total order for your ranges.
I suggest operator < that compares the arithmetical mean of range bounds, i.e.
(a, b) < (c, d) if and only if (a+b)/2 < (c+d)/2 for total order. Note that you might want use float for arithmetical mean.
For testing I suggest the following code draft (I write here from scratch and didn't tested it). -1 meanst that are no range that contains this
int range::firstContainsMe(const std::vector<range> rangesVec)
{
for (size_t i = 0; i < rangesVec; i++) {
if (lower >= rangesVec[i].lower && upper <= rangesVec[i].upper) {
return i;
}
}
return -1;
}
Your comparison operator is unsuitable.
If you wish to use any container or algorithm based on ordering in C++, the ordering relation needs to be a Strict Weak Ordering Relation. The definition can be found on Wikipedia, in short the following rules must be respected:
Irreflexivity: For all x in S, it is not the case that x < x.
Asymmetry: For all x, y in S, if x < y then it is not the case that y < x.
Transitivity: For all x, y, z in S, if x < y and y < z then x < z.
Transitivity of Incomparability: For all x, y, z in S, if x is incomparable with y (neither x < y nor y < x hold), and y is incomparable with z, then x is incomparable with z.
Your comparison operator fails, and therefore is unsuitable. In general, a quick way of obtaining a good comparison operator is to do what tuples do:
bool operator<(range const & b) const
{
return std::tie(first, second) < std::tie(b.first, b.second);
}
You want a map, not a set.
In order to solve your problem, you want a map, not a set.
For disjoint intervals, a map from lower-bound to upper-bound is sufficient:
std::map<int, int> intervals;
The .lower_bound and .upper_bound operations allow finding the closest key in O(log N) time, and from there containment is quickly asserted.
For non-disjoint intervals, things get trickier I fear, and you'll want to start looking into specialized data-structures (Interval Trees for example).

Why operator< of c++ map do not work with <=

I have this following program for map with custom keys:
class MyClass
{
public:
MyClass(int i): val(i) {}
bool operator< (const MyClass& that) const { return val <= that.val; }
private:
int val;
};
int main()
{
MyClass c1(1);
MyClass c2(2);
MyClass c3(3);
map<MyClass, int> table;
table[c1] = 12;
table[c2] = 22;
table[c3] = 33;
cout << "Mapped values are: " << table.lower_bound(c1)->second << " " << table[c2] << " " << table[c3] << endl;
}
The output comes as:
Mapped values are: 22 0 0
But if I compare using < or > in the operator< instead of <= then everything works fine. And the output comes as:
Mapped values are: 12 22 33
Can someone explain why <= does not work at all, but < and even > works?
The comparison function used by std::map must implement a strict weak ordering. That means it must implement the following rules given objects x, y, and z:
op(x, x) must always be false
if op(x, y) is true then op(y, x) must be false
if op(x, y) && op(y, z) is true then op(x, z) must also be true
if !op(x, y) && !op(y, x) is true then !op(x, z) && !op(z, x) must also be true
The <= operator does not satisfy these conditions because, given x = y = 1, x <= x is not false and both x <= y and y <= x are true.
std::map uses these rules to implement its comparisons. For example, it could implement an equality check as !(op(x, y) || op(y, x)). Given x = 4, y = 4, and op = operator<= this becomes !(4 <= 4 || 4 <= 4), so 4 does not compare equal to 4 because the first rule above was broken.
On cppreference we find this quote.
Everywhere the standard library uses the Compare concept, uniqueness is determined by using the equivalence relation. In imprecise terms, two objects a and b are considered equivalent (not unique) if neither compares less than the other: !comp(a, b) && !comp(b, a).
This means that with you current compare
bool operator< (const MyClass& that) const { return val <= that.val; }
if you have two MyClass with val 5 and 5, 5 <= 5 will return true, and they will not be considered equivalent.

Best way to to average duplicate properties in C++ vector

I have a std::vector<PLY> that holds a number of structs:
struct PLY {
int x;
int y;
int greyscale;
}
Some of the PLY's could be duplicates in terms of their position x and y but not necessarily in terms of their greyscale value. What is the best way to find those (position-) duplicates and replace them with a single PLY instace which has a greyscale value that represents the average greyscale of all duplicates?
E.g: PLY a{1,1,188} is a duplicate of PLY b{1,1,255}. Same (x,y) position possibly different greyscale.
Based on your description of Ply you need these operators:
auto operator==(const Ply& a, const Ply& b)
{
return a.x == b.x && a.y == b.y;
}
auto operator<(const Ply& a, const Ply& b)
{
// whenever you can be lazy!
return std::make_pair(a.x, a.y) < std::make_pair(b.x, b.y);
}
Very important: if the definition "Two Ply are identical if their x and y are identical" is not general valid, then defining comparator operators that ignore greyscale is a bad ideea. In that case you should define separate function objects or non-operator functions and pass them around to function.
There is a nice rule of thumb that a function should not have more than a loop. So instead of a nested 2 for loops, we define this helper function which computes the average of consecutive duplicates and also returns the end of the consecutive duplicates range:
// prereq: [begin, end) has at least one element
// i.e. begin != end
template <class It>
auto compute_average_duplicates(It begin, It end) -> std::pair<int, It>
// (sadly not C++17) concepts:
//requires requires(It i) { {*i} -> Ply; }
{
auto it = begin + 1;
int sum = begin->greyscale;
for (; it != end && *begin == *it; ++it) {
sum += it->greyscale;
}
// you might need rounding instead of truncation:
return std::make_pair(sum / std::distance(begin, it), it);
}
With this we can have our algorithm:
auto foo()
{
std::vector<Ply> v = {{1, 5, 10}, {2, 4, 6}, {1, 5, 2}};
std::sort(std::begin(v), std::end(v));
for (auto i = std::begin(v); i != std::end(v); ++i) {
decltype(i) j;
int average;
std::tie(average, j) = compute_average_duplicates(i, std::end(v));
// C++17 (coming soon in a compiler near you):
// auto [average, j] = compute_average_duplicates(i, std::end(v));
if (i + 1 == j)
continue;
i->greyscale = average;
v.erase(i + 1, j);
// std::vector::erase Invalidates iterators and references
// at or after the point of the erase
// which means i remains valid, and `++i` (from the for) is correct
}
}
You can apply lexicographical sorting first. During sorting you should take care of overflowing greyscale. With current approach you will have some roundoff error, but it will be small as i first sum and only then average.
In the second part you need to remove duplicates from the array. I used additional array of indices to copy every element not more than once. If you have some forbidden value for x, y or greyscale you can use it and thus get along without additional array.
struct PLY {
int x;
int y;
int greyscale;
};
int main()
{
struct comp
{
bool operator()(const PLY &a, const PLY &b) { return a.x != b.x ? a.x < b.x : a.y < b.y; }
};
vector<PLY> v{ {1,1,1}, {1,2,2}, {1,1,2}, {1,3,5}, {1,2,7} };
sort(begin(v), end(v), comp());
vector<bool> ind(v.size(), true);
int s = 0;
for (int i = 1; i < v.size(); ++i)
{
if (v[i].x == v[i - 1].x &&v[i].y == v[i - 1].y)
{
v[s].greyscale += v[i].greyscale;
ind[i] = false;
}
else
{
int d = i - s;
if (d != 1)
{
v[s].greyscale /= d;
}
s = i;
}
}
s = 0;
for (int i = 0; i < v.size(); ++i)
{
if (ind[i])
{
if (s != i)
{
v[s] = v[i];
}
++s;
}
}
v.resize(s);
}
So you need to check, is PLY a1 { 1,1,1 }; duplicates PLY a2 {2,2,1};
So simple method is to override operator == to check a1.x == a2.x and a1.y == a2.y. After you can write own function removeDuplicates(std::vector<PLU>& mPLY); which will use iterators of this vector, compare and remove. But better to use std::list if you want to remove from middle of array too frequently.

how to compare structs

I am having difficulties to set up the comparison correctly.
Here is an example of my problem, where my code wrongly assumes {1,2}={2,1}: http://ideone.com/i7huL
#include <iostream>
#include <map>
using namespace std;
struct myStruct {
int a;
int b;
bool operator<(const myStruct& rhs) const {
return rhs.a < this->a && rhs.b < this->b;
}
};
int main() {
std::map <myStruct, int> mymap ;
myStruct m1={1,2};
myStruct m2={2,1};
mymap.insert(make_pair(m1,3));
std::map<myStruct, int>::iterator it1 = mymap.find(m1);
std::map<myStruct, int>::iterator it2 = mymap.find(m2);
cout << it1->second << it2->second;
// here it1->second=it2->second=3, although I would have expected it2 to be equal to map.end().
}
I could use || instead of &&, but I'm not sure this is the correct way either. I just want to have operator< implemented in such a way that I am able to find objects in my map, without making any errors, as is the case in the code I linked to.
Thanks.
Yes, this operator implementation doesn't make much sense. I'd recommend:
bool operator<(const myStruct& rhs) const {
return rhs.a < this->a || (rhs.a == this->a && rhs.b < this->b);
}
bool operator<(const myStruct& rhs) const {
if (a < rhs.a) return true;
if (a == rhs.a) return b < rhs.b;
return false;
}
If you are looking for a generalization to many data members, there is a great example using C++11 std::tie:
struct S {
int n;
std::string s;
float d;
bool operator<(const S& rhs) const {
return std::tie(n, s, d) < std::tie(rhs.n, rhs.s, rhs.d);
}
};
The problem is that your operator does not define a strict weak ordering. Think through your how your example of {1,2} and {2,1} would go down in your operator. Assume X = {1,2}, and Y = {2,1}.
Is X < Y? Is 1 < 2 AND 2 < 1? No, therefore X is not less than Y.
Is Y < X? Is 2 < 1 AND 1 < 2? No, therefore Y is not less than X.
So, if X is not less than Y, and Y is not less than X, what's left? They're equal.
You need to pick one of the members of your struct, either a or b to be the primary comparison. If the primary comparison results in equality, only then do you check the secondary comparison. Just like when you alphabetize something. First you check the first letter, and only if they are equal do you go on to the next. Hans Passant has provided an example of this.
Here's a more serious problem example for your operator. The one I gave above is not necessarily bad, because maybe you want {1,2} to be considered equal to {2,1}. The fundamental problem crops with a set of values like this: consider X = {1,1}, Y = {1,2}, Z = {2,2}
With your operator, X is definitively less than Z, because 1 is less than 2. But X comes out equal to Y, and Y comes out equal to Z. In order to adhere to strict weak ordering, if X = Y, and Y = Z, then X should equal Z. But here that is not the case.
You asked about generalising to four int members, here's how I would structure such code for maximum clarity.
bool operator<(const myStruct& rhs) const
{
if (a < rhs.a)
return true;
if (a > rhs.a)
return false;
if (b < rhs.b)
return true;
if (b > rhs.b)
return false;
if (c < rhs.c)
return true;
if (c > rhs.c)
return false;
if (d < rhs.d)
return true;
if (d > rhs.d)
return false;
return false;
}
You can easily extend such code for as many data members as you wish.
The simplest solution uses std::tie to compare the tuples.
return std::tie(rhs.a, rhs.b) < std::tie(a, b);
This generalizes very quickly and simply to more data members.
I prefer to write this by comparing elements for equality until two are found that are different:
bool operator<(const myStruct& rhs) const {
if (a != rhs.a)
return a < rhs.a;
if (b != rhs.b)
return b < rhs.b;
return false; // this and rhs are equal.
}
I find this clearer and more extensible than writing a single expression with a mix of || and && (as per #HansPassant), and more compact than #jahhaj's approach of having each passing test lead to a return true; or return false;. Performance is about the same, unless you know something about the distribution of values. There is an argument for avoiding operator==() and just using operator<(), but that only applies if you are trying to write maximally generic template code.
Problem is that you need to know what your structure represents. Otherwise defining a < operator would just become arbitrary. Others won't be able to give you a fitting answer. Take an example where when your structure represents a cartisian coordinate of a point in 2D. In this case you could define a meaningful ordering operator such as the distance from the origin for the structure.
i.e, distance d1 = this->a*this->a + this->b*this->b
distance d2 = rhs.a*rhs.a + rhs.b*rhs.b
if(d1 < d2)
return true;
else
return false;