Sort a vector of structs - c++

I have a vector of structs and I need help with how to sort them according to one of the values, and if those 2 values are the same, then sort it according to another parameter.
This is similar to other questions, but it has more to it.
What I am trying to implement is the scan line based polygon fill algorithm.
I build the active edge list, but then I need to sort it based on the x value in each struct object. If the x values are the same, then they need to be sorted based on the inverse of the slopes for each struct object.
Here is the definition of the struct with the override operator < for normal sorting:
struct Bucket
{
// Fields of a bucket list
int ymax, x, dx, dy, sum;
// Override the < operator, used for sorting based on the x value
bool operator < (const Bucket& var) const
{
// Check if the x values are the same, if so
// sort based on the ivnerse of the slope (dx/dy)
/*if(x == var.x)
return (dx/dy) < (var.dx/var.dy);
else*/
return (x < var.x);
}
};
I commented out the if then else statement because it does compile, but causes a floating point error and the program crashes.
The exact error is: "Floating point exception (core dumped)"
I also tried casting each division to (int) but that did not work either.
My question: Is there a way to do the sort similar to the way I have it, or should I write my own sort method.
If I should make my own sort method, please provide a link or something to a simple method which can help.
Thanks

You should implement double division, because with integers, when you have for example 5/6 it results in 0, and division by 0 is not possible as we know. That's why the program crashes.
SO change the members of the structure to doubles.And then you should take care of some precision issues but at least the program won't crash assuming that you are not allowing 0 value for dy.

You can use tuple which overrides different operators for lexicographic comparisons (http://en.cppreference.com/w/cpp/utility/tuple/operator_cmp)
typedef std::tuple<int, int, int, int, int> Bucket;
But it's a bit annoying to change your struct to a tuple. You can use tie that will make the tuple for you.
bool operator < (const Bucket& var) const
{
std::tie(x, dx/dy) < std::tie(var.x, var.dx/var.dy);
}
However, this solution won't compile because it works with references.
bool operator < (const Bucket& var) const
{
int slope = dx/dy;
int var_slope = var.dx/var.dy;
std::tie(x, slope) < std::tie(var.x, var_slope);
}
It's not the most efficient solution, but readability is quite good.
Of course, you still have the division by 0 in this example.

Related

C++ Hash Table - How is collision for unordered_map with custom data type as keys resolved?

I have defined a class called Point which is to be used as a key inside an unordered_map. So, I have provided an operator== function inside the class and I have also provided a template specialization for std::hash. Based on my research, these are the two things I found necessary. The relevant code is as shown:
class Point
{
int x_cord = {0};
int y_cord = {0};
public:
Point()
{
}
Point(int x, int y):x_cord{x}, y_cord{y}
{
}
int x() const
{
return x_cord;
}
int y() const
{
return y_cord;
}
bool operator==(const Point& pt) const
{
return (x_cord == pt.x() && y_cord == pt.y());
}
};
namespace std
{
template<>
class hash<Point>
{
public:
size_t operator()(const Point& pt) const
{
return (std::hash<int>{}(pt.x()) ^ std::hash<int>{}(pt.y()));
}
};
}
// Inside some function
std::unordered_map<Point, bool> visited;
The program compiled and gave the correct results in the cases that I tested. However, I am not convinced if this is enough when using a user-defined class as key. How does the unordered_map know how to resolve collision in this case? Do I need to add anything to resolve collision?
That's a terrible hash function. But it is legal, so your implementation will work.
The rule (and really the only rule) for Hash and Equals is:
if a == b, then std::hash<value_type>(a) == std::hash<value_type>(b).
(It's also important that both Hash and Equals always produce the same value for the same arguments. I used to think that went without saying, but I've seen several SO questions where unordered_map produced unexpected results precisely because one or both of these functions depended on some external value.)
That would be satisfied by a hash function which always returned 42, in which case the map would get pretty slow as it filled up. But other than the speed issue, the code would work.
std::unordered_map uses a chained hash, not an open-addressed hash. All entries with the same hash value are placed in the same bucket, which is a linked list. So low-quality hashes do not distribute entries very well among the buckets.
It's clear that your hash gives {x, y} and {y, x} the same hash value. More seriously, any collection of points in a small rectangle will share the same small number of different hash values, because the high-order bits of the hash values will all be the same.
Knowing that Point is intended to store coordinates within an image, the best hash function here is:
pt.x() + pt.y() * width
where width is the width of the image.
Considering that x is a value in the range [0, width-1], the above hash function produces a unique number for any valid value of pt. No collisions are possible.
Note that this hash value corresponds to the linear index for the point pt if you store the image as a single memory block. That is, given y is also in a limited range ([0, height-1]), all hash values generated are within the range [0, width* height-1], and all integers in that range can be generated. Thus, consider replacing your hash table with a simple array (i.e. an image). An image is the best data structure to map a pixel location to a value.

C++: Create integer vector of infinities

I'm working on an algorithm and I need to initialize the vector of ints:
std::vector<int> subs(10)
of fixed length with values:
{-inf, +inf, +inf …. }
This is where I read that it is possible to use MAX_INT, but it's not quiete correct because the elements of my vector are supposed to be greater than any possible int value.
I liked overrloading comparison operator method from this answer, but how do you initialize the vector with infinitytype class objects if there are supposed to be an int?
Or maybe you know any better solution?
Thank you.
The solution depends on the assumptions your algorithm (or the implementation of your algorithm) has:
You could increase the element size beyond int (e.g. if your sizeof(int) is 4, use int64_t), and initialize to (int64_t) 1 + std::numeric_limits<int>:max() (and similarly for the negative values). But perhaps your algorithm assumes that you can't "exceed infinity" by adding on multiplying by positive numbers?
You could use an std::variant like other answers suggest, selecting between an int and infinity; but perhaps your algorithm assumes your elements behave like numbers?
You could use a ratio-based "number" class, ensuring it will not get non-integral values except infinity.
You could have your algorithm special-case the maximum and minimum integers
You could use floats or doubles which support -/+ infinity, and restrict them to integrality.
etc.
So, again, it really just depends and there's no one-size-fits-all solution.
AS already said in the comments, you can't have an infinity value stored in int: all values of this type are well-defined and finite.
If you are ok with a vector of something working as an infinite for ints, then consider using a type like this:
struct infinite
{ };
bool operator < (int, infinite)
{
return true;
}
You can use a variant (for example, boost::variant) which supports double dispatching, which stores either an int or an infinitytype (which should store the sign of the infinity, for example in a bool), then implement the comparison operators through a visitor.
But I think it would be simpler if you simply used a double instead of int, and whenever you take out a value that is not infinity, convert it to int. If performance is not that great of an issue, then it will work fine (probably still faster than a variant). If you need great performance, then just use MAX_INT and be done with it.
You are already aware of the idea of an "infinite" type, but that implementation could only contain infinite values. There's another related idea:
struct extended_int {
enum {NEGINF, FINITE, POSINF} type;
int finiteValue; // Only meaningful when type==FINITE
bool operator<(extended_int rhs) {
if (this->type==POSINF) return false;
if (rhs.type==NEGINF) return false;
if (this->type==FINITE && rhs.type==POSINF) return false;
if (this->type==NEGINF && rhs.type==FINITE) return false;
assert(this->type==FINITE && rhs.type==FINITE);
return this->finiteValue < rhs.finiteValue)
}
// Implicitly converting ctor
constexpr extended_int(int value) : type(FINITE), finiteValue(value) { }
// And the two infinities
static constexpr extended_int posinf;
static constexpr extended_int neginf;
}
You now have extended_int(5) < extended_int(6) but also extended_int(5) < extended_int::posinf

Largest Number < x?

In C++, let's say I have a number x of type T which can be an integer or floating point type. I want to find the largest number y of type T for which y < x holds. The solution needs to be templated to work transparently with both integers and floating point numbers. You may ignore the edge case where x is already the smallest number that can be represented in a T.
POSSIBLE USE CASE: This question was marked as too localized, hence I would like to provide a use case which I think is more general. Note that I'm not the original author of the OP.
Consider this structure:
struct lower_bound {
lower_bound(double value, bool open) : value(open? value+0.1 : value) {}
double value;
bool operator()(double x) { return x >= value; }
};
This class simulates an lower bound which can either be open or closed. Of course, in real (pun intended) life we can not do this. The flowing is impossible (or at least quite tricky) to calculate for S being all real numbers.
However, when S is the set of floating point numbers, this is a very valid principle, since we are dealing with essentially a countable set; and then there is no such thing as an open or closed bound. That is, >= can be defined in terms of > like done in the lower_bound class.
For code simplicity I used +0.1 to simulate an open lower bound. Of course, 0.1 is a crude value as there may be values z such that value < z <= value+0.1 or value+0.1 == value in a floating point representation. Hence #brett-hale answer is very useful :)
You may think about another simpler solution:
struct lower_bound {
lower_bound(double value, bool open) : open(open), value(value) {}
bool open;
double value;
bool operator()(double x) { return (open ? x > value : x>=value); }
};
However, this is less efficient as the sizeof(Lower_bound) is larger, and operator() needs to execute a more complicated statement. The first implementation is really efficient, and can also be implemented simply as a double, instead of a structure. Technically, the only reason to use the second implementation is because you assume a double is continuous, whereas it is not and I guess it will not be anywhere in the foreseeable future.
I hope I have created and explained a valid use case, and that I have not offended the original author.
If you have C++11, you could use std::nextafter in <cmath> :
if (std::is_integral<T>::value)
return (x - 1);
else
return std::nextafter(x, - std::numeric_limits<T>::infinity());

Overloading operator[] to start at 1 and performance overhead

I am doing some C++ computational mechanics (don't worry, no physics knowledge required here) and there is something that really bothers me.
Suppose I want to represent a 3D math Vector (nothing to do with std::vector):
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[0] = x;
coordinates[1] = y;
coordinates[2] = z;
}
private:
double coordinates[3];
};
So far so good. Now I can overload operator[] to extract coordinates:
double& Vector::operator[](int i) {
return coordinates[i] ;
}
So I can type:
Vector V;
… //complex computation with V
double x1 = V[0];
V[1] = coord2;
The problem is, indexing from 0 is NOT natural here. I mean, when sorting arrays, I don't mind, but the fact is that the conventionnal notation in every paper, book or whatever is always substripting coordinates beginning with 1.
It may seem a quibble but the fact is that in formulas, it always takes a double-take to understand what we are taking about. Of course, this is much worst with matrices.
One obvious solution is just a slightly different overloading :
double& Vector::operator[](int i) {
return coordinates[i-1] ;
}
so I can type
double x1 = V[1];
V[2] = coord2;
It seems perfect except for one thing: this i-1 subtraction which seems a good candidate for a small overhead. Very small you would say, but I am doing computationnal mechanics, so this is typically something we couldn't afford.
So now (finally) my question: do you think a compiler can optimize this, or is there a way to make it optimize ? (templates, macro, pointer or reference kludge...)
Logically, in
double xi = V[i];
the integer between the bracket being a literal most of the time (except in 3-iteration for loops), inlining operator[] should make it possible, right ?
(sorry for this looong question)
EDIT:
Thanks for all your comments and answers
I kind of disagree with people telling me that we are used to 0-indexed vectors.
From an object-oriented perspective, I see no reason for a math Vector to be 0-indexed because implemented with a 0-indexed array. We're not suppose to care about the underlying implementation. Now, suppose I don't care about performance and use a map to implement Vector class. Then I would find it natural to map '1' with the '1st' coordinate.
That said I tried out with 1-indexed vectors and matrices, and after some code writing, I find it not interacting nicely every time I use an array around. I thougth Vector and containers (std::array,std::vector...) would not interact often (meaning, transfering data between one another), but it seems I was wrong.
Now I have of a solution that I think is less controversial (please give me your opinion) :
Every time I use a Vector in some physical context, I think of using an enum :
enum Coord {
x = 0,
y = 1,
z = 2
};
Vector V;
V[x] = 1;
The only disadvantage I see being that these x,y and z can be redefined without enven a warning...
This one should be measured or verified by looking at the disassembly, but my guess is: The getter function is tiny and its arguments are constant. There is a high chance the compiler will inline the function and constant-fold the subtraction. In that case the runtime cost would be zero.
Why not to try this:
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[1] = x;
coordinates[2] = y;
coordinates[3] = z;
}
private:
double coordinates[4];
};
If you are not instantiating your object in quantities of millions, then the memory waist might be affordable.
Have you actually profiled it or examined the generated code? That's how this question is answered.
If the operator[] implementation is visible then this is likely to be optimized to have zero overhead.
I recommend you define this in the header (.h) for your class. If you define it in the .cpp then the compiler can't optimize as much. Also, your index should not be an "int" which can have negative values... make it a size_t:
class Vector {
// ...
public:
double& operator[](const size_t i) {
return coordinates[i-1] ;
}
};
You cannot say anything objective about performance without benchmarking. On x86, this subtraction can be compiled using relative addressing, which is very cheap. If operator[] is inlined, then the overhead is zero—you can encourage this with inline or with compiler-specific instructions such as GCC’s __attribute__((always_inline)).
If you must guarantee it, and the offset is a compile-time constant, then using a template is the way to go:
template<size_t I>
double& Vector::get() {
return coordinates[i - 1];
}
double x = v.get<1>();
For all practical purposes, this is guaranteed to have zero overhead thanks to constant-folding. You could also use named accessors:
double Vector::x() const { return coordinates[0]; }
double Vector::y() const { return coordinates[1]; }
double Vector::z() const { return coordinates[2]; }
double& Vector::x() { return coordinates[0]; }
double& Vector::y() { return coordinates[1]; }
double& Vector::z() { return coordinates[2]; }
And for loops, iterators:
const double* Vector::begin() const { return coordinates; }
const double* Vector::end() const { return coordinates + 3; }
double* Vector::begin() { return coordinates; }
double* Vector::end() { return coordinates + 3; }
// (x, y, z) -> (x + 1, y + 1, z + 1)
for (auto& i : v) ++i;
Like many of the others here, however, I disagree with the premise of your question. You really should simply use 0-based indexing, as it is more natural in the realm of C++. The language is already very complex, and you need not complicate things further for those who will maintain your code in the future.
Seriously, benchmark this all three ways (ie, compare the subtraction and the double[4] methods to just using zero-based indices in the caller).
It's entirely possible you'll get a huge win from forcing 16-byte alignment on some cache architectures, and equally possible the subtraction is effectively free on some compiler/instruction set/code path combinations.
The only way to tell is to benchmark realistic code.

Floating point keys in std:map

The following code is supposed to find the key 3.0in a std::map which exists. But due to floating point precision it won't be found.
map<double, double> mymap;
mymap[3.0] = 1.0;
double t = 0.0;
for(int i = 0; i < 31; i++)
{
t += 0.1;
bool contains = (mymap.count(t) > 0);
}
In the above example, contains will always be false.
My current workaround is just multiply t by 0.1 instead of adding 0.1, like this:
for(int i = 0; i < 31; i++)
{
t = 0.1 * i;
bool contains = (mymap.count(t) > 0);
}
Now the question:
Is there a way to introduce a fuzzyCompare to the std::map if I use double keys?
The common solution for floating point number comparison is usually something like a-b < epsilon. But I don't see a straightforward way to do this with std::map.
Do I really have to encapsulate the double type in a class and overwrite operator<(...) to implement this functionality?
So there are a few issues with using doubles as keys in a std::map.
First, NaN, which compares less than itself is a problem. If there is any chance of NaN being inserted, use this:
struct safe_double_less {
bool operator()(double left, double right) const {
bool leftNaN = std::isnan(left);
bool rightNaN = std::isnan(right);
if (leftNaN != rightNaN)
return leftNaN<rightNaN;
return left<right;
}
};
but that may be overly paranoid. Do not, I repeat do not, include an epsilon threshold in your comparison operator you pass to a std::set or the like: this will violate the ordering requirements of the container, and result in unpredictable undefined behavior.
(I placed NaN as greater than all doubles, including +inf, in my ordering, for no good reason. Less than all doubles would also work).
So either use the default operator<, or the above safe_double_less, or something similar.
Next, I would advise using a std::multimap or std::multiset, because you should be expecting multiple values for each lookup. You might as well make content management an everyday thing, instead of a corner case, to increase the test coverage of your code. (I would rarely recommend these containers) Plus this blocks operator[], which is not advised to be used when you are using floating point keys.
The point where you want to use an epsilon is when you query the container. Instead of using the direct interface, create a helper function like this:
// works on both `const` and non-`const` associative containers:
template<class Container>
auto my_equal_range( Container&& container, double target, double epsilon = 0.00001 )
-> decltype( container.equal_range(target) )
{
auto lower = container.lower_bound( target-epsilon );
auto upper = container.upper_bound( target+epsilon );
return std::make_pair(lower, upper);
}
which works on both std::map and std::set (and multi versions).
(In a more modern code base, I'd expect a range<?> object that is a better thing to return from an equal_range function. But for now, I'll make it compatible with equal_range).
This finds a range of things whose keys are "sufficiently close" to the one you are asking for, while the container maintains its ordering guarantees internally and doesn't execute undefined behavior.
To test for existence of a key, do this:
template<typename Container>
bool key_exists( Container const& container, double target, double epsilon = 0.00001 ) {
auto range = my_equal_range(container, target, epsilon);
return range.first != range.second;
}
and if you want to delete/replace entries, you should deal with the possibility that there might be more than one entry hit.
The shorter answer is "don't use floating point values as keys for std::set and std::map", because it is a bit of a hassle.
If you do use floating point keys for std::set or std::map, almost certainly never do a .find or a [] on them, as that is highly highly likely to be a source of bugs. You can use it for an automatically sorted collection of stuff, so long as exact order doesn't matter (ie, that one particular 1.0 is ahead or behind or exactly on the same spot as another 1.0). Even then, I'd go with a multimap/multiset, as relying on collisions or lack thereof is not something I'd rely upon.
Reasoning about the exact value of IEEE floating point values is difficult, and fragility of code relying on it is common.
Here's a simplified example of how using soft-compare (aka epsilon or almost equal) can lead to problems.
Let epsilon = 2 for simplicity. Put 1 and 4 into your map. It now might look like this:
1
\
4
So 1 is the tree root.
Now put in the numbers 2, 3, 4 in that order. Each will replace the root, because it compares equal to it. So then you have
4
\
4
which is already broken. (Assume no attempt to rebalance the tree is made.) We can keep going with 5, 6, 7:
7
\
4
and this is even more broken, because now if we ask whether 4 is in there, it will say "no", and if we ask for an iterator for values less than 7, it won't include 4.
Though I must say that I've used maps based on this flawed fuzzy compare operator numerous times in the past, and whenever I digged up a bug, it was never due to this. This is because datasets in my application areas never actually amount to stress-testing this problem.
As Naszta says, you can implement your own comparison function. What he leaves out is the key to making it work - you must make sure that the function always returns false for any values that are within your tolerance for equivalence.
return (abs(left - right) > epsilon) && (left < right);
Edit: as pointed out in many comments to this answer and others, there is a possibility for this to turn out badly if the values you feed it are arbitrarily distributed, because you can't guarantee that !(a<b) and !(b<c) results in !(a<c). This would not be a problem in the question as asked, because the numbers in question are clustered around 0.1 increments; as long as your epsilon is large enough to account for all possible rounding errors but is less than 0.05, it will be reliable. It is vitally important that the keys to the map are never closer than 2*epsilon apart.
You could implement own compare function.
#include <functional>
class own_double_less : public std::binary_function<double,double,bool>
{
public:
own_double_less( double arg_ = 1e-7 ) : epsilon(arg_) {}
bool operator()( const double &left, const double &right ) const
{
// you can choose other way to make decision
// (The original version is: return left < right;)
return (abs(left - right) > epsilon) && (left < right);
}
double epsilon;
};
// your map:
map<double,double,own_double_less> mymap;
Updated: see Item 40 in Effective STL!
Updated based on suggestions.
Using doubles as keys is not useful. As soon as you make any arithmetic on the keys you are not sure what exact values they have and hence cannot use them for indexing the map. The only sensible usage would be that the keys are constant.