Floating point keys in std:map - c++

The following code is supposed to find the key 3.0in a std::map which exists. But due to floating point precision it won't be found.
map<double, double> mymap;
mymap[3.0] = 1.0;
double t = 0.0;
for(int i = 0; i < 31; i++)
{
t += 0.1;
bool contains = (mymap.count(t) > 0);
}
In the above example, contains will always be false.
My current workaround is just multiply t by 0.1 instead of adding 0.1, like this:
for(int i = 0; i < 31; i++)
{
t = 0.1 * i;
bool contains = (mymap.count(t) > 0);
}
Now the question:
Is there a way to introduce a fuzzyCompare to the std::map if I use double keys?
The common solution for floating point number comparison is usually something like a-b < epsilon. But I don't see a straightforward way to do this with std::map.
Do I really have to encapsulate the double type in a class and overwrite operator<(...) to implement this functionality?

So there are a few issues with using doubles as keys in a std::map.
First, NaN, which compares less than itself is a problem. If there is any chance of NaN being inserted, use this:
struct safe_double_less {
bool operator()(double left, double right) const {
bool leftNaN = std::isnan(left);
bool rightNaN = std::isnan(right);
if (leftNaN != rightNaN)
return leftNaN<rightNaN;
return left<right;
}
};
but that may be overly paranoid. Do not, I repeat do not, include an epsilon threshold in your comparison operator you pass to a std::set or the like: this will violate the ordering requirements of the container, and result in unpredictable undefined behavior.
(I placed NaN as greater than all doubles, including +inf, in my ordering, for no good reason. Less than all doubles would also work).
So either use the default operator<, or the above safe_double_less, or something similar.
Next, I would advise using a std::multimap or std::multiset, because you should be expecting multiple values for each lookup. You might as well make content management an everyday thing, instead of a corner case, to increase the test coverage of your code. (I would rarely recommend these containers) Plus this blocks operator[], which is not advised to be used when you are using floating point keys.
The point where you want to use an epsilon is when you query the container. Instead of using the direct interface, create a helper function like this:
// works on both `const` and non-`const` associative containers:
template<class Container>
auto my_equal_range( Container&& container, double target, double epsilon = 0.00001 )
-> decltype( container.equal_range(target) )
{
auto lower = container.lower_bound( target-epsilon );
auto upper = container.upper_bound( target+epsilon );
return std::make_pair(lower, upper);
}
which works on both std::map and std::set (and multi versions).
(In a more modern code base, I'd expect a range<?> object that is a better thing to return from an equal_range function. But for now, I'll make it compatible with equal_range).
This finds a range of things whose keys are "sufficiently close" to the one you are asking for, while the container maintains its ordering guarantees internally and doesn't execute undefined behavior.
To test for existence of a key, do this:
template<typename Container>
bool key_exists( Container const& container, double target, double epsilon = 0.00001 ) {
auto range = my_equal_range(container, target, epsilon);
return range.first != range.second;
}
and if you want to delete/replace entries, you should deal with the possibility that there might be more than one entry hit.
The shorter answer is "don't use floating point values as keys for std::set and std::map", because it is a bit of a hassle.
If you do use floating point keys for std::set or std::map, almost certainly never do a .find or a [] on them, as that is highly highly likely to be a source of bugs. You can use it for an automatically sorted collection of stuff, so long as exact order doesn't matter (ie, that one particular 1.0 is ahead or behind or exactly on the same spot as another 1.0). Even then, I'd go with a multimap/multiset, as relying on collisions or lack thereof is not something I'd rely upon.
Reasoning about the exact value of IEEE floating point values is difficult, and fragility of code relying on it is common.

Here's a simplified example of how using soft-compare (aka epsilon or almost equal) can lead to problems.
Let epsilon = 2 for simplicity. Put 1 and 4 into your map. It now might look like this:
1
\
4
So 1 is the tree root.
Now put in the numbers 2, 3, 4 in that order. Each will replace the root, because it compares equal to it. So then you have
4
\
4
which is already broken. (Assume no attempt to rebalance the tree is made.) We can keep going with 5, 6, 7:
7
\
4
and this is even more broken, because now if we ask whether 4 is in there, it will say "no", and if we ask for an iterator for values less than 7, it won't include 4.
Though I must say that I've used maps based on this flawed fuzzy compare operator numerous times in the past, and whenever I digged up a bug, it was never due to this. This is because datasets in my application areas never actually amount to stress-testing this problem.

As Naszta says, you can implement your own comparison function. What he leaves out is the key to making it work - you must make sure that the function always returns false for any values that are within your tolerance for equivalence.
return (abs(left - right) > epsilon) && (left < right);
Edit: as pointed out in many comments to this answer and others, there is a possibility for this to turn out badly if the values you feed it are arbitrarily distributed, because you can't guarantee that !(a<b) and !(b<c) results in !(a<c). This would not be a problem in the question as asked, because the numbers in question are clustered around 0.1 increments; as long as your epsilon is large enough to account for all possible rounding errors but is less than 0.05, it will be reliable. It is vitally important that the keys to the map are never closer than 2*epsilon apart.

You could implement own compare function.
#include <functional>
class own_double_less : public std::binary_function<double,double,bool>
{
public:
own_double_less( double arg_ = 1e-7 ) : epsilon(arg_) {}
bool operator()( const double &left, const double &right ) const
{
// you can choose other way to make decision
// (The original version is: return left < right;)
return (abs(left - right) > epsilon) && (left < right);
}
double epsilon;
};
// your map:
map<double,double,own_double_less> mymap;
Updated: see Item 40 in Effective STL!
Updated based on suggestions.

Using doubles as keys is not useful. As soon as you make any arithmetic on the keys you are not sure what exact values they have and hence cannot use them for indexing the map. The only sensible usage would be that the keys are constant.

Related

std map constructing strict weak order and finding lower bound

I have some kind of real life walls which are characterized by two heights (leftmost and rightmost height).
e.g.
I I I I
I I I I I
I I I I I I
I_____I I_____I I____I
the first one has leftmost height hl=4, rightmost height hr=3, the second hl=4 and hr=4 and so on.
Given hl and hr my task is now to find the wall loosing minimal volume in order to reach hl and hr. So (a) only lowering the heights on either side of the wall is allowed but not increasing and (b) the lost volume should be minimal.
In a first approach I've reduced the problem to one height using the minimal height hMin=std::min(hl,hr). By doing so I can fill a map with the "walls" and use hMin as keys and in return get the solution using lower_bound searching with max(hl,hr).
Now considering the two heights optimal solution I'm getting into all sorts of trouble constructing a strict weak order. What I have tried until now is to extend the key for a 2nd height, use a custom less and use equivivalently lower_bound.
My custom less looks somewhat like:
struct KeyLess
{
bool operator(Key const&x, Key const&y) const
{
if ((x.hl + roundOff < y.hl ) && (x.hr+ roundOff < y.hr))
return true;
if ((y.hl + roundOff < x.hl ) && (y.hr+ roundOff < x.hr))
return false;
return false;
}
};
but obviously has problems for x.hl> y.hl and x.hr < y.hr or visually for these types
I I
I I I I
I I I I I I
I_____I I_____I I____I
and does not give a strict weak ordering afaik.
I would appreciate any help constructing a less operator for my problem or showing me another way of finding a solution to this problem.
Example
I I I I I I
I I I I I I I I
I I I I I I I I
I I I I I I I I I
I_____I I_____I I____I I_____I I_____I
Given hl=3 and hr=5 it should return the 3rd (hl=4 and hr=5).
The order the walls are saved in the map is not per se relevant as long as I can get to the solution (But I think that is also my problem to find a meaningful ordering here).
I think you want
struct KeyLess
{
bool operator(Key const&x, Key const&y) const
{
return std::pair(std::abs(x.hl - x.hr), std::min(x.hl, x.hr))
< std::pair(std::abs(y.hl - y.hr), std::min(y.hl, y.hr));
}
};
I.e. ordering first by the difference in heights, then by the smaller height. If you still need to distinguish
I I
I I
I I I I
I_____I I_____I
Then you can extend that by arbitrarily choosing the first as less than the second
struct KeyLess
{
bool operator(Key const&x, Key const&y) const
{
return std::tuple(std::abs(x.hl - x.hr), std::min(x.hl, x.hr), x.hl < x.hr)
< std::tuple(std::abs(y.hl - y.hr), std::min(y.hl, y.hr), y.hl < y.hr);
}
};
You have a map (or set) which requires strict weak ordering, but you don't care about the order.
You can simply use unordered_map (or unordered_set) and not have to worry about it.
Or you can create a strict weak ordering. Fortunately this is already available in the standard library using std::tie from #include <tuple>
bool Key::operator<(const Key &rhs) const {
return std::tie(hl, hr) < std::tie(rhs.hl, rhs.hr);
}
But of course, if the order actually DOES matter, but has its own meaning and can be arbitrarily changed, then you should use std::vector. Let the algorithm put things where they need to be.

C++: Create integer vector of infinities

I'm working on an algorithm and I need to initialize the vector of ints:
std::vector<int> subs(10)
of fixed length with values:
{-inf, +inf, +inf …. }
This is where I read that it is possible to use MAX_INT, but it's not quiete correct because the elements of my vector are supposed to be greater than any possible int value.
I liked overrloading comparison operator method from this answer, but how do you initialize the vector with infinitytype class objects if there are supposed to be an int?
Or maybe you know any better solution?
Thank you.
The solution depends on the assumptions your algorithm (or the implementation of your algorithm) has:
You could increase the element size beyond int (e.g. if your sizeof(int) is 4, use int64_t), and initialize to (int64_t) 1 + std::numeric_limits<int>:max() (and similarly for the negative values). But perhaps your algorithm assumes that you can't "exceed infinity" by adding on multiplying by positive numbers?
You could use an std::variant like other answers suggest, selecting between an int and infinity; but perhaps your algorithm assumes your elements behave like numbers?
You could use a ratio-based "number" class, ensuring it will not get non-integral values except infinity.
You could have your algorithm special-case the maximum and minimum integers
You could use floats or doubles which support -/+ infinity, and restrict them to integrality.
etc.
So, again, it really just depends and there's no one-size-fits-all solution.
AS already said in the comments, you can't have an infinity value stored in int: all values of this type are well-defined and finite.
If you are ok with a vector of something working as an infinite for ints, then consider using a type like this:
struct infinite
{ };
bool operator < (int, infinite)
{
return true;
}
You can use a variant (for example, boost::variant) which supports double dispatching, which stores either an int or an infinitytype (which should store the sign of the infinity, for example in a bool), then implement the comparison operators through a visitor.
But I think it would be simpler if you simply used a double instead of int, and whenever you take out a value that is not infinity, convert it to int. If performance is not that great of an issue, then it will work fine (probably still faster than a variant). If you need great performance, then just use MAX_INT and be done with it.
You are already aware of the idea of an "infinite" type, but that implementation could only contain infinite values. There's another related idea:
struct extended_int {
enum {NEGINF, FINITE, POSINF} type;
int finiteValue; // Only meaningful when type==FINITE
bool operator<(extended_int rhs) {
if (this->type==POSINF) return false;
if (rhs.type==NEGINF) return false;
if (this->type==FINITE && rhs.type==POSINF) return false;
if (this->type==NEGINF && rhs.type==FINITE) return false;
assert(this->type==FINITE && rhs.type==FINITE);
return this->finiteValue < rhs.finiteValue)
}
// Implicitly converting ctor
constexpr extended_int(int value) : type(FINITE), finiteValue(value) { }
// And the two infinities
static constexpr extended_int posinf;
static constexpr extended_int neginf;
}
You now have extended_int(5) < extended_int(6) but also extended_int(5) < extended_int::posinf

Tolerant key lookup in std::map

Requirements:
container which sorts itself based on numerically comparing the keys (e.g. std::map)
check existence of key based on float tolerance (e.g. map.find() and use custom comparator )
and the tricky one: the float tolerance used by the comparator may be changed by the user at runtime!
The first 2 can be accomplished using a map with a custom comparator:
struct floatCompare : public std::binary_function<float,float,bool>
{
bool operator()( const float &left, const float &right ) const
{
return (fabs(left - right) > 1e-3) && (left < right);
}
};
typedef std::map< float, float, floatCompare > floatMap;
Using this implementation, floatMap.find( 15.0001 ) will find 15.0 in the map.
However, let's say the user doesn't want a float tolerance of 1e-3.
What is the easiest way to make this comparator function use a variable tolerance at runtime? I don't mind re-creating and re-sorting the map based on the new comparator each time epsilon is updated.
Other posts on modification after initialization here and using floats as keys here didn't provide a complete solution.
You can't change the ordering of the map after it's created (and you should just use plain old operator< even for the floating point type here), and you can't even use a "tolerant" comparison operator as that may vioate the required strict-weak-ordering for map to maintain its state.
However you can do the tolerant search with lower_bound and upper_bound. The gist is that you would create a wrapper function much like equal_range that does a lower_bound for "value - tolerance" and then an upper_bound for "value + tolerance" and see if it creates a non-empty range of values that match the criteria.
You cannot change the definition of how elements are ordered in a map once it's been instantiated. If you were to find some technical hack to do so (such as implementing a custom comparator that takes a tolerance that can change at runtime), it would evoke Undefined Behavior.
Your main alternative to changing the ordering is to create another map with a different ordering scheme. This other map could be an indexing map, where the keys are ordered in a different way, and the values arent the elements themselves, but an index in to the main map.
Alternatively maybe what you're really trying to do isn't change the ordering, but maintain the ordering and change the search parameters.
That you can do, and there are a few ways to do it.
One is to simply use map::lower_bound -- once with the lower bound of your tolerance, and once with the upper bound of your tolerance, just past the end of tolerance. For example, if you want to find 15.0 with a tolerance of 1e-5. You could lower_bound with 14.99995 and then again with 15.00005 (my math might be off here) to find the elements in that range.
Another is to use std::find_if with a custom functor, lambda, or std::function. You could declare the functor in such a way as to take the tolerance and the value at construction, and perform the check in operator().
Since this is a homework question, I'll leave the fiddly details of actually implementing all this up to you. :)
Rather than using a comparator with tolerance, which is going to fail in subtle ways, just use a consistent key that is derived from the floating point value. Make your floating point values consistent using rounding.
inline double key(double d)
{
return floor(d * 1000.0 + 0.5);
}
You can't achieve that with a simple custom comparator, even if it was possible to change it after the definition, or when resorting using a new comparator. The fact is: a "tolerant comparator" is not really a comparator. For three values, it's possible that a < c (difference is large enough) but neither a < b nor b < c (both difference too small). Example: a = 5.0, b = 5.5, c = 6.0, tolerance = 0.6
What you should do instead is to use default sorting using operator< for floats, i.e. simply don't provide any custom comparator. Then, for the lookup don't use find but rather lower_bound and upper_bound with modified values according to the tolerance. These two function calls will give you two iterators which define the sequence which will be accepted using this tolerance. If this sequence is empty, the key was not found, obviously.
You then might want to get the key which is closest to the value to be searched for. If this is true, you should then find the min_element of this subsequence, using a comparator which will consider the difference between the key and the value to be searched.
template<typename Map, typename K>
auto tolerant_find(const Map & map, const K & lookup, const K & tolerance) -> decltype(map.begin()) {
// First, find sub-sequence of keys "near" the lookup value
auto first = map.lower_bound(lookup - tolerance);
auto last = map.upper_bound(lookup + tolerance);
// If they are equal, the sequence is empty, and thus no entry was found.
// Return the end iterator to be consistent with std::find.
if (first == last) {
return map.end();
}
// Then, find the one with the minimum distance to the actual lookup value
typedef typename Map::mapped_type T;
return std::min_element(first, last, [lookup](std::pair<K,T> a, std::pair<K,T> b) {
return std::abs(a.first - lookup) < std::abs(b.first - lookup);
});
}
Demo: http://ideone.com/qT3JIa
It may be better to leave the std::map class alone (well, partly at least), and just write your own class which implements the three methods you mentioned.
template<typename T>
class myMap{
private:
float tolerance;
std::map<float,T> storage;
public:
void setTolerance(float t){tolerance=t;};
std::map<float,T>::iterator find(float val); // ex. same as you provided, just change 1e-3 for tolerance
/* other methods go here */
};
That being said, I don't think you need to recreate the container and sort it depending on the tolerance.
check existence of key based on float tolerance
merely means you have to check if an element exists. The position of the elements inside the map shouldn't change. You could start the search from val-tolerance, and when you find an element (the function find returns an iterator), get the next elements untill you reach the end of the map or untill their values exceed val+tolerance.
That basically means that the behavior of the insert/add/[]/whatever functions isn't based on the tolerance, so there's no real problem of storing the values.
If you're afraid the elements will be too close to eachother, you may want to start the searching from val, and then gradually increase the toleration untill it reaches the user desired one.

Filling unordered_set is too slow

We have a given 3D-mesh and we are trying to eliminate identical vertexes. For this we are using a self defined struct containing the coordinates of a vertex and the corresponding normal.
struct vertice
{
float p1,p2,p3,n1,n2,n3;
bool operator == (const vertice& vert) const
{
return (p1 == vert.p1 && p2 == vert.p2 && p3 == vert.p3);
}
};
After filling the vertex with data, it is added to an unordered_set to remove the duplicates.
struct hashVertice
{
size_t operator () (const vertice& vert) const
{
return(7*vert.p1 + 13*vert.p2 + 11*vert.p3);
}
};
std::unordered_set<vertice,hashVertice> verticesSet;
vertice vert;
while(i<(scene->mMeshes[0]->mNumVertices)){
vert.p1 = (float)scene->mMeshes[0]->mVertices[i].x;
vert.p2 = (float)scene->mMeshes[0]->mVertices[i].y;
vert.p3 = (float)scene->mMeshes[0]->mVertices[i].z;
vert.n1 = (float)scene->mMeshes[0]->mNormals[i].x;
vert.n2 = (float)scene->mMeshes[0]->mNormals[i].y;
vert.n3 = (float)scene->mMeshes[0]->mNormals[i].z;
verticesSet.insert(vert);
i = i+1;
}
We discovered that it is too slow for data amounts like 3.000.000 vertexes. Even after 15 minutes of running the program wasn't finished. Is there a bottleneck we don't see or is another data structure better for such a task?
What happens if you just remove verticesSet.insert(vert); from the loop?
If it speeds-up dramatically (as I expect it would), your bottleneck is in the guts of the std::unordered_set, which is a hash-table, and the main potential performance problem with hash tables is when there are excessive hash collisions.
In your current implementation, if p1, p2 and p3 are small, the number of distinct hash codes will be small (since you "collapse" float to integer) and there will be lots of collisions.
If the above assumptions turn out to be true, I'd try to implement the hash function differently (e.g. multiply with much larger coefficients).
Other than that, profile your code, as others have already suggested.
Hashing floating point can be tricky. In particular, your hash
routine calculates the hash as a floating point value, then
converts it to an unsigned integral type. This has serious
problems if the vertices can be small: if all of the vertices
are in the range [0...1.0), for example, your hash function
will never return anything greater than 13. As an unsigned
integer, which means that there will be at most 13 different
hash codes.
The usual way to hash floating point is to hash the binary
image, checking for the special cases first. (0.0 and -0.0
have different binary images, but must hash the same. And it's
an open question what you do with NaNs.) For float this is
particularly simple, since it usually has the same size as
int, and you can reinterpret_cast:
size_t
hash( float f )
{
assert( /* not a NaN */ );
return f == 0.0 ? 0.0 : reinterpret_cast( unsigned& )( f );
}
I know, formally, this is undefined behavior. But if float and
int have the same size, and unsigned has no trapping
representations (the case on most general purpose machines
today), then a compiler which gets this wrong is being
intentionally obtuse.
You then use any combining algorithm to merge the three results;
the one you use is as good as any other (in this case—it's
not a good generic algorithm).
I might add that while some of the comments insist on profiling
(and this is generally good advice), if you're taking 15 minutes
for 3 million values, the problem can really only be a poor hash
function, which results in lots of collisions. Nothing else will
cause that bad of performance. And unless you're familiar with
the internal implementation of std::unordered_set, the usual
profiler output will probably not give you much information.
On the other hand, std::unordered_set does have functions
like bucket_count and bucket_size, which allow analysing
the quality of the hash function. In your case, if you cannot
create an unordered_set with 3 million entries, your first
step should be to create a much smaller one, and use these
functions to evaluate the quality of your hash code.
If there is a bottleneck, you are definitely not seeing it, because you don't include any kind of timing measures.
Measure the timing of your algorithm, either with a profiler or just manually. This will let you find the bottleneck - if there is one.
This is the correct way to proceed. Expecting yourself, or alternatively, StackOverflow users to spot bottlenecks by eye inspection instead of actually measuring time in your program is, from my experience, the most common cause of failed attempts at optimization.

Largest Number < x?

In C++, let's say I have a number x of type T which can be an integer or floating point type. I want to find the largest number y of type T for which y < x holds. The solution needs to be templated to work transparently with both integers and floating point numbers. You may ignore the edge case where x is already the smallest number that can be represented in a T.
POSSIBLE USE CASE: This question was marked as too localized, hence I would like to provide a use case which I think is more general. Note that I'm not the original author of the OP.
Consider this structure:
struct lower_bound {
lower_bound(double value, bool open) : value(open? value+0.1 : value) {}
double value;
bool operator()(double x) { return x >= value; }
};
This class simulates an lower bound which can either be open or closed. Of course, in real (pun intended) life we can not do this. The flowing is impossible (or at least quite tricky) to calculate for S being all real numbers.
However, when S is the set of floating point numbers, this is a very valid principle, since we are dealing with essentially a countable set; and then there is no such thing as an open or closed bound. That is, >= can be defined in terms of > like done in the lower_bound class.
For code simplicity I used +0.1 to simulate an open lower bound. Of course, 0.1 is a crude value as there may be values z such that value < z <= value+0.1 or value+0.1 == value in a floating point representation. Hence #brett-hale answer is very useful :)
You may think about another simpler solution:
struct lower_bound {
lower_bound(double value, bool open) : open(open), value(value) {}
bool open;
double value;
bool operator()(double x) { return (open ? x > value : x>=value); }
};
However, this is less efficient as the sizeof(Lower_bound) is larger, and operator() needs to execute a more complicated statement. The first implementation is really efficient, and can also be implemented simply as a double, instead of a structure. Technically, the only reason to use the second implementation is because you assume a double is continuous, whereas it is not and I guess it will not be anywhere in the foreseeable future.
I hope I have created and explained a valid use case, and that I have not offended the original author.
If you have C++11, you could use std::nextafter in <cmath> :
if (std::is_integral<T>::value)
return (x - 1);
else
return std::nextafter(x, - std::numeric_limits<T>::infinity());