How does STL map know, that map contains a given element? - c++

In question Using char as key in stdmap there is advised to use custom compare function/functor:
struct cmp_str
{
bool operator()(char const *a, char const *b)
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
This allows map to detect if key A is less than key B. But for example map<>::find() returns end if element is not found, and iterator to it if it is found. So map knows about equivalence, not only less-than. How?

The equality condition for two keys a and b are that a<b and b<a are both false. The map itself is commonly implemented as a balanced binary tree*, so the less-than comparison is used to traverse the map from the root node until the matching element is found. When searching for a key k, less-than comparison is used until the first element for which the comparison is false is found. If the inverse comparison is also false, k has been found. Otherwise, k is not in the map. The map only uses the less-than comparison to this purpose.
Note also that std::set uses exactly the same mechanism, the only difference being that each element is it's own key.
* strictly speaking, the C++ standard does not specify that std::map be a balanced binary tree, but the complexity constraints it places on operations such as insertion and look-up mean that implementations chose structures such as red-black tree.

Equivalence/operator== can be expressed as a function of operator<:
bool operator==(T left, T right) {
return !(left < right) && !(right < left);
}

This is because the comparator of the map must implement a strict weak ordering, such as <.
One of the mathematical properties of such a relation is Antisymmetry, which states that for any x and y, then not (x < y) and not (y < x) implies x == y.
Therefore, after finding the first element not to compare smaller than the key you are searching for, the implementation simply checks if that element compares greater, and it's neither smaller nor greater, then it must be equal.

Related

Comparator for matching point in a range

I need to create a std::set of ranges for finding matching points in these ranges. Each range is defined as follows:
struct Range {
uint32_t start;
uint32_t end;
uint32_t pr;
};
In this structure start/end pair identify each range. pr identifies the priority of that range. It means if a single point falls into 2 different ranges, I like to return range with smaller pr. I like to create a std::set with a transparent comparator to match points like this:
struct RangeComparator {
bool operator()(const Range& l, const Range& r) const {
if (l.end < r.start)
return true;
if (l.end < r.end && l.pr >= r.pr)
return true;
return false;
}
bool operator()(const Range& l, uint32_t p) const {
if (p < l.start)
return true;
return false;
}
bool operator()(uint32_t p, const Range& r) const {
if (p < r.start)
return true;
return false;
}
using is_transparent = int;
};
std::set<Range, RangeComparator> ranges;
ranges.emplace(100,250,1);
ranges.emplace(200,350,2);
auto v1 = ranges.find(110); // <-- return range 1
auto v2 = ranges.find(210); // <-- return range 1 because pr range 1 is less
auto v3 = ranges.find(260); // <-- return range 2
I know my comparators are wrong. I wonder how I can write these 3 comparators to answer these queries correctly? Is it possible at all?
find returns an element that compares equivalent to the argument. Equivalent means that it compares neither larger nor smaller in the strict weak ordering provided to the std::set.
Therefore, to make your use case work, you want all points in a range to compare equivalent to the range.
If two ranges overlap, then the points shared by the two ranges need to compare equivalent to both ranges. The priority doesn't matter for this, since the equivalence should presumably hold if only one of the ranges is present.
However, one of the defining properties of a strict weak ordering is that the property of comparing equivalent is transitive. Therefore in this ordering the two ranges must then also compare equal in order to satisfy the requirements of std::set.
Therefore, as long as the possible ranges are not completely separated, the only valid strict weak ordering is the one that compares all ranges and points equivalent.
This is however not an order that would give you what you want.
This analysis holds for all standard library associative containers, since they have the same requirements on the ordering.

Key already exists in unordered_map, but "find" returns as not found

I constructed an unordered_map using key type rot3d, which is defined below:
#ifndef EPS6
#define EPS6 1.0e-6
#endif
struct rot3d
{
double agl[3]; // alpha, beta, gamma in ascending order
bool operator==(const rot3d &other) const
{
// printf("== used\n");
return abs(agl[0]-other.agl[0]) <= EPS6 && abs(agl[1]-other.agl[1]) <= EPS6 && abs(agl[2]-other.agl[2]) <= EPS6;
}
};
Equality of rot3d is defined by the condition that each component is within a small range of the same component from the other rot3d object.
Then I defined a value type RotMat:
struct RotMat // rotation matrix described by a pointer to matrix and trunction number
{
cuDoubleComplex *mat = NULL;
int p = 0;
};
In the end, I defined a hash table from rot3d to RotMat using self-defined hash function:
struct rot3dHasher
{
std::size_t operator()(const rot3d& key) const
{
using std::hash;
return (hash<double>()(key.agl[0]) ^ (hash<double>()(key.agl[1]) << 1) >> 1) ^ (hash<double>()(key.agl[2]) << 1);
}
};
typedef std::unordered_map<rot3d,RotMat,rot3dHasher> HashRot2Mat;
The problem I met was, a key was printed to be in the hash table, but the function "find" didn't find it. For instance, I printed a key using an iterator of the hash table:
Key: (3.1415926535897931,2.8198420991931510,0.0000000000000000)
But then I also got this information indicating that the key was not found:
(3.1415926535897931,2.8198420991931505,0.0000000000000000) not found in the hash table.
Although the two keys are not 100% the same, the definition of "==" should ensure them to be equal. So why am I seeing this key in the hash table, but it was not found by "find"?
Hash-based equivalence comparisons are allowed to have false positives, which are resolved by calling operator==.
Hash-based equivalence comparisons are not allowed to have false negatives, but yours does. Your two "not 100% the same" keys have different hash values, so the element is not even found as a candidate for testing using operator==.
It is necessary that (a == b) implies (hash(a) == hash(b)) and your definitions break this precondition. A hashtable with a broken precondition can misbehave in many ways, including not finding the item you are looking for.
Use a different data structure that is not dependent on hashing, but nearest-neighbor matching. An octtree would be a smart choice.
Equality of rot3d is defined by the condition that each component is within a small range of the same component from the other rot3d object.
This is not an equivalence. You must have that a==b and b==c implies a==c. Yours fails this requirement.
Using a non-equality in a std algorithm or container breaks the std preconditions, which means your program is ill-formed, no diagnostic required.
Also your hash hashes equivalent values differently. Also illegal.
One way to fix this is to build buckets. Each bucket has a size of your epsilon.
To find if a value is in your buckets, check the bucket you'd put the probe value in, plus all adjacent buckets (3^3 or 27 of them).
For each element, double check distance.
struct bucket; // array of 3 doubles, each a multiple of EPS6. Has == and hash. Also construct-from-rod3d that rounds.
bucket get_bucket(rot3d);
Now, odds are that you are just caching. And within EPS-ish is good enough.
template<class T, class B>
struct adapt:T{
template<class...Args>
auto operator()(Args&&...args)const{
return T::operator()( static_cast<B>(std::forward<Args>(args))... );
}
using is_transparent=void;
};
std::unordered_map<bucket, RotMat, adapt<std::hash<rot3d>, bucket>, adapt<std::equal_to<>, bucket>> map;
here we convert rod3ds to buckets on the fly.

How does std::set comparator function work?

Currently working on an algorithm problems using set.
set<string> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print mySet:
(())()
()()()
Ok great, as expected.
However if I put a comp function that sorts the set by its length, I only get 1 result back.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
return a.size()>b.size();
}
};
set<string, size_comp> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print myset
(())()
Can someone explain to me why?
I tried using a multi set, but its appending duplicates.
multiset<string,size_comp> mSet;
mSet.insert("(())()");
mSet.insert("()()()");
mSet.insert("()()()");
//print mset
"(())()","()()()","()()()"
std::set stores unique values only. Two values a,b are considered equivalent if and only if
!comp(a,b) && !comp(b,a)
or in everyday language, if a is not smaller than b and b is not smaller than a. In particular, only this criterion is used to check for equality, the normal operator== is not considered at all.
So with your comparator, the set can only contain one string of length n for every n.
If you want to allow multiple values that are equivalent under your comparison, use std::multiset. This will of course also allow exact duplicates, again, under your comparator, "asdf" is just as equivalent to "aaaa" as it is to "asdf".
If that does not make sense for your problem, you need to come up with either a different comparator that induces a proper notion of equality or use another data structure.
A quick fix to get the behavior you probably want (correct me if I'm wrong) would be introducing a secondary comparison criterion like the normal operator>. That way, we sort by length first, but are still able to distinguish between different strings of the same length.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
if (a.size() != b.size())
return a.size() > b.size();
return a > b;
}
};
The comparator template argument, which defaults to std::less<T>, must represent a strict weak ordering relation between values in its domain.
This kind of relation has some requirements:
it's not reflexive (x < x yields false)
it's asymmetric (x < y implies that y < x is false)
it's transitive (x < y && y < z implies x < z)
Taking this further we can define equivalence between values in term of this relation, because if !(x < y) && !(y < x) then it must hold that x == y.
In your situation you have that ∀ x, y such that x.size() == y.size(), then both comp(x,y) == false && comp(y,x) == false, so since no x or y is lesser than the other, then they must be equal.
This equivalence is used to determine if two items correspond to the same, thus ignoring second insertion in your example.
To fix this you must make sure that your comparator never returns false for both comp(x,y) and comp(y,x) if you don't want to consider x equal to y, for example by doing
auto cmp = [](const string& a, const string& b) {
if (a.size() != b.size())
return a.size() > b.size();
else
return std::less()(a, b);
}
So that for input of same length you fallback to normal lexicographic order.
This is because equality of elements is defined by the comparator. An element is considered equal to another if and only if !comp(a, b) && !comp(b, a).
Since the length of "(())()" is not greater, nor lesser than the length of "()()()", they are considered equal by your comparator. There can be only unique elements in a std::set, and an equivalent object will overwrite the existing one.
The default comparator uses operator<, which in the case of strings, performs lexicographical ordering.
I tried using a multi set, but its appending duplicates.
Multiset indeed does allow duplicates. Therefore both strings will be contained despite having the same length.
size_comp considers only the length of the strings. The default comparison operator uses lexicographic comparison, which distinguishes based on the content of the string as well as the length.

Binary search on vector objects of an element greater than an attribute

I have a vector which contains lot of elements of my class X .
I need to find the first occurrence of an element in this vector say S such that S.attrribute1 > someVariable. someVariable will not be fixed . How can I do binary_search for this ? (NOT c++11/c++14) . I can write std::binary_search with search function of greater (which ideally means check of equality) but that would be wrong ? Whats the right strategy for fast searching ?
A binary search can only be done if the vector is in sorted order according to the binary search's predicate, by definition.
So, unless all elements in your vector for which "S.attribute1 > someVariable" are located after all elements that are not, this is going to be a non-starter, right out of the gate.
If all elements in your vector are sorted in some other way, that "some other way" is the only binary search that can be implemented.
Assuming that they are, you must be using a comparator, of some sort, that specifies strict weak ordering on the attribute, in order to come up with your sorted vector in the first place:
class comparator {
public:
bool operator()(const your_class &a, const your_class &b) const
{
return a.attribute1 < b.attribute1;
}
};
The trick is that if you want to search using the attribute value alone, you need to use a comparator that can be used with std::binary_search which is defined as follows:
template< class ForwardIt, class T, class Compare >
bool binary_search( ForwardIt first, ForwardIt last,
const T& value, Compare comp );
For std::binary_search to succeed, the range [first, last) must be
at least partially ordered, i.e. it must satisfy all of the following
requirements:
for all elements, if element < value or comp(element, value) is true
then !(value < element) or !comp(value, element) is also true
So, the only requirement is that comp(value, element) and comp(element, value) needs to work. You can pass the attribute value for T, rather than the entire element in the vector to search for, as long as your comparator can deal with it:
class search_comparator {
public:
bool operator()(const your_class &a, const attribute_type &b) const
{
return a.attribute1 < b;
}
bool operator()(const attribute_type &a, const your_class &b) const
{
return a < b.attribute1;
}
};
Now, you should be able to use search_comparator instead of comparator, and do a binary search by the attribute value.
And, all bets are off, as I said, if the vector is not sorted by the given attribute. In that case, you'll need to use std::sort it explicitly, first, or come up with some custom container that keeps track of the vector elements, in the right order, separately and in addition to the main vector that holds them. Using pointers, perhaps, in which case you should be able to execute a binary search on the pointers themselves, using a similar search comparator, that looks at the pointers, instead.
For std::binary_search to succeed, the range need to be sorted.std::binary_search, std::lower_bound works on sorted containers. So every time you add a new element into your vector you need to keep it sorted.
For this purpose you can use std::lower_bound in your insertion:
class X;
class XCompare
{
public:
bool operator()(const X& first, const X& second) const
{
// your sorting logic
}
};
X value(...);
auto where = std::lower_bound(std::begin(vector), std::end(vector), value, XCompare());
vector.insert(where, value);
And again you can use std::lower_bound to search in your vector:
auto where = std::lower_bound(std::begin(vector), std::end(vector), searching_value, XCompare());
Don't forget to check if std::lower_bound was successful:
bool successed = where != std::end(vector) && !(XCompare()(value, *where));
Or directly use std::binary_search if you only want to know that element is in vector.

Why does std::binary_search return bool?

According to draft N4431, the function std::binary_search in the algorithms library returns a bool, [binary.search]:
template<class ForwardIterator, class T>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value);
template<class ForwardIterator, class T, class Compare>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value, Compare comp);
Requires: The elements e of [first,last) are partitioned with respect to the expressions e < value and !(value < e) or comp(e, value) and !comp(value, e). Also, for all elements e of [first,last), e < value implies !(value < e) or comp(e, value) implies !comp(value, e).
Returns: true if there is an iterator i in the range [first,last) that satisfies the corresponding conditions:
!(*i < value) && !(value < *i) or comp(*i, value) == false && comp(value, *i) ==
false.
Complexity: At most log2(last - first) + O(1) comparisons.
Does anyone know why this is the case?
Most other generic algorithms either return an iterator to the element or an iterator that is equivalent to the iterator denoting the end of the sequence of elements (i.e., one after the last element to be considered in the sequence), which is what I would have expected.
The name of this function in 1994 version of STL was isMember. I think you'd agree that a function with that name should return bool
http://www.stepanovpapers.com/Stepanov-The_Standard_Template_Library-1994.pdf
It's split into multiple different functions in C++, as for the reasoning it's nearly impossible to tell why someone made something one way or another. binary_search will tell you if such an element exists. If you need to know the location of them use lower_bound and upper_bound which will give the begin/end iterator respectively. There's also equal_range that gives you both the begin and end at once.
Since others seem to think that it's obvious why it was created that way I'll argue my points why it's hard/impossible to answer if you aren't Alexander Stepanov or someone who worked with him.
Sadly the SGI STL FAQ doesn't mention binary_search at all. It explains reasoning for list<>::size being linear time or pop returning void. It doesn't seem like they deemed binary_search special enough to document it.
Let's look at the possible performance improvement mentioned by #user2899162:
You can find the original implementation of the SGI STL algorithm binary_search here. Looking at it one can pretty much simplify it (we all know how awful the internal names in the standard library are) to:
template <class ForwardIter, class V>
bool binary_search(ForwardIter first, ForwardIter last, const V& value) {
ForwardIter it = lower_bound(first, last, value);
return it != last && !(value < *it);
}
As you can see it was implemented in terms of lower_bound and got the same exact performance. If they really wanted it to take advantage of possible performance improvements they wouldn't have implemented it in terms of the slower one, so it doesn't seem like that was the reason they did it that way.
Now let's look at it simply being a convenience function
It being simply a convenience function seems more likely, but looking through the STL you'll find numerous other algorithms where this could have been possible. Looking at the above implementation you'll see that it's only trivially more to do than a std::find(begin, end, value) != end; yet we have to write that all the time and don't have a convenience function that returns a bool. Why exactly here and not all the other algorithms too? It's not really obvious and can't simply be explained.
In conclusion I find it far from obvious and don't really know if I could confidently and honestly answer it.
The binary search algorithm relies on strict weak ordering. Meaning that the elements are supposed to be partitioned according to the operator < or according to a custom comparator that has the same guarantees. This means that there isn't necessarily only one element that could be found for a given query. Thus you need the lower_bound, upper_bound and equal_range functions to retrieve iterators.
The standard library contains variants of binary search algorithm that return iterators. They are called std::lower_bound and std::upper_bound. I think the rationale behind std::binary_search returning bool is that it wouldn't be clear what iterator to return in case of equivalent elements, while in case of std::lower_bound and std::upper_bound it is clear.
There might have been performance considerations as well, because in theory std::binary_search could be implemented to perform better in case of multiple equivalent elements and certain types. However, at least one popular implementation of the standard library (libstdc++) implements std::binary_search using std::lower_bound and, moreover, they have the same theoretical complexity.
If you want to get an iterator on a value, you can use std::equal_range which will return 2 iterators, one on the lower bound and one on the higher bound of the range of values that are equal to the one you're looking for.
Since the only requirement is that values are sorted and not unique, there's is no simple "find" that would return an iterator on the one element you're looking for. If there's only one element equal to the value you're looking for, there will only be a difference of 1 between the two iterators.
Here's a C++20 binary-seach alternative that returns an iterator:
template<typename RandomIt, typename T, typename Pred>
inline
RandomIt xbinary_search( RandomIt begin, RandomIt end, T const &key, Pred pred )
requires std::random_access_iterator<RandomIt>
&&
requires( Pred pred, typename std::iterator_traits<RandomIt>::value_type &elem, T const &key )
{
{ pred( elem, key ) } -> std::convertible_to<std::strong_ordering>;
}
{
using namespace std;
size_t lower = 0, upper = end - begin, mid;
strong_ordering so;
while( lower != upper )
{
mid = (lower + upper) / 2;
so = pred( begin[mid], key );
if( so == 0 )
{
assert(mid == 0 || pred( begin[mid - 1], key ) < 0);
assert(begin + mid + 1 == end || pred( begin[mid + 1], key ) > 0);
return begin + mid;
}
if( so > 0 )
upper = mid;
else
lower = mid + 1;
}
return end;
}
This code only works correctly if there's only one value between begin and end that matches the key. But if you debug and NDEBUG is not defined, the code stops in your debugger.