I understand that the underlying data structure for map in C++ is a self-balancing Binary Search Tree. Since in these data structures, finding a lower bound and an upper bound to a key has lots of use, you'd think that map lower_bound, and upper_bound functions will give you that capability. It's a bummer that these functions don't deliver that.
Does anyone know why lower_bound behaves the way it does? (it gives you the key that is NOT BEFORE the given key).
I've been using C++ since before SGI even introduced the STL, and for some reason still manage to mess up using these methods, including even embarrassing myself when presenting them to a class. I think my problems are that:
The names already have an intuitive but different meaning in mathematics. Given the mathematical meanings, it seems weird that in a big set or map, upper_bound and lower_bound are actually the same or adjacent elements.
The names "upper_bound" and "lower_bound" sound like there is a some kind of symmetry between the two, when there is absolutely not. I'd have a much easier time if the names were something like least_ge (least greater than or equal to) for lower_bound and least_gt (least greater than) for upper_bound.
If someone has a mnemonic or logic to make these easy to internalize, please share it. Otherwise, it just feels like they wrote two useful functions but used two random mathematical terms to name those functions, with no way to derive the semantics from the names. At that point, why not use up made up names like egptr and pbase? I mean at least I don't have any pre-existing intuitions to overcome about the names of streambuf methods...
At any rate here are what I believe are the basic rules you have to remember:
lower_bound(X) returns the lowest element v such that v >= X
upper_bound(X) returns the lowest element v such that v > X
To traverse the half-open interval [L,H), start with lower_bound(L) and stop at (don't process) lower_bound(H). This is usually what you want, because it's most common to traverse half-open intervals in C++--e.g., [buf, buf+nbytes), or [0,array_size), or [begin(), end()).
To traverse the closed interval [L,H], start at lower_bound(L) and stop at upper_bound(H).
To traverse the open interval (L,H), start at upper_bound(L) and stop at lower_bound(H).
In a non-empty container, the mirror image of lower_bound(X) is std::prev(upper_bound(X)) and the mirror image of upper_bound(X) is std::prev(lower_bound(X)). Of course, if an element is equal to begin(), then you can't step it backwards with std::prev, so you need extra logic to deal with the fact that this point cannot be represented with an iterator value.
In a multiset/multimap, the first v is lower_bound(v) if that element is indeed v. The last v is std::prev(upper_bound(v)) if the container is not empty and that element is v, but remember to check that the container is not empty before attempting prev on end().
From the point view of usual math convention, the upper_bound is the "least true upper-bound" (never equal) and the lower_bound is the "least upper-bound" (could equal). The fact that lower_bound is actually an "upper-bound" in usual math convention may cause confusion among users.
A way to rationalize the name of lower_bound/upper_bound is considering them in the context of another method called equal_range. The lower_bound is really the "lower_bound" of the equal_range, similarly the upper_bound.
This is not only in the map. It is in STL.
lower_bound for your x find such y that x <= y. And the upper_bound x < y.
Related
I know how to use std::unordered_map::emplace, but how do I use emplace_hint? Neither cplusplus nor cppreference provide a set of examples that illustrate how we might know where to put the element.
Can anyone provide some information on this or give some examples / illustrations of when we might know where the emplaced element should go?
What could an unordered_map potentially do with the hint? Well, if the iterator addresses an element with the same key as the element that emplace_hint has been asked to insert, then it can fail quickly - just a key comparison without any hashing or groping through any list of hash-colliding elements at that bucket. But if the key doesn't match, then the hint is otherwise useless because any other key - no matter how "close" in value - should (probabilistically) be at a completely unrelated bucket (given what's normally considered a "good" hash function), so time would have been wasted on a key comparison only to have to start over as if it were a normal emplace.
This might be useful when you're inserting pre-sorted-by-key elements, aiming to remove lots of duplicates in the process, but the key is so huge it's easier to keep an iterator to the just-inserted element than a copy of the key, or perhaps the hash function is particularly slow.
Another benefit of unordered_map::emplace_hint is better API compatibility with map::emplace_hint, so code can switch the container type and have the emplace_hints not break the compile, though they might end up slower than if the code were switched to emplace() as the close-but-different-key hints that help with a map may be useless with an unordered_map.
Just taking GCC 10.2 g++ -E output to see if it does what's described above. emplace_hint calls down into _M_insert_multi_node(...) wherein there's this line:
__node_base* __prev = __builtin_expect(__hint != nullptr, false)
&& this->_M_equals(__k, __code, __hint)
? __hint
: _M_find_before_node(__bkt, __k, __code);
Above, __k is the key that may be inserted, __code is the hash code, __hint is the hint iterator/pointer; _M_equals(...) returns:
return _Equal_hash_code<__node_type>::_S_equals(__c, *__n) &&
_M_eq()(__k, this->_M_extract()(__n->_M_v()));
So, it's checking that the hash codes are equal (a quick and dirty check if you've already calculated the hashes) and the keys are equal (a potentially slower operation for e.g. a quality hash of long strings) before using the hint iterator. That's the only case in which it uses the hint. Imagine the bucket logically has some colliding elements chained off it with keys K1, K2, K3, K4 and your hint iterator is to K4 but you're trying to insert a duplicate with K2: as the iterators are forward only, you have to use _M_find_before_node(...) to reach colliding elements earlier in the chain than your hint points to. After the _M_find_before_node(...) you can scan from K1 forwards to see if the key to insert - K2 - is already present in the elements that collided at the bucket.
(The implementation could be improved by skipping the hash comparison when the key comparison is known to be cheap, but getting that condition right with type traits would be a bit of a pain - how do you know which key equality functions are cheap? Could assume so for small, standard layout, trivially copyable types or similar, at least when the unordered container is instantiated with the default std::equals<> comparison....).
I'm trying to write an algorithm to remove duplicates from a vector<struct xxxx*>.
struct xxxx{
int value; // This is just to make you understand
xxxx* one;
xxxx* two;
}
As you see my struct it's like a tree but the pointers are not in order. The pointers can point to any(actually not any but most) of the others. And the vector doesn't contain the structs but pointers, so I couldn't use the std algorithms to help me neither.
I'm trying to delete duplicates with exactly same value and the same two pointers, but in the same time if I have two similar structs (Let's say A and B) and C.one or C.two points to B. Then I need to change it to A and viceversa.
In other words: if A == B then remove B and change C.one to point A.
I think I can write the brute-force, so if there's no better algorithm I'll write it by myself.
Yesterday, I tried to explain the reasonable approach to a very similar problem to a coworker who had used an N squared solution to an N log N problem.
First create a helper struct, that is basically a wrapper around an xxxx* with a comparison operator checking the contents (not the pointer value) and probably with some other utility functions. This wrapper struct isn't strictly needed vs. just using xxxx*, but from experience, I think it makes the task cleaner.
Create a std::set of those helper structs, into which you will only insert unique elements, and likely another set into which you will insert recursively unresolved elements.
Loop through the original vector and at each position recurse through its children. If you hit a child already in the unique set, that is a final value for that child pointer. If you hit a child that matches a unique element without being the one it matches, then fix the pointer that got you there. If there is also the possibility of null pointers that should bottom the recursion, and if loops are possible you need to detect them (with that recursively unresolved set) and some decision about what to do with a loop. At some point you hit resolved unique elements and add that to the unique set.
The performance and maybe even soundness of the idea depends on the depth and complexity of the loops and what you want to do with loops. There are some messy cases where a loop would map onto another loop, but detecting that could be very tricky. If your phase "like a tree" meant "no loops" then the recursion bottoms cleanly and efficiently without the extra complexity of explicitly managing the recursively unresolved elements.
Obviously I left out some of the grunt work detail around detecting unique / non-unique as you back out of the recursion and around detecting "already did it during an earlier recursion" as you hit an item in the main loop above the recursion. But all those details should be pretty obvious as you write the relevant parts of the code.
Edit: To understand how few node visits there are despite nesting a recursion inside a sequential loop, think from the point of view of the pointers. We follow each pointer at most once (some duplicates are pre detected without following their pointers). For N nodes, there are N top level pointers (if I understood your description correctly) and significantly less than 2N internal pointers (the more tree-like it is, the closer it will be to N-1 internal pointers, rather than 2N). So each node is visited on average less than 3 times and a minority of those visits require both the pre lookup and the post recursion lookup, and each lookup is log U where U is the number of unique items found up to that point. So we can trivially see a bound of 6 N log N.
I have a rather large set of objects that represent numbers and I want to select such numbers according to a custom ordering. This ordering includes several criteria such as the type of their representation (some numbers are represented by an interval), their integrality and ultimatively their value. These numbers are shared throughout the programs (shared pointers) and there is nothing I can do about this.
However, the elements properties can change at any time such that the order changes while I can't notify the container about this. For example, some operations require a refinement of a number that is represented by an interval and during this refinement, the exact value can be found. Thereby, the number changes from the interval representation to a rational number, possibly even an integer. This change, due to the shared instance, immediately propagates to the number in the container and breaks the ordering (and I don't even know which number changed). This totally breaks std::set.
So what I'd like to have is a container that tries to be sorted, but does not rely on this. Whenever an operation detects an incorrect ordering, this ordering should be corrected locally. For example insert would insert the element (using binary search) and always check if the ordering of the current element (w.r.t. the neighbors) is correct.
I'd be willing to accept that "give me the smallest element" would then be only "give me a small element" and that find or remove would have linear complexity: I only need front, insert and remove_front to be particularly efficient.
Is there any implementation that does something like this?
How would you implement this?
If you are looking for an algorithm in the standard library, you should take a look at:
std::make_heap
std::pop_heap
std::push_heap
In <algorithm>. They might fit your need, and even if they don't I'm quite sure you will find what you are looking for in some kind of heap structure. Which one will probably depend on how your code is structured, and how often you expect a value to change etc.
In short:
A heap is a data structure in which it is fast to find and extract the smallest (or largest) element. It is also for most heaps possible to create restructure the heap in linear time or better. You could start out from this page on Wikipedia if you want to learn more about heaps.
Suppose I have a float-integer map m:
m[1.23] = 3
m[1.25] = 34
m[2.65] = 54
m[3.12] = 51
Imagine that I know that there's a mapping between 2.65 and 54, but I don't know about any other mappings.
Is there any way to visit the adjacent mappings without iterating from the beginning or searching using the find function?
In other words: can I directly access the adjacent values by just knowing about a single mapping...such as m[2.65]=54?
UPDATE Perhaps a more important "point" than my answer, brought up by #MattMcNabb:
Floating point keys in std:map
Can I directly access the adjacent values by just knowing about a single mapping (m[2.65]=54)
Yes. std::map is an ordered collection; which is to say that if an operator< exists (more generally, std::less) for the key type you can expect it to have sorted access. In fact--you won't be able to make a map for a key type if it doesn't have this comparison operator available (unless you pass in a predicate function to perform this comparison in the template invocation)
Note there is also a std::unordered_map which is often preferable for cases where you don't need this property of being able to navigate quickly between "adjacent" map entries. However you will need to have std::hash defined in that case. You can still iterate it, but adjacency of items in the iteration won't have anything to do with the sort order of the keys.
UPDATE also due to #MattMcNabb
Is there any way to visit the adjacent mappings without iterating from the beginning or searching using the find function?
You allude to array notation, and the general answer here would be "not really". Which is to say there is no way of saying:
if (not m[2.65][-2]) {
std::cout << "no element 2 steps prior to m[2.65]";
} else {
std::cout << "the element 2 before m[2.65] is " << *m[2.65][-2];
}
While no such notational means exist, the beauty (and perhaps the horror) of C++ is that you could write an augmentation of map that did that. Though people would come after you with torches and pitchforks. Or maybe they'd give you cult status and put your book on the best seller list. It's a fine line--but before you even try, count the letters and sequential consonants in your last name and make sure it's a large number.
What you need to access the ordering is an iterator. And find will get you one; and all the flexibility that it affords.
If you only use the array notation to read or write from a std::map, it's essentially a less-capable convenience layer built above iterators. So unless you build your own class derived from map, you're going to be stuck with the limits of that layer. The notation provides no way to get information about adjacent values...nor does it let you test for whether a key is in the map or not. (With find you can do this by comparing the result of a lookup to end(m) if m is your map.)
Technically speaking, find gives you the same effect as you could get by walking through the iterators front-to-back or back-to-front and comparing, as they are sorted. But that would be slower if you're seeking arbitrary elements. All the containers have a kind of algorithmic complexity guarantee that you can read up on.
When dereferencing an iterator, you will receive a pair whose first element is the key and second element is the value. The value will be mutable, but the key is constant. So you cannot find an element, then navigate to an adjacent element, and alter its key directly...just its value.
I'm trying to work out the best method to search a vector of type "Tracklet" (a class I have built myself) to find the first and last occurrence of a given value for one of its variables. For example, I have the following classes (simplified for this example):
class Tracklet {
TimePoint *start;
TimePoint *end;
int angle;
public:
Tracklet(CvPoint*, CvPoint*, int, int);
}
class TimePoint {
int x, y, t;
public:
TimePoint(int, int, int);
TimePoint(CvPoint*, int);
// Relevant getters and setters exist here
};
I have a vector "vector<Tracklet> tracklets" and I need to search for any tracklets with a given value of "t" for the end timepoint. The vector is ordered in terms of end time (i.e. tracklet.end->t).
I'm happy to code up a search algorithm, but am unsure of which route to take with it. I'm not sure binary search would be suitable, as I seem to remember it won't necessarily find the first. I was thinking of a method where I use binary search to find an index of an element with the correct time, then iterate back to find the first and forward to find the last. I'm sure there's a better way than that, since it wastes binary searches O(log n) by iterating.
Hopefully that makes sense: I struggled to explain it a bit!
Cheers!
If the vector is sorted and contains the value, std::lower_bound will give you an iterator to the first element with a given value and std::upper_bound will give you an iterator to one element past the last one containing the value. Compare the value with the returned element to see if it existed in the vector. Both these functions use binary search, so time is O(logN).
To compare on tracklet.end->t, use:
bool compareTracklets(const Tracklet &tr1, const Tracklet &tr2) {
return (tr1.end->t < tr2.end->t);
}
and pass compareTracklets as the fourth argument to lower_bound or upper_bound
I'd just use find and find_end, and then do something more complicated only if testing showed it to be too slow.
If you're really concerned about lookup performance, you might consider a different data structure, like a map with timestamp as the key and a vector or list of elements as the value.
A binary search seems like your best option here, as long as your vector remains sorted. It's essentially identical, performance-wise, to performing a lookup in a binary tree-structure.
dirkgently referred to a sweet optimization comparative. But I would in fact not use a std::vector for this.
Usually, when deciding to use a STL container, I don't really consider the performance aspect, but I do consider its interface regarding the type of operation I wish to use.
std::set<T>::find
std::set<T>::lower_bound
std::set<T>::upper_bound
std::set<T>::equal_range
Really, if you want an ordered sequence, outside of a key/value setup, std::set is just easier to use than any other.
You don't have to worry about inserting at a 'bad' position
You don't have problems of iterators invalidation when adding / removing an element
You have built-in methods for searching
Of course, you also want your Comparison Predicate to really shine (hopes the compiler inlines the operator() implementation), in every case.
But really, if you are not convinced, try a build with a std::vector and manual insertion / searching (using the <algorithm> header) and try another build using std::set.
Compare the size of the implementations (number of lines of code), compare the number of bugs, compare the speed, and then decide.
Most often, the 'optimization' you aim for is actually a pessimization, and in those rares times it's not, it's just so complicated that it's not worth it.
Optimization:
Don't
Expert only: Don't, we mean it
The vector is ordered in terms of time
The start time or the end time?
What is wrong with a naive O(n) search? Remember you are only searching and not sorting. You could use a sorted container as well (if that doesn't go against the basic design).