Searching in a map: Member vs. non-member lower_bound - c++

I need to look up two elements in a std::map. Since my map is sparse, I might not have an entry for every key, so I use lower_bound for lookup. For sake of simplicity, let's assume that I can always find two elements like this and that they are always distinct.
The quickest solution would of course be:
auto it1 = my_map.lower_bound(k1);
auto it2 = my_map.lower_bound(k2);
However, I know that the element at index k2 is located between begin and the element at index k1. Therefore, I was thinking of using std::lower_bound for the second lookup to avoid having to search the full range again:
auto it1 = my_map.lower_bound(k1);
auto it2 = std::lower_bound(begin(my_map), it1, k2);
Any opinions on the second solution? Complexity-wise it should be better, but it's a lot less pleasant to look at than the original code and I'm wondering whether it is worth bothering at all. Also, should I expect any drawbacks due to the fact that I'm using the non-member lower_bound for the second call?

The primary drawback is that the non-member std::lower_bound has to rely on the bidirectional iterators that map provides. So while it's able to perform O(log(n)) comparisons, it still has to perform O(n) iterations.
On the other hand, the member lower_bound() is aware of the internal structure of the map, which is typically some sort of binary tree. This means that's capable of traversing in ways the standard algorithm cannot.

Related

Why there isn't lower_bound and upper_bound for unordered_multimap?

Switching from a multimap to unordered_multimap, I realized there isn't the equivalent:
lower_bound
upper_bound
It seem obvious that the equal_range could make an easy equivalent, but I wonder if I am missing something: a reason for this choice.
Coming from any other library, I would have considered the difference a simple error, but STL is usually quite orthogonal in this respect.
It's in the name. unordered_multimap. There is no order, and as such, no lower/upper relationship. Items (keys) stored in unordred_* containers are not even required to implement </ std::less, only hash and equality operations.
If you look into std::lower_bound() and std::upper_bound() documentation, you can see that they have put a special requirement on the range they can be applied to:
The range [first, last) must be partitioned with respect to the expression !(value < element) or !comp(value, element), i.e., all elements for which the expression is true must precede all elements for which the expression is false. A fully-sorted range meets this criterion.
As std::map satisfy that criterion one could use those generic functions on it, but such usage is not efficient, as those generic functions are unaware of the internal representation of the map. So std::map provided its own, more efficient variants (though less generic). std::unordered_map on another side, does not satisfy the criterion so you cannot apply those generic functions on it and so it does not make any sense to implement them for std::unorderd_map itself.
I though the lower_bound returns the first iterator with a given key, this could be possible on an unordered container.
This is what std::find() does. std::lower_bound() or std::map::lower_bound() gives you the position, from which elements of the range are not less than the key. The fact that you can use it to find a particular element is a useful side effect of that behavior, but not the main purpose of those functions.

Searching a std::vector for approximations of a value

I have a vector,
std::vector<float> v;
and a float value, x. The vector v contains x+epsilon, where epsilon is very small (yet greater than the machine epsilon), but it doesn't contain x. Is there a way, using the STL, to find the index of x+epsilon in the vector?
Something like:
int i = alternative_find(v.begin(), v.end(), x, gamma) - v.begin();
which will return the index of all the values in v which are in [x-gamma,x+gamma]? I could implement a binary search function (I'd like to avoid linear time complexity), but I'd really like to know if it could be done in an easier way.
Find the std::lower_bound, then the std::upper_bound, and you'll have your range.
From an iterator, you can obtain an index using std::distance (though stick with the iterator if you can!).
This assumes your data is sorted, but since you talk about binary searches that seems like a sensible assumption.
If it's not then you're going to have to examine every element anyway, in which case any approach is basically as good as another.
If you're talking about binary search then obvious the vector's pre-sorted, which means you want to find the first element above x-gamma, then if you actually want to use the values it's fastest to increment further while they're in range. Check out lower_bound: http://en.cppreference.com/w/cpp/algorithm/lower_bound
If you just want to find the first and last, an alternative is to use upper_bound to binary search to its position, but that's likely slower than incrementing if there are a lot of elements and only a few match.
In C++11:
std::find_if(v.begin(), v.end(),
[x,gamma](float f){return f >= x-gamma && f <= x+gamma;})
Historically, you'd have to write your own predicate, and it would probably be simpler to use a regular for loop.
(Although, regarding your last sentence, if the vector is or can be sorted, then you can do a binary search with lower_bound and upper_bound as described in other answers).

Binary_search in STL set over set's member function find?

Why do we have 2 ways like above to search for an element in the set?
Also find algorithm can be used to find an element in a list or a vector but what would be the harm in these providing a member function as well as member functions are expected to be faster than a generic algorithm?
Why do we need remove algorithm and create all the drama about erase remove where remove will just shift the elements and then use erase to delete the actual element..Just like STL list provides a member function remove why cant the other containers just offer a remove function and be done with it?
Binary_search in STL set over set's member function find?
Why do we have 2 ways like above to search for an element in the set?
Binary search returns a bool and set::find() and iterator. In order to compare apples to apples, the algorithm to compare set::find() with is std::lower_bound() which also returns an iterator.
You can apply std::lower_bound() on an arbitrary sorted range specified by a pair of (forward / bidirectional / random access) iterators and not only on a std::set. So having std::lower_bound() is justified. As std::set happens to be a sorted range, you can call
std::lower_bound(mySet.begin(), mySet.end(), value);
but the
mySet.find(value);
call is not only more concise, it is also more efficient. If you look into the implementation of std::lower_bound() you will find something like std::advance(__middle, __half); which has different complexity depending on the iterator (whether forward / bidirectional / random access iterator). In case of std::set, the iterators are bidirectional and advancing them has linear complexity, ouch! In contrast, std::set::find() is guaranteed to perform the search in logarithmic time complexity. The underlying implementation (which is a red and black tree in case of libstdc++) makes it possible. Offering a set::find() is also justified as it is more efficient than calling std::lower_bound() on std::set.
Also find algorithm can be used to find an element in a list or a
vector but what would be the harm in these providing a member function
as well as member functions are expected to be faster than a generic
algorithm?
I don't see how you could provide a faster member function for list or vector, unless the container is sorted (or possesses some special property).
Why do we need remove algorithm and create all the drama about erase
remove where remove will just shift the elements and then use erase to
delete the actual element..Just like STL list provides a member
function remove why cant the other containers just offer a remove
function and be done with it?
I can think of two reasons.
Yes, the STL is seriously lacking many convenience functions. I often feel like I live in a begin-end hell when using algorithms on an entire container; I often proved my own wrappers that accept a container, something like:
template <typename T>
bool contains(const std::vector<T>& v, const T& elem) {
return std::find(v.begin(), v.end(), elem) != v.end();
}
so that I can write
if (contains(myVector, 42)) {
instead of
if (std::find(myVector.begin(), myVector.end(), 42) != myVector.end()) {
Unfortunately, you quite often have to roll your own or use boost. Why? Because standardization is painful and slow so the standardization committee focuses on more important things. The people on the committee often donate their free time and are not paid for their work.
Now deleting elements from a vector can be tricky: Do you care about the order of your elements? Are your elements PODs? What are your exception safety requirements?
Let's assume you don't care about the order of your elements and you want to delete the i-th element:
std::swap(myVector[i], myVector.back());
myVector.pop_back();
or even simpler:
myVector[i] = myVector.back(); // but if operator= throws during copying you might be in trouble
myVector.pop_back();
In C++11 with move semantics:
myVector[i] = std::move(myVector.back());
myVector.pop_back();
Note that these are O(1) operations instead of O(N). These are examples of the efficiency and exception safety considerations that the standard committee leaves up to you. Providing a member function and "one size fits all" is not the C++ way.
Having said all these, I repeat I wish we had more convenience functions; I understand your problem.
I'll answer part of your question. The Erase-Remove idiom is from the book “Effective STL” written by Scott Meye. As to why remove() doesn't actually delete elements from the container, there is a good answer here, I just copy part of the answer:
The key is to realize that remove() is designed to work on not just a
container but on any arbitrary forward iterator pair: that means it
can't actually delete the elements, because an arbitrary iterator pair
doesn't necessarily have the ability to delete elements.
Why STL list provides a member function remove and why can't the other containers just offer a remove function and be done with it? I think it's because the idiom is more efficient than other methods to remove specific values from the contiguous-memory containers.

How can I define operator< for bidirectional iterator?

How can I define operator< for bidirectional iterator? ( list::iterator )
(I would like to use list and not vector.)
You can't do it directly, but you can compute std::distance(x.begin(), it1) and std::distance(x.begin(), it2) and compare those. Given that lists don't have random access, you expect to have to pay the price for such a query by having to traverse the entire list.
Edit: This will perform poorly if both iterators are near the end of the list. If you want to get more fancy, you could write some exploring algorithm that moves outwards from both iterators:
[ .... <-- it1 --> .... <-- it2 --> .... ]
You would basically keep two copies for each, fwd1/rev1 and fwd2/rev2, and you decrement the rev* iterators until you hit x.begin() and advance the fwd* iterators until you hit x.end(). If your iterator pairs are uniformly distributed, this probably has better expected runtime.
Impossible. You might need to walk until end, and for that you need to know the list of origin, which is not encoded in a list::iterator.
(You can make a function object for this purpose, though, which takes the list or origin as a constructor argument. Mind you, finding out whether one iterator is less-than another would take O(n) time.)
You can't do this because you'd have to know the start and/or end of the list in order to make such a comparison. Only random access iterators define operator<.
Supposing ++(list.end()) is not undefined behaviour and equals list.end(), there is a way. But i am not sure of this hypothesis.
If it is valid, you can define a simple algorithm to obtain the result you want.

Complexity of STL max_element

So according to the link here: http://www.cplusplus.com/reference/algorithm/max_element/ , the max_element function is O(n), apparently for all STL containers. Is this correct? Shouldn't it be O(log n) for a set (implemented as a binary tree)?
On a somewhat related note, I've always used cplusplus.com for questions which are easier to answer, but I would be curious what others think of the site.
It's linear because it touches every element.
It's pointless to even use it on a set or other ordered container using the same comparator because you can just use .rbegin() in constant time.
If you're not using the same comparison function there's no guarantee that the orders will coincide so, again, it has to touch every element and has to be at least linear.
Although algorithms may be specialized for different iterator categories there is no way to specialize them base on whether an iterator range is ordered.
Most algorithms work on unordered ranges (max_element included), a few require the ranges to be ordered (e.g. set_union, set_intersection) some require other properties for the range (e.g. push_heap, pop_heap).
The max_element function is O(n) for all STL containers.
This is incorrect, because max_element applies to iterators, not containers. Should you give it iterators from a set, it has no way of knowing they come from a set and will therefore traverse all of them in order looking for the maximum. So the correct sentence is:
The max_element function is O(n) for all forward iterators
Besides, if you know that you're manipulating a set, you already have access to methods that give you the max element faster than O(n), so why use max_element ?
It is an STL algorithm, so it does not know anything about the container. So this linear search is the best it can do on a couple on forward iterators.
STL algorithms do not know what container you took the iterators from, whether or not it is ordered and what order constraints were used. It is a linear algorithm that checks all elements in the range while keeping track of the maximum value seen so far.
Note that even if you could use metaprogramming techniques to detect what type of container where the iterators obtained from that is not a guarantee that you can just skip to the last element to obtain the maximum:
int values[] = { 1, 2, 3, 4, 5 };
std::set<int, greater<int> > the_set( values, values+5 );
std::max_element( the_set.begin(), the_set.end() ); //??
Even if the iterators come from a set, it is not the last, but the first element the one that holds the maximum. With more complex data types the set can be ordered with some other key that can be unrelated to the min/max values.