Complexity of STL max_element - c++

So according to the link here: http://www.cplusplus.com/reference/algorithm/max_element/ , the max_element function is O(n), apparently for all STL containers. Is this correct? Shouldn't it be O(log n) for a set (implemented as a binary tree)?
On a somewhat related note, I've always used cplusplus.com for questions which are easier to answer, but I would be curious what others think of the site.

It's linear because it touches every element.
It's pointless to even use it on a set or other ordered container using the same comparator because you can just use .rbegin() in constant time.
If you're not using the same comparison function there's no guarantee that the orders will coincide so, again, it has to touch every element and has to be at least linear.
Although algorithms may be specialized for different iterator categories there is no way to specialize them base on whether an iterator range is ordered.
Most algorithms work on unordered ranges (max_element included), a few require the ranges to be ordered (e.g. set_union, set_intersection) some require other properties for the range (e.g. push_heap, pop_heap).

The max_element function is O(n) for all STL containers.
This is incorrect, because max_element applies to iterators, not containers. Should you give it iterators from a set, it has no way of knowing they come from a set and will therefore traverse all of them in order looking for the maximum. So the correct sentence is:
The max_element function is O(n) for all forward iterators
Besides, if you know that you're manipulating a set, you already have access to methods that give you the max element faster than O(n), so why use max_element ?

It is an STL algorithm, so it does not know anything about the container. So this linear search is the best it can do on a couple on forward iterators.

STL algorithms do not know what container you took the iterators from, whether or not it is ordered and what order constraints were used. It is a linear algorithm that checks all elements in the range while keeping track of the maximum value seen so far.
Note that even if you could use metaprogramming techniques to detect what type of container where the iterators obtained from that is not a guarantee that you can just skip to the last element to obtain the maximum:
int values[] = { 1, 2, 3, 4, 5 };
std::set<int, greater<int> > the_set( values, values+5 );
std::max_element( the_set.begin(), the_set.end() ); //??
Even if the iterators come from a set, it is not the last, but the first element the one that holds the maximum. With more complex data types the set can be ordered with some other key that can be unrelated to the min/max values.

Related

find() vs binary_search() in STL

Which function is more efficient in searching an element in vector find() or binary_search() ?
The simple answer is: std::find for unsorted data and std::binary_search for sorted data. But I think there's much more to this:
Both methods take a range [start, end) with n elements and and a value x that is to be found as input. But note the important difference that std::binary_search only returns a bool that tells you wether the range contained the element, or not. std::find() returns an iterator. So both have different, but overlapping use cases.
std::find() is pretty straight forward. O(n) iterator increments and O(n) comparisons. Also it doesn't matter wether the input data is sorted or not.
For std::binary_search() you need to consider multiple factors:
It only works on sorted data. You need to take the cost of sorting into account.
The number of comparisons is always O(log n).
If the iterator does not satisfy LegacyRandomAccessIterator the number of iterator increments is O(n), it will be logarithmic when they do satisfy this requirement.
Conclusion (a bit opinionated):
when you operate on un-sorted data or need the location of the item you searched for you must use std::find()
when your data is already sorted or needs to be sorted anyway and you simply want to check if an element is present or not, use std::binary_search()
If you want to search containers like std::set, std::map or their unordered counterparts also consider their builtin methods like std::set::find
When you are not sure if the data is sorted or not, You have to use find() and If the data will be sorted you should use binary_search().
For more information, You can refer find() and binary_search()
If your input is sorted then you can use binary_search as it will take O(lg n) time. if your input is unsorted you can use find, which will take O(n) time.

Time complexity of std::lower_bound on a sorted vector

I was studying std::upper_bound from http://www.cplusplus.com/reference/algorithm/upper_bound/
and I came across the fact that this might run in linear time on non-random access iterators.
I need to use this for a sorted vector. Now I don't know what are non-random access iterators and whether this will run in logarithmic time on the sorted vector.
Can anyone clear this for me.
§ 23.3.6.1 [vector.overview]/p1:
A vector is a sequence container that supports random access iterators.
A random access iterator is the one that is able to compute the offset of an arbitrary element in a constant time, without a need to iterate from one place to another (what would result in linear complexity).
std::lower_bound itself provides generic implementation of the binary search algorithm, that doesn't care much about what iterator is used to indicate ranges (it only requires the iterator to be of at least a forward category). It uses helper functions like std::advance to iteratively limit the ranges in its binary search. With std::vector<T>::iterator which is of a random access category, std::lower_bound runs with logarithmic time complexity with regards to the number of steps required to iterate over elements, as it can partition the range by half in each step in a constant time.
§ 25.4.3 [alg.binary.search]/p1:
All of the algorithms in this section are versions of binary search and assume that the sequence being
searched is partitioned with respect to an expression formed by binding the search key to an argument of
the implied or explicit comparison function. They work on non-random access iterators minimizing the
number of comparisons, which will be logarithmic for all types of iterators. They are especially appropriate
for random access iterators, because these algorithms do a logarithmic number of steps through the data
structure. For non-random access iterators they execute a linear number of steps.

Optimal way to search a std::set

How should one search a std::set, when speed is the critical criterion for his/her project?
set:find?
Complexity:
Logarithmic in size.
std::binary_search?
Complexity:
On average, logarithmic in the distance between first and last: Performs approximately log2(N)+2 element comparisons (where N is this distance).
On non-random-access iterators, the iterator advances produce themselves an additional linear complexity in N on average.
Just a binary search implemented by him/her (like this one)? Or the STL's one is good enough?
Is there a way to answer this theoretically? Or we have to test ourselves? If someone has, it would be nice if (s)he would share this information with us (if no, we are not lazy :) ).
The iterator type provided by std::set is a bidirectional_iterator, a category which does not require random access to elements, but only element-wise movements in both directions. All random_access_iterator's are bidirectional_iterators, but not vice versa.
Using std::binary_search on a std::set can therefore yield O(n) runtime as per the remarks you quoted, while std::set::find has guaranteed O(logn).
So to search a set, use set::find.
It's unlikely that std::set has a random access iterator. Even if it did, std::binary_search would access at least as many nodes as .find, since .find accesses only the ancestors of the target node.

what is the uses cases in which order in key value associative container is useful

I am wondering in which uses cases, we can be interested by having an ordered associative container.
In other terms, why use std::map and no std::unorderd_map
You use an ordered map when you need to be able to iterate over the keys in an ordered fashion.
As Mat pointed out, it depends on whether you need to access your elements in sorted order. Also, map has guaranteed, worst-case logarithmic access time (in the number of elements), whereas unordered map has average constant access time.
The main reason for using map instead of unordered_map is that map is in standard and for sure you have it. Some companies have policy not to use definitions from tr1.
But you ask when order is important - it is needed for operations like lower_bound, upper_bound, set_union, set_intersection etc.
If the comparison between unordered containers is needed, the complexity
might be an issue.
It seems to be difficult to implement operator==/operator!= between
unordered containers efficiently.
N3290 §23.2.5/11 says
the complexity of operator== ... is
proportional to ... N^2 in the worst
case, where N is a.size().
while other containers have linear complexity.
Unordered_map use more memory than map. If there is a constraint in memory then its recommended to use map. So the operations become slower when using unordered_map
EDIT
Quoting from the wiki article
It is similar to the map class in the C++ standard library but has different
constraints. As its name implies, unlike the map class, the elements of an
unordered_map are not ordered. This is due to the use of hashing to store objects.
unordered_map can still be iterated through like a regular map.

Dynamically sorted STL containers

I'm fairly new to the STL, so I was wondering whether there are any dynamically sortable containers? At the moment my current thinking is to use a vector in conjunction with the various sort algorithms, but I'm not sure whether there's a more appropriate selection given the (presumably) linear complexity of inserting entries into a sorted vector.
To clarify "dynamically", I am looking for a container that I can modify the sorting order at runtime - e.g. sort it in an ascending order, then later re-sort in a descending order.
You'll want to look at std::map
std::map<keyType, valueType>
The map is sorted based on the < operator provided for keyType.
Or
std::set<valueType>
Also sorted on the < operator of the template argument, but does not allow duplicate elements.
There's
std::multiset<valueType>
which does the same thing as std::set but allows identical elements.
I highly reccomend "The C++ Standard Library" by Josuttis for more information. It is the most comprehensive overview of the std library, very readable, and chock full of obscure and not-so-obscure information.
Also, as mentioned by 17 of 26, Effective Stl by Meyers is worth a read.
If you know you're going to be sorting on a single value ascending and descending, then set is your friend. Use a reverse iterator when you want to "sort" in the opposite direction.
If your objects are complex and you're going to be sorting in many different ways based on the member fields within the objects, then you're probably better off with using a vector and sort. Try to do your inserts all at once, and then call sort once. If that isn't feasible, then deque may be a better option than the vector for large collections of objects.
I think that if you're interested in that level of optimization, you had better be profiling your code using actual data. (Which is probably the best advice anyone here can give: it may not matter that you call sort after each insert if you're only doing it once in a blue moon.)
It sounds like you want a multi-index container. This allows you to create a container and tell that container the various ways you may want to traverse the items in it. The container then keeps multiple lists of the items, and those lists are updated on each insert/delete.
If you really want to re-sort the container, you can call the std::sort function on any std::deque, std::vector, or even a simple C-style array. That function takes an optional third argument to determine how to sort the contents.
The stl provides no such container. You can define your own, backed by either a set/multiset or a vector, but you are going to have to re-sort every time the sorting function changes by either calling sort (for a vector) or by creating a new collection (for set/multiset).
If you just want to change from increasing sort order to decreasing sort order, you can use the reverse iterator on your container by calling rbegin() and rend() instead of begin() and end(). Both vector and set/multiset are reversible containers, so this would work for either.
std::set is basically a sorted container.
You should definitely use a set/map. Like hazzen says, you get O(log n) insert/find. You won't get this with a sorted vector; you can get O(log n) find using binary search, but insertion is O(n) because inserting (or deleting) an item may cause all existing items in the vector to be shifted.
It's not that simple. In my experience insert/delete is used less often than find. Advantage of sorted vector is that it takes less memory and is more cache-friendly. If happen to have version that is compatible with STL maps (like the one I linked before) it's easy to switch back and forth and use optimal container for every situation.
in theory an associative container (set, multiset, map, multimap) should be your best solution.
In practice it depends by the average number of the elements you are putting in.
for less than 100 elements a vector is probably the best solution due to:
- avoiding continuous allocation-deallocation
- cache friendly due to data locality
these advantages probably will outperform nevertheless continuous sorting.
Obviously it also depends on how many insertion-deletation you have to do. Are you going to do per-frame insertion/deletation?
More generally: are you talking about a performance-critical application?
remember to not prematurely optimize...
The answer is as always it depends.
set and multiset are appropriate for keeping items sorted but are generally optimised for a balanced set of add, remove and fetch. If you have manly lookup operations then a sorted vector may be more appropriate and then use lower_bound to lookup the element.
Also your second requirement of resorting in a different order at runtime will actually mean that set and multiset are not appropriate because the predicate cannot be modified a run time.
I would therefore recommend a sorted vector. But remember to pass the same predicate to lower_bound that you passed to the previous sort as the results will be undefined and most likely wrong if you pass the wrong predicate.
Set and multiset use an underlying binary tree; you can define the <= operator for your own use. These containers keep themselves sorted, so may not be the best choice if you are switching sort parameters. Vectors and lists are probably best if you are going to be resorting quite a bit; in general list has it's own sort (usually a mergesort) and you can use the stl binary search algorithm on vectors. If inserts will dominate, list outperforms vector.
STL maps and sets are both sorted containers.
I second Doug T's book recommendation - the Josuttis STL book is the best I've ever seen as both a learning and reference book.
Effective STL is also an excellent book for learning the inner details of STL and what you should and shouldn't do.
For "STL compatible" sorted vector see A. Alexandrescu's AssocVector from Loki.