Container with optimized find like std::map but non associative - c++

I have a big std::vector<int> where I have to get an iterator so that I can call other functions of it, like erase. Looping through the vector to find the element I'm searching for takes a lot of time.
std::map::find() is much faster, but I don't want to allocate memory for the second value which I'm never going to use.
Is there any single-value container with find() or anything that gives me an iterator with similar speed as std::map::find ? I couldn't find any.

You're looking for std::set or std::multiset.

You could use std::unordered_map or use the same std::vector that keeps an order of elements that you could apply standard algorithms for sorted containers as for example std::equal_range.

Related

Binary_search in STL set over set's member function find?

Why do we have 2 ways like above to search for an element in the set?
Also find algorithm can be used to find an element in a list or a vector but what would be the harm in these providing a member function as well as member functions are expected to be faster than a generic algorithm?
Why do we need remove algorithm and create all the drama about erase remove where remove will just shift the elements and then use erase to delete the actual element..Just like STL list provides a member function remove why cant the other containers just offer a remove function and be done with it?
Binary_search in STL set over set's member function find?
Why do we have 2 ways like above to search for an element in the set?
Binary search returns a bool and set::find() and iterator. In order to compare apples to apples, the algorithm to compare set::find() with is std::lower_bound() which also returns an iterator.
You can apply std::lower_bound() on an arbitrary sorted range specified by a pair of (forward / bidirectional / random access) iterators and not only on a std::set. So having std::lower_bound() is justified. As std::set happens to be a sorted range, you can call
std::lower_bound(mySet.begin(), mySet.end(), value);
but the
mySet.find(value);
call is not only more concise, it is also more efficient. If you look into the implementation of std::lower_bound() you will find something like std::advance(__middle, __half); which has different complexity depending on the iterator (whether forward / bidirectional / random access iterator). In case of std::set, the iterators are bidirectional and advancing them has linear complexity, ouch! In contrast, std::set::find() is guaranteed to perform the search in logarithmic time complexity. The underlying implementation (which is a red and black tree in case of libstdc++) makes it possible. Offering a set::find() is also justified as it is more efficient than calling std::lower_bound() on std::set.
Also find algorithm can be used to find an element in a list or a
vector but what would be the harm in these providing a member function
as well as member functions are expected to be faster than a generic
algorithm?
I don't see how you could provide a faster member function for list or vector, unless the container is sorted (or possesses some special property).
Why do we need remove algorithm and create all the drama about erase
remove where remove will just shift the elements and then use erase to
delete the actual element..Just like STL list provides a member
function remove why cant the other containers just offer a remove
function and be done with it?
I can think of two reasons.
Yes, the STL is seriously lacking many convenience functions. I often feel like I live in a begin-end hell when using algorithms on an entire container; I often proved my own wrappers that accept a container, something like:
template <typename T>
bool contains(const std::vector<T>& v, const T& elem) {
return std::find(v.begin(), v.end(), elem) != v.end();
}
so that I can write
if (contains(myVector, 42)) {
instead of
if (std::find(myVector.begin(), myVector.end(), 42) != myVector.end()) {
Unfortunately, you quite often have to roll your own or use boost. Why? Because standardization is painful and slow so the standardization committee focuses on more important things. The people on the committee often donate their free time and are not paid for their work.
Now deleting elements from a vector can be tricky: Do you care about the order of your elements? Are your elements PODs? What are your exception safety requirements?
Let's assume you don't care about the order of your elements and you want to delete the i-th element:
std::swap(myVector[i], myVector.back());
myVector.pop_back();
or even simpler:
myVector[i] = myVector.back(); // but if operator= throws during copying you might be in trouble
myVector.pop_back();
In C++11 with move semantics:
myVector[i] = std::move(myVector.back());
myVector.pop_back();
Note that these are O(1) operations instead of O(N). These are examples of the efficiency and exception safety considerations that the standard committee leaves up to you. Providing a member function and "one size fits all" is not the C++ way.
Having said all these, I repeat I wish we had more convenience functions; I understand your problem.
I'll answer part of your question. The Erase-Remove idiom is from the book “Effective STL” written by Scott Meye. As to why remove() doesn't actually delete elements from the container, there is a good answer here, I just copy part of the answer:
The key is to realize that remove() is designed to work on not just a
container but on any arbitrary forward iterator pair: that means it
can't actually delete the elements, because an arbitrary iterator pair
doesn't necessarily have the ability to delete elements.
Why STL list provides a member function remove and why can't the other containers just offer a remove function and be done with it? I think it's because the idiom is more efficient than other methods to remove specific values from the contiguous-memory containers.

Merits of std::find

Is there any advantage to using C++11's std::find over a container's find method?
In the case of std::vector (which does not have a find method) does std::find use some smart algorithm or the naive way of simply iterating over every element?
In the case of std::map it seems you need to pass along an std::pair, which is the value_type of an std::map. This does not seem very useful as usually you'd want to find for either a key or a mapped element.
What about other containers like std::list or std::set or std::unordered_set ?
In the case of std::vector (which does not have a find method) does std::find use some smart algorithm or the naive way of simply iterating over every element?
It cannot, because vectors are not sorted. There is no other way to find an element in an unsorted vector than a linear search with O(n) complexity.
On the other hand, sequence containers do not offer a find() member functions, so you could not possibly use that.
In the case of std::map it seems you need to pass along an std::pair, which is the value_type of an std::map. This does not seem very useful as usually you'd want to find for either a key or a mapped element.
Indeed, here you should use the find() member function, which guarantees a better complexity (O(log N)).
In general, when a container exposes a member function with the same name as a generic algorithm, this is because the member function does the same thing, but offers a better complexity guarantee.
What about other containers like std::list or std::set or std::unordered_set ?
Just like std::vector, std::list is not a sorted container - so the same conclusion applies.
For std::set and std::unordered_set, instead, you should use the find() member function, which guarantees a better complexity (O(log n) and average O(1), respectively).

Simple and efficient container in C++ with characteristics of map and list containers

I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.

why there is no find for vector in C++

what's the alternative?
Should I write by myself?
There is the std::find() algorithm, which performs a linear search over an iterator range, e.g.,
std::vector<int> v;
// Finds the first element in the vector that has the value 42:
// If there is no such value, it == v.end()
std::vector<int>::const_iterator it = std::find(v.begin(), v.end(), 42);
If your vector is sorted, you can use std::binary_search() to test whether a value is present in the vector, and std::equal_range() to get begin and end iterators to the range of elements in the vector that have that value.
The reason there is no vector::find is because there is no algorithmic advantage over std::find (std::find is O(N) and in general, you can't do better for vectors).
But the reason you have map::find is because it can be more efficient (map::find is O(log N) so you would always want to use that over std::find for maps).
Who told you that? There's is "find" algorithm for vector in C++. Ironically Coincidentally, it is called std::find. Or maybe std::binary_search. Or something else, depending on the properties of the data stored in your vector.
Containers get their own specific versions of generic algorithms (implemented as container methods) only when the effective implementation of the algorithm is somehow tied to the internal details of the container. std::list<>::sort would be one example.
In all other cases, the algorithms are implemented by standalone functions.
Having a 'find' functionality in the container class violates 'SRP' (Single Responsibility Principle). A container's core functionality is to provide interfaces for storage, retrieval of elements in the container. 'Finding', 'Sorting', 'Iterating' etc are not core functionality of any container and hence not part of it's direct interface.
However as 'Herb' states in Namespace Principle, 'find' is a part of the interface by being defined in the same namespace as 'vector' namely 'std'.
Use std::find(vec.begin(), vec.end(), value).
And don't forget to include <algorithm>
what's the alternative?
The standard offers std::find, for sequential search over arbitrary sequences of like-elements (or something like that).
This can be applied to all containers supporting iterators, but for internally sorted containers (like std::map) the search can be optimized. In that case, the container offers it's own find member function.
why there is no find for vector in C++?
There was no point in creating a std::vector<???>::find as the implementation would be identical to std::find(vector.begin(), vector.end(), value_to_find);.
Should I write by myself?
No. Unless you have specific limitations or requirements, you should use the STL implementation whenever possible.

Dynamically sorted STL containers

I'm fairly new to the STL, so I was wondering whether there are any dynamically sortable containers? At the moment my current thinking is to use a vector in conjunction with the various sort algorithms, but I'm not sure whether there's a more appropriate selection given the (presumably) linear complexity of inserting entries into a sorted vector.
To clarify "dynamically", I am looking for a container that I can modify the sorting order at runtime - e.g. sort it in an ascending order, then later re-sort in a descending order.
You'll want to look at std::map
std::map<keyType, valueType>
The map is sorted based on the < operator provided for keyType.
Or
std::set<valueType>
Also sorted on the < operator of the template argument, but does not allow duplicate elements.
There's
std::multiset<valueType>
which does the same thing as std::set but allows identical elements.
I highly reccomend "The C++ Standard Library" by Josuttis for more information. It is the most comprehensive overview of the std library, very readable, and chock full of obscure and not-so-obscure information.
Also, as mentioned by 17 of 26, Effective Stl by Meyers is worth a read.
If you know you're going to be sorting on a single value ascending and descending, then set is your friend. Use a reverse iterator when you want to "sort" in the opposite direction.
If your objects are complex and you're going to be sorting in many different ways based on the member fields within the objects, then you're probably better off with using a vector and sort. Try to do your inserts all at once, and then call sort once. If that isn't feasible, then deque may be a better option than the vector for large collections of objects.
I think that if you're interested in that level of optimization, you had better be profiling your code using actual data. (Which is probably the best advice anyone here can give: it may not matter that you call sort after each insert if you're only doing it once in a blue moon.)
It sounds like you want a multi-index container. This allows you to create a container and tell that container the various ways you may want to traverse the items in it. The container then keeps multiple lists of the items, and those lists are updated on each insert/delete.
If you really want to re-sort the container, you can call the std::sort function on any std::deque, std::vector, or even a simple C-style array. That function takes an optional third argument to determine how to sort the contents.
The stl provides no such container. You can define your own, backed by either a set/multiset or a vector, but you are going to have to re-sort every time the sorting function changes by either calling sort (for a vector) or by creating a new collection (for set/multiset).
If you just want to change from increasing sort order to decreasing sort order, you can use the reverse iterator on your container by calling rbegin() and rend() instead of begin() and end(). Both vector and set/multiset are reversible containers, so this would work for either.
std::set is basically a sorted container.
You should definitely use a set/map. Like hazzen says, you get O(log n) insert/find. You won't get this with a sorted vector; you can get O(log n) find using binary search, but insertion is O(n) because inserting (or deleting) an item may cause all existing items in the vector to be shifted.
It's not that simple. In my experience insert/delete is used less often than find. Advantage of sorted vector is that it takes less memory and is more cache-friendly. If happen to have version that is compatible with STL maps (like the one I linked before) it's easy to switch back and forth and use optimal container for every situation.
in theory an associative container (set, multiset, map, multimap) should be your best solution.
In practice it depends by the average number of the elements you are putting in.
for less than 100 elements a vector is probably the best solution due to:
- avoiding continuous allocation-deallocation
- cache friendly due to data locality
these advantages probably will outperform nevertheless continuous sorting.
Obviously it also depends on how many insertion-deletation you have to do. Are you going to do per-frame insertion/deletation?
More generally: are you talking about a performance-critical application?
remember to not prematurely optimize...
The answer is as always it depends.
set and multiset are appropriate for keeping items sorted but are generally optimised for a balanced set of add, remove and fetch. If you have manly lookup operations then a sorted vector may be more appropriate and then use lower_bound to lookup the element.
Also your second requirement of resorting in a different order at runtime will actually mean that set and multiset are not appropriate because the predicate cannot be modified a run time.
I would therefore recommend a sorted vector. But remember to pass the same predicate to lower_bound that you passed to the previous sort as the results will be undefined and most likely wrong if you pass the wrong predicate.
Set and multiset use an underlying binary tree; you can define the <= operator for your own use. These containers keep themselves sorted, so may not be the best choice if you are switching sort parameters. Vectors and lists are probably best if you are going to be resorting quite a bit; in general list has it's own sort (usually a mergesort) and you can use the stl binary search algorithm on vectors. If inserts will dominate, list outperforms vector.
STL maps and sets are both sorted containers.
I second Doug T's book recommendation - the Josuttis STL book is the best I've ever seen as both a learning and reference book.
Effective STL is also an excellent book for learning the inner details of STL and what you should and shouldn't do.
For "STL compatible" sorted vector see A. Alexandrescu's AssocVector from Loki.