How can I efficiently select a Standard Library container in C++11?

How can I efficiently select a Standard Library container in C++11? - c++

There's a well known image (cheat sheet) called "C++ Container choice". It's a flow chart to choose the best container for the wanted usage.
Does anybody know if there's already a C++11 version of it?
This is the previous one:

Not that I know of, however it can be done textually I guess. Also, the chart is slightly off, because list is not such a good container in general, and neither is forward_list. Both lists are very specialized containers for niche applications.
To build such a chart, you just need two simple guidelines:
Choose for semantics first
When several choices are available, go for the simplest
Worrying about performance is usually useless at first. The big O considerations only really kick in when you start handling a few thousands (or more) of items.
There are two big categories of containers:
Associative containers: they have a find operation
Simple Sequence containers
and then you can build several adapters on top of them: stack, queue, priority_queue. I will leave the adapters out here, they are sufficiently specialized to be recognizable.
Question 1: Associative ?
If you need to easily search by one key, then you need an associative container
If you need to have the elements sorted, then you need an ordered associative container
Otherwise, jump to the question 2.
Question 1.1: Ordered ?
If you do not need a specific order, use an unordered_ container, otherwise use its traditional ordered counterpart.
Question 1.2: Separate Key ?
If the key is separate from the value, use a map, otherwise use a set
Question 1.3: Duplicates ?
If you want to keep duplicates, use a multi, otherwise do not.
Example:
Suppose that I have several persons with a unique ID associated to them, and I would like to retrieve a person data from its ID as simply as possible.
I want a find function, thus an associative container
1.1. I couldn't care less about order, thus an unordered_ container
1.2. My key (ID) is separate from the value it is associated with, thus a map
1.3. The ID is unique, thus no duplicate should creep in.
The final answer is: std::unordered_map<ID, PersonData>.
Question 2: Memory stable ?
If the elements should be stable in memory (ie, they should not move around when the container itself is modified), then use some list
Otherwise, jump to question 3.
Question 2.1: Which ?
Settle for a list; a forward_list is only useful for lesser memory footprint.
Question 3: Dynamically sized ?
If the container has a known size (at compilation time), and this size will not be altered during the course of the program, and the elements are default constructible or you can provide a full initialization list (using the { ... } syntax), then use an array. It replaces the traditional C-array, but with convenient functions.
Otherwise, jump to question 4.
Question 4: Double-ended ?
If you wish to be able to remove items from both the front and back, then use a deque, otherwise use a vector.
You will note that, by default, unless you need an associative container, your choice will be a vector. It turns out it is also Sutter and Stroustrup's recommendation.

I like Matthieu's answer, but I'm going to restate the flowchart as this:
When to NOT use std::vector
By default, if you need a container of stuff, use std::vector. Thus, every other container is only justified by providing some functionality alternative to std::vector.
Constructors
std::vector requires that its contents are move-constructible, since it needs to be able to shuffle the items around. This is not a terrible burden to place on the contents (note that default constructors are not required, thanks to emplace and so forth). However, most of the other containers don't require any particular constructor (again, thanks to emplace). So if you have an object where you absolutely cannot implement a move constructor, then you will have to pick something else.
A std::deque would be the general replacement, having many of the properties of std::vector, but you can only insert at either ends of the deque. Inserts in the middle require moving. A std::list places no requirement on its contents.
Needs Bools
std::vector<bool> is... not. Well, it is standard. But it's not a vector in the usual sense, as operations that std::vector normally allows are forbidden. And it most certainly does not contain bools.
Therefore, if you need real vector behavior from a container of bools, you're not going to get it from std::vector<bool>. So you'll have to make due with a std::deque<bool>.
Searching
If you need to find elements in a container, and the search tag can't just be an index, then you may need to abandon std::vector in favor of set and map. Note the key word "may"; a sorted std::vector is sometimes a reasonable alternative. Or Boost.Container's flat_set/map, which implements a sorted std::vector.
There are now four variations of these, each with their own needs.
Use a map when the search tag is not the same thing as the item you're looking for itself. Otherwise use a set.
Use unordered when you have a lot of items in the container and search performance absolutely needs to be O(1), rather than O(logn).
Use multi if you need multiple items to have the same search tag.
Ordering
If you need a container of items to always be sorted based on a particular comparison operation, you can use a set. Or a multi_set if you need multiple items to have the same value.
Or you can use a sorted std::vector, but you'll have to keep it sorted.
Stability
When iterators and references are invalidated is sometimes a concern. If you need a list of items, such that you have iterators/pointers to those items in various other places, then std::vector's approach to invalidation may not be appropriate. Any insertion operation may cause invalidation, depending on the current size and capacity.
std::list offers a firm guarantee: an iterator and its associated references/pointers are only invalidated when the item itself is removed from the container. std::forward_list is there if memory is a serious concern.
If that's too strong a guarantee, std::deque offers a weaker but useful guarantee. Invalidation results from insertions in the middle, but insertions at the head or tail causes only invalidation of iterators, not pointers/references to items in the container.
Insertion Performance
std::vector only provides cheap insertion at the end (and even then, it becomes expensive if you blow capacity).
std::list is expensive in terms of performance (each newly inserted item costs a memory allocation), but it is consistent. It also offers the occasionally indispensable ability to shuffle items around for virtually no performance cost, as well as to trade items with other std::list containers of the same type at no loss of performance. If you need to shuffle things around a lot, use std::list.
std::deque provides constant-time insertion/removal at the head and tail, but insertion in the middle can be fairly expensive. So if you need to add/remove things from the front as well as the back, std::deque might be what you need.
It should be noted that, thanks to move semantics, std::vector insertion performance may not be as bad as it used to be. Some implementations implemented a form of move semantic-based item copying (the so-called "swaptimization"), but now that moving is part of the language, it's mandated by the standard.
No Dynamic Allocations
std::array is a fine container if you want the fewest possible dynamic allocations. It's just a wrapper around a C-array; this means that its size must be known at compile-time. If you can live with that, then use std::array.
That being said, using std::vector and reserveing a size would work just as well for a bounded std::vector. This way, the actual size can vary, and you only get one memory allocation (unless you blow the capacity).

Here is the C++11 version of the above flowchart. [originally posted without attribution to its original author, Mikael Persson]

Here's a quick spin, although it probably needs work
Should the container let you manage the order of the elements?
Yes:
Will the container contain always exactly the same number of elements?
Yes:
Does the container need a fast move operator?
Yes: std::vector
No: std::array
No:
Do you absolutely need stable iterators? (be certain!)
Yes: boost::stable_vector (as a last case fallback, std::list)
No:
Do inserts happen only at the ends?
Yes: std::deque
No: std::vector
No:
Are keys associated with Values?
Yes:
Do the keys need to be sorted?
Yes:
Are there more than one value per key?
Yes: boost::flat_map (as a last case fallback, std::map)
No: boost::flat_multimap (as a last case fallback, std::map)
No:
Are there more than one value per key?
Yes: std::unordered_multimap
No: std::unordered_map
No:
Are elements read then removed in a certain order?
Yes:
Order is:
Ordered by element: std::priority_queue
First in First out: std::queue
First in Last out: std::stack
Other: Custom based on std::vector?????
No:
Should the elements be sorted by value?
Yes: boost::flat_set
No: std::vector
You may notice that this differs wildly from the C++03 version, primarily due to the fact that I really do not like linked nodes. The linked node containers can usually be beat in performance by a non-linked container, except in a few rare situations. If you don't know what those situations are, and have access to boost, don't use linked node containers. (std::list, std::slist, std::map, std::multimap, std::set, std::multiset). This list focuses mostly on small and middle sided containers, because (A) that's 99.99% of what we deal with in code, and (B) Large numbers of elements need custom algorithms, not different containers.

Related

C++: What are the reasons for choosing a linked list / deque over a vector? [duplicate]

There's a well known image (cheat sheet) called "C++ Container choice". It's a flow chart to choose the best container for the wanted usage.
Does anybody know if there's already a C++11 version of it?
This is the previous one:

Here is the C++11 version of the above flowchart. [originally posted without attribution to its original author, Mikael Persson]

Here's a quick spin, although it probably needs work
Should the container let you manage the order of the elements?
Yes:
Will the container contain always exactly the same number of elements?
Yes:
Does the container need a fast move operator?
Yes: std::vector
No: std::array
No:
Do you absolutely need stable iterators? (be certain!)
Yes: boost::stable_vector (as a last case fallback, std::list)
No:
Do inserts happen only at the ends?
Yes: std::deque
No: std::vector
No:
Are keys associated with Values?
Yes:
Do the keys need to be sorted?
Yes:
Are there more than one value per key?
Yes: boost::flat_map (as a last case fallback, std::map)
No: boost::flat_multimap (as a last case fallback, std::map)
No:
Are there more than one value per key?
Yes: std::unordered_multimap
No: std::unordered_map
No:
Are elements read then removed in a certain order?
Yes:
Order is:
Ordered by element: std::priority_queue
First in First out: std::queue
First in Last out: std::stack
Other: Custom based on std::vector?????
No:
Should the elements be sorted by value?
Yes: boost::flat_set
No: std::vector
You may notice that this differs wildly from the C++03 version, primarily due to the fact that I really do not like linked nodes. The linked node containers can usually be beat in performance by a non-linked container, except in a few rare situations. If you don't know what those situations are, and have access to boost, don't use linked node containers. (std::list, std::slist, std::map, std::multimap, std::set, std::multiset). This list focuses mostly on small and middle sided containers, because (A) that's 99.99% of what we deal with in code, and (B) Large numbers of elements need custom algorithms, not different containers.

Prefer unordered_set over vector

Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?

Is it safe to say that if I don't want duplicates in my container, and I don't care about element position as I only want to iterate through the container, then I should use an unordered_set instead of vector?
No, it is not. It depends on many factors. For example if you seldom add new elements but iterate over container quite often it would be preferable to use std::vector and maintain uniqueness manually. There also could be other factors affecting your decision. But normally yes you may prefer std::unordered_set as it simplifies your program.

Not entirely. unordered_sets are not required to be contiguous containers; in the case where you'd frequently want to read all numerous values contained in the set, you may prefer std::vector on time-critic application.
std::unordered_set:
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its value. This allows fast access to individual elements, since once a hash is computed, it refers to the exact bucket the element is placed into.
But in the general case, I'd say Yes.

I generally prefer vector or map. (or in your case, std::set).
Hash tables can be faster than maps/sets (red-black trees), but red-black trees have guaranteed performance 100% of the time. And logarithmic performance is REALLY fast! A hash table kan kill performance when it starts rehashing.
std::vector is the workhorse of the STL and should be your default choice. Vector is very straightforward, and is very cache-friendly

This article by Matt Austern is related to this topic and it is worth reading:
Why you shouldn't use set (and what you should use instead) by Matt Austern
This thread is trying to identify conditions under which unordered_set is preferable over vectors. Similarly, in the above article, the author clearly identifies four conditions, which all need to be satisfied in order to prefer set over a custom but simpler data structure called sorted_vector (last section: What is set good for?). It will be interesting to clearly state a set of conditions for preferring unordered_set over vector.
also, the last paragraph of the article summarizes a useful rule to keep in mind:
Every component in the standard C++ library is there because it's useful for some purpose, but sometimes that purpose is narrowly defined and rare. As a general rule you should always use the simplest data structure that meets your needs. The more complicated a data structure, the more likely that it's not as widely useful as it might seem.

Of course yes. If you do not want duplicates, you have to use a key-aware container, and since unordered_* totally win over their tree-based counterparts, this is pretty much your only choice.

iterate ordered versus unordered containers

I want to know which data-structures are more efficient for iterating through their elements between std::set, std::map and std::unordered_set, std::unordered_map.
I searched through SO and I found this question. The answers either propose to copy the elements in a std::vector or to use Boost.Container, which IMHO don't answer my question.
My purpose is to keep in a container a big number of unique elements, that most of the time I want to iterate through them. Insertions and extractions are more rare. I want to avoid std::vector in combination with std::unique.

Lets consider set vs unordered_set.
The main difference here is the 'nature' of the iteration, that is the traversal of the set will give you the elements in order while traversing a range in an unordered set will give you a bunch of values in no particular order.
Suppose you want to traverse a range [it1, it2]. If we exclude the lookup time that's needed to find elements it1 and it2 there can be no direct mapping from one case to another since the elements in between are not guarrandeed to be the same even if you've used the same elements to construct the container.
There are cases however where something like this has meaning when e.g. you want to traverse a fixed number of elements (regardless of what they are) or when you need to traverse the whole container. In such cases you need to consider implementation mechanics :
Sets are usually implemented like Red–black trees (a form of binary search trees). Like all binary search trees allow efficient in-order traversal (LRR: left root right) of their elements. That is to traverse you pay the cost of pointer chasing (just like traversing a list).
Unordered sets on the other hand are hash tables and to my knowledge the STL implementation uses hashing with chaining. That means (in a very very high level) that what's used for the structure is a (contiguous) buffer where each element is the head of a chain (list) that contains the elements. The way the elements are layed out across those chains (buckets) and across the buffer will affect the traversal time, however you'll be chasing pointers once again jumping through differents lists as well this time. I don't think it'll vary significantly from the tree case but won't be any better for sure.
In any case micro tuning and benchmarking will give you the answer for your particular application.

The difference does not lie between the ordering or lack of one but in the backing container. If it's a contiguous memory it should be fast to iterate over, due to simple implementation of iterator and cache friendliness.
Unordered containers are usually stored as a vector of vectors (or a similar thing), while ordered containers are implemented using trees, but it is left for implementation after all. This would suggest that iterating over unordered version should be waster. However this is left for implementation after all, and I saw implementations (which bent rules a little to be fair) with different behaviour.
Generally speaking, container performance is quite a complex topic and usually has to be tested in actual application to get reliable answer. There is plenty on implemention-defined stuff that might affect the performance. I'd go with hash_set if I had to go in blind. Copying into a vector might also turn out a good option.
EDIT: As #TonyD said in it's comment, there is a rule, that disallows invalidating iterators during addition of element when the max_load_factor() is not exceeded, this practically rules out backing containers which are contiguous in memory.
Thus, copying everything into a vector seems like even more reasonable option. If you need to remove duplicates, a feasible option might be to use http://en.cppreference.com/w/cpp/algorithm/sort and have dupes easily ignored. I have heard that using vector and sort to have a sorted array (or vector) is quite often a used option in case of need for a container that needs to be sorter and is being iterated over more often than modified.

iterate from fastest to slowest should be : set > map > unordered_set > unordered_map;
set is a little lighter than map, and they are ordered with binary tree rule, so should be faster than unordered_ containers.

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?

std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.

std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.

Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).

In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector

It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.

std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.

To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

Is there ever a reason to use std::list? [duplicate]

This question already has answers here:
Under what circumstances are linked lists useful?
(17 answers)
Closed 9 years ago.
After having read this question and looking at some results here, it seems like one should altogether completely avoid lists in C++. I always expected that linked lists would be the containers of choice for cases where I only need to iterate over all the contents because insertion is a matter of pointer manipulation and there is never a need to reallocate.
Apparently, because of "cache locality," lists are iterated over very slowly, so any benefit from having to use less reserve memory or faster addition (which is not that much faster, it seems, from the second link) doesn't seem worth it.
Having said that, when should I, from a performance standpoint, use std::list over std::deque or, if possible, std::vector?
On a side note, will std::forward_list also have lots of cache misses?

std::list is useful in a few corner cases.
However, the general rule of C++ sequential containers is "if your algorithm are compatible, use std::vector. If your algorithms are not compatible, modify your algorithms so you can use std::vector."
Exceptions exist, and here is an attempt to exhaustively list reasons that std::list is a better choice:
When you need to (A) insert into the middle of the container and (B) you need each objects location in memory to be stable. Requirement (B) can usually be removed by having a non-stable container consisting of pointers to elements, so this isn't a strong reason to use std::list.
When you need to (A) insert or delete in the middle of the container (B) orders of magnitude more often than you need to iterate over the container. This is also an extreme corner case: in order to find the element to delete from a list, you usually need to iterate!
Which leads to
you need (A) insert or delete in the middle of the container and (B) have iterators to all other elements remain valid. This ends up being a hidden requirement of case 1 and case 2: it is hard to delete or insert more often than you iterate when you don't have persistant iterators, and the stability of iterators and of objects is highly related.
the final case, is the case of splicing was once a reason to use std::list.
Back in C++03, all versions of std::list::splice could (in theory) be done in O(1) time. However, the extremely efficient forms of splice required that size be an O(n) operation. C++11 has required that size on a list be O(1), so splice's extreme efficiency is limited to the "splicing an entire other list" and "self-splicing a sublist" case. In the case of single element splice, this is just an insert and delete. In the case of sub range splice, the code now has to visit every node in the splice just to count them in order to maintain size as O(1) (except in self-splicing).
So, if you are doing only whole-list splices, or self-list subrange splices, list can do these operations much faster than other non-list containers.

When should I, from a performance standpoint, use std::list
From a performance standpoint, rarely. The only situation that comes to mind is if you have many lists that you need to split and join to form other lists; a linked list can do this without allocating memory or moving objects.
The real benefit of list is stability: elements don't need to be movable, and iterators and references are never invalidated unless they refer to an element that's been erased.
On a side note, will std::forward_list also have lots of cache misses?
Yes; it's still a linked list.

The biggest improvement that std::list can provide is when you're moving one or more elements from the middle of one list into another list. This splice operation is extremely efficient on list while it may involve allocation and movement of items in random access containers such as vector. That said, this comes of very rarely and much of the time vector is your best container in terms of performance an simplicity.
Always profile your code if you suspect performance problems with a container choice.

You chose std::list over std::deque and std::vector (and other contiguous memory containers) when you frequently add/remove items in the middle of your container, see also.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js