In C++ Primer 5th, it says that the default implementation of stack and queue is deque.
I'm wondering why they don't use list? Stack and Queue doesn't support random access, always operate on both ends, therefore list should be the most intuitive way to implement them, and deque, which support random access (with constant time) is somehow not necessary.
Could anyone explain the reason behind this implementation?
With std::list as underlying container each std::stack::push does a memory allocation. Whereas std::deque allocates memory in chunks and can reuse its spare capacity to avoid the memory allocation.
With small elements the storage overhead of list nodes can also become significant. E.g. std::list<int> node size is 24 bytes (on a 64-bit system), with only 4 bytes occupied by the element - at least 83% storage overhead.
I think the question should be asked the other way around: Why use a list if you can use an array?
The list is more complicated: More allocations, more resources (for storing pointers) and more work to do (even if it is all in constant time). On the other hand, the main property for preferring lists is also not relevant to stacks and queues: constant-time random insertion and deletion.
the main reason is because deque is faster than the list in average for front and back insertions and deletions
deque is faster because memory is allocated by chunks were as list need allocation on each elements and allocation are costly operations.
benchmark
Let's compare the sequence containers:
std::array is right out, it doesn't change size.
std::list optimises for iterator non-invalidation, allows insertion at known positions, and lacks random access. It has O(N) space overhead, with a large constant, and it has bad cache locality.
std::forward_list is an even more crippled list with smaller space overhead
std::deque optimises for appending or prepending, at the expense of not being contiguous. It has O(N) space overhead, with a smaller constant, and it has mediocre cache locality.
std::vector optimises for access speed, at the expense of insertion / removal anywhere but the end. It has O(1) space overhead, and the great cache locality.
So what does this mean for stack and queue?
std::stack only requires operations at one end. std::vector, std::deque and std::list all provide the necessary operations
std::queue requires operations at both ends. std::deque and std::list are the only candidates.
The choice of std::deque as the default is then one of consistency, as std::vector is generally better for std::stack, but inapplicable for std::queue.
Note that std::priority_queue, whilst named similar to std::queue, is actually more akin to std::stack in requiring only modification at one end. It also benefits more from the raw access speed of std::vector, maintaining the heap invariant.
Related
I've been told that std::vector has a C-style array on the inside implementation, but would that not negate the entire purpose of having a dynamic container?
So is inserting a value in a vector an O(n) operation? Or is it O(1) like in a linked-list?
From the C++11 standard, in the "sequence containers" library section (emphasis mine):
[23.3.6.1 Class template vector overview][vector.overview]
A vector is a sequence container that supports (amortized) constant time insert and erase operations at the
end; insert and erase in the middle take linear time. Storage management is handled automatically, though
hints can be given to improve efficiency.
This does not defeat the purpose of dynamic size -- part of the point of vector is that not only is it very fast to access a single element, but scanning over the vector has very good memory locality because everything is tightly packed together. In practice, having good memory locality is very important because it greatly reduces cache misses, which has a large impact on runtime. This is a major advantage of vector over list in many situations, particularly those where you need to iterate over the entire container more often than you need to add or remove elements.
The memory in a std::vector is required to be contiguous, so it's typically represented as an array.
Your question about the complexity of the operations on a std::vector is a good one - I remember wondering this myself when I first started programming. If you append an element to a std::vector, then it may have to perform a resize operation and copy over all the existing elements to a new array. This will take time O(n) in the worst case. However, the amortized cost of appending an element is O(1). By this, we mean that the total cost of any sequence of n appends to a std::vector is always O(n). The intuition behind this is that the std::vector usually overallocates space in its array, leaving a lot of free slots for elements to be inserted into without a reallocation. As a result, most of the appends will take time O(1) even though every now and then you'll have one that takes time O(n).
That said, the cost of performing an insertion elsewhere in a std::vector will be O(n), because you may have to shift everything down.
You also asked why this is, if it defeats the purpose of having a dynamic array. Even if the std::vector just acted like a managed array, it's still a win over raw arrays. The std::vector knows its size, can do bounds-checking (with at), is an actual object (unlike an array), and doesn't decay to a pointer. These extra features - coupled with the extra logic to make appends work quickly - are almost always worth it.
I read here from the accepted answer that a std::deque has the following characteristic
1- Random access - constant O(1)
2- Insertion or removal of elements at the end or beginning - amortized constant O(1)
3- Insertion or removal of elements - linear O(n)
My question is about point 2. How can a deque have an amortized constant insertion at the end or beginning?
I understand that a std::vector has an amortized constant time complexity for insertions at the end. This is because a vector is continguous and is a dynamic array. So when it runs out of memory for a push_back at the end, it would allocate a whole new block of memory, copy existing items from the old location to the new location and then delete items from the old location. This operation I understand is amortized constant. How does this apply to a deque ? How can insertions at the top and bottom of a deque be amortized constant. I was under the impression that it was supposed to be constant O(1). I know that a deque is composed of memory chunks.
The usual implementation of a deque is basically a vector of pointers to fixed-sized nodes.
Allocating the fixed-size node clearly has constant complexity, so that's pretty easy to handle--you just amortize the cost of allocating a single node across the number of items in that node to get a constant complexity for each.
The vector of pointers part is what's (marginally) more interesting. When we allocate enough of the fixed-size nodes that the vector of pointers is full, we need to increase the size of the vector. Like std::vector, we need to copy its contents to the newly allocated vector, so its growth must follow a geometric (rather than arithmetic) progression. This means that we have more pointers to copy we do the copying less and less frequently, so the total time devoted to copying pointers remains constant.
As a side note: the "vector" part is normally treated as a circular buffer, so if you're using your deque as a queue, constantly adding to one end and removing from the other does not result in re-allocating the vector--it just means moving the head and tail pointers that keep track of which of the pointers are "active" at a given time.
The (profane) answer lies in containers.requirements.general, 23.2.1/2:
All of the complexity requirements in this Clause are stated solely in
terms of the number of operations on the contained objects.
Reallocating the array of pointers is hence not covered by the complexity guarantee of the standard and may take arbitrarily long. As mentioned before, it likely adds an amortized constant overhead to each push_front()/push_back() call (or an equivalent modifier) in any "sane" implementation. I would not recommend using deque in RT-critical code though. Typically, in an RT scenario, you don't want to have unbounded queues or stacks (which in C++ by default use deque as the underlying container) anyway, neither memory allocations that could fail, so you will be most likely using a preallocated ring buffer (e.g. Boost's circular_buffer) instead.
I know that deque is more efficient than vector when insertions are at front or end and vector is better if we have to do pointer arithmetic. But which one to use when we have to perform insertions in middle.? and Why.?
You might think that a deque would have the advantage, because it stores the data broken up into blocks. However to implement operator[] in constant time requires all those blocks to be the same size. Inserting or deleting an element in the middle still requires shifting all the values on one side or the other, same as a vector. Since the vector is simpler and has better locality for caching, it should come out ahead.
Selection criteria with Standard library containers is, You select a container depending upon:
Type of data you want to store &
The type of operations you want to perform on the data.
If you want to perform large number of insertions in the middle you are much better off using a std::list.
If the choice is just between a std::deque and std::vector then there are a number of factors to consider:
Typically, there is one more indirection in case of deque to access the elements, so element
access and iterator movement of deques are usually a bit slower.
In systems that have size limitations for blocks of memory, a deque might contain more elements because it uses more than one block of memory. Thus, max_size() might be larger for deques.
Deques provide no support to control the capacity and the moment of reallocation. In
particular, any insertion or deletion of elements other than at the beginning or end
invalidates all pointers, references, and iterators that refer to elements of the deque.
However, reallocation may perform better than for vectors, because according to their
typical internal structure, deques don't have to copy all elements on reallocation.
Blocks of memory might get freed when they are no longer used, so the memory size of a
deque might shrink (this is not a condition imposed by standard but most implementations do)
std::deque could perform better for large containers because it is typically implemented as a linked sequence of contiguous data blocks, as opposed to the single block used in an std::vector. So an insertion in the middle would result in less data being copied from one place to another, and potentially less reallocations.
Of course, whether that matters or not depends on the size of the containers and the cost of copying the elements stored. With C++11 move semantics, the cost of the latter is less important. But in the end, the only way of knowing is profiling with a realistic application.
Deque would still be more efficient, as it doesn't have to move half of the array every time you insert an element.
Of course, this will only really matter if you consider large numbers of elements, and even than it is advisable to run a benchmark and see which one works better in your particular case. Remember that premature optimization is the root of all evil.
What I need is just a dynamically growing array. I don't need random access, and I always insert to the end and read it from the beginning to the end.
slist seems to be the first choice, because it provides just enough of what I need. However, I can't tell what benefit I get by using slist instead of vector. Besides, several materials I read about STL say, "vectors are generally the most efficient in time for accessing elements and to add or remove elements from the end of the sequence". Therefore, my question is: for my needs, is slist really a better choice than vector? Thanks in advance.
For starters, slist is non-standard.
For your choice, a linked list will be slower than a vector, count on it. There are two reasons contributing to this:
First and foremost, cache locality; vectors store their elements linearly in the RAM which facilitates caching and pre-fetching.
Secondly, appending to a linked list involves dynamic allocations which add a large overhead. By contrast, vectors most of the time do not need to allocate memory.
However, a std::deque will probably be even faster. In-depth performance analysis has shown that, despite bias to the contrary, std::deque is almost always superior to std::vector in performance (if no random access is needed), due to its improved (chunked) memory allocation strategy.
Yes, if you are always reading beginning to end, slist (a linked list) sounds like the way to go. The possible exception is if you will be inserting a large quantity of elements at the end at the same time. Then, the vector could be better if you use reserve appropriately.
Of course, profile to be sure it's better for your application.
Matt Austern (author of "Generic Programming and the STL" and general C++ guru) is a strong advocate of singly-linked lists for inclusion in the forthcoming C++ standard; see his presentation at http://www.accu-usa.org/Slides/SinglyLinkedLists.ppt and his long article at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm for more details, including a discussion of the trade-offs involved that may guide you in possibly choosing this data structure. (Note that the currently proposed name is forward_list, though slist is how it was traditionally named in SGI's STL & other popular libraries).
I'll second (or maybe third...) the opinion that std::vector or std::deque will do the job. The only thing that I will add is a few additional factors that should guide the decision between std::vector<T> and std::list<T>. These have a lot to do with the characteristics of T and what algorithms you plan on using.
The first is memory overhead. Std::list is a node-based container so if T is a primitive type or relatively small user-defined type, then the memory overhead of the node-based linkage might be non-negligible - consider that std::list<int> is likely to use at least 3 * sizeof(int) storage for each element whereas std::vector will only use sizeof(int) storage with a small header overhead. Std::deque is similar to std::vector but has a small overhead that is linear to N.
The next issue is the cost of copy construction. If T(T const&) is at all expensive, then steer clear of std::vector<T> since it cause a bunch of copies to occur as the size of the vector grows. This is where std::deque<T> is a clear winner and std::list<T> is also a contender.
The final issue that usually guides the decision on container type is whether your algorithms can work with the iterator invalidation constraints of std::vector and std::deque. If you will be manipulating the container elements a lot (e.g., sorting, inserting in the middle, or shuffling), then you might want to lean towards std::list since manipulating the order requires little more than resetting a few linkage pointers.
I'm guessing you mean std::list by "slist". Vectors are good when you need fast, random-access to a sequence of elements, with guaranteed contiguous memory, and fast sequential reading (IOW, from the beginning to the end). Lists are good when you need fast (constant-time) insertion or deletion of items at the beginning or end of the sequence, but don't care about the performance of random-access or sequential reading.
The reason for the difference is the way the 2 are implemented. Vectors are implemented internally as an array of items, which needs to be reallocated when its size/capacity is reached on adding an item. Lists are implemented as a doubly-linked list, which can cause cache-misses for sequential reading. Random-access for lists also requires scanning from the first (or last) item in the list, until it locates the item you're requesting.
Sounds like a good job for std::deque to me. It has the memory benefits of a vector like contiguous memory allocation for each "slab" (good for CPU caches), no overhead for each element like with std::list and it does not need to reallocate the whole set as std::vector does. Read more about std::deque here
Since the only operations required for a container to be used in a stack are:
back()
push_back()
pop_back()
Why is the default container for it a deque instead of a vector?
Don't deque reallocations give a buffer of elements before front() so that push_front() is an efficient operation? Aren't these elements wasted since they will never ever be used in the context of a stack?
If there is no overhead for using a deque this way instead of a vector, why is the default for priority_queue a vector not a deque also? (priority_queue requires front(), push_back(), and pop_back() - essentially the same as for stack)
Updated based on the Answers below:
It appears that the way deque is usually implemented is a variable size array of fixed size arrays. This makes growing faster than a vector (which requires reallocation and copying), so for something like a stack which is all about adding and removing elements, deque is likely a better choice.
priority_queue requires indexing heavily, as every removal and insertion requires you to run pop_heap() or push_heap(). This probably makes vector a better choice there since adding an element is still amortized constant anyways.
As the container grows, a reallocation for a vector requires copying all the elements into the new block of memory. Growing a deque allocates a new block and links it to the list of blocks - no copies are required.
Of course you can specify that a different backing container be used if you like. So if you have a stack that you know is not going to grow much, tell it to use a vector instead of a deque if that's your preference.
See Herb Sutter's Guru of the Week 54 for the relative merits of vector and deque where either would do.
I imagine the inconsistency between priority_queue and queue is simply that different people implemented them.