What is the difference between the two? I mean the methods are all the same. So, for a user, they work identically.
Is that correct??
Let me list down the differences:
Deque manages its elements with a
dynamic array, provides random
access, and has almost the same
interface as a vector.
List manages its elements as a
doubly linked list and does not
provide random access.
Deque provides Fast insertions and deletions at
both the end and the beginning. Inserting and deleting elements in
the middle is relatively slow because
all elements up to either of both
ends may be moved to make room or to
fill a gap.
In List, inserting and removing elements is fast at each position,
including both ends.
Deque: Any insertion or deletion of elements
other than at the beginning or end
invalidates all pointers, references,
and iterators that refer to elements
of the deque.
List: Inserting and deleting elements does
not invalidate pointers, references,
and iterators to other elements.
Complexity
Insert/erase at the beginning in middle at the end
Deque: Amortized constant Linear Amortized constant
List: Constant Constant Constant
From the (dated but still very useful) SGI STL summary of deque:
A deque is very much like a vector: like vector, it is a sequence that supports random access to elements, constant time insertion and removal of elements at the end of the sequence, and linear time insertion and removal of elements in the middle.
The main way in which deque differs from vector is that deque also supports constant time insertion and removal of elements at the beginning of the sequence. Additionally, deque does not have any member functions analogous to vector's capacity() and reserve(), and does not provide any of the guarantees on iterator validity that are associated with those member functions.
Here's the summary on list from the same site:
A list is a doubly linked list. That is, it is a Sequence that supports both forward and backward traversal, and (amortized) constant time insertion and removal of elements at the beginning or the end, or in the middle. Lists have the important property that insertion and splicing do not invalidate iterators to list elements, and that even removal invalidates only the iterators that point to the elements that are removed. The ordering of iterators may be changed (that is, list::iterator might have a different predecessor or successor after a list operation than it did before), but the iterators themselves will not be invalidated or made to point to different elements unless that invalidation or mutation is explicit.
In summary the containers may have shared routines but the time guarantees for those routines differ from container to container. This is very important when considering which of these containers to use for a task: taking into account how the container will be most frequently used (e.g., more for searching than for insertion/deletion) goes a long way in directing you to the right container.
std::list is basically a doubly linked list.
std::deque, on the other hand, is implemented more like std::vector. It has constant access time by index, as well as insertion and removal at the beginning and end, which provides dramatically different performance characteristics than a list.
I've made illustrations for the students in my C++ class.
This is based (loosely) on (my understanding of) the implementation in the GCC STL implementation (
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_deque.h and https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_list.h)
Double-ended queue
Elements in the collection are stored in memory blocks. The number of elements per block depends on the size of the element: the bigger the elements, the fewer per block. The underlying hope is that if the blocks are of a similar size no matter the type of the elements, that should help the allocator most of the time.
You have an array (called a map in the GCC implementation) listing the memory blocks. All memory blocks are full except the first one which may have room at the beginning and the last which may have room at the end. The map itself is filled from the center outwards. This is how, contrarily to a std::vector, insertion at both ends can be done in constant time. Similarly to a std:::vector, random access is possible in constant time, but requires two indirections instead of one. Similarly to std::vector and contrarily to std::list, removing or inserting elements in the middle is costly because you have to reorganize a large part of the datastructure.
Doubly-linked list
Doubly-linked list are perhaps more usual. Every element is stored in its own memory block, allocated independently from the other elements. In each block, you have the element's value and two pointers: one to the previous element, one to the next element. It makes it very easy to insert element at any position in the list, or even to move a subchain of elements from one list to another (an operation called splicing): you just have to update the pointers at the beginning and end of the insertion point. The downside is that to find one element by its index, you have to walk the chain of pointers, so random access has a linear cost in the numbers of elements in the list.
Another important guarantee is the way each different container stores its data in memory:
A vector is a single contiguous memory block.
A deque is a set of linked memory blocks, where more than one element is stored in each memory block.
A list is a set of elements dispersed in memory, i.e.: only one element is stored per memory "block".
Note that the deque was designed to try to balance the advantages of both vector and list without their respective drawbacks. It is a specially interesting container in memory limited platforms, for example, microcontrollers.
The memory storage strategy is often overlooked, however, it is frequently one of the most important reasons to select the most suitable container for a certain application.
No. A deque only supports O(1) insertion and deletion at the front and back. It may, for example, be implemented in a vector with wrap-around. Since it also guarantees O(1) random access, you can be sure it's not using (just) a doubly linked list.
Among eminent differences between deque and list
For deque :
Items stored side by side;
Optimized for adding datas from two sides (front, back);
Elements indexed by numbers (integers).
Can be browsed by iterators and even by element's index.
Time access to data is faster.
For list
Items stored "randomly" in the memory;
Can be browsed only by iterators;
Optimized for insertion and removal in the middle.
Time access to data is slower, slow to iterate, due to its very poor spatial locality.
Handles very well large elements
You can check also the following Link, which compares the performance between the two STL containers (with std::vector)
Hope i shared some useful informations.
The performance differences have been explained well by others. I just wanted to add that similar or even identical interfaces are common in object-oriented programming -- part of the general methodology of writing object-oriented software. You should IN NO WAY assume that two classes work the same way simply because they implement the same interface, any more than you should assume that a horse works like a dog because they both implement attack() and make_noise().
Here's a proof-of-concept code use of list, unorded map that gives O(1) lookup and O(1) exact LRU maintenance. Needs the (non-erased) iterators to survive erase operations. Plan to use in a O(1) arbitrarily large software managed cache for CPU pointers on GPU memory. Nods to the Linux O(1) scheduler (LRU <-> run queue per processor). The unordered_map has constant time access via hash table.
#include <iostream>
#include <list>
#include <unordered_map>
using namespace std;
struct MapEntry {
list<uint64_t>::iterator LRU_entry;
uint64_t CpuPtr;
};
typedef unordered_map<uint64_t,MapEntry> Table;
typedef list<uint64_t> FIFO;
FIFO LRU; // LRU list at a given priority
Table DeviceBuffer; // Table of device buffers
void Print(void){
for (FIFO::iterator l = LRU.begin(); l != LRU.end(); l++) {
std::cout<< "LRU entry "<< *l << " : " ;
std::cout<< "Buffer entry "<< DeviceBuffer[*l].CpuPtr <<endl;
}
}
int main()
{
LRU.push_back(0);
LRU.push_back(1);
LRU.push_back(2);
LRU.push_back(3);
LRU.push_back(4);
for (FIFO::iterator i = LRU.begin(); i != LRU.end(); i++) {
MapEntry ME = { i, *i};
DeviceBuffer[*i] = ME;
}
std::cout<< "************ Initial set of CpuPtrs" <<endl;
Print();
{
// Suppose evict an entry - find it via "key - memory address uin64_t" and remove from
// cache "tag" table AND LRU list with O(1) operations
uint64_t key=2;
LRU.erase(DeviceBuffer[2].LRU_entry);
DeviceBuffer.erase(2);
}
std::cout<< "************ Remove item 2 " <<endl;
Print();
{
// Insert a new allocation in both tag table, and LRU ordering wiith O(1) operations
uint64_t key=9;
LRU.push_front(key);
MapEntry ME = { LRU.begin(), key };
DeviceBuffer[key]=ME;
}
std::cout<< "************ Add item 9 " <<endl;
Print();
std::cout << "Victim "<<LRU.back()<<endl;
}
Related
I have a function that stores a lot of small objects (~16 bytes) in a vector, but it doesn't know in advance how many objects will be stored (imagine a recursive descent parser storing tokens for example).
std::vector<SmallObject> getObjects();
This is quite slow because of all the reallocation and copying (and apparently C++ even has to invoke the copy constructors if you don't use an optimised version (see "Object Relocation").
There must be a better way to do things like this where all I am doing to construct the vector is appending things. For example I could have a singly linked list of blocks that are filled, and convert everything to a single vector at the end, so everything only has to be copied once.
Is there anything in Boost or the standard C++ library that would help with this? Or any particularly clever algorithms?
Edit: To be more concrete:
struct SmallObject {
unsigned id;
boost::icl::discrete_interval<unsigned> ival;
};
The question which container is most efficient is always best answered by "it depends" and "measure it!".
Without any more information about your specific situation, there are two 'obvious' possibilities:
Use a linked list
The STL has two linked lists by default: a singly linked list std::forward_list and a doubly linked list std::deque. Moreover there is std::list which is usually the doubly-linked variant. Some quotes from the documentation:
std::forward [...] is implemented as a singly-linked list and essentially does not have any overhead compared to its implementation in C. Compared to std::list this container provides more space efficient storage when bidirectional iteration is not needed.
std::list [...] is usually implemented as a doubly-linked list. Compared to std::forward_list this container provides bidirectional iteration capability while being less space efficient.
std::deque (double-ended queue) [..] insertion and deletion at either end of a deque never invalidates pointers or references to the rest of the elements.
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays
Reserve space in a vector
If there is any way you can estimate an upper bound on the number of objects you will want to store, you can use that to reserve some space in advance.
For example, if you're reading these objects from a file, the number of objects may be at most the file size divided by 16, or the number of lines times two, or some other quick and easy calculation that you can do before constructing these objects.
In that case, if you reserve the capacity, you will allocate too much memory but prevent moves. Even if the upper bound is a bit too low, that's OK: you may still need to double the capacity once or twice but at least you prevent all the small increases (2 -> 4 -> 10 -> 16) at the start of the loop.
Is there any other constant time way to split a vector other than using the following.
std::vector<int> v_SplitVector(start , end);
This would take a complexity of O(N). In this case O(end - start). Is there a constant time operation to do this.
OR am I using the wrong container for the task?..
The act of "splitting" a container, for container like vectors, where elements sits on contiguous memory, require necessarily a copy / move of everything needs to go on the other side.
Container like list, that have elements each on its own memory block can be easily rearranged (see std::list::splice)
But having elements in non contiguous memory may result in lower memory access performance due to more frequent cache missing.
In other words, the complexity of the algorithm may be not the only factor influencing performance: an infrequent linear copy may damage you less than a frequent linear walk on dispersed elements.
The trade-off mostly depends on how the hardware manage caches and how the std implementation you are using takes care of that (and how the compiler can eventually optimize)
This is a copy rather than a split, hence the complexity. You can probably write a split for list which might perform better.
std::vector doesn't support the following, but if an efficient "split" operation is very important to you then you could perhaps write your own container. This would be quite a lot of work.
You could define "split" as follows:
removes an initial segment of the container, and returns a new container containing those elements. References to those elements continue to refer to the same elements in the new container. The old container contains the remaining elements. The capacity of the new container is equal to its size, and the capacity of the old container is reduced by the number of elements removed.
Then the old container and the new container would share a block of underlying storage (presumably with ref-counting). The new container would have to reallocate if you append to it (since the memory immediately at the end of its elements is in use), but so long as that happens rarely or never it could be a win.
Your example code takes a copy, though, it doesn't modify the original container. If a logical copy is a requirement then to do it without actually copying the elements you need either COW or immutable objects.
std::list has a splice() function that can move a range of elements from one list to another. This avoids copying the elements, but as of C++11 it is in effect guaranteed not to be O(1), because it needs to count how many elements it has moved. In C++03 implementations could choose whether they wanted this op to be O(1) or list::size() to be O(1), but in C++11 size() is required to be constant time for all containers.
Comparing the performance of std::vector with std::list is usually about more than just one operation, though. You have to consider that list doesn't have random-access iterators, and so on.
Creating a new std::vector necessarily requires copying, since
vectors aren't allowed to share parts of their implementation.
A modification in the container from which you obtained start
and end shouldn't affect the values in splitVector.
What you can do, fairly easily, is create a View container,
which simply holds the two iterators, and maps all accesses
through them. Something along the lines of:
template <typename Iterator>
class View
{
Iterator myBegin;
Iterator myEnd;
public:
typedef typename std::iterator_traits<Iterator>::value_type value_type;
// And the other usual typedefs, including...
typedef Iterator iterator;
View( Iterator begin, Iterator end )
: myBegin( begin )
, myEnd( end )
{
}
iterator begin() { return myBegin; }
iterator end() { return myEnd; }
value_type operator[]( ptrdiff_t index ) { return *(myBegin + index ); }
// ...
};
This requires a fair amount of boilerplate, because the
interface to something like vector is rather complete, but it's
all very straight forward and simple. The one thing you cannot
do with this, however, is modify the topology of either the
underlying container or of any View—anything which might
invalidate any iterators will of course, wreck havoc.
When adding or removing elements to/from a place different than start/end, the vector must have complexity of at least o(n) due to internal shifts required. The sme follows when you want to not only remove, but move the elements out: for a vector, they must be copied, hence, at least 1 op per element moved. That means that moving elements out of a vector is at least O(N) where N is the amount of elements moved.
If you need near-constant time add/remove operations (be it adding/inserting one, or many elements) you should look at list/linkedlist containers, where all elements and sublists are easily 'detachable', especially if you know the pointer/iterator. Or trees, or any other dynamic structure.
completely by the way, I sense what v_SplitVector does, but where did it came from? I do not remember such function/method in stdlib or boost?
Should I use deque instead of vector if i'd like to push elements also in the beginning of the container? When should I use list and what's the point of it?
Use deque if you need efficient insertion/removal at the beginning and end of the sequence and random access; use list if you need efficient insertion anywhere, at the sacrifice of random access. Iterators and references to list elements are very stable under almost any mutation of the container, while deque has very peculiar iterator and reference invalidation rules (so check them out carefully).
Also, list is a node-based container, while a deque uses chunks of contiguous memory, so memory locality may have performance effects that cannot be captured by asymptotic complexity estimates.
deque can serve as a replacement for vector almost everywhere and should probably have been considered the "default" container in C++ (on account of its more flexible memory requirements); the only reason to prefer vector is when you must have a guaranteed contiguous memory layout of your sequence.
deque and vector provide random access, list provides only linear accesses. So if you need to be able to do container[i], that rules out list. On the other hand, you can insert and remove items anywhere in a list efficiently, and operations in the middle of vector and deque are slow.
deque and vector are very similar, and are basically interchangeable for most purposes. There are only two differences worth mentioning. First, vector can only efficiently add new items at the end, while deque can add items at either end efficiently. So why would you ever use a vector then? Unlike deque, vector guarantee that all items will be stored in contiguous memory locations, which makes iterating through them faster in some situations.
I learned the complexity of deque::insert() from the C++ standard 2003 (chapter 23.2.1.3) as follows:
In the worst case, inserting a single element into a deque takes time linear in the minimum of the distance from the insertion point to the beginning of the deque and the distance from the insertion point to the end of the deque.
I always understand the implementation of stl deque as a collection of memory chunks. Hence an insertion will only affect the elements in the same memory chunk as the insertion position. My question is, what does the standard mean by "linear in the minimum of the distance from the insertion point to the beginning of the deque and the distance from the insertion point to the end of the deque"?
My understanding is because C++ standard does not enforce a certain implementation of deque. The complexity is just in general for the worst case. However, in the actual implementation in compilers, it is linear to the number of elements in a memory chunk, which may vary for different element sizes.
Another guess might be that, since insert() will invalidate all iterators, deque needs to update all iterators. Therefore it's linear.
std::deque is normally (always?) implemented as a collection of chunks of memory, but it won't normally insert a whole new chunk just for you to insert one new element in the middle of the collection. As such, it'll find whether the insertion point is closer to the beginning or end, and shuffle the existing elements to make room for the new element in an existing chunk. It'll only add a new chunk of memory at the beginning or end of the collection.
I think you'd be better served by a diagram... let's play with ASCII art!
A deque is usually an array of memory chunks, but all apart the front and back memory chunks are full. This is necessary because a deque is a RandomAccessContainer, and to get O(1) access to any container, you cannot have an unbounded number of containers from which to read the size:
bool size() const { return first.size + (buckets.size()- 2) * SIZE + last.size; }
T& operator[](size_t i) {
if (i < first.size) { return first[SIZE - i]; }
size_t const correctedIndex = i - first.size;
return buckets[correctedIndex / SIZE][correctedIndex % SIZE];
}
Those operations are O(1) because of the multiplication/division!
In my example, I'll suppose that a memory chunk is full when it contains 8 elements. In practice, nobody said the size should be fixed, just that all inner buckets shall have the same size.
// Deque
0: ++
1: ++++++++
2: ++++++++
3: ++++++++
4: +++++
Now say that we want to insert at index 13. It falls somewhere in the bucket labelled 2. There are several strategies we can think about:
extend bucket 2 (only)
introduce a new bucket either before or after 2 and shuffle only a few elements
But those two strategies would violate the invariant that all "inner" buckets have the same number of elements.
Therefore we are left with shuffling the elements around, either toward the beginning or the end (whichever is cheaper), in our case:
// Deque
0: +++
1: ++++++++
2: +O++++++
3: ++++++++
4: +++++
Note how bucket 0 grew.
This shuffle implies that, in the worst case, you'll move half the elements: O(N/2).
deque has O(1) insert at either the beginning or the end though, because there it's just a matter of adding the element in the right place or (if the bucket is full) creating a new bucket.
There are other containers that have better insert/erase behavior at random indices, based on B+ Trees. In an indexed B+ Tree you can, instead of a "key" (or in parallel) maintain internally a count of the elements prior to a certain position. There are various technics to do this efficiently. At the end you get a container with:
O(1): empty, size
O(log N): at, insert, erase
You can check the blist module in Python which implements a Python list-like element backed by such a structure.
Your conjecture are ... 99.9% true.
All depends on what the actual implementation is. What the standard specifies are the minimum requirement for both the implementors (that cannot claim to be standard if they don fit the specs) and users (that must not expect "better performances" if writing implementation independent code).
The idea behind the spec., is a chunk (a == one) of uninitialized memory where elements are allocated around the center... until there is space for them.
Inserting in the middle means shift. Inserting at front or end means just construct in place. (when no space exist, a reallocation is done)
Indexes and iterators cannot be trusted after a modification, since we cannot assume what has been shifted and in which direction.
More efficient implementation don't use a single chunk, but multiple chunk to redistribute the "shifting" problem and to allocate memory in constant size from the underlying system (thus limiting reallocation and fragmentation).
If you're targeting one of them you can expect better performances, otherwise yo had better not to assume any structure optimization.
Linear on the number of elements inserted (copy construction). Plus, depending on the particular library implemention, additional linear time in up to the number of elements between position and one of the ends of the deque.
Reference...
When talking about the STL, I have several schoolmates telling me that "vectors are linked lists".
I have another one arguing that if you call the erase() method with an iterator, it breaks the vector, since it's a linked list.
They also tend to don't understand why I'm always arguing that vector are contiguous, just like any other array, and don't seem to understand what random access means. Are vector stricly contiguous just like regular arrays, or just at most contiguous ? (for example it will allocate several contiguous segments if the whole array doesn't fit).
I'm sorry to say that your schoolmates are completely wrong. If your schoolmates can honestly say that "vectors are linked lists" then you need to respectfully tell them that they need to pick up a good C++ book (or any decent computer science book) and read it. Or perhaps even the Wikipedia articles for vectors and lists. (Also see the articles for dynamic arrays and linked lists.)
Vectors (as in std::vector) are not linked lists. (Note that std::vector do not derive from std::list). While they both can store a collection of data, how a vector does it is completely different from how a linked list does it. Therefore, they have different performance characteristics in different situations.
For example, insertions are a constant-time operation on linked lists, while it is a linear-time operation on vectors if it is inserted in somewhere other than the end. (However, it is amortized constant-time if you insert at the end of a vector.)
The std::vector class in C++ are required to be contiguous by the C++ standard:
23.2.4/1 Class template vector
A vector is a kind of sequence that supports random access iterators. In addition, it supports (amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage management is handled automatically, though hints can be given to improve efficienty. The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
Compare that to std::list:
23.2.2/1 Class template list
A list is a kind of sequence that supports bidirectional iterators and allows constant time insert and erase operations anywhere within the sequence, with storage management handled automatically. Unlike vectors (23.2.4) and deques (23.2.1), fast random access to list elements is not supported, but many algorithms only need sequential access anyway.
Clearly, the C++ standard stipulates that a vector and a list are two different containers that do things differently.
You can't "break" a vector (at least not intentionally) by simply calling erase() with a valid iterator. That would make std::vectors rather useless since the point of its existence is to manage memory for you!
vector will hold all of it's storage in a single place. A vector is not even remotely like a linked list. Infact, if I had to pick two data structures that were most unlike each other, it would be vector and list. "At most contiguous" is how a deque operates.
Vector:
Guaranteed contiguous storage for all elements - will copy or move elements.
O(1) access time.
O(n) for insert or remove.
Iterators invalidated upon insertion or removal of any element.
List:
No contiguous storage at all - never copies or moves elements.
O(n) access time- plus all the nasty cache misses you're gonna get.
O(1) insert or remove.
Iterators valid as long as that specific element is not removed.
As you can see, they behave differently in every data structure use case.
By definition, vectors are contiguous blocks of memory like C arrays. See: http://en.wikipedia.org/wiki/Vector_(C%2B%2B)
Vectors allow random access; that is,
an element of a vector may be
referenced in the same manner as
elements of arrays (by array indices).
Linked-lists and sets, on the other
hand, do not support random access or
pointer arithmetic.
Vectors are not linked linked list, they provide random access and are contiguous just like arrays. In order to achieve this they re-allocate memory under the hood.
List is designed to allow quick insertions and deletions, while not invalidating any references or iterators except
the ones to the deleted element.