Related
I was looking at STL containers and trying to figure what they really are (i.e. the data structure used), and the deque stopped me: I thought at first that it was a double linked list, which would allow insertion and deletion from both ends in constant time, but I am troubled by the promise made by the operator [] to be done in constant time. In a linked list, arbitrary access should be O(n), right?
And if it's a dynamic array, how can it add elements in constant time? It should be mentioned that reallocation may happen, and that O(1) is an amortized cost, like for a vector.
So I wonder what is this structure that allows arbitrary access in constant time, and at the same time never needs to be moved to a new bigger place.
A deque is somewhat recursively defined: internally it maintains a double-ended queue of chunks of fixed size. Each chunk is a vector, and the queue (“map” in the graphic below) of chunks itself is also a vector.
There’s a great analysis of the performance characteristics and how it compares to the vector over at CodeProject.
The GCC standard library implementation internally uses a T** to represent the map. Each data block is a T* which is allocated with some fixed size __deque_buf_size (which depends on sizeof(T)).
From overview, you can think deque as a double-ended queue
The datas in deque are stored by chuncks of fixed size vector, which are
pointered by a map(which is also a chunk of vector, but its size may change)
The main part code of the deque iterator is as below:
/*
buff_size is the length of the chunk
*/
template <class T, size_t buff_size>
struct __deque_iterator{
typedef __deque_iterator<T, buff_size> iterator;
typedef T** map_pointer;
// pointer to the chunk
T* cur;
T* first; // the begin of the chunk
T* last; // the end of the chunk
//because the pointer may skip to other chunk
//so this pointer to the map
map_pointer node; // pointer to the map
}
The main part code of the deque is as below:
/*
buff_size is the length of the chunk
*/
template<typename T, size_t buff_size = 0>
class deque{
public:
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef __deque_iterator<T, buff_size> iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
protected:
typedef pointer* map_pointer;
// allocate memory for the chunk
typedef allocator<value_type> dataAllocator;
// allocate memory for map
typedef allocator<pointer> mapAllocator;
private:
//data members
iterator start;
iterator finish;
map_pointer map;
size_type map_size;
}
Below i will give you the core code of deque, mainly about three parts:
iterator
How to construct a deque
1. iterator(__deque_iterator)
The main problem of iterator is, when ++, -- iterator, it may skip to other chunk(if it pointer to edge of chunk). For example, there are three data chunks: chunk 1,chunk 2,chunk 3.
The pointer1 pointers to the begin of chunk 2, when operator --pointer it will pointer to the end of chunk 1, so as to the pointer2.
Below I will give the main function of __deque_iterator:
Firstly, skip to any chunk:
void set_node(map_pointer new_node){
node = new_node;
first = *new_node;
last = first + chunk_size();
}
Note that, the chunk_size() function which compute the chunk size, you can think of it returns 8 for simplify here.
operator* get the data in the chunk
reference operator*()const{
return *cur;
}
operator++, --
// prefix forms of increment
self& operator++(){
++cur;
if (cur == last){ //if it reach the end of the chunk
set_node(node + 1);//skip to the next chunk
cur = first;
}
return *this;
}
// postfix forms of increment
self operator++(int){
self tmp = *this;
++*this;//invoke prefix ++
return tmp;
}
self& operator--(){
if(cur == first){ // if it pointer to the begin of the chunk
set_node(node - 1);//skip to the prev chunk
cur = last;
}
--cur;
return *this;
}
self operator--(int){
self tmp = *this;
--*this;
return tmp;
}
iterator skip n steps / random access
self& operator+=(difference_type n){ // n can be postive or negative
difference_type offset = n + (cur - first);
if(offset >=0 && offset < difference_type(buffer_size())){
// in the same chunk
cur += n;
}else{//not in the same chunk
difference_type node_offset;
if (offset > 0){
node_offset = offset / difference_type(chunk_size());
}else{
node_offset = -((-offset - 1) / difference_type(chunk_size())) - 1 ;
}
// skip to the new chunk
set_node(node + node_offset);
// set new cur
cur = first + (offset - node_offset * chunk_size());
}
return *this;
}
// skip n steps
self operator+(difference_type n)const{
self tmp = *this;
return tmp+= n; //reuse operator +=
}
self& operator-=(difference_type n){
return *this += -n; //reuse operator +=
}
self operator-(difference_type n)const{
self tmp = *this;
return tmp -= n; //reuse operator +=
}
// random access (iterator can skip n steps)
// invoke operator + ,operator *
reference operator[](difference_type n)const{
return *(*this + n);
}
2. How to construct a deque
common function of deque
iterator begin(){return start;}
iterator end(){return finish;}
reference front(){
//invoke __deque_iterator operator*
// return start's member *cur
return *start;
}
reference back(){
// cna't use *finish
iterator tmp = finish;
--tmp;
return *tmp; //return finish's *cur
}
reference operator[](size_type n){
//random access, use __deque_iterator operator[]
return start[n];
}
template<typename T, size_t buff_size>
deque<T, buff_size>::deque(size_t n, const value_type& value){
fill_initialize(n, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::fill_initialize(size_t n, const value_type& value){
// allocate memory for map and chunk
// initialize pointer
create_map_and_nodes(n);
// initialize value for the chunks
for (map_pointer cur = start.node; cur < finish.node; ++cur) {
initialized_fill_n(*cur, chunk_size(), value);
}
// the end chunk may have space node, which don't need have initialize value
initialized_fill_n(finish.first, finish.cur - finish.first, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::create_map_and_nodes(size_t num_elements){
// the needed map node = (elements nums / chunk length) + 1
size_type num_nodes = num_elements / chunk_size() + 1;
// map node num。min num is 8 ,max num is "needed size + 2"
map_size = std::max(8, num_nodes + 2);
// allocate map array
map = mapAllocator::allocate(map_size);
// tmp_start,tmp_finish poniters to the center range of map
map_pointer tmp_start = map + (map_size - num_nodes) / 2;
map_pointer tmp_finish = tmp_start + num_nodes - 1;
// allocate memory for the chunk pointered by map node
for (map_pointer cur = tmp_start; cur <= tmp_finish; ++cur) {
*cur = dataAllocator::allocate(chunk_size());
}
// set start and end iterator
start.set_node(tmp_start);
start.cur = start.first;
finish.set_node(tmp_finish);
finish.cur = finish.first + num_elements % chunk_size();
}
Let's assume i_deque has 20 int elements 0~19 whose chunk size is 8, and now push_back 3 elements (0, 1, 2) to i_deque:
i_deque.push_back(0);
i_deque.push_back(1);
i_deque.push_back(2);
It's internal structure like below:
Then push_back again, it will invoke allocate new chunk:
push_back(3)
If we push_front, it will allocate new chunk before the prev start
Note when push_back element into deque, if all the maps and chunks are filled, it will cause allocate new map, and adjust chunks.But the above code may be enough for you to understand deque.
Imagine it as a vector of vectors. Only they aren't standard std::vectors.
The outer vector contains pointers to the inner vectors. When its capacity is changed via reallocation, rather than allocating all of the empty space to the end as std::vector does, it splits the empty space to equal parts at the beginning and the end of the vector. This allows push_front and push_back on this vector to both occur in amortized O(1) time.
The inner vector behavior needs to change depending on whether it's at the front or the back of the deque. At the back it can behave as a standard std::vector where it grows at the end, and push_back occurs in O(1) time. At the front it needs to do the opposite, growing at the beginning with each push_front. In practice this is easily achieved by adding a pointer to the front element and the direction of growth along with the size. With this simple modification push_front can also be O(1) time.
Access to any element requires offsetting and dividing to the proper outer vector index which occurs in O(1), and indexing into the inner vector which is also O(1). This assumes that the inner vectors are all fixed size, except for the ones at the beginning or the end of the deque.
(This is an answer I've given in another thread. Essentially I'm arguing that even fairly naive implementations, using a single vector, conform to the requirements of "constant non-amortized push_{front,back}". You might be surprised, and think this is impossible, but I have found other relevant quotes in the standard that define the context in a surprising way. Please bear with me; if I have made a mistake in this answer, it would be very helpful to identify which things I have said correctly and where my logic has broken down. )
In this answer, I am not trying to identify a good implementation, I'm merely trying to help us to interpret the complexity requirements in the C++ standard. I'm quoting from N3242, which is, according to Wikipedia, the latest freely available C++11 standardization document. (It appears to be organized differently from the final standard, and hence I won't quote the exact page numbers. Of course, these rules might have changed in the final standard, but I don't think that has happened.)
A deque<T> could be implemented correctly by using a vector<T*>. All the elements are copied onto the heap and the pointers stored in a vector. (More on the vector later).
Why T* instead of T? Because the standard requires that
"An insertion at either end of the deque invalidates all the iterators
to the deque, but has no effect on the validity of references to
elements of the deque."
(my emphasis). The T* helps to satisfy that. It also helps us to satisfy this:
"Inserting a single element either at the beginning or end of a deque always ..... causes a single call to a constructor of T."
Now for the (controversial) bit. Why use a vector to store the T*? It gives us random access, which is a good start. Let's forget about the complexity of vector for a moment and build up to this carefully:
The standard talks about "the number of operations on the contained objects.". For deque::push_front this is clearly 1 because exactly one T object is constructed and zero of the existing T objects are read or scanned in any way. This number, 1, is clearly a constant and is independent of the number of objects currently in the deque. This allows us to say that:
'For our deque::push_front, the number of operations on the contained objects (the Ts) is fixed and is independent of the number of objects already in the deque.'
Of course, the number of operations on the T* will not be so well-behaved. When the vector<T*> grows too big, it'll be realloced and many T*s will be copied around. So yes, the number of operations on the T* will vary wildly, but the number of operations on T will not be affected.
Why do we care about this distinction between counting operations on T and counting operations on T*? It's because the standard says:
All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.
For the deque, the contained objects are the T, not the T*, meaning we can ignore any operation which copies (or reallocs) a T*.
I haven't said much about how a vector would behave in a deque. Perhaps we would interpret it as a circular buffer (with the vector always taking up its maximum capacity(), and then realloc everything into a bigger buffer when the vector is full. The details don't matter.
In the last few paragraphs, we have analyzed deque::push_front and the relationship between the number of objects in the deque already and the number of operations performed by push_front on contained T-objects. And we found they were independent of each other. As the standard mandates that complexity is in terms of operations-on-T, then we can say this has constant complexity.
Yes, the Operations-On-T*-Complexity is amortized (due to the vector), but we're only interested in the Operations-On-T-Complexity and this is constant (non-amortized).
The complexity of vector::push_back or vector::push_front is irrelevant in this implementation; those considerations involve operations on T* and hence are irrelevant. If the standard was referring to the 'conventional' theoretical notion of complexity, then they wouldn't have explicitly restricted themselves to the "number of operations on the contained objects". Am I overinterpreting that sentence?
deque = double ended queue
A container which can grow in either direction.
Deque is typically implemented as a vector of vectors (a list of vectors can't give constant time random access). While the size of the secondary vectors is implementation dependent, a common algorithm is to use a constant size in bytes.
I was reading "Data structures and algorithms in C++" by Adam Drozdek, and found this useful.
HTH.
A very interesting aspect of STL deque is its implementation. An STL deque is not implemented as a linked list but as an array of pointers to blocks or arrays of data. The number of blocks changes dynamically depending on storage needs, and the size of the array of pointers changes accordingly.
You can notice in the middle is the array of pointers to the data (chunks on the right), and also you can notice that the array in the middle is dynamically changing.
An image is worth a thousand words.
While the standard doesn't mandate any particular implementation (only constant-time random access), a deque is usually implemented as a collection of contiguous memory "pages". New pages are allocated as needed, but you still have random access. Unlike std::vector, you're not promised that data is stored contiguously, but like vector, insertions in the middle require lots of relocating.
deque could be implemented as a circular buffer of fixed size array:
Use circular buffer so we could grow/shrink at both end by adding/removing a fixed sized array with O(1) complexity
Use fixed sized array so it is easy to calculate the index, hence access via index with two pointer dereferences - also O(1)
I would like to loop through elements of a vector in C++.
I am very new at this so I don't understand the details very well.
For example:
for (elements in vector) {
if () {
check something
else {
//else add another element to the vector
vectorname.push_back(n)
}
}
Its the for (vector elements) that I am having trouble with.
You'd normally use what's called a range-based for loop for this:
for (auto element : your_vector)
if (condition(element))
// whatever
else
your_vector.push_back(something);
But note: modifying a vector in the middle of iteration is generally a poor idea. And if your basic notion is to add the element if it's not already present, you may want to look up std::set, std::map, std::unordered_set or std::unordered_map instead.
In order to do this properly (and safely), you need to understand how std::vector works.
vector capatity
You may know that a vector works much like an array with "infinite" size. Meaning, it can hold as many elements as you want, as long as you have enough memory to hold them. But how does it do that?
A vector has an internal buffer (think of it like an array allocated with new) that may be the same size as the elements you're storing, but generally it's larger. It uses the extra space in the buffer to insert any new elements that you want to insert when you use push_back().
The amount of elements the vector has is known as its size, and the amount of elements it can hold is known as its capacity. You can query those via the size() and capacity() member functions.
However, this extra space must end at some point. That's when the magic happens: When the vector notices it doesn't have enough memory to hold more elements, it allocates a new buffer, larger1 than the previous one, and copies all elements to it. The important thing to notice here is that the new buffer will have a different address. As we continue with this explanation, keep this in mind.
iterators
Now, we need to talk about iterators. I don't know how much of C++ you have studied yet, but think of an old plain array:
int my_array[5] = {1,2,3,4,5};
you can take the address of the first element by doing:
int* begin = my_array;
and you can take the address of the end of the array (more specifically, one past the last element) by doing:
int* end = begin + sizeof(my_array)/sizeof(int);
if you have these addresses, one way to iterate the array and print all elements would be:
for (int* it = begin; it < end; ++it) {
std::cout << *it;
}
An iterator works much like a pointer. If you increment it (like we do with the pointer using ++it above), it will point to the next element. If you dereference it (again, like we do with the pointer using *it above), it will return the element it is pointing to.
std::vector provides us with two member functions, begin() and end(), that return iterators analogous to our begin and end pointers above. This is what you need to keep in mind from this section: Internally, these iterators have pointers that point to the elements in the vector's internal buffer.
a simpler way to iterate
Theoretically, you can use std::vector::begin() and std::vector::end to iterate a vector like this:
std::vector<int> v{1,2,3,4,5};
for (std::vector<int>::iterator it = v.begin; it != v.end(); ++it) {
std::cout << *it;
}
Note that, apart from the ugly type of it, this is exactly the same as our pointer example. C++ introduced the keyword auto, that lets us get rid of these ugly types, when we don't really need to know them:
std::vector<int> v{1,2,3,4,5};
for (auto it = v.begin; it != v.end(); ++it) {
std::cout << *it;
}
This works exactly the same (in fact, it has the exact same type), but now we don't need to type (or read) that uglyness.
But, there's an even better way. C++ has also introduced range-based for:
std::vector<int> v{1,2,3,4,5};
for (auto it : v) {
std::cout << it;
}
the range-based for construct does several things for you:
It calls v.begin() and v.end()2 to get the upper and lower bounds of the range we're going to iterate;
Keeps an internal iterator (let's call it i), and calls ++i on every step of the loop;
Dereferences the iterator (by calling *i) and stores it in the it variable for us. This means we do not need to dereference it ourselves (note how the std::cout << it line looks different from the other examples)
putting it all together
Let's do a small exercise. We're going to iterate a vector of numbers, and, for each odd number, we are going to insert a new elements equal to 2*n.
This is the naive way that we could probably think at first:
std::vector<int> v{1,2,3,4,5};
for (int i : v) {
if (i%2==1) {
v.push_back(i*2);
}
}
Of course, this is wrong! Vector v will start with a capacity of 5. This means that, when we try using push_back for the first time, it will allocate a new buffer.
If the buffer was reallocated, its address has changed. Then, what happens to the internal pointer that the range-based for is using to iterate the vector? It no longer points to the buffer!
This it what we call a reference invalidation. Look at the reference for std::vector::push_back. At the very beginning, it says:
If the new size() is greater than capacity() then all iterators and references (including the past-the-end iterator) are invalidated. Otherwise only the past-the-end iterator is invalidated.
Once the range-based for tries to increment and dereference the now invalid pointer, bad things will happen.
There are several ways to avoid this. For instance, in this particular algorithm, I know that we can never insert more than n new elements. This means that the size of the vector can never go past 2n after the loop has ended. With this knowledge in hand, I can increase the vector's capacity beforehand:
std::vector<int> v{1,2,3,4,5};
v.reserve(v.size()*2); // Increases the capacity of the vector to at least size*2.
// The code bellow now works properly!
for (int i : v) {
if (i%2==1) {
v.push_back(i*2);
}
}
If for some reason I don't know this information for a particular algorithm, I can use a separate vector to store the new elements, and then add them to our vector at the end:
std::vector<int> v{1,2,3,4,5};
std::vector<int> doubles;
for (int i : v) {
if (i%2==1) {
doubles.push_back(i*2);
}
}
// Reserving space is not necessary because the vector will allocate
// memory if it needs to anyway, but this does makes things faster
v.reserve(v.size() + doubles.size());
// There's a standard algorithm (std::copy), that, when used in conjunction with
// std::back_inserter, does this for us, but I find that the code bellow is more
// readable.
for (int i : doubles) {
v.push_back(i);
}
Finally, there's the old plain for, using an int to iterate. The iterator cannot be invalidated because it holds an index, instead of a pointer to the internal buffer:
std::vector<int> v{1,2,3,4,5};
for (int i = 0; i < v.size(); ++i) {
if (v[i]%2==1) {
doubles.push_back(v[i]*2);
}
}
Hopefully by now, you understand the advantages and drawbacks of each method. Happy studies!
1 How much larger depends on the implementation. Generally, implementations choose to allocate a new buffer of twice the size of the current buffer.
2 This is a small lie. The whole story is a bit more complicated: It actually tries to call begin(v) and end(v). Because vector is in the std namespace, it ends up calling std::begin and std::end, which, in turn, call v.begin() and v.end(). All of this machinery is there to ensure that the range-based for works not only with standard containers, but also with anything with a proper implementation for begin and end. That includes, for instance, regular plain arrays.
Here is the quick code snippet using iterators to iterate the vector-
#include<iostream>
#include<iterator> // for iterators to include
#include<vector> // for vectors to include
using namespace std;
int main()
{
vector<int> ar = { 1, 2, 3, 4, 5 };
// Declaring iterator to a vector
vector<int>::iterator ptr;
// Displaying vector elements using begin() and end()
cout << "The vector elements are : ";
for (ptr = ar.begin(); ptr < ar.end(); ptr++)
cout << *ptr << " ";
return 0;
}
Article to read more - Iterate through a C++ Vector using a 'for' loop
.
Hope it will help.
Try this,
#include<iostream>
#include<vector>
int main()
{
std::vector<int> vec(5);
for(int i=0;i<10;i++)
{
if(i<vec.size())
vec[i]=i;
else
vec.push_back(i);
}
for(int i=0;i<vec.size();i++)
std::cout<<vec[i];
return 0;
}
Output:
0123456789
Process returned 0 (0x0) execution time : 0.328 s
Press any key to continue.
I have to copy the first size element from a set of Solution (a class) named population to an array of solution named parents. I have some problems with iterators because i should do an hybrid solution between a normal for loop
and a for with iterators. The idea is this: when I'm at the ith iteration of the for I declare a new iterator that's pointing the beginning
of population, then I advance this iterator to the ith position, I take this solution element and I copy into parents[i]
Solution* parents; //it is filled somewhere else
std::set<Solution> population; //it is filled somewhere else
for (int i = 0; i < size; i++) {
auto it = population.begin();
advance(it, i);
parents[i] = *it;
}
Two error messages popup with this sentence: 'Expression: cannot dereference end map/set iterator'
and 'Expression: cannot advance end map/set iterator'
Any idea on how to this trick? I know it's kinda bad mixing array and set, i should use vector instead of array?
You use std::copy_n.
#include <algorithm>
extern Solution* parents; //it is filled somewhere else
extern std::set<Solution> population; //it is filled somewhere else
std::copy_n(population.begin(), size, parents);
It seems like size may be incorrectly set. To ensure that your code behaves as expected, you should just use the collection's size directly:
auto it = population.begin();
for (int i = 0; i < population.size(); i++) {
parents[i] = *it;
++it;
}
This can also be solved with a much simpler expression:
std::copy(population.begin(), population.end(), parents);
I have to copy the first size element from a set [..] to an array
You can use std::copy_n.
for (int i = 0; i < size; i++) {
auto it = population.begin();
advance(it, i);
The problem with this is that you're iterating over the linked list in every iteration. This turns the copy operation from normally linear complexity to quadratic.
Expression: cannot dereference end map/set iterator'
The problem here appears to be that your set doesn't contain at least size number of elements. You cannot copy size number of elements if there aren't that many. I suggest that you would copy less elements when the set is smaller.
i should use vector instead of array?
Probably. Is the array very large? Is the size of the vector not known at compile time? If so, use a vector.
I'm creating a custom vector class as part of a homework assignment. What I am currently trying to do is implement a function called erase, which will take an integer as an argument, decrease my array length by 1, remove the element at the position specified by the argument, and finally shift all the elements down to fill in the gap left by "erased" element.
What I am not completely understanding, due to my lack of experience with this language, is how you can delete a single element from an array of pointers.
Currently, I have the following implemented:
void myvector::erase(int i)
{
if(i != max_size)
{
for(int x = i; x < max_size; x++)
{
vec_array[x] = vec_array[x+1];
}
vec_size --;
//delete element from vector;
}
else
//delete element from vector
}
The class declaration and constructors look like this:
template <typename T>
class myvector
{
private:
T *vec_array;
int vec_size;
int max_size;
bool is_empty;
public:
myvector::myvector(int max_size_input)
{
max_size = max_size_input;
vec_array = new T[max_size];
vec_size = 0;
}
I have tried the following:
Using delete to try and delete an element
delete vec_size[max_size];
vec_size[max_size] = NULL;
Setting the value of the element to NULL or 0
vec_size[max_size] = NULL
or
vec_size[max_size] = 0
None of which are working for me due to either operator "=" being ambiguous or specified type not being able to be cast to void *.
I'm probably missing something simple, but I just can't seem to get passed this. Any help would be much appreciated. Again, sorry for the lack of experience if this is something silly.
If your custom vector class is supposed to work like std::vector, then don't concern yourself with object destruction. If you need to erase an element, you simply copy all elements following it by one position to the left:
void myvector::erase(int i)
{
for (int x = i + 1; x < vec_size; x++) {
vec_array[x - 1] = vec_array[x];
}
vec_size--;
}
That's all the basic work your erase() function has to do.
If the elements happen to be pointers, you shouldn't care; the user of your vector class is responsible for deleting those pointers if that's needed. You cannot determine if they can actually be deleted (the pointers might point to automatic stack variables, which are not deletable.)
So, do not ever call delete on an element of your vector.
If your vector class has a clear() function, and you want to make sure the elements are destructed, simply:
delete[] vec_array;
vec_array = new T[max_size];
vec_size = 0;
And this is how std::vector works, actually. (Well, the basic logic of it; of course you can optimize a hell of a lot of stuff in a vector implementation.)
Since this is homework i wont give you a definitive solution, but here is one method of erasing a value:
loop through and find value specified in erase function
mark values position in the array
starting from that position, move all elements values to the previous element(overlapping 'erased' value)
for i starting at position, i less than size minus one, i plus plus
element equals next element
reduce size of vector by 1
see if this is a big enough hint.
I was looking at STL containers and trying to figure what they really are (i.e. the data structure used), and the deque stopped me: I thought at first that it was a double linked list, which would allow insertion and deletion from both ends in constant time, but I am troubled by the promise made by the operator [] to be done in constant time. In a linked list, arbitrary access should be O(n), right?
And if it's a dynamic array, how can it add elements in constant time? It should be mentioned that reallocation may happen, and that O(1) is an amortized cost, like for a vector.
So I wonder what is this structure that allows arbitrary access in constant time, and at the same time never needs to be moved to a new bigger place.
A deque is somewhat recursively defined: internally it maintains a double-ended queue of chunks of fixed size. Each chunk is a vector, and the queue (“map” in the graphic below) of chunks itself is also a vector.
There’s a great analysis of the performance characteristics and how it compares to the vector over at CodeProject.
The GCC standard library implementation internally uses a T** to represent the map. Each data block is a T* which is allocated with some fixed size __deque_buf_size (which depends on sizeof(T)).
From overview, you can think deque as a double-ended queue
The datas in deque are stored by chuncks of fixed size vector, which are
pointered by a map(which is also a chunk of vector, but its size may change)
The main part code of the deque iterator is as below:
/*
buff_size is the length of the chunk
*/
template <class T, size_t buff_size>
struct __deque_iterator{
typedef __deque_iterator<T, buff_size> iterator;
typedef T** map_pointer;
// pointer to the chunk
T* cur;
T* first; // the begin of the chunk
T* last; // the end of the chunk
//because the pointer may skip to other chunk
//so this pointer to the map
map_pointer node; // pointer to the map
}
The main part code of the deque is as below:
/*
buff_size is the length of the chunk
*/
template<typename T, size_t buff_size = 0>
class deque{
public:
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef __deque_iterator<T, buff_size> iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
protected:
typedef pointer* map_pointer;
// allocate memory for the chunk
typedef allocator<value_type> dataAllocator;
// allocate memory for map
typedef allocator<pointer> mapAllocator;
private:
//data members
iterator start;
iterator finish;
map_pointer map;
size_type map_size;
}
Below i will give you the core code of deque, mainly about three parts:
iterator
How to construct a deque
1. iterator(__deque_iterator)
The main problem of iterator is, when ++, -- iterator, it may skip to other chunk(if it pointer to edge of chunk). For example, there are three data chunks: chunk 1,chunk 2,chunk 3.
The pointer1 pointers to the begin of chunk 2, when operator --pointer it will pointer to the end of chunk 1, so as to the pointer2.
Below I will give the main function of __deque_iterator:
Firstly, skip to any chunk:
void set_node(map_pointer new_node){
node = new_node;
first = *new_node;
last = first + chunk_size();
}
Note that, the chunk_size() function which compute the chunk size, you can think of it returns 8 for simplify here.
operator* get the data in the chunk
reference operator*()const{
return *cur;
}
operator++, --
// prefix forms of increment
self& operator++(){
++cur;
if (cur == last){ //if it reach the end of the chunk
set_node(node + 1);//skip to the next chunk
cur = first;
}
return *this;
}
// postfix forms of increment
self operator++(int){
self tmp = *this;
++*this;//invoke prefix ++
return tmp;
}
self& operator--(){
if(cur == first){ // if it pointer to the begin of the chunk
set_node(node - 1);//skip to the prev chunk
cur = last;
}
--cur;
return *this;
}
self operator--(int){
self tmp = *this;
--*this;
return tmp;
}
iterator skip n steps / random access
self& operator+=(difference_type n){ // n can be postive or negative
difference_type offset = n + (cur - first);
if(offset >=0 && offset < difference_type(buffer_size())){
// in the same chunk
cur += n;
}else{//not in the same chunk
difference_type node_offset;
if (offset > 0){
node_offset = offset / difference_type(chunk_size());
}else{
node_offset = -((-offset - 1) / difference_type(chunk_size())) - 1 ;
}
// skip to the new chunk
set_node(node + node_offset);
// set new cur
cur = first + (offset - node_offset * chunk_size());
}
return *this;
}
// skip n steps
self operator+(difference_type n)const{
self tmp = *this;
return tmp+= n; //reuse operator +=
}
self& operator-=(difference_type n){
return *this += -n; //reuse operator +=
}
self operator-(difference_type n)const{
self tmp = *this;
return tmp -= n; //reuse operator +=
}
// random access (iterator can skip n steps)
// invoke operator + ,operator *
reference operator[](difference_type n)const{
return *(*this + n);
}
2. How to construct a deque
common function of deque
iterator begin(){return start;}
iterator end(){return finish;}
reference front(){
//invoke __deque_iterator operator*
// return start's member *cur
return *start;
}
reference back(){
// cna't use *finish
iterator tmp = finish;
--tmp;
return *tmp; //return finish's *cur
}
reference operator[](size_type n){
//random access, use __deque_iterator operator[]
return start[n];
}
template<typename T, size_t buff_size>
deque<T, buff_size>::deque(size_t n, const value_type& value){
fill_initialize(n, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::fill_initialize(size_t n, const value_type& value){
// allocate memory for map and chunk
// initialize pointer
create_map_and_nodes(n);
// initialize value for the chunks
for (map_pointer cur = start.node; cur < finish.node; ++cur) {
initialized_fill_n(*cur, chunk_size(), value);
}
// the end chunk may have space node, which don't need have initialize value
initialized_fill_n(finish.first, finish.cur - finish.first, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::create_map_and_nodes(size_t num_elements){
// the needed map node = (elements nums / chunk length) + 1
size_type num_nodes = num_elements / chunk_size() + 1;
// map node num。min num is 8 ,max num is "needed size + 2"
map_size = std::max(8, num_nodes + 2);
// allocate map array
map = mapAllocator::allocate(map_size);
// tmp_start,tmp_finish poniters to the center range of map
map_pointer tmp_start = map + (map_size - num_nodes) / 2;
map_pointer tmp_finish = tmp_start + num_nodes - 1;
// allocate memory for the chunk pointered by map node
for (map_pointer cur = tmp_start; cur <= tmp_finish; ++cur) {
*cur = dataAllocator::allocate(chunk_size());
}
// set start and end iterator
start.set_node(tmp_start);
start.cur = start.first;
finish.set_node(tmp_finish);
finish.cur = finish.first + num_elements % chunk_size();
}
Let's assume i_deque has 20 int elements 0~19 whose chunk size is 8, and now push_back 3 elements (0, 1, 2) to i_deque:
i_deque.push_back(0);
i_deque.push_back(1);
i_deque.push_back(2);
It's internal structure like below:
Then push_back again, it will invoke allocate new chunk:
push_back(3)
If we push_front, it will allocate new chunk before the prev start
Note when push_back element into deque, if all the maps and chunks are filled, it will cause allocate new map, and adjust chunks.But the above code may be enough for you to understand deque.
Imagine it as a vector of vectors. Only they aren't standard std::vectors.
The outer vector contains pointers to the inner vectors. When its capacity is changed via reallocation, rather than allocating all of the empty space to the end as std::vector does, it splits the empty space to equal parts at the beginning and the end of the vector. This allows push_front and push_back on this vector to both occur in amortized O(1) time.
The inner vector behavior needs to change depending on whether it's at the front or the back of the deque. At the back it can behave as a standard std::vector where it grows at the end, and push_back occurs in O(1) time. At the front it needs to do the opposite, growing at the beginning with each push_front. In practice this is easily achieved by adding a pointer to the front element and the direction of growth along with the size. With this simple modification push_front can also be O(1) time.
Access to any element requires offsetting and dividing to the proper outer vector index which occurs in O(1), and indexing into the inner vector which is also O(1). This assumes that the inner vectors are all fixed size, except for the ones at the beginning or the end of the deque.
(This is an answer I've given in another thread. Essentially I'm arguing that even fairly naive implementations, using a single vector, conform to the requirements of "constant non-amortized push_{front,back}". You might be surprised, and think this is impossible, but I have found other relevant quotes in the standard that define the context in a surprising way. Please bear with me; if I have made a mistake in this answer, it would be very helpful to identify which things I have said correctly and where my logic has broken down. )
In this answer, I am not trying to identify a good implementation, I'm merely trying to help us to interpret the complexity requirements in the C++ standard. I'm quoting from N3242, which is, according to Wikipedia, the latest freely available C++11 standardization document. (It appears to be organized differently from the final standard, and hence I won't quote the exact page numbers. Of course, these rules might have changed in the final standard, but I don't think that has happened.)
A deque<T> could be implemented correctly by using a vector<T*>. All the elements are copied onto the heap and the pointers stored in a vector. (More on the vector later).
Why T* instead of T? Because the standard requires that
"An insertion at either end of the deque invalidates all the iterators
to the deque, but has no effect on the validity of references to
elements of the deque."
(my emphasis). The T* helps to satisfy that. It also helps us to satisfy this:
"Inserting a single element either at the beginning or end of a deque always ..... causes a single call to a constructor of T."
Now for the (controversial) bit. Why use a vector to store the T*? It gives us random access, which is a good start. Let's forget about the complexity of vector for a moment and build up to this carefully:
The standard talks about "the number of operations on the contained objects.". For deque::push_front this is clearly 1 because exactly one T object is constructed and zero of the existing T objects are read or scanned in any way. This number, 1, is clearly a constant and is independent of the number of objects currently in the deque. This allows us to say that:
'For our deque::push_front, the number of operations on the contained objects (the Ts) is fixed and is independent of the number of objects already in the deque.'
Of course, the number of operations on the T* will not be so well-behaved. When the vector<T*> grows too big, it'll be realloced and many T*s will be copied around. So yes, the number of operations on the T* will vary wildly, but the number of operations on T will not be affected.
Why do we care about this distinction between counting operations on T and counting operations on T*? It's because the standard says:
All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.
For the deque, the contained objects are the T, not the T*, meaning we can ignore any operation which copies (or reallocs) a T*.
I haven't said much about how a vector would behave in a deque. Perhaps we would interpret it as a circular buffer (with the vector always taking up its maximum capacity(), and then realloc everything into a bigger buffer when the vector is full. The details don't matter.
In the last few paragraphs, we have analyzed deque::push_front and the relationship between the number of objects in the deque already and the number of operations performed by push_front on contained T-objects. And we found they were independent of each other. As the standard mandates that complexity is in terms of operations-on-T, then we can say this has constant complexity.
Yes, the Operations-On-T*-Complexity is amortized (due to the vector), but we're only interested in the Operations-On-T-Complexity and this is constant (non-amortized).
The complexity of vector::push_back or vector::push_front is irrelevant in this implementation; those considerations involve operations on T* and hence are irrelevant. If the standard was referring to the 'conventional' theoretical notion of complexity, then they wouldn't have explicitly restricted themselves to the "number of operations on the contained objects". Am I overinterpreting that sentence?
deque = double ended queue
A container which can grow in either direction.
Deque is typically implemented as a vector of vectors (a list of vectors can't give constant time random access). While the size of the secondary vectors is implementation dependent, a common algorithm is to use a constant size in bytes.
I was reading "Data structures and algorithms in C++" by Adam Drozdek, and found this useful.
HTH.
A very interesting aspect of STL deque is its implementation. An STL deque is not implemented as a linked list but as an array of pointers to blocks or arrays of data. The number of blocks changes dynamically depending on storage needs, and the size of the array of pointers changes accordingly.
You can notice in the middle is the array of pointers to the data (chunks on the right), and also you can notice that the array in the middle is dynamically changing.
An image is worth a thousand words.
While the standard doesn't mandate any particular implementation (only constant-time random access), a deque is usually implemented as a collection of contiguous memory "pages". New pages are allocated as needed, but you still have random access. Unlike std::vector, you're not promised that data is stored contiguously, but like vector, insertions in the middle require lots of relocating.
deque could be implemented as a circular buffer of fixed size array:
Use circular buffer so we could grow/shrink at both end by adding/removing a fixed sized array with O(1) complexity
Use fixed sized array so it is easy to calculate the index, hence access via index with two pointer dereferences - also O(1)