Is it possible to access a STL deque element by its address? - c++

Good afternoon , We are trying to make a simple emulation of the Windows memory mapped caching subsystem. We are using several STL data structures.
A STL deque , std::deque<Range> accesses, which holds and records information about the last recently used to the most recently used memory mapped regions.
A STL set, std::set<char *> ranges_pointer which might hold pointers to the elements in the STL deque..
Basically, when we retrieve a pointer to a valid memory mapped region from the STL set , we would like to use that pointer to direcly access the corresponding element deque in O(constant time). Then, we would like to move that deque element to the front of the STL deque so that that subsequent reguests for clusters of memory mapped addreses can be found at the front of the STL deque accesses.
We know from Stack Overflow that the only STL container that guarantees contingous addresses is STL vector. However, every time one moves a element from a STL vector, it takes O(linear) time to shift or memcpy the remaining items into the right location,This could be expensive. In contrast, when you move a element from STL deque, all one has to rearrange the next and prev pointers on both sides of the element that is moved.
We were wondering if one could write a method to access a STL deque element by its address. Although the std::allocator does not guarantee contiguous STL deque adresses, perhaps we could use a custom memory pool allocator to get a chunk of contiguous addresses.
Also, do BOOST or other C++ frameworks implement contiguous doubly linked lists which offer random access just like STL deque. The class Range holds all the essential information about every cached memory mapped region, The class Range is stored in the the std::deque accessess member variable. Thank you. The class Range is shown below:
class Range {
public:
explicit Range(int item){
mLow = item;
mHigh = item;
mPtr = 0;
mMapPtr = 0;
}
Range(int low, int high, char* ptr = 0,char* mapptr = 0,
int currMappedLength = 0){
mLow = low;
mHigh = high;
mPtr = ptr;
mMapPtr = mapptr;
mMappedLength = currMappedLength;
}
Range(void){
mLow = 0;
mHigh = 0;
mPtr = 0;
mMapPtr = 0;
}
~Range(){
}
bool operator<(const Range& rhs) const{
return mHigh < rhs.mHigh;
}
int low() const { return mLow; }
int high() const { return mHigh; }
char* getMapPtr() const { return mMapPtr; }
int getMappedLength() const { return mMappedLength; }
private:
int mLow; // beginning of memory mapped region
int mHigh; // end of memory mapped region
char* mMapPtr; // return value from MapViewOfFile
int mMappedLength; // length of memory mapped region
}; // class Range

Rather than using a set to hold the addresses, how about a map that contains the address and an iterator into the deque?
Note that moving an element from the middle of a deque to the beginning or end is going to be no faster than doing it for a vector. You might want to think about using a list.

Related

std::deque: How is push_front( ) and push_back( ) possible in O(1)? [duplicate]

I was looking at STL containers and trying to figure what they really are (i.e. the data structure used), and the deque stopped me: I thought at first that it was a double linked list, which would allow insertion and deletion from both ends in constant time, but I am troubled by the promise made by the operator [] to be done in constant time. In a linked list, arbitrary access should be O(n), right?
And if it's a dynamic array, how can it add elements in constant time? It should be mentioned that reallocation may happen, and that O(1) is an amortized cost, like for a vector.
So I wonder what is this structure that allows arbitrary access in constant time, and at the same time never needs to be moved to a new bigger place.
A deque is somewhat recursively defined: internally it maintains a double-ended queue of chunks of fixed size. Each chunk is a vector, and the queue (“map” in the graphic below) of chunks itself is also a vector.
There’s a great analysis of the performance characteristics and how it compares to the vector over at CodeProject.
The GCC standard library implementation internally uses a T** to represent the map. Each data block is a T* which is allocated with some fixed size __deque_buf_size (which depends on sizeof(T)).
From overview, you can think deque as a double-ended queue
The datas in deque are stored by chuncks of fixed size vector, which are
pointered by a map(which is also a chunk of vector, but its size may change)
The main part code of the deque iterator is as below:
/*
buff_size is the length of the chunk
*/
template <class T, size_t buff_size>
struct __deque_iterator{
typedef __deque_iterator<T, buff_size> iterator;
typedef T** map_pointer;
// pointer to the chunk
T* cur;
T* first; // the begin of the chunk
T* last; // the end of the chunk
//because the pointer may skip to other chunk
//so this pointer to the map
map_pointer node; // pointer to the map
}
The main part code of the deque is as below:
/*
buff_size is the length of the chunk
*/
template<typename T, size_t buff_size = 0>
class deque{
public:
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef __deque_iterator<T, buff_size> iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
protected:
typedef pointer* map_pointer;
// allocate memory for the chunk
typedef allocator<value_type> dataAllocator;
// allocate memory for map
typedef allocator<pointer> mapAllocator;
private:
//data members
iterator start;
iterator finish;
map_pointer map;
size_type map_size;
}
Below i will give you the core code of deque, mainly about three parts:
iterator
How to construct a deque
1. iterator(__deque_iterator)
The main problem of iterator is, when ++, -- iterator, it may skip to other chunk(if it pointer to edge of chunk). For example, there are three data chunks: chunk 1,chunk 2,chunk 3.
The pointer1 pointers to the begin of chunk 2, when operator --pointer it will pointer to the end of chunk 1, so as to the pointer2.
Below I will give the main function of __deque_iterator:
Firstly, skip to any chunk:
void set_node(map_pointer new_node){
node = new_node;
first = *new_node;
last = first + chunk_size();
}
Note that, the chunk_size() function which compute the chunk size, you can think of it returns 8 for simplify here.
operator* get the data in the chunk
reference operator*()const{
return *cur;
}
operator++, --
// prefix forms of increment
self& operator++(){
++cur;
if (cur == last){ //if it reach the end of the chunk
set_node(node + 1);//skip to the next chunk
cur = first;
}
return *this;
}
// postfix forms of increment
self operator++(int){
self tmp = *this;
++*this;//invoke prefix ++
return tmp;
}
self& operator--(){
if(cur == first){ // if it pointer to the begin of the chunk
set_node(node - 1);//skip to the prev chunk
cur = last;
}
--cur;
return *this;
}
self operator--(int){
self tmp = *this;
--*this;
return tmp;
}
iterator skip n steps / random access
self& operator+=(difference_type n){ // n can be postive or negative
difference_type offset = n + (cur - first);
if(offset >=0 && offset < difference_type(buffer_size())){
// in the same chunk
cur += n;
}else{//not in the same chunk
difference_type node_offset;
if (offset > 0){
node_offset = offset / difference_type(chunk_size());
}else{
node_offset = -((-offset - 1) / difference_type(chunk_size())) - 1 ;
}
// skip to the new chunk
set_node(node + node_offset);
// set new cur
cur = first + (offset - node_offset * chunk_size());
}
return *this;
}
// skip n steps
self operator+(difference_type n)const{
self tmp = *this;
return tmp+= n; //reuse operator +=
}
self& operator-=(difference_type n){
return *this += -n; //reuse operator +=
}
self operator-(difference_type n)const{
self tmp = *this;
return tmp -= n; //reuse operator +=
}
// random access (iterator can skip n steps)
// invoke operator + ,operator *
reference operator[](difference_type n)const{
return *(*this + n);
}
2. How to construct a deque
common function of deque
iterator begin(){return start;}
iterator end(){return finish;}
reference front(){
//invoke __deque_iterator operator*
// return start's member *cur
return *start;
}
reference back(){
// cna't use *finish
iterator tmp = finish;
--tmp;
return *tmp; //return finish's *cur
}
reference operator[](size_type n){
//random access, use __deque_iterator operator[]
return start[n];
}
template<typename T, size_t buff_size>
deque<T, buff_size>::deque(size_t n, const value_type& value){
fill_initialize(n, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::fill_initialize(size_t n, const value_type& value){
// allocate memory for map and chunk
// initialize pointer
create_map_and_nodes(n);
// initialize value for the chunks
for (map_pointer cur = start.node; cur < finish.node; ++cur) {
initialized_fill_n(*cur, chunk_size(), value);
}
// the end chunk may have space node, which don't need have initialize value
initialized_fill_n(finish.first, finish.cur - finish.first, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::create_map_and_nodes(size_t num_elements){
// the needed map node = (elements nums / chunk length) + 1
size_type num_nodes = num_elements / chunk_size() + 1;
// map node num。min num is 8 ,max num is "needed size + 2"
map_size = std::max(8, num_nodes + 2);
// allocate map array
map = mapAllocator::allocate(map_size);
// tmp_start,tmp_finish poniters to the center range of map
map_pointer tmp_start = map + (map_size - num_nodes) / 2;
map_pointer tmp_finish = tmp_start + num_nodes - 1;
// allocate memory for the chunk pointered by map node
for (map_pointer cur = tmp_start; cur <= tmp_finish; ++cur) {
*cur = dataAllocator::allocate(chunk_size());
}
// set start and end iterator
start.set_node(tmp_start);
start.cur = start.first;
finish.set_node(tmp_finish);
finish.cur = finish.first + num_elements % chunk_size();
}
Let's assume i_deque has 20 int elements 0~19 whose chunk size is 8, and now push_back 3 elements (0, 1, 2) to i_deque:
i_deque.push_back(0);
i_deque.push_back(1);
i_deque.push_back(2);
It's internal structure like below:
Then push_back again, it will invoke allocate new chunk:
push_back(3)
If we push_front, it will allocate new chunk before the prev start
Note when push_back element into deque, if all the maps and chunks are filled, it will cause allocate new map, and adjust chunks.But the above code may be enough for you to understand deque.
Imagine it as a vector of vectors. Only they aren't standard std::vectors.
The outer vector contains pointers to the inner vectors. When its capacity is changed via reallocation, rather than allocating all of the empty space to the end as std::vector does, it splits the empty space to equal parts at the beginning and the end of the vector. This allows push_front and push_back on this vector to both occur in amortized O(1) time.
The inner vector behavior needs to change depending on whether it's at the front or the back of the deque. At the back it can behave as a standard std::vector where it grows at the end, and push_back occurs in O(1) time. At the front it needs to do the opposite, growing at the beginning with each push_front. In practice this is easily achieved by adding a pointer to the front element and the direction of growth along with the size. With this simple modification push_front can also be O(1) time.
Access to any element requires offsetting and dividing to the proper outer vector index which occurs in O(1), and indexing into the inner vector which is also O(1). This assumes that the inner vectors are all fixed size, except for the ones at the beginning or the end of the deque.
(This is an answer I've given in another thread. Essentially I'm arguing that even fairly naive implementations, using a single vector, conform to the requirements of "constant non-amortized push_{front,back}". You might be surprised, and think this is impossible, but I have found other relevant quotes in the standard that define the context in a surprising way. Please bear with me; if I have made a mistake in this answer, it would be very helpful to identify which things I have said correctly and where my logic has broken down. )
In this answer, I am not trying to identify a good implementation, I'm merely trying to help us to interpret the complexity requirements in the C++ standard. I'm quoting from N3242, which is, according to Wikipedia, the latest freely available C++11 standardization document. (It appears to be organized differently from the final standard, and hence I won't quote the exact page numbers. Of course, these rules might have changed in the final standard, but I don't think that has happened.)
A deque<T> could be implemented correctly by using a vector<T*>. All the elements are copied onto the heap and the pointers stored in a vector. (More on the vector later).
Why T* instead of T? Because the standard requires that
"An insertion at either end of the deque invalidates all the iterators
to the deque, but has no effect on the validity of references to
elements of the deque."
(my emphasis). The T* helps to satisfy that. It also helps us to satisfy this:
"Inserting a single element either at the beginning or end of a deque always ..... causes a single call to a constructor of T."
Now for the (controversial) bit. Why use a vector to store the T*? It gives us random access, which is a good start. Let's forget about the complexity of vector for a moment and build up to this carefully:
The standard talks about "the number of operations on the contained objects.". For deque::push_front this is clearly 1 because exactly one T object is constructed and zero of the existing T objects are read or scanned in any way. This number, 1, is clearly a constant and is independent of the number of objects currently in the deque. This allows us to say that:
'For our deque::push_front, the number of operations on the contained objects (the Ts) is fixed and is independent of the number of objects already in the deque.'
Of course, the number of operations on the T* will not be so well-behaved. When the vector<T*> grows too big, it'll be realloced and many T*s will be copied around. So yes, the number of operations on the T* will vary wildly, but the number of operations on T will not be affected.
Why do we care about this distinction between counting operations on T and counting operations on T*? It's because the standard says:
All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.
For the deque, the contained objects are the T, not the T*, meaning we can ignore any operation which copies (or reallocs) a T*.
I haven't said much about how a vector would behave in a deque. Perhaps we would interpret it as a circular buffer (with the vector always taking up its maximum capacity(), and then realloc everything into a bigger buffer when the vector is full. The details don't matter.
In the last few paragraphs, we have analyzed deque::push_front and the relationship between the number of objects in the deque already and the number of operations performed by push_front on contained T-objects. And we found they were independent of each other. As the standard mandates that complexity is in terms of operations-on-T, then we can say this has constant complexity.
Yes, the Operations-On-T*-Complexity is amortized (due to the vector), but we're only interested in the Operations-On-T-Complexity and this is constant (non-amortized).
The complexity of vector::push_back or vector::push_front is irrelevant in this implementation; those considerations involve operations on T* and hence are irrelevant. If the standard was referring to the 'conventional' theoretical notion of complexity, then they wouldn't have explicitly restricted themselves to the "number of operations on the contained objects". Am I overinterpreting that sentence?
deque = double ended queue
A container which can grow in either direction.
Deque is typically implemented as a vector of vectors (a list of vectors can't give constant time random access). While the size of the secondary vectors is implementation dependent, a common algorithm is to use a constant size in bytes.
I was reading "Data structures and algorithms in C++" by Adam Drozdek, and found this useful.
HTH.
A very interesting aspect of STL deque is its implementation. An STL deque is not implemented as a linked list but as an array of pointers to blocks or arrays of data. The number of blocks changes dynamically depending on storage needs, and the size of the array of pointers changes accordingly.
You can notice in the middle is the array of pointers to the data (chunks on the right), and also you can notice that the array in the middle is dynamically changing.
An image is worth a thousand words.
While the standard doesn't mandate any particular implementation (only constant-time random access), a deque is usually implemented as a collection of contiguous memory "pages". New pages are allocated as needed, but you still have random access. Unlike std::vector, you're not promised that data is stored contiguously, but like vector, insertions in the middle require lots of relocating.
deque could be implemented as a circular buffer of fixed size array:
Use circular buffer so we could grow/shrink at both end by adding/removing a fixed sized array with O(1) complexity
Use fixed sized array so it is easy to calculate the index, hence access via index with two pointer dereferences - also O(1)

How can I correctly push back a series of objects in a vector in C++?

The scope of the program is to create a Container object which stores in a vector Class objects. Then I want to print, starting from a precise Class object of the vector all its predecessors.
class Class{
public:
Class(){
for (int i = 0; i < 10; ++i) {
Class c;
c.setName(i);
if (i > 0) {
c.setNext(_vec,i-1);
}
_vec.push_back(c);
}
}
};
~Class();
void setName(const int& n);
void setNext( vector<Class>& vec, const int& pos);
Class* getNext();
string getName();
void printAllNext(){ //print all next Class objects including himself
cout << _name <<endl;
if (_next != nullptr) {
(*_next).printAllNext();
}
}
private:
Class* _next;
string _name;
};
class Container{
public:
Container(){
for (int i = 0; i < 10; ++i) {
Class c;
c.setName(i);
if (i > 0) {
c.setNext(_vec,i-1);
}
_vec.push_back(c);
};
~Container();
void printFromVec(const int& n){//print all objects of _vec starting from n;
_vec[n].printAllNext();
};
private:
vector<Class> _vec;
};
int main() {
Container c;
c.printFromVec(5);
}
The problem is that all _next pointers of Class objects are undefined or random.
I think the problem is with this part of code:
class Container{
public:
Container(){
for (int i = 0; i < 10; ++i) {
Class c;
c.setName(i);
if (i > 0) {
c.setNext(_vec,i-1);
}
_vec.push_back(c);
};
Debugging I noticed that pointers of already created objects change their values.
What is the problem? How can I make it work?
Although there is really error in the code (likely wrong copypaste), the problem is really following: std::vector maintains inside dynamically allocated array of objects. It starts with certain initial size. When you push to vector, it fills entries of array. When all entries are filled but you attempt pushing more elements, vector allocates bigger chunk of memory and moves or copies (whichever you element data type supports) objects to a new memory location. That's why address of object changes.
Now some words on what to do.
Solution 1. Use std::list instead of std::vector. std::list is double linked list, and element, once added to list, will be part of list item and will not change its address, there is no reallocation.
Solution 2. Use vector of shared pointers. In this case you will need to allocate each object dynamically and put address into shared pointer object, you can do both at once by using function std::make_shared(). Then you push shared pointer to vector, and store std::weak_ptr as pointer to previous/next one.
Solution 3. If you know maximum number of elements in vector you may ever have, you can leave all as is, but do one extra thing before pushing very first time - call reserve() on vector with max number of elements as parameters. Vector will allocate array of that size and keep it until it is filled and more space needed. But since you allocated maximum possible size you expect to ever have, reallocation should never happen, and so addresses of objects will remain same.
Choose whichever solution you think fits most for your needs.
#ivan.ukr Offered a number of solutions for keeping the pointers stable. However, I believe that is the wrong problem to solve.
Why do we need stable pointers? So that Class objects can point to the previous object in a container.
Why do we need the pointers to previous? So we can iterate backwards.
That’s the real problem: iterating backwards from a point in the container. The _next pointer is an incomplete solution to the real problem which is iteration.
If you want to iterate a vector, use iterators. You can read about them on the cppreference page for std::vector. I don’t want to write the code for you but I’ll give you some hints.
To get an iterator referring to the ith element, use auto iter = _vec.begin() + i;.
To print the object that this iterator refers to, use iter->print() (you’ll have to rename printAllNext to print and have it just print this object).
To move an iterator backwards, use --iter.
To check if an iterator refers to the first element, use iter == _vec.begin().
You could improve this further by using reverse iterators but I’ll leave that up to you.

How to create an auxiliary data structure to keep track of heap indices in a minheap for the decrease_key operation in c++

I think this is probably a trivial problem to solve but I have been struggling with this for past few days.
I have the following vector: v = [7,3,16,4,2,1]. I was able to implement with some help from google simple minheap algorithm to get the smallest element in each iteration. After extraction of the minimum element, I need to decrease the values of some of the elements and then bubble them up.
The issue I am having is that I want find the elements whose value has to be reduced in the heap in constant time, then reduce that value and then bubble it up.
After the heapify operation, the heap_vector v_h looks like this: v_h = [1,2,7,4,3,16]. When I remove the min element 1, then the heap vector becomes, [2,3,7,4,16]. But before we do the swap and bubble up, say I want to change the values of 7 to 4, 16 to 4 and 4 to 3.5 . But I am not sure where they will be in the heap. The indices of values of the elements that have to be decreased will be given with respect to the original vector v. I figured out that I need to have an auxiliary data structure that can keep track of the heap indices in relation to the original order of the elements (the heap index vector should look like h_iv = [2,4,5,3,1,0] after all the elements have been inserted into the minheap. And whenever an element is deleted from the minheap, the heap_index should be -1. I created a vector to try to update the heap indices whenever there is a change but I am unable to do it.
I am pasting my work here and also at https://onlinegdb.com/SJR4LqQO4
Some of the work I had tried is commented out. I am unable to map the heap indices when there is a swap in the bubble up or bubble down operations. I will be very grateful to anyone who can lead me in a direction to solve my problem. Please also let me know if I have to rethink some of my logic.
The .hpp file
#ifndef minheap_hpp
#define minheap_hpp
#include <stdio.h>
// #include "helper.h"
#include <vector>
class minheap
{
public:
std::vector<int> vect;
std::vector<int> heap_index;
void bubble_down(int index);
void bubble_up(int index);
void Heapify();
public:
minheap(const std::vector<int>& input_vector);
minheap();
void insert(int value);
int get_min();
void delete_min();
void print_heap_vector();
};
#endif /* minheap_hpp */
The .cpp file
#include "minheap.hpp"
minheap::minheap(const std::vector<int>& input_vector) : vect(input_vector)
{
Heapify();
}
void minheap::Heapify()
{
int length = static_cast<int>(vect.size());
// auto start = 0;
// for (auto i = 0; i < vect.size(); i++){
// heap_index.push_back(start);
// start++;
// }
for(int i=length/2-1; i>=0; --i)
{
bubble_down(i);
}
}
void minheap::bubble_down(int index)
{
int length = static_cast<int>(vect.size());
int leftChildIndex = 2*index + 1;
int rightChildIndex = 2*index + 2;
if(leftChildIndex >= length){
return;
}
int minIndex = index;
if(vect[index] > vect[leftChildIndex])
{
minIndex = leftChildIndex;
}
if((rightChildIndex < length) && (vect[minIndex] > vect[rightChildIndex]))
{
minIndex = rightChildIndex;
}
if(minIndex != index)
{
std::swap(vect[index], vect[minIndex]);
// std::cout << "swap " << index << " - " << minIndex << "\n";
// auto a = heap_index[heap_index[index]];
// auto b = heap_index[heap_index[minIndex]];
// heap_index[a] = b;
// heap_index[b] = a;
// print_vector(heap_index);
bubble_down(minIndex);
}
}
void minheap::bubble_up(int index)
{
if(index == 0)
return;
int par_index = (index-1)/2;
if(vect[par_index] > vect[index])
{
std::swap(vect[index], vect[par_index]);
bubble_up(par_index);
}
}
void minheap::insert(int value)
{
int length = static_cast<int>(vect.size());
vect.push_back(value);
bubble_up(length);
}
int minheap::get_min()
{
return vect[0];
}
void minheap::delete_min()
{
int length = static_cast<int>(vect.size());
if(length == 0)
{
return;
}
vect[0] = vect[length-1];
vect.pop_back();
bubble_down(0);
}
void minheap::print_heap_vector(){
// print_vector(vect);
}
and the main file
#include <iostream>
#include <iostream>
#include "minheap.hpp"
int main(int argc, const char * argv[]) {
std::vector<int> vec {7, 3, 16, 4, 2, 1};
minheap mh(vec);
// mh.print_heap_vector();
for(int i=0; i<3; ++i)
{
auto a = mh.get_min();
mh.delete_min();
// mh.print_heap_vector();
std::cout << a << "\n";
}
// std::cout << "\n";
return 0;
}
"I want to change the values of 7 to 4, 16 to 4 and 4 to 3.5 . But I am not sure where they will be in the heap. The indices of values of the elements that have to be decreased will be given with respect to the original vector v. ... Please also let me know if I have to rethink some of my logic."
Rather than manipulate the values inside the heap, I would suggest keeping the values that need changing inside a vector (possibly v itself). The heap could be based on elements that are a struct (or class) that holds an index into the corresponding position in the vector with the values, rather than hold the (changing) value itself.
The struct (or class) would implement an operator< function that compares the values retrieved from the two vector locations for the respective index values. So, instead of storing the comparison value in the heap elements and comparing a < b, you would store index positions i and j and so on and compare v[i] < v[j] for the purpose of heap ordering.
In this way, the positions of the numerical values you need to update will never change from their original positions. The position information will never go stale (as I understand it from your description).
Of course, when you make changes to those stored values in the vector, that could easily invalidate any ordering that might have existed in the heap itself. As I understand your description, that much was necessarily true in any case. Therefore, depending on how you change the values, you might need to do a fresh make_heap to restore proper heap ordering. (That isn't clear, since it depends on whether your intended changes violate heap assumptions, but it would be a safe thing to assume unless there are strong assurances otherwise.)
I think the rest is pretty straight forward. You can still operate the heap as you intended before. For ease you might even give the struct (or class) a lookup function to return the current value at it's corresponding position in the vector, if you need that (rather than the index) as you pop out minimum values.
p.s. Here is a variation on the same idea.
In the original version above, one would likely need to also store a pointer to the location of the vector that held the vector of values, possibly as a shared static pointer of that struct (or class) so that all the members could dereference the pointer to that vector in combination with the index values to look up the particular member associated with that element.
If you prefer, instead of storing that shared vector pointer and an index in each member, each struct (or class) instance could more simply store a pointer (or iterator) directly to the corresponding value's location. If the values are integers, the heap element struct's member value could be int pointer. While each pointer might be larger than an index value, this does have the advantage that it eliminates any assumption about the data structure that holds the compared values and it is even simpler/faster to dereference vs. lookup with an index into the vector. (Both are constant time.)
One caution: In this alternate approach, the pointer values would be invalidated if you were to cause the vector's storage positions to change, e.g. by pushing in new values and expanding it in a way that forces it to reallocate it's space. I'm assuming you only need to change values, not expand the number of values after you've begun to use the heap. But if you did need to do that, that would be one reason to prefer index values, since they remain valid after expanding the vector (unlike pointers).
p.p.s. This technique is also valuable when the objects that you want to compare in the heap are large. Rather than have the heap perform many copy operations on large objects as it reorders the positions of the heap elements, by storing only pointers (or index values) the copying is much more efficient. In fact, this makes it possible to use heaps on objects that you might not want to copy at all.
Here is a quick idea of one version of the comparison function (with some class context now added).
class YourHeapElementClassName
{
public:
// constructor
explicit YourHeapElementClassName(theTypeOfYourComparableValueOrObject & val)
: m_valPointer(&val)
{
}
bool operator<(const YourHeapElementClassName & other) const
{
return *m_valPointer < *(other.m_valPointer);
}
...
private:
theTypeOfYourComparableValueOrObject * m_valPointer;
}; // YourHeapElementClassName
// and later instead of making a heap of int or double,
// you make a heap of YourHeapElementClassName objects
// that you initialize so each points to a value in v
// by using the constructor above with each v member.
// If you (probably) don't need to change the v values
// through these heap objects, the member value could be
// a pointer to a const value and the constructor could
// have a const reference argument for the original value.
If you had need to do this with different types of values or objects, the pointer approach could be implemented with a template that generalizes on the type of value or object and holds a pointer to that general type.

What really is a deque in STL?

I was looking at STL containers and trying to figure what they really are (i.e. the data structure used), and the deque stopped me: I thought at first that it was a double linked list, which would allow insertion and deletion from both ends in constant time, but I am troubled by the promise made by the operator [] to be done in constant time. In a linked list, arbitrary access should be O(n), right?
And if it's a dynamic array, how can it add elements in constant time? It should be mentioned that reallocation may happen, and that O(1) is an amortized cost, like for a vector.
So I wonder what is this structure that allows arbitrary access in constant time, and at the same time never needs to be moved to a new bigger place.
A deque is somewhat recursively defined: internally it maintains a double-ended queue of chunks of fixed size. Each chunk is a vector, and the queue (“map” in the graphic below) of chunks itself is also a vector.
There’s a great analysis of the performance characteristics and how it compares to the vector over at CodeProject.
The GCC standard library implementation internally uses a T** to represent the map. Each data block is a T* which is allocated with some fixed size __deque_buf_size (which depends on sizeof(T)).
From overview, you can think deque as a double-ended queue
The datas in deque are stored by chuncks of fixed size vector, which are
pointered by a map(which is also a chunk of vector, but its size may change)
The main part code of the deque iterator is as below:
/*
buff_size is the length of the chunk
*/
template <class T, size_t buff_size>
struct __deque_iterator{
typedef __deque_iterator<T, buff_size> iterator;
typedef T** map_pointer;
// pointer to the chunk
T* cur;
T* first; // the begin of the chunk
T* last; // the end of the chunk
//because the pointer may skip to other chunk
//so this pointer to the map
map_pointer node; // pointer to the map
}
The main part code of the deque is as below:
/*
buff_size is the length of the chunk
*/
template<typename T, size_t buff_size = 0>
class deque{
public:
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef __deque_iterator<T, buff_size> iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
protected:
typedef pointer* map_pointer;
// allocate memory for the chunk
typedef allocator<value_type> dataAllocator;
// allocate memory for map
typedef allocator<pointer> mapAllocator;
private:
//data members
iterator start;
iterator finish;
map_pointer map;
size_type map_size;
}
Below i will give you the core code of deque, mainly about three parts:
iterator
How to construct a deque
1. iterator(__deque_iterator)
The main problem of iterator is, when ++, -- iterator, it may skip to other chunk(if it pointer to edge of chunk). For example, there are three data chunks: chunk 1,chunk 2,chunk 3.
The pointer1 pointers to the begin of chunk 2, when operator --pointer it will pointer to the end of chunk 1, so as to the pointer2.
Below I will give the main function of __deque_iterator:
Firstly, skip to any chunk:
void set_node(map_pointer new_node){
node = new_node;
first = *new_node;
last = first + chunk_size();
}
Note that, the chunk_size() function which compute the chunk size, you can think of it returns 8 for simplify here.
operator* get the data in the chunk
reference operator*()const{
return *cur;
}
operator++, --
// prefix forms of increment
self& operator++(){
++cur;
if (cur == last){ //if it reach the end of the chunk
set_node(node + 1);//skip to the next chunk
cur = first;
}
return *this;
}
// postfix forms of increment
self operator++(int){
self tmp = *this;
++*this;//invoke prefix ++
return tmp;
}
self& operator--(){
if(cur == first){ // if it pointer to the begin of the chunk
set_node(node - 1);//skip to the prev chunk
cur = last;
}
--cur;
return *this;
}
self operator--(int){
self tmp = *this;
--*this;
return tmp;
}
iterator skip n steps / random access
self& operator+=(difference_type n){ // n can be postive or negative
difference_type offset = n + (cur - first);
if(offset >=0 && offset < difference_type(buffer_size())){
// in the same chunk
cur += n;
}else{//not in the same chunk
difference_type node_offset;
if (offset > 0){
node_offset = offset / difference_type(chunk_size());
}else{
node_offset = -((-offset - 1) / difference_type(chunk_size())) - 1 ;
}
// skip to the new chunk
set_node(node + node_offset);
// set new cur
cur = first + (offset - node_offset * chunk_size());
}
return *this;
}
// skip n steps
self operator+(difference_type n)const{
self tmp = *this;
return tmp+= n; //reuse operator +=
}
self& operator-=(difference_type n){
return *this += -n; //reuse operator +=
}
self operator-(difference_type n)const{
self tmp = *this;
return tmp -= n; //reuse operator +=
}
// random access (iterator can skip n steps)
// invoke operator + ,operator *
reference operator[](difference_type n)const{
return *(*this + n);
}
2. How to construct a deque
common function of deque
iterator begin(){return start;}
iterator end(){return finish;}
reference front(){
//invoke __deque_iterator operator*
// return start's member *cur
return *start;
}
reference back(){
// cna't use *finish
iterator tmp = finish;
--tmp;
return *tmp; //return finish's *cur
}
reference operator[](size_type n){
//random access, use __deque_iterator operator[]
return start[n];
}
template<typename T, size_t buff_size>
deque<T, buff_size>::deque(size_t n, const value_type& value){
fill_initialize(n, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::fill_initialize(size_t n, const value_type& value){
// allocate memory for map and chunk
// initialize pointer
create_map_and_nodes(n);
// initialize value for the chunks
for (map_pointer cur = start.node; cur < finish.node; ++cur) {
initialized_fill_n(*cur, chunk_size(), value);
}
// the end chunk may have space node, which don't need have initialize value
initialized_fill_n(finish.first, finish.cur - finish.first, value);
}
template<typename T, size_t buff_size>
void deque<T, buff_size>::create_map_and_nodes(size_t num_elements){
// the needed map node = (elements nums / chunk length) + 1
size_type num_nodes = num_elements / chunk_size() + 1;
// map node num。min num is 8 ,max num is "needed size + 2"
map_size = std::max(8, num_nodes + 2);
// allocate map array
map = mapAllocator::allocate(map_size);
// tmp_start,tmp_finish poniters to the center range of map
map_pointer tmp_start = map + (map_size - num_nodes) / 2;
map_pointer tmp_finish = tmp_start + num_nodes - 1;
// allocate memory for the chunk pointered by map node
for (map_pointer cur = tmp_start; cur <= tmp_finish; ++cur) {
*cur = dataAllocator::allocate(chunk_size());
}
// set start and end iterator
start.set_node(tmp_start);
start.cur = start.first;
finish.set_node(tmp_finish);
finish.cur = finish.first + num_elements % chunk_size();
}
Let's assume i_deque has 20 int elements 0~19 whose chunk size is 8, and now push_back 3 elements (0, 1, 2) to i_deque:
i_deque.push_back(0);
i_deque.push_back(1);
i_deque.push_back(2);
It's internal structure like below:
Then push_back again, it will invoke allocate new chunk:
push_back(3)
If we push_front, it will allocate new chunk before the prev start
Note when push_back element into deque, if all the maps and chunks are filled, it will cause allocate new map, and adjust chunks.But the above code may be enough for you to understand deque.
Imagine it as a vector of vectors. Only they aren't standard std::vectors.
The outer vector contains pointers to the inner vectors. When its capacity is changed via reallocation, rather than allocating all of the empty space to the end as std::vector does, it splits the empty space to equal parts at the beginning and the end of the vector. This allows push_front and push_back on this vector to both occur in amortized O(1) time.
The inner vector behavior needs to change depending on whether it's at the front or the back of the deque. At the back it can behave as a standard std::vector where it grows at the end, and push_back occurs in O(1) time. At the front it needs to do the opposite, growing at the beginning with each push_front. In practice this is easily achieved by adding a pointer to the front element and the direction of growth along with the size. With this simple modification push_front can also be O(1) time.
Access to any element requires offsetting and dividing to the proper outer vector index which occurs in O(1), and indexing into the inner vector which is also O(1). This assumes that the inner vectors are all fixed size, except for the ones at the beginning or the end of the deque.
(This is an answer I've given in another thread. Essentially I'm arguing that even fairly naive implementations, using a single vector, conform to the requirements of "constant non-amortized push_{front,back}". You might be surprised, and think this is impossible, but I have found other relevant quotes in the standard that define the context in a surprising way. Please bear with me; if I have made a mistake in this answer, it would be very helpful to identify which things I have said correctly and where my logic has broken down. )
In this answer, I am not trying to identify a good implementation, I'm merely trying to help us to interpret the complexity requirements in the C++ standard. I'm quoting from N3242, which is, according to Wikipedia, the latest freely available C++11 standardization document. (It appears to be organized differently from the final standard, and hence I won't quote the exact page numbers. Of course, these rules might have changed in the final standard, but I don't think that has happened.)
A deque<T> could be implemented correctly by using a vector<T*>. All the elements are copied onto the heap and the pointers stored in a vector. (More on the vector later).
Why T* instead of T? Because the standard requires that
"An insertion at either end of the deque invalidates all the iterators
to the deque, but has no effect on the validity of references to
elements of the deque."
(my emphasis). The T* helps to satisfy that. It also helps us to satisfy this:
"Inserting a single element either at the beginning or end of a deque always ..... causes a single call to a constructor of T."
Now for the (controversial) bit. Why use a vector to store the T*? It gives us random access, which is a good start. Let's forget about the complexity of vector for a moment and build up to this carefully:
The standard talks about "the number of operations on the contained objects.". For deque::push_front this is clearly 1 because exactly one T object is constructed and zero of the existing T objects are read or scanned in any way. This number, 1, is clearly a constant and is independent of the number of objects currently in the deque. This allows us to say that:
'For our deque::push_front, the number of operations on the contained objects (the Ts) is fixed and is independent of the number of objects already in the deque.'
Of course, the number of operations on the T* will not be so well-behaved. When the vector<T*> grows too big, it'll be realloced and many T*s will be copied around. So yes, the number of operations on the T* will vary wildly, but the number of operations on T will not be affected.
Why do we care about this distinction between counting operations on T and counting operations on T*? It's because the standard says:
All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.
For the deque, the contained objects are the T, not the T*, meaning we can ignore any operation which copies (or reallocs) a T*.
I haven't said much about how a vector would behave in a deque. Perhaps we would interpret it as a circular buffer (with the vector always taking up its maximum capacity(), and then realloc everything into a bigger buffer when the vector is full. The details don't matter.
In the last few paragraphs, we have analyzed deque::push_front and the relationship between the number of objects in the deque already and the number of operations performed by push_front on contained T-objects. And we found they were independent of each other. As the standard mandates that complexity is in terms of operations-on-T, then we can say this has constant complexity.
Yes, the Operations-On-T*-Complexity is amortized (due to the vector), but we're only interested in the Operations-On-T-Complexity and this is constant (non-amortized).
The complexity of vector::push_back or vector::push_front is irrelevant in this implementation; those considerations involve operations on T* and hence are irrelevant. If the standard was referring to the 'conventional' theoretical notion of complexity, then they wouldn't have explicitly restricted themselves to the "number of operations on the contained objects". Am I overinterpreting that sentence?
deque = double ended queue
A container which can grow in either direction.
Deque is typically implemented as a vector of vectors (a list of vectors can't give constant time random access). While the size of the secondary vectors is implementation dependent, a common algorithm is to use a constant size in bytes.
I was reading "Data structures and algorithms in C++" by Adam Drozdek, and found this useful.
HTH.
A very interesting aspect of STL deque is its implementation. An STL deque is not implemented as a linked list but as an array of pointers to blocks or arrays of data. The number of blocks changes dynamically depending on storage needs, and the size of the array of pointers changes accordingly.
You can notice in the middle is the array of pointers to the data (chunks on the right), and also you can notice that the array in the middle is dynamically changing.
An image is worth a thousand words.
While the standard doesn't mandate any particular implementation (only constant-time random access), a deque is usually implemented as a collection of contiguous memory "pages". New pages are allocated as needed, but you still have random access. Unlike std::vector, you're not promised that data is stored contiguously, but like vector, insertions in the middle require lots of relocating.
deque could be implemented as a circular buffer of fixed size array:
Use circular buffer so we could grow/shrink at both end by adding/removing a fixed sized array with O(1) complexity
Use fixed sized array so it is easy to calculate the index, hence access via index with two pointer dereferences - also O(1)

How is vector implemented in C++

I am thinking of how I can implement std::vector from the ground up.
How does it resize the vector?
realloc only seems to work for plain old stucts, or am I wrong?
it is a simple templated class which wraps a native array. It does not use malloc/realloc. Instead, it uses the passed allocator (which by default is std::allocator).
Resizing is done by allocating a new array and copy constructing each element in the new array from the old one (this way it is safe for non-POD objects). To avoid frequent allocations, often they follow a non-linear growth pattern.
UPDATE: in C++11, the elements will be moved instead of copy constructed if it is possible for the stored type.
In addition to this, it will need to store the current "size" and "capacity". Size is how many elements are actually in the vector. Capacity is how many could be in the vector.
So as a starting point a vector will need to look somewhat like this:
template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_;
typename A::size_type capacity_;
typename A::size_type size_;
A allocator_;
};
The other common implementation is to store pointers to the different parts of the array. This cheapens the cost of end() (which no longer needs an addition) ever so slightly at the expense of a marginally more expensive size() call (which now needs a subtraction). In which case it could look like this:
template <class T, class A = std::allocator<T> >
class vector {
public:
// public member functions
private:
T* data_; // points to first element
T* end_capacity_; // points to one past internal storage
T* end_; // points to one past last element
A allocator_;
};
I believe gcc's libstdc++ uses the latter approach, but both approaches are equally valid and conforming.
NOTE: This is ignoring a common optimization where the empty base class optimization is used for the allocator. I think that is a quality of implementation detail, and not a matter of correctness.
Resizing the vector requires allocating a new chunk of space, and copying the existing data to the new space (thus, the requirement that items placed into a vector can be copied).
Note that it does not use new [] either -- it uses the allocator that's passed, but that's required to allocate raw memory, not an array of objects like new [] does. You then need to use placement new to construct objects in place. [Edit: well, you could technically use new char[size], and use that as raw memory, but I can't quite imagine anybody writing an allocator like that.]
When the current allocation is exhausted and a new block of memory needs to be allocated, the size must be increased by a constant factor compared to the old size to meet the requirement for amortized constant complexity for push_back. Though many web sites (and such) call this doubling the size, a factor around 1.5 to 1.6 usually works better. In particular, this generally improves chances of re-using freed blocks for future allocations.
From Wikipedia, as good an answer as any.
A typical vector implementation consists, internally, of a pointer to
a dynamically allocated array,[2] and possibly data members holding
the capacity and size of the vector. The size of the vector refers to
the actual number of elements, while the capacity refers to the size
of the internal array. When new elements are inserted, if the new size
of the vector becomes larger than its capacity, reallocation
occurs.[2][4] This typically causes the vector to allocate a new
region of storage, move the previously held elements to the new region
of storage, and free the old region. Because the addresses of the
elements change during this process, any references or iterators to
elements in the vector become invalidated.[5] Using an invalidated
reference causes undefined behaviour
Like this:
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_vector.h
(official gcc mirror on github)
///Implement Vector class
class MyVector {
int *int_arr;
int capacity;
int current;
public:
MyVector() {
int_arr = new int[1];
capacity = 1;
current = 0;
}
void Push(int nData);
void PushData(int nData, int index);
void PopData();
int GetData(int index);
int GetSize();
void Print();
};
void MyVector::Push(int data)
{
if (current == capacity){
int *temp = new int[2 * capacity];
for (int i = 0; i < capacity; i++)
{
temp[i] = int_arr[i];
}
delete[] int_arr;
capacity *= 2;
int_arr = temp;
}
int_arr[current] = data;
current++;
}
void MyVector::PushData(int data, int index)
{
if (index == capacity){
Push(index);
}
else
int_arr[index] = data;
}
void MyVector::PopData(){
current--;
}
int MyVector::GetData(int index)
{
if (index < current){
return int_arr[index];
}
}
int MyVector::GetSize()
{
return current;
}
void MyVector::Print()
{
for (int i = 0; i < current; i++) {
cout << int_arr[i] << " ";
}
cout << endl;
}
int main()
{
MyVector vect;
vect.Push(10);
vect.Push(20);
vect.Push(30);
vect.Push(40);
vect.Print();
std::cout << "\nTop item is "
<< vect.GetData(3) << std::endl;
vect.PopData();
vect.Print();
cout << "\nTop item is "
<< vect.GetData(1) << endl;
return 0;
}
It allocates a new array and copies everything over. So, expanding it is quite inefficient if you have to do it often. Use reserve() if you have to use push_back().
You'd need to define what you mean by "plain old structs."
realloc by itself only creates a block of uninitialized memory. It does no object allocation. For C structs, this suffices, but for C++ it does not.
That's not to say you couldn't use realloc. But if you were to use it (note you wouldn't be reimplementing std::vector exactly in this case!), you'd need to:
Make sure you're consistently using malloc/realloc/free throughout your class.
Use "placement new" to initialize objects in your memory chunk.
Explicitly call destructors to clean up objects before freeing your memory chunk.
This is actually pretty close to what vector does in my implementation (GCC/glib), except it uses the C++ low-level routines ::operator new and ::operator delete to do the raw memory management instead of malloc and free, rewrites the realloc routine using these primitives, and delegates all of this behavior to an allocator object that can be replaced with a custom implementation.
Since vector is a template, you actually should have its source to look at if you want a reference – if you can get past the preponderance of underscores, it shouldn't be too hard to read. If you're on a Unix box using GCC, try looking for /usr/include/c++/version/vector or thereabouts.
You can implement them with resizing array implementation.
When the array becomes full, create an array with twice as much the size and copy all the content to the new array. Do not forget to delete the old array.
As for deleting the elements from vector, do resizing when your array becomes a quarter full. This strategy makes prevents any performance glitches when one might try repeated insertion and deletion at half the array size.
It can be mathematically proved that the amortized time (Average time) for insertions is still linear for n insertions which is asymptotically the same as you will get with a normal static array.
realloc only works on heap memory. In C++ you usually want to use the free store.