Let's say I have a vector of long long elements:
std::vector<long long>V {1, 2, 3};
I have seen that in order to iterate over the vector you can do this:
for (auto i = v.begin() ; i != v.end() ; i++)
cout<<*i;
i++ means i grows by 1, but shouldn't the address go up 8 bytes to print the next element? So "growing" part of the for loop(for any type) should look like this:
i += sizeof(v[0]);
I'm assuming an address can hold 1 byte , so if the starting address of an integer would be 1000 , then its total allocation will be represented by adresses:1000 , 1001 , 1002 , 1003.I would like to understand memory better so I'd be thankful if you could help me.
When you increment a pointer it goes up by the "size of" the pointer's type. Remember, for a given pointer i, i[0] is the first element and is equivalent to *(i + 1).
Iterators tend to work in a very similar fashion so their operation is familiar, they feel like pointers due to how operator* and operator-> are implemented.
In other words, the meaning of i++ depends entirely on what i is and what operator++ will do on that type. Seeing ++ does not automatically mean +1. For iterators it has a very specific meaning, and that depends entirely on the type.
For std::vector it moves a pointer up to the next entry. For other structures it might navigate a linked list. You could write your own class where it makes a database call, reads a file from disk, or basically whatever you want.
Now if you do i += sizeof(v[0]) instead then you're moving up an arbitrary number of places in your array. It depends on what the size of that entry is, which depends a lot on your ISA.
std::vector is really simple, it's just a straight up block of memory treated like an array. You can even get a pointer to this via the data() function.
In other words think of i++ as "move up one index" not "move up one byte".
One of the nice things about the way you've written this code is that you don't need to worry about bytes.
Your auto type for i is masking the fact that it's a vector iterator, in other words, a special type that is designed for accessing members of a vector of long long. This means that *i evaluates to a long long pulled from whichever member of the vector i is currently indexing, so i++ advances i to index the next member of the vector.
Now, in terms of low level implementation, the values 1, 2, 3 are probably sitting in adjacent 8-byte blocks of memory, and i is probably implemented as a pointer, and incrementing i is probably implemented by adding 8 to its value, but each of those assumptions is going to depend on your architecture and the implementation of your compiler. And as I said, it's something you probably don't need to worry about.
Related
My question is simple: are std::vector elements guaranteed to be contiguous? In other words, can I use the pointer to the first element of a std::vector as a C-array?
If my memory serves me well, the C++ standard did not make such guarantee. However, the std::vector requirements were such that it was virtually impossible to meet them if the elements were not contiguous.
Can somebody clarify this?
Example:
std::vector<int> values;
// ... fill up values
if( !values.empty() )
{
int *array = &values[0];
for( int i = 0; i < values.size(); ++i )
{
int v = array[i];
// do something with 'v'
}
}
This was missed from C++98 standard proper but later added as part of a TR. The forthcoming C++0x standard will of course contain this as a requirement.
From n2798 (draft of C++0x):
23.2.6 Class template vector [vector]
1 A vector is a sequence container that supports random access iterators. In addition, it supports (amortized)
constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage
management is handled automatically, though hints can be given to improve efficiency. The elements of a
vector are stored contiguously, meaning that if v is a vector where T is some type other
than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
As other answers have pointed out, the contents of a vector is guaranteed to be continuous (excepting bool's weirdness).
The comment that I wanted to add, is that if you do an insertion or a deletion on the vector, which could cause the vector to reallocate it's memory, then you will cause all of your saved pointers and iterators to be invalidated.
The standard does in fact guarantee that a vector is continuous in memory and that &a[0] can be passed to a C function that expects an array.
The exception to this rule is vector<bool> which only uses one bit per bool thus although it does have continuous memory it can't be used as a bool* (this is widely considered to be a false optimization and a mistake).
BTW, why don't you use iterators? That's what they're for.
As other's have already said, vector internally uses a contiguous array of objects. Pointers into that array should be treated as invalid whenever any non-const member function is called IIRC.
However, there is an exception!!
vector<bool> has a specialised implementation designed to save space, so that each bool only uses one bit. The underlying array is not a contiguous array of bool and array arithmetic on vector<bool> doesn't work like vector<T> would.
(I suppose it's also possible that this may be true of any specialisation of vector, since we can always implement a new one. However, std::vector<bool> is the only, err, standard specialisation upon which simple pointer arithmetic won't work.)
I found this thread because I have a use case where vectors using contiguous memory is an advantage.
I am learning how to use vertex buffer objects in OpenGL. I created a wrapper class to contain the buffer logic, so all I need to do is pass an array of floats and a few config values to create the buffer.
I want to be able to generate a buffer from a function based on user input, so the length is not known at compile time. Doing something like this would be the easiest solution:
void generate(std::vector<float> v)
{
float f = generate_next_float();
v.push_back(f);
}
Now I can pass the vector's floats as an array to OpenGL's buffer-related functions. This also removes the need for sizeof to determine the length of the array.
This is far better than allocating a huge array to store the floats and hoping I made it big enough, or making my own dynamic array with contiguous storage.
cplusplus.com:
Vector containers are implemented as dynamic arrays; Just as regular arrays, vector containers have their elements stored in contiguous storage locations, which means that their elements can be accessed not only using iterators but also using offsets on regular pointers to elements.
Yes, the elements of a std::vector are guaranteed to be contiguous.
Why doesn't C++ support range based for loop over dynamic arrays? That is, something like this:
int* array = new int[len];
for[] (int i : array) {};
I just invented the for[] statement to rhyme with new[] and delete[]. As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work) so in theory, range based for loop could also be made to work. What is the reason that it's not made to work?
What is the reason that it's not made to work?
A range based loop like
for(auto a : y) {
// ...
}
is just syntactic sugar for the following expression
auto endit = std::end(y);
for(auto it = std::begin(y); it != endit; ++it) {
auto a = *it;
// ...
}
Since std::begin() and std::end() cannot be used with a plain pointer, this can't be applied with a pointer allocated with new[].
As far as I understand, the runtime has the size of the array available (otherwise delete[] could not work)
How delete[] keeps track of the memory block that was allocated with new[] (which isn't necessarily the same size as was specified by the user), is a completely different thing and the compiler most probably doesn't even know how exactly this is implemented.
When you have this:
int* array = new int[len];
The problem here is that your variable called array is not an array at all. It is a pointer. That means it only contains the address of one object (in this case the first element of the array created using new).
For range based for to work the compiler needs two addresses, the beginning and the end of the array.
So the problem is the compiler does not have enough information to do this:
// array is only a pointer and does not have enough information
for(int i : array)
{
}
int* array = new int[len];
for[] (int i : array) {}
There are several points which must be addressed; I'll tackle them one at a time.
Does the run-time knows the size of the array?
In certain conditions, it must. As you pointed out, a call to delete[] will call the destructor of each element (in reserve order) and therefore must know how many there are.
However, by not specifying that the number of elements must be known, and accessible, the C++ standard allows an implementation to omit it whenever the call to the destructor is not required (std::is_trivially_destructible<T>::value evaluates to true).
Can the run-time distinguish between pointer and array?
In general, no.
When you have a pointer, it could point to anything:
a single item, or an item in an array,
the first item in an array, or any other,
an array on the stack, or an array on the heap,
just an array, or an array part of a larger object.
This is the reason what delete[] exists, and using delete here would be incorrect. With delete[], you the user state: this pointer points to the first item of a heap-allocated array.
The implementation can then assume that, for example, in the 8 bytes preceding this first item it can find the size of the array. Without you guaranteeing this, those 8 bytes could be anything.
Then, why not go all the way and create for[] (int i : array)?
There are two reasons:
As mentioned, today an implementation can elide the size on a number of elements; with this new for[] syntax, it would no longer be possible on a per-type basis.
It's not worth it.
Let us be honest, new[] and delete[] are relics of an older time. They are incredibly awkward:
the number of elements has to be known in advance, and cannot be changed,
the elements must be default constructible, or otherwise C-ish,
and unsafe to use:
the number of elements is inaccessible to the user.
There is generally no reason to use new[] and delete[] in modern C++. Most of the times a std::vector should be preferred; in the few instances where the capacity is superfluous, a std::dynarray is still better (because it keeps track of the size).
Therefore, without a valid reason to keep using these statements, there is no motivation to include new semantic constructs specifically dedicated to handling them.
And should anyone be motivated enough to make such a proposal:
the inhibition of the current optimization, a violation of C++ philosophy of "You don't pay for what you don't use", would likely be held against them,
the inclusion of new syntax, when modern C++ proposals have gone to great lengths to avoid it as much as possible (to the point of having a library defined std::variant), would also likely be held against them.
I recommend that you simply use std::vector.
This is not related to dynamic arrays, it is more general. Of course for dynamic arrays there exists somewhere the size to be able to call destructors (but remember that standard doesn't says anything about that, just that calling delete [] works as intended).
The problem is with pointers in general as given a pointer you can't tell if it correspond to any kind of...what?
Arrays decay to pointers but given a pointer what can you say?
array is not an array, but a pointer and there's no information about the size of the "array". So, compiler can not deduce begin and end of this array.
See the syntax of range based for loop:
{
auto && __range = range_expression ;
for (auto __begin = begin_expr, __end = end_expr;
__begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement
}
}
range_expression - any expression that represents a suitable sequence
(either an array or an object for which begin and end member functions
or free functions are defined, see below) or a braced-init-list.
auto works at compile time.So, begin_expr and end_expr doesn't at deduct runtime.
The reason is that, given only the value of the pointer array, the compiler (and your code) has no information about what it points at. The only thing known is that array has a value which is the address of a single int.
It could point at the first element of a statically allocated array. It could point at an element in the middle of a dynamically allocated array. It could point at a member of a data structure. It could point at an element of an array that is within a data structure. The list goes on.
Your code will make ASSUMPTIONS about what the pointer points at. It may assume it is an array of 50 elements. Your code may access the value of len, and assume array points at the (first element of) an array of len elements. If your code gets it right, all works as intended. If your code gets it wrong (e.g. accessing the 50th element of an array with 5 elements) then the behaviour is simply undefined. It is undefined because the possibilities are endless - the book-keeping to keep track of what an arbitrary pointer ACTUALLY points at (beyond the information that there is an int at that address) would be enormous.
You're starting with the ASSUMPTION that array points at the result from new int[len]. But that information is not stored in the value of array itself, so the compiler has no way to work back to a value of len. That would be needed for your "range based" approach to work.
While, yes, given array = new int[len], the machinery invoked by delete [] array will work out that array has len elements, and release them. But delete [] array also has undefined behaviour if array results from something other than a new [] expression. Even
int *array = new int;
delete [] array;
gives undefined behaviour. The "runtime" is not required to work out, in this case, that array is actually the address of a single dynamically allocated int (and not an actual array). So it is not required to cope with that.
I was working with std::reverse_iterator today and was thinking about how it works with values created by calling begin on a container. According to cppreference, if I have reverse_iterator r constructed from iterator i, the following has to hold &*r == &*(i-1).
However, this would mean that if I write this
std::vector<int> vec = {1, 2, 3, 4, 5};
auto iter = std::make_reverse_iterator(begin(vec));
iter now points to piece of memory that is placed before begin(vec), which is out of bounds. By strict interpretation of C++ standard, this invokes UB.
(There is specific provision for pointer/iterator to element 1-past-the-end of the array, but as far as I know, none for pointer/iterator to element 1-ahead-of-the-start of an array.)
So, am I reading the link wrong, or is there a specific provision in the standard for this case, or is it that when using reverse_iterator, the whole array is taken as reversed and as such, pointer to ahead of the array is actually pointer past the end?
Yes, you are reading it wrong.
There is no need for reverse-iterators to store pointers pointing before the start of an element.
To illustrate, take an array of 2 elements:
int a[2];
These are the forward-iterators:
a+0 a+1 a+2 // The last one is not dereferenceable
The reverse-iterators would be represented with these exact same values, in reverse order:
a+2 a+1 a+0 // The last one cannot be dereferenced
So, while dereferencing a normal iterator is really straightforward, a reverse-iterator-dereference is slightly more complicated: pointer[-1] (That's for random-access iterators, the others are worse: It copy = pointer; --copy; return *copy;).
Be aware that using forward-iterators is far more common than reverse-iterators, thus the former are more likely to have optimized code for them than the latter. Generic code which does not hit that corner is about as likely to run better with either type though, due to all the transformations a decent optimizing compiler does.
std::make_reverse_iterator(begin(vec)) is not dereferenceable, in the same way that end(vec) is not dereferenceable. It doesn't "point" to any valid object, and that's OK.
Is there a specific data structure that a deque in the C++ STL is supposed to implement, or is a deque just this vague notion of an array growable from both the front and the back, to be implemented however the implementation chooses?
I used to always assume a deque was a circular buffer, but I was recently reading a C++ reference here, and it sounds like a deque is some kind of array of arrays. It doesn't seem like it's a plain old circular buffer. Is it a gap buffer, then, or some other variant of growable array, or is it just implementation-dependent?
UPDATE AND SUMMARY OF ANSWERS:
It seems the general consensus is that a deque is a data structure such that:
the time to insert or remove an element should be constant at beginning or end of the list and at most linear elsewhere. If we interpret this to mean true constant time and not amortized constant time, as someone comments, this seems challenging. Some have argued that we should not interpret this to mean non-amortized constant time.
"A deque requires that any insertion shall keep any reference to a member element valid. It's OK for iterators to be invalidated, but the members themselves must stay in the same place in memory." As someone comments: This is easy enough by just copying the members to somewhere on the heap and storing T* in the data structure under the hood.
"Inserting a single element either at the beginning or end of a deque always takes constant time and causes a single call to a constructor of T." The single constructor of T will also be achieved if the data structure stores T* under the hood.
The data structure must have random access.
It seems no one knows how to get a combination of the 1st and 4th conditions if we take the first condition to be "non-amortized constant time". A linked list achieves 1) but not 4), whereas a typical circular buffer achieves 4) but not 1). I think I have an implementation that fulfills both below. Comments?
We start with an implementation someone else suggested: we allocate an array and start placing elements from the middle, leaving space in both the front and back. In this implementation, we keep track of how many elements there are from the center in both the front and back directions, call those values F and B. Then, let's augment this data structure with an auxiliary array that is twice the size of the original array (so now we're wasting a ton of space, but no change in asymptotic complexity). We will also fill this auxiliary array from its middle and give it similar values F' and B'. The strategy is this: every time we add one element to the primary array in a given direction, if F > F' or B > B' (depending on the direction), up to two values are copied from the primary array to the auxiliary array until F' catches up with F (or B' with B). So an insert operation involves putting 1 element into the primary array and copying up to 2 from the primary to the auxiliary, but it's still O(1). When the primary array becomes full, we free the primary array, make the auxiliary array the primary array, and make another auxiliary array that's yet 2 times bigger. This new auxiliary array starts out with F' = B' = 0 and having nothing copied to it (so the resize op is O(1) if a heap allocation is O(1) complexity). Since the auxiliary copies 2 elements for every element added to the primary and the primary starts out at most half-full, it is impossible for the auxiliary to not have caught up with the primary by the time the primary runs out of space again. Deletions likewise just need to remove 1 element from the primary and either 0 or 1 from the auxiliary. So, assuming heap allocations are O(1), this implementation fulfills condition 1). We make the array be of T* and use new whenever inserting to fulfill conditions 2) and 3). Finally, 4) is fulfilled because we are using an array structure and can easily implement O(1) access.
It's implementation specific. All a deque requires is constant time insertion/deletion at the start/end, and at most linear elsewhere. Elements are not required to be contiguous.
Most implementations use what can be described as an unrolled list. Fixed-sized arrays get allocated on the heap and pointers to these arrays are stored in a dynamically sized array belonging to the deque.
A deque is typically implemented as a dynamic array of arrays of T.
(a) (b) (c) (d)
+-+ +-+ +-+ +-+
| | | | | | | |
+-+ +-+ +-+ +-+
^ ^ ^ ^
| | | |
+---+---+---+---+
| 1 | 8 | 8 | 3 | (reference)
+---+---+---+---+
The arrays (a), (b), (c) and (d) are generally of fixed capacity, and the inner arrays (b) and (c) are necessarily full. (a) and (d) are not full, which gives O(1) insertion at both ends.
Imagining that we do a lot of push_front, (a) will fill up, when it's full and an insertion is performed we first need to allocate a new array, then grow the (reference) vector and push the pointer to the new array at the front.
This implementation trivially provides:
Random Access
Reference Preservation on push at both ends
Insertion in the middle that is proportional to min(distance(begin, it), distance(it, end)) (the Standard is slightly more stringent that what you required)
However it fails the requirement of amortized O(1) growth. Because the arrays have fixed capacity whenever the (reference) vector needs to grow, we have O(N/capacity) pointer copies. Because pointers are trivially copied, a single memcpy call is possible, so in practice this is mostly constant... but this is insufficient to pass with flying colors.
Still, push_front and push_back are more efficient than for a vector (unless you are using MSVC implementation which is notoriously slow because of very small capacity for the arrays...)
Honestly, I know of no data structure, or data structure combination, that could satisfy both:
Random Access
and
O(1) insertion at both ends
I do know a few "near" matches:
Amortized O(1) insertion can be done with a dynamic array in which you write in the middle, this is incompatible with the "reference preservation" semantics of the deque
A B+ Tree can be adapted to provide an access by index instead of by key, the times are close to constants, but the complexity is O(log N) for access and insertion (with a small constant), it requires using Fenwick Trees in the intermediate level nodes.
Finger Trees can be adapted similarly, once again it's really O(log N) though.
A deque<T> could be implemented correctly by using a vector<T*>. All the elements are copied onto the heap and the pointers stored in a vector. (More on the vector later).
Why T* instead of T? Because the standard requires that
"An insertion at either end of the deque invalidates all the iterators
to the deque, but has no effect on the validity of references to
elements of the deque."
(my emphasis). The T* helps to satisfy that. It also helps us to satisfy this:
"Inserting a single element either at the beginning or end of a deque always ..... causes a single call to a constructor of T."
Now for the (controversial) bit. Why use a vector to store the T*? It gives us random access, which is a good start. Let's forget about the complexity of vector for a moment and build up to this carefully:
The standard talks about "the number of operations on the contained objects.". For deque::push_front this is clearly 1 because exactly one T object is constructed and zero of the existing T objects are read or scanned in any way. This number, 1, is clearly a constant and is independent of the number of objects currently in the deque. This allows us to say that:
'For our deque::push_front, the number of operations on the contained objects (the Ts) is fixed and is independent of the number of objects already in the deque.'
Of course, the number of operations on the T* will not be so well-behaved. When the vector<T*> grows too big, it'll be realloced and many T*s will be copied around. So yes, the number of operations on the T* will vary wildly, but the number of operations on T will not be affected.
Why do we care about this distinction between counting operations on T and counting operations on T*? It's because the standard says:
All of the complexity requirements in this clause are stated solely in terms of the number of operations on the contained objects.
For the deque, the contained objects are the T, not the T*, meaning we can ignore any operation which copies (or reallocs) a T*.
I haven't said much about how a vector would behave in a deque. Perhaps we would interpret it as a circular buffer (with the vector always taking up its maximum capacity(), and then realloc everything into a bigger buffer when the vector is full. The details don't matter.
In the last few paragraphs, we have analyzed deque::push_front and the relationship between the number of objects in the deque already and the number of operations performed by push_front on contained T-objects. And we found they were independent of each other. As the standard mandates that complexity is in terms of operations-on-T, then we can say this has constant complexity.
Yes, the Operations-On-T*-Complexity is amortized (due to the vector), but we're only interested in the Operations-On-T-Complexity and this is constant (non-amortized).
Epilogue: the complexity of vector::push_back or vector::push_front is irrelevant in this implementation; those considerations involve operations on T* and hence is irrelevant.
(Making this answer a community-wiki. Please get stuck in.)
First things first: A deque requires that any insertion to the front or back shall keep any reference to a member element valid. It's OK for iterators to be invalidated, but the members themselves must stay in the same place in memory. This is easy enough by just copying the members to somewhere on the heap and storing T* in the data structure under the hood. See this other StackOverflow question " About deque<T>'s extra indirection "
(vector doesn't guarantee to preserve either iterators or references, whereas list preserves both).
So let's just take this 'indirection' for granted and look at the rest of the problem. The interesting bit is the time to insert or remove from the beginning or end of the list. At first, it looks like a deque could trivially be implemented with a vector, perhaps by interpreting it as a circular buffer.
BUT A deque must satisfy "Inserting a single element either at the beginning or end of a
deque always takes constant time and causes a single call to a constructor of T."
Thanks to the indirection we've already mentioned, it's easy to ensure there is just one constructor call, but the challenge is to guarantee constant time. It would be easy if we could just use constant amortized time, which would allow the simple vector implementation, but it must be constant (non-amortized) time.
My understanding of deque
It allocates 'n' empty contiguous objects from the heap as the first sub-array.
The objects in it are added exactly once by the head pointer on insertion.
When the head pointer comes to the end of an array, it
allocates/links a new non-contiguous sub-array and adds objects there.
They are removed exactly once by the tail pointer on extraction.
When the tail pointer finishes a sub-array of objects, it moves
on to the next linked sub-array, and deallocates the old.
The intermediate objects between the head and tail are never moved in memory by deque.
A random access first determines which sub-array has the
object, then access it from it's relative offset with in the subarray.
This is an answer to user gravity's challenge to comment on the 2-array-solution.
Some details are discussed here
A suggestion for improvement is given
Discussion of details:
The user "gravity" has already given a very neat summary. "gravity" also challenged us to comment on the suggestion of balancing the number of elements between two arrays in order to achieve O(1) worst case (instead of average case) runtime. Well, the solution works efficiently if both arrays are ringbuffers, and it appears to me that it is sufficient to split the deque into two segments, balanced as suggested.
I also think that for practical purposes the standard STL implementation is at least good enough, but under realtime requirements and with a properly tuned memory management one might consider using this balancing technique. There is also a different implementation given by Eric Demaine in an older Dr.Dobbs article, with similar worst case runtime.
Balancing the load of both buffers requires to move between 0 or 3 elements, depending on the situation. For instance, a pushFront(x) must, if we keep the front segment in the primary array, move the last 3 elements from the primary ring to the auxiliary ring in order to keep the required balance. A pushBack(x) at the rear must get hold of the load difference and then decide when it is time to move one element from the primary to the auxiliary array.
Suggestion for improvement:
There is less work and bookkeeping to do if front and rear are both stored in the auxiliary ring. This can be achieved by cutting the deque into three segments q1,q2,q3, arranged in the following manner: The front part q1 is in the auxiliary ring (the doubled-sized one) and may start at any offset from which the elements are arranged clockwise in subsequent order. The number of elements in q1 are exactly half of all elements stored in the auxiliary ring. The rear part q3 is also in the auxilary ring, located exactly opposite to part q1 in the auxilary ring, also clockwise in subsequent order. This invariant has to be kept between all deque operations. Only the middle part q2 is located (clockwise in subsequent order) in the primary ring.
Now, each operation will either move exactly one element, or allocate a new empty ringbuffer when either one gets empty. For instance, a pushFront(x) stores x before q1 in the auxilary ring. In order to keep the invariant, we move the last element from q2 to the front of the rear q3. So both, q1 and q3 get an additional element at their fronts and thus stay opposite to each other. PopFront() works the other way round, and the rear operations work the same way. The primary ring (same as the middle part q2) goes empty exactly when q1 and q3 touch each other and form a full circle of subsequent Elements within the auxiliary ring. Also, when the deque shrinks, q1,q3 will go empty exactly when q2 forms a proper circle in the primary ring.
The datas in deque are stored by chuncks of fixed size vector, which are
pointered by a map(which is also a chunk of vector, but its size may change)
The main part code of the deque iterator is as below:
/*
buff_size is the length of the chunk
*/
template <class T, size_t buff_size>
struct __deque_iterator{
typedef __deque_iterator<T, buff_size> iterator;
typedef T** map_pointer;
// pointer to the chunk
T* cur;
T* first; // the begin of the chunk
T* last; // the end of the chunk
//because the pointer may skip to other chunk
//so this pointer to the map
map_pointer node; // pointer to the map
}
The main part code of the deque is as below:
/*
buff_size is the length of the chunk
*/
template<typename T, size_t buff_size = 0>
class deque{
public:
typedef T value_type;
typedef T& reference;
typedef T* pointer;
typedef __deque_iterator<T, buff_size> iterator;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
protected:
typedef pointer* map_pointer;
// allocate memory for the chunk
typedef allocator<value_type> dataAllocator;
// allocate memory for map
typedef allocator<pointer> mapAllocator;
private:
//data members
iterator start;
iterator finish;
map_pointer map;
size_type map_size;
}
Below i will give you the core code of deque, mainly about two parts:
iterator
Simple function about deque
1. iterator(__deque_iterator)
The main problem of iterator is, when ++, -- iterator, it may skip to other chunk(if it pointer to edge of chunk). For example, there are three data chunks: chunk 1,chunk 2,chunk 3.
The pointer1 pointers to the begin of chunk 2, when operator --pointer it will pointer to the end of chunk 1, so as to the pointer2.
Below I will give the main function of __deque_iterator:
Firstly, skip to any chunk:
void set_node(map_pointer new_node){
node = new_node;
first = *new_node;
last = first + chunk_size();
}
Note that, the chunk_size() function which compute the chunk size, you can think of it returns 8 for simplify here.
operator* get the data in the chunk
reference operator*()const{
return *cur;
}
operator++, --
// prefix forms of increment
self& operator++(){
++cur;
if (cur == last){ //if it reach the end of the chunk
set_node(node + 1);//skip to the next chunk
cur = first;
}
return *this;
}
// postfix forms of increment
self operator++(int){
self tmp = *this;
++*this;//invoke prefix ++
return tmp;
}
self& operator--(){
if(cur == first){ // if it pointer to the begin of the chunk
set_node(node - 1);//skip to the prev chunk
cur = last;
}
--cur;
return *this;
}
self operator--(int){
self tmp = *this;
--*this;
return tmp;
}
2. Simple function about deque
common function of deque
iterator begin(){return start;}
iterator end(){return finish;}
reference front(){
//invoke __deque_iterator operator*
// return start's member *cur
return *start;
}
reference back(){
// cna't use *finish
iterator tmp = finish;
--tmp;
return *tmp; //return finish's *cur
}
reference operator[](size_type n){
//random access, use __deque_iterator operator[]
return start[n];
}
If you want to understand deque more deeply you can also see this question https://stackoverflow.com/a/50959796/6329006
My question is simple: are std::vector elements guaranteed to be contiguous? In other words, can I use the pointer to the first element of a std::vector as a C-array?
If my memory serves me well, the C++ standard did not make such guarantee. However, the std::vector requirements were such that it was virtually impossible to meet them if the elements were not contiguous.
Can somebody clarify this?
Example:
std::vector<int> values;
// ... fill up values
if( !values.empty() )
{
int *array = &values[0];
for( int i = 0; i < values.size(); ++i )
{
int v = array[i];
// do something with 'v'
}
}
This was missed from C++98 standard proper but later added as part of a TR. The forthcoming C++0x standard will of course contain this as a requirement.
From n2798 (draft of C++0x):
23.2.6 Class template vector [vector]
1 A vector is a sequence container that supports random access iterators. In addition, it supports (amortized)
constant time insert and erase operations at the end; insert and erase in the middle take linear time. Storage
management is handled automatically, though hints can be given to improve efficiency. The elements of a
vector are stored contiguously, meaning that if v is a vector where T is some type other
than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().
As other answers have pointed out, the contents of a vector is guaranteed to be continuous (excepting bool's weirdness).
The comment that I wanted to add, is that if you do an insertion or a deletion on the vector, which could cause the vector to reallocate it's memory, then you will cause all of your saved pointers and iterators to be invalidated.
The standard does in fact guarantee that a vector is continuous in memory and that &a[0] can be passed to a C function that expects an array.
The exception to this rule is vector<bool> which only uses one bit per bool thus although it does have continuous memory it can't be used as a bool* (this is widely considered to be a false optimization and a mistake).
BTW, why don't you use iterators? That's what they're for.
As other's have already said, vector internally uses a contiguous array of objects. Pointers into that array should be treated as invalid whenever any non-const member function is called IIRC.
However, there is an exception!!
vector<bool> has a specialised implementation designed to save space, so that each bool only uses one bit. The underlying array is not a contiguous array of bool and array arithmetic on vector<bool> doesn't work like vector<T> would.
(I suppose it's also possible that this may be true of any specialisation of vector, since we can always implement a new one. However, std::vector<bool> is the only, err, standard specialisation upon which simple pointer arithmetic won't work.)
I found this thread because I have a use case where vectors using contiguous memory is an advantage.
I am learning how to use vertex buffer objects in OpenGL. I created a wrapper class to contain the buffer logic, so all I need to do is pass an array of floats and a few config values to create the buffer.
I want to be able to generate a buffer from a function based on user input, so the length is not known at compile time. Doing something like this would be the easiest solution:
void generate(std::vector<float> v)
{
float f = generate_next_float();
v.push_back(f);
}
Now I can pass the vector's floats as an array to OpenGL's buffer-related functions. This also removes the need for sizeof to determine the length of the array.
This is far better than allocating a huge array to store the floats and hoping I made it big enough, or making my own dynamic array with contiguous storage.
cplusplus.com:
Vector containers are implemented as dynamic arrays; Just as regular arrays, vector containers have their elements stored in contiguous storage locations, which means that their elements can be accessed not only using iterators but also using offsets on regular pointers to elements.
Yes, the elements of a std::vector are guaranteed to be contiguous.