I have been trying to figure out an efficient way of managing dynamic arrays which I may change occasionally but would like to randomly access and iterate over often.
I would like to be able to:
store the array in a continuous data block (reduce cache misses)
access each element individually and independently of the array handle (pointers > indices)
resize the array (dynamic)
So in order to achieve this I have been trying things out using std::vector<T>::iterator, and it worked very well, until recently, when I resized the vector (e.g. calling push_back()) that I was storing iterators of. All the iterators became invalid, because they were pointing to stale memory.
Is there any efficient (possibly STL-)way of keeping the iterator pointers up to date? Or do I have to update each Iterator manually?
Is this whole approach even worthwhile? Should I stick with indices?
I have used indices before and it was ok, but I have changed my approach because it still wasn´t good. I would always have to drag the entire array into scope and the indices could be easily used for any array. also there is no perfect way of defining a "NULL" index (none I know about).
What about the option to update all pointers along with a resize operation? All you would have to do is to store the original vector::begin, resize the vector and afterwards update all pointers to vector.begin() + (ptr - prevBegin) and resize operations is already something you should try to avoid.

Fully achieving all 3 of your goals is impossible. If you are fully contiguous, then you have one memory block with a finite size, and the only way to get more memory is to ask for more memory, which will not be contiguous with the memory you already have. So you have to sacrifice at least one requirement, to at least some degree:
If you are willing to partly sacrifice contiguity, you can use a std::deque. This is an array-of-arrays kind of structure. It doesn't invalidate references, for I think any operation that increases its size. It depends on the details of your data type but generally its performance is much closed to a contiguous array than a linked list. Well done but old (5 year) benchmarks: https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html. Another option is to write a chunking allocator, to use either with deque or another structure. This is quite a bit more work though.
If you can use indices, then you can just use a vector
If you don't need to resize, you can still just use a vector and never resize it.
Unless you have a good reason, I would stick with indices. If your main performance bottlenecks are iteration related over a large number of elements (as your contiguity requirement implies), then this whole indexing thing should really be a non-issue. If you do have a very good reason for avoiding indices (which you haven't stated), then I would profile the deque versus the vector on the main loop operation to see how much worse the deque really does. It might be barely worse, and if neither deque nor vector work well enough for you, the next alternatives are quite a bit more work (probably involving allocators or a custom data structure).

Depending on your needs, if you can use the following data structure:
then no matter how the vector is resized, if you access your data via a Foo*, the pointer to foo will not be invalidated.
As the number of Foos you need to store in your vector changes, the vector may need to resize it's internal contiguous block of memory, which means any iterators you have pointing inside the vector will be invalidated when the vector resizes.
(You can read more here on C++0x iterator invalidation rules)
However, since the object stored in the vector is a pointer to an object elsewhere in the heap, the pointed-to-object (Foo in this example), will not be invalidated.
Note that the vector owns Foo (as alluded to by std::unique_ptr<Foo>), whilst you can store a non-owning pointer to Foo by keeping a Foo* as the means of accessing your Foo data.
So long as the vector outlives your need to access Foo via your Foo*, then you will not have any lifetime issues.
So in terms of your requirements:
store the array in a continuous data block (reduce cache misses)
yes, std::vector achieves this
access each element individually and independently of the array handle (pointers > indices)
yes, store a Foo* as your means of accessing each element individually, and that remains independent of the array handle (vector::iterator)
resize the array (dynamic)
yes, std::vector achieves this, automatically, resizing for you when you need it to.
Using a smart pointer (in this example std::unique_ptr) in the vector means memory management is also handled automatically for you. (Just make sure you don't try to access a Foo* after the vector is destroyed.
It has been pointed out in the comments that storing a std::unique_ptr<Foo> in the vector violates your requirement for the objects to be stored in contiguous memory (if indeed that is what you mean by store the array in contiguous memory, as the vector's underlying array will be contiguous, but accessing the Foo objects will incur an indirection).
However, if you use a suitable allocator (eg arena allocator) for both the vector and the Foo objects, then you will have a higher chance of suffering fewer cache misses, as your Foo objects will exist near to the memory used by your vector, thereby having a higher chance of being in cache when iterating over the vector.


Should I use a deque or vector to store huge deques in C++?

Imagine I have 40 huge deques each storing data of a user-defined type. 40 isn't that many, but the deques themselves are huge (hence why I've elected to use deques over vectors). My question is if I want a container for these 40 deques, should that container be a vector or a deque?
If I opt for a vector to contain my huge deques would that make the vector huge in memory, or do the elements of the vector simply point to the deques? If the containing vector becomes huge as a result of storing the 40 huge deques then will I need to use deques instead to avoid the contiguous memory-related problems I ran into when I initially opted to use deques for the user-defined type containers?
class myClass {
// lots of data members resulting in large class object
int main(){
std::deque<myClass> foo;
for(int i=0, i<10000000, i++){
myClass classObject;
We now have a deque with 1000000 elements containing our class objects. Imagine I create 40 of these deques.
Now if I want a container for these 40 deques should I do this;
std::vector< std::deque<myClass> > bar
Or should I do this;
std::deque< std::deque<myClass> > bar
do the elements of the vector simply point to the deques
To the question you asked: No.
To the question you meant: Yes.
In vector<deque<T>> the vector elements are the actual deque objects themselves, not pointers to them. But std::deque objects are pretty thin, since they in turn have pointers to the double-ended data structure where the content of the deque resides.
Your 40 datasets will not be stored contiguous to each other in memory by using vector<deque<T>> (or even vector<vector<T>>) -- only when the inner container is allocation-free like std::array would the data of all containers be stored together.
If on the other hand you really want your vector elements to be pointers to deque objects, then you can use vector<unique_ptr<deque<T>>>.
Containers store their actual data in external storage from the Free Store (heap). So there is no benefit to using a std::deque for storing just 40-ish deques because it will only be storing the deques' internal housekeeping data which is only a few bytes. So I would use a std::vector for this.
If the number is exactly 40 then I would conside a std::array.
Generally, as his holiness Stroustrup put it himself,
I don't know your data structure, but I bet std::vector can beat it
meaning that usually you want linear structure and not something that might be a linked list or anything else underneath, because generally "rich" computation environments (read: PCs and such) are extremely good at optimizing linear access.
However, if it's true that each of these data-carrying class objects are large, and I mean "about as large as the CPU cache entries", then this won't make a huge difference. Use a deque if that's the right data structure (it's linear most of the time, anyway); also, you can also tell it to pre-allocate memory for 1000000 elements if you know it beforehand, by passing that number to the constructor.
Anyway, the memory structure you're using won't make a significant impact on how much memory you'll need, effectively. you'll need 40 * 10000000 elements. That's it. If that's more than the memory you have, you'll need to get more memory, or write a better algorithm.
Lets consider the options, as I see them they are:
std::vector<std::deque *>
std::deque<std::deque *>
The difference between container<std::deque> & container<std::deque *> is that all the members will be stored contiguously (deque is mostly but not entirely contiguous).
This means that with container<std::deque> accessing the contents will generally not cache miss but the actual data is stored in the object by pointer and will. whereas with container<std::deque *> the values in the container will cache miss and then again when you access the data.
Whether container should be a deque or a vector depends on whether you are likely to grow it back and whether you care about variable rate iteration. i.e. a vector is contiguous once set and so will not cache miss while iterating linearly over it, this isn't guaranteed for a deque, when it hops sub containers it will likely break the prefetch.
EDIT: I forgot to mention why you might prefer a cache miss over contiguousness, the answer is fragmentation. With huge data sets you put a lot of strain on the heap, increasing the chance an allocation can fail even though there is enough memory to store it because the memory is scattered across the heap, custom allocators are a potential best of both worlds option for fragmentation and cache misses.

push_back objects into vector memory issue C++

Compare the two ways of initializing vector of objects here.
vector<Obj> someVector;
Obj new_obj;
vector<Obj*> ptrVector;
Obj* objptr = new Obj();
The first one push_back actual object instead of the pointer of the object. Is vector push_back copying the value being pushed? My problem is, I have huge object and very long vectors, so I need to find a best way to save memory.
Is the second way better?
Are there other ways to have a vector of objects/pointers that I can find each object later and use the least memory at the same time?
Of the two above options, this third not included one is the most efficient:
std::vector<Obj> someVector;
for (int i = 0; i < preCalculatedSize; ++i)
emplace_back directly constructs the object into the memory that the vector arranges for it. If you reserve prior to use, you can avoid reallocation and moving.
However, if the objects truly are large, then the advantages of cache-coherency are less. So a vector of smart pointers makes sense. Thus the forth option:
std::vector< std::unique_ptr<Obj> > someVector;
std::unique_ptr<Obj> element( new Obj );
someVector.push_back( std::move(element) );
is probably best. Here, we represent the lifetime of the data and how it is accessed in the same structure with nearly zero overhead, preventing it from getting out of sync.
You have to explicitly std::move the std::unique_ptr around when you want to move it. If you need a raw pointer for whatever reason, .get() is how to access it. -> and * and explicit operator bool are all overridden, so you only really need to call .get() when you have an interface that expects a Obj*.
Both of these solutions require C++11. If you lack C++11, and the objects truly are large, then the "vector of pointers to data" is acceptable.
In any case, what you really should do is determine which matches your model best, check performance, and only if there is an actual performance problem do optimizations.
If your Obj class doesn't require polymorphic behavior, then it is better to simply store the Obj types directly in the vector<Obj>.
If you store objects in vector<Obj*>, then you are assuming the responsibility of manually deallocating those objects when they are no longer needed. Better, in this case, to use vector<std::unique_ptr<Obj>> if possible, but again, only if polymorphic behavior is required.
The vector will store the Obj objects on the heap (by default, unless you override the allocator in the vector template). These objects will be stored in contiguous memory, which can also give you better cache locality, depending upon your use case.
The drawback to using vector<Obj> is that frequent insertion/removal from the vector may cause reallocation and copying of your Obj objects. However, that usually will not be the bottleneck in your application, and you should profile it if you feel like it is.
With C++11 move semantics, the implications of copying can be much reduced.
Using a vector<Obj> will take less memory to store if you can reserve the size ahead of time. vector<Obj *> will necessarily use more memory than vector<Obj> if the vector doesn't have to be reallocated, since you have the overhead of the pointers and the overhead of dynamic memory allocation. This overhead may be relatively small though if you only have a few large objects.
However, if you are very close to running out of memory, using vector<Obj> may cause a problem if you can't reserve the correct size ahead of time because you'll temporarily need extra storage when reallocating the vector.
Having a large vector of large objects may also cause an issue with memory fragmentation. If you can create the vector early in the execution of your program and reserve the size, this may not be an issue, but if the vector is created later, you might run into a problem due to memory holes on the heap.
Under the circumstances, I'd consider a third possibility: use std::deque instead of std::vector.
This is kind of a halfway point between the two you've given. A vector<obj> allocates one huge block to hold all the instances of the objects in the vector. A vector<obj *> allocates one block of pointers, but each instance of the object in a block of its own. Therefore, you get N objects plus N pointers.
A deque will create a block of pointers and a number of blocks of objects -- but (at least normally) it'll put a number of objects (call it M) together into a single block, so you get a block of N/M pointers, and N/M of objects.
This avoids many of the shortcomings of either a vector of objects or a vector of pointers. Once you allocate a block of objects, you never have to reallocate or copy them. You do (or may) eventually have to reallocate the block of pointers, but it'll be smaller (by a factor of M) than the vector of pointers if you try to do it by hand.
One caveat: if you're using Microsoft's compiler/standard library, this may not work very well -- they have some strange logic (still present up through VS 2013 RC) that means if your object size is larger than 16, you'll get only one object per block -- i.e., equivalent to your vector<obj *> idea.

Deque - how come "reserve" doesn't exist?

The standard STL vector container has a "reserve" function to reserve uninitialized memory that can be used later to prevent reallocations.
How come that the other deque container hasn't it?
Increasing the size of a std::vector can be costly. When a vector outgrows its reserved space, the entire contents of the vector must be copied (or moved) to a larger reserve.
It is specifically because std::vector resizing can be costly that vector::reserve() exists. reserve() can prepare a std::vector to anticipate reaching a certain size without exceeding its capacity.
Conversely, a deque can always add more memory without needing to relocate the existing elements. If a std::deque could reserve() memory, there would be little to no noticeable benefit.
For vector and string, reserved space prevents later insertions at the end (up to the capacity) from invalidating iterators and references to earlier elements, by ensuring that elements don't need to be copied/moved. This relocation may also be costly.
With deque and list, earlier references are never invalidated by insertions at the end, and elements aren't moved, so the need to reserve capacity does not arise.
You might think that with vector and string, reserving space also guarantees that later insertions will not throw an exception (unless a constructor throws), since there's no need to allocate memory. You might think that the same guarantee would be useful for other sequences, and hence deque::reserve would have a possible use. There is in fact no such guarantee for vector and string, although in most (all?) implementations it's true. So this is not the intended purpose of reserve.
Quoting from C++ Reference
As opposed to std::vector, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.
The storage of a deque is automatically expanded and contracted as needed. Expansion of a deque is cheaper than the expansion of a std::vector because it does not involve copying of the existing elements to a new memory location.
Deque can allocate new memory anywhere it wants and just point to it, unlike vectors which require a continuous block of memory to hold all their elements.
Only vector have. There is no need of reserve function for deque, since elements not stored continuougusly and there is no need to reallocate and move elements when add, or remove elements.
reserve implies allocation of large blocks of contiguous data (like a vector). There is nothing in the dequeue that implies contiguous storage - it's generally implemented more like a list (which you will notice also doesn't have a 'reserve' function).
Thus, a 'reserve' function would make no sense.
There are 2 main types of memory: memories that allocate a single chunk like array and vectors, and distributed memories whose members grabs any empty location to fill in. queue and linkest list structures belong to the second type and they have some special practical advantages such that deleting a particular element does not cause a mass memory movement as opposed to arrays and vectors. Therefore they do not need to reserve any space beforehand, if they need it they just take it by connecting to tip
If you aim for having aligned memory containers you could think about implementing something like this:
std::deque<std::vector> dv; //deque with dynamic size memory aligned vectors.
typedef size_t[N] Mem;
std::deque<Mem> dvf //deque with fixed size memory aligned vectors. Here you can store the raw bytes adding a header to loop through and cast using header information and typeid...
//templates and polymorphism can help storing raw bytes, checking the type where a pointer points for example, and creating an interface to access the partial aligned memory.
Alternatively you can use a map to access the vectors instead of a deque...

Why would I prefer using vector to deque

they are both contiguous memory containers;
feature wise, deque has almost everything vector has but more, since it is more efficient to insert in the front.
Why whould anyone prefer std::vector to std::deque?
Elements in a deque are not contiguous in memory; vector elements are guaranteed to be. So if you need to interact with a plain C library that needs contiguous arrays, or if you care (a lot) about spatial locality, then you might prefer vector. In addition, since there is some extra bookkeeping, other ops are probably (slightly) more expensive than their equivalent vector operations. On the other hand, using many/large instances of vector may lead to unnecessary heap fragmentation (slowing down calls to new).
Also, as pointed out elsewhere on StackOverflow, there is more good discussion here: http://www.gotw.ca/gotw/054.htm .
To know the difference one should know how deque is generally implemented. Memory is allocated in blocks of equal sizes, and they are chained together (as an array or possibly a vector).
So to find the nth element, you find the appropriate block then access the element within it. This is constant time, because it is always exactly 2 lookups, but that is still more than the vector.
vector also works well with APIs that want a contiguous buffer because they are either C APIs or are more versatile in being able to take a pointer and a length. (Thus you can have a vector underneath or a regular array and call the API from your memory block).
Where deque has its biggest advantages are:
When growing or shrinking the collection from either end
When you are dealing with very large collection sizes.
When dealing with bools and you really want bools rather than a bitset.
The second of these is lesser known, but for very large collection sizes:
The cost of reallocation is large
The overhead of having to find a contiguous memory block is restrictive, so you can run out of memory faster.
When I was dealing with large collections in the past and moved from a contiguous model to a block model, we were able to store about 5 times as large a collection before we ran out of memory in a 32-bit system. This is partly because, when re-allocating, it actually needed to store the old block as well as the new one before it copied the elements over.
Having said all this, you can get into trouble with std::deque on systems that use "optimistic" memory allocation. Whilst its attempts to request a large buffer size for a reallocation of a vector will probably get rejected at some point with a bad_alloc, the optimistic nature of the allocator is likely to always grant the request for the smaller buffer requested by a deque and that is likely to cause the operating system to kill a process to try to acquire some memory. Whichever one it picks might not be too pleasant.
The workarounds in such a case are either setting system-level flags to override optimistic allocation (not always feasible) or managing the memory somewhat more manually, e.g. using your own allocator that checks for memory usage or similar. Obviously not ideal. (Which may answer your question as to prefer vector...)
I've implemented both vector and deque multiple times. deque is hugely more complicated from an implementation point of view. This complication translates to more code and more complex code. So you'll typically see a code size hit when you choose deque over vector. You may also experience a small speed hit if your code uses only the things the vector excels at (i.e. push_back).
If you need a double ended queue, deque is the clear winner. But if you're doing most of your inserts and erases at the back, vector is going to be the clear winner. When you're unsure, declare your container with a typedef (so it is easy to switch back and forth), and measure.
std::deque doesn't have guaranteed continuous memory - and it's often somewhat slower for indexed access. A deque is typically implemented as a "list of vector".
According to http://www.cplusplus.com/reference/stl/deque/, "unlike vectors, deques are not guaranteed to have all its elements in contiguous storage locations, eliminating thus the possibility of safe access through pointer arithmetics."
Deques are a bit more complicated, in part because they don't necessarily have a contiguous memory layout. If you need that feature, you should not use a deque.
(Previously, my answer brought up a lack of standardization (from the same source as above, "deques may be implemented by specific libraries in different ways"), but that actually applies to just about any standard library data type.)
A deque is a sequence container which allows random access to it's elements but it is not guaranteed to have contiguous storage.
I think that good idea to make perfomance test of each case. And make decision relying on this tests.
I'd prefer std::deque than std::vector in most cases.
You woudn't prefer vector to deque acording to these test results (with source).
Of course, you should test in your app/environment, but in summary:
push_back is basically the same for all
insert, erase in deque are much faster than list and marginally faster than vector
Some more musings, and a note to consider circular_buffer.
On the one hand, vector is quite frequently just plain faster than deque. If you don't actually need all of the features of deque, use a vector.
On the other hand, sometimes you do need features which vector does not give you, in which case you must use a deque. For example, I challenge anyone to attempt to rewrite this code, without using a deque, and without enormously altering the algorithm.
Note that vector memory is re-allocated as the array grows. If you have pointers to vector elements, they will become invalid.
Also, if you erase an element, iterators become invalid (but not "for(auto...)").
Edit: changed 'deque' to 'vector'

std::vector versus std::array in C++

What are the difference between a std::vector and an std::array in C++? When should one be preferred over another? What are the pros and cons of each? All my textbook does is list how they are the same.
std::vector is a template class that encapsulate a dynamic array1, stored in the heap, that grows and shrinks automatically if elements are added or removed. It provides all the hooks (begin(), end(), iterators, etc) that make it work fine with the rest of the STL. It also has several useful methods that let you perform operations that on a normal array would be cumbersome, like e.g. inserting elements in the middle of a vector (it handles all the work of moving the following elements behind the scenes).
Since it stores the elements in memory allocated on the heap, it has some overhead in respect to static arrays.
std::array is a template class that encapsulate a statically-sized array, stored inside the object itself, which means that, if you instantiate the class on the stack, the array itself will be on the stack. Its size has to be known at compile time (it's passed as a template parameter), and it cannot grow or shrink.
It's more limited than std::vector, but it's often more efficient, especially for small sizes, because in practice it's mostly a lightweight wrapper around a C-style array. However, it's more secure, since the implicit conversion to pointer is disabled, and it provides much of the STL-related functionality of std::vector and of the other containers, so you can use it easily with STL algorithms & co. Anyhow, for the very limitation of fixed size it's much less flexible than std::vector.
For an introduction to std::array, have a look at this article; for a quick introduction to std::vector and to the the operations that are possible on it, you may want to look at its documentation.
Actually, I think that in the standard they are described in terms of maximum complexity of the different operations (e.g. random access in constant time, iteration over all the elements in linear time, add and removal of elements at the end in constant amortized time, etc), but AFAIK there's no other method of fulfilling such requirements other than using a dynamic array. As stated by #Lucretiel, the standard actually requires that the elements are stored contiguously, so it is a dynamic array, stored where the associated allocator puts it.
To emphasize a point made by #MatteoItalia, the efficiency difference is where the data is stored. Heap memory (required with vector) requires a call to the system to allocate memory and this can be expensive if you are counting cycles. Stack memory (possible for array) is virtually "zero-overhead" in terms of time, because the memory is allocated by just adjusting the stack pointer and it is done just once on entry to a function. The stack also avoids memory fragmentation. To be sure, std::array won't always be on the stack; it depends on where you allocate it, but it will still involve one less memory allocation from the heap compared to vector. If you have a
small "array" (under 100 elements say) - (a typical stack is about 8MB, so don't allocate more than a few KB on the stack or less if your code is recursive)
the size will be fixed
the lifetime is in the function scope (or is a member value with the same lifetime as the parent class)
you are counting cycles,
definitely use a std::array over a vector. If any of those requirements is not true, then use a std::vector.
If you are considering using multidimensional arrays, then there is one additional difference between std::array and std::vector. A multidimensional std::array will have the elements packed in memory in all dimensions, just as a c style array is. A multidimensional std::vector will not be packed in all dimensions.
Given the following declarations:
int cConc[3][5];
std::array<std::array<int, 5>, 3> aConc;
int **ptrConc; // initialized to [3][5] via new and destructed via delete
std::vector<std::vector<int>> vConc; // initialized to [3][5]
A pointer to the first element in the c-style array (cConc) or the std::array (aConc) can be iterated through the entire array by adding 1 to each preceding element. They are tightly packed.
A pointer to the first element in the vector array (vConc) or the pointer array (ptrConc) can only be iterated through the first 5 (in this case) elements, and then there are 12 bytes (on my system) of overhead for the next vector.
This means that a std::vector> array initialized as a [3][1000] array will be much smaller in memory than one initialized as a [1000][3] array, and both will be larger in memory than a std:array allocated either way.
This also means that you can't simply pass a multidimensional vector (or pointer) array to, say, openGL without accounting for the memory overhead, but you can naively pass a multidimensional std::array to openGL and have it work out.
Summarizing the above discussion in a table for quick reference:
C-Style Array
Memory efficiency
More efficient
More Efficient
Less efficient (May double its size on new allocation.)
Iterate over elements or use std::copy()
Direct copy: a2 = a1;
Direct copy: v2 = v1;
Passing to function
Passed by pointer. (Size not available in function)
Passed by value
Passed by value (Size available in that function)
sizeof(a1) / sizeof(a1[0])
Use case
For quick access and when insertions/deletions not frequently needed.
Same as classic array but safer and easier to pass and copy.
When frequent additions or deletions might be needed
Using the std::vector<T> class:
...is just as fast as using built-in arrays, assuming you are doing only the things built-in arrays allow you to do (read and write to existing elements).
...automatically resizes when new elements are inserted.
...allows you to insert new elements at the beginning or in the middle of the vector, automatically "shifting" the rest of the elements "up"( does that make sense?). It allows you to remove elements anywhere in the std::vector, too, automatically shifting the rest of the elements down.
...allows you to perform a range-checked read with the at() method (you can always use the indexers [] if you don't want this check to be performed).
There are two three main caveats to using std::vector<T>:
You don't have reliable access to the underlying pointer, which may be an issue if you are dealing with third-party functions that demand the address of an array.
The std::vector<bool> class is silly. It's implemented as a condensed bitfield, not as an array. Avoid it if you want an array of bools!
During usage, std::vector<T>s are going to be a bit larger than a C++ array with the same number of elements. This is because they need to keep track of a small amount of other information, such as their current size, and because whenever std::vector<T>s resize, they reserve more space then they need. This is to prevent them from having to resize every time a new element is inserted. This behavior can be changed by providing a custom allocator, but I never felt the need to do that!
Edit: After reading Zud's reply to the question, I felt I should add this:
The std::array<T> class is not the same as a C++ array. std::array<T> is a very thin wrapper around C++ arrays, with the primary purpose of hiding the pointer from the user of the class (in C++, arrays are implicitly cast as pointers, often to dismaying effect). The std::array<T> class also stores its size (length), which can be very useful.
A vector is a container class while an array is an allocated memory.