vector reserve c++ - c++

I have a very large multidimensional vector that changes in size all the time.
Is there any point to use the vector.reserve() function when I only know a good approximation of the sizes.
So basically I have a vector
A[256*256][x][y]
where x goes from 0 to 50 for every iteration in the program and then back to 0 again. The y values can differ every time, which means that for each of the
[256*256][y] elements the vector y can be of a different size but still smaller than 256;
So to clarify my problem this is what I have:
vector<vector<vector<int>>> A;
for(int i =0;i<256*256;i++){
A.push_back(vector<vector<int>>());
A[i].push_back(vector<int>());
A[i][0].push_back(SOME_VALUE);
}
Add elements to the vector...
A.clear();
And after this I do the same thing again from the top.
When and how should I reserve space for the vectors.
If I have understood this correctly I would save a lot of time if I would use reserve as I change the sizes all the time?
What would be the negative/positive sides of reserving the maximum size my vector can have which would be [256*256][50][256] in some cases.
BTW. I am aware of different Matrix Templates and Boost, but have decided to go with vectors on this one...
EDIT:
I was also wondering how to use the reserve function in multidimensional arrays.
If I only reserve the vector in two dimensions will it then copy the whole thing if I exceed its capacity in the third dimension?

To help with discussion you can consider the following typedefs:
typedef std::vector<int> int_t; // internal vector
typedef std::vector<int_t> mid_t; // intermediate
typedef std::vector<mid_t> ext_t; // external
The cost of growing (vector capacity increase) int_t will only affect the contents of this particular vector and will not affect any other element. The cost of growing mid_t requires copying of all the stored elements in that vector, that is it will require all of the int_t vector, which is quite more costly. The cost of growing ext_t is huge: it will require copying all the elements already stored in the container.
Now, to increase performance, it would be much more important to get the correct ext_t size (it seems fixed 256*256 in your question). Then get the intermediate mid_t size correct so that expensive reallocations are rare.
The amount of memory you are talking about is huge, so you might want to consider less standard ways to solve your problem. The first thing that comes to mind is adding and extra level of indirection. If instead of holding the actual vectors you hold smart pointers into the vectors you can reduce the cost of growing the mid_t and ext_t vectors (if ext_t size is fixed, just use a vector of mid_t). Now, this will imply that code that uses your data structure will be more complex (or better add a wrapper that takes care of the indirections). Each int_t vector will be allocated once in memory and will never move in either mid_t or ext_t reallocations. The cost of reallocating mid_t is proportional to the number of allocated int_t vectors, not the actual number of inserted integers.
using std::tr1::shared_ptr; // or boost::shared_ptr
typedef std::vector<int> int_t;
typedef std::vector< shared_ptr<int_t> > mid_t;
typedef std::vector< shared_ptr<mid_t> > ext_t;
Another thing that you should take into account is that std::vector::clear() does not free the allocated internal space in the vector, only destroys the contained objects and sets the size to 0. That is, calling clear() will never release memory. The pattern for actually releasing the allocated memory in a vector is:
typedef std::vector<...> myvector_type;
myvector_type myvector;
...
myvector.swap( myvector_type() ); // swap with a default constructed vector

Whenever you push a vector into another vector, set the size in the pushed vectors constructor:
A.push_back(vector<vector<int> >( somesize ));

You have a working implementation but are concerned about the performance. If your profiling shows it to be a bottleneck, you can consider using a naked C-style array of integers rather than the vector of vectors of vectors.
See how-do-i-work-with-dynamic-multi-dimensional-arrays-in-c for an example
You can re-use the same allocation each time, reallocing as necessary and eventually keeping it at the high-tide mark of usage.
If indeed the vectors are the bottleneck, performance beyond avoiding the sizing operations on the vectors each loop iteration will likely become dominated by your access pattern into the array. Try to access the highest orders sequentially.

If you know the size of a vector at construction time, pass the size to the c'tor and assign using operator[] instead of push_back. If you're not totally sure about the final size, make a guess (maybe add a little bit more) and use reserve to have the vector reserve enough memory upfront.
What would be the negative/positive sides of reserving the maximum size my vector can have which would be [256*256][50][256] in some cases.
Negative side: potential waste of memory. Positive side: less CPU time, less heap fragmentation. It's a memory/cpu tradeoff, the optimum choice depends on your application. If you're not memory-bound (on most consumer machines there's more than enough RAM), consider reserving upfront.
To decide how much memory to reserve, look at the average memory consumption, not at the peak (reserving 256*256*50*256 is not a good idea unless such dimensions are needed regularly)

Related

How can I use the spare capacity in a std::vector?

What is the safe, portable, idiomatic way to use the spare capacity in a std::vector?
std::vector<Foo> foos;
foos.emplace_back(1);
foos.emplace_back(2);
foos.reserve(10);
At this point, foos owns at least 8 * sizeof(Foo) uninitialized spare memory starting at foos.data() + foos.size(). It seems terribly inefficient to let that memory go to waste. How can I use this spare capacity as scratch space or for other purposes before I fill it with Foo objects by appending to foos? What is the right way to do this without invoking any undefined behavior?
How can I use the spare capacity in a std::vector?
By inserting more elements.
More convoluted answer which may be what you're looking for, but probably more complex than it's worth:
You can allocate memory yourself (std::allocator or whatever; don't forget std::unique_ptr), and implement a custom allocator that uses that piece of memory. You can then use that memory for whatever and later create a vector using the custom allocator. Note that this doesn't let you use the memory that has already been reserved for the vecor; you can only use the memory before it has been acquired by the vector.
Yes, memory for unused elements is wasted, but it is unavoidable. It's by design.
The storage of the vector is handled automatically, being expanded and
contracted as needed. Vectors usually occupy more space than static
arrays, because more memory is allocated to handle future growth. This
way a vector does not need to reallocate each time an element is
inserted, but only when the additional memory is exhausted.
It's also good to know some non-standard vector's variants and may use them as replacement for std::vector, they
have specific application scenarios.
If you need a dynamic array and the maximum capacity is know at compile timeļ¼Œthen boost::static_vector would be a choice(Use it may avoid the unnecessary memory waste of std::vector since most of std::vector's implementation always reserve capacity with a pow of 2, so if your desired capacity is 10, then 6 elements is likely to
be waste):
static_vector is an hybrid between vector and array: like vector, it's
a sequence container with contiguous storage that can change in size,
along with the static allocation, low overhead, and fixed capacity of
array. static_vector is based on Adam Wulkiewicz and Andrew Hundt's
high-performance varray class.
The number of elements in a static_vector may vary dynamically up to a
fixed capacity because elements are stored within the object itself
similarly to an array. However, objects are initialized as they are
inserted into static_vector unlike C arrays or std::array which must
construct all elements on instantiation. The behavior of static_vector
enables the use of statically allocated elements in cases with complex
object lifetime requirements that would otherwise not be trivially
possible. Some other properties:
Random access to elements Constant time insertion and removal of
elements at the end Linear time insertion and removal of elements at
the beginning or in the middle. static_vector is well suited for use
in a buffer, the internal implementation of other classes, or use
cases where there is a fixed limit to the number of elements that must
be stored. Embedded and realtime applications where allocation either
may not be available or acceptable are a particular case where
static_vector can be beneficial.
And if the dynamic array mostly has a very small size(only some of them would be large but with a small probability), then we may consider using small_vector, a small vector will improve the performance for such a scenario and such container is widely used in large projects.
small_vector is a vector-like container optimized for the case when it
contains few elements. It contains some preallocated elements
in-place, which allows it to avoid the use of dynamic storage
allocation when the actual number of elements is below that
preallocated threshold. small_vector is inspired by LLVM's SmallVector
container. Unlike static_vector, small_vector's capacity can grow
beyond the initial preallocated capacity.

What is the memory/runtime efficiency of std::vector, and what is its memory allocation strategy?

I am reading "C++ Primer", and in the chapter about containers, the book suggests to always use std::vector whenever possible, eg when there is no demand to insert or delete in the middle or front.
I tested a bit with std::vector and noticed that every time it needs to reallocate, it always reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case, and why would it execute in such a way.
In addition, how is the memory/time efficiency of std::vector compared to built-in static and dynamic arrays? Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?
why would it execute in such a way
Because even though this wastes some memory, it makes insertions faster, by reducing the number of reallocations.
It keeps push_back complexity at amortized O(1), while increasing the size by 1 and reallocating each time would make it O(n).
reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case
The standard just says that push_back has to have amortized O(1) complexity, and compilers (more precisely, standard library implementations) are free to achieve that by any means.
This disallows increasing capacity by 1 each time, as that would make the complexity O(n). Apparently achieving amortized O(1) requires the size to be increased by N times, but N can have any value. As show in the comments, N can wary between insertions, and is normally between 1.5 and 2.
memory/time efficiency of std::vector compared to built-in static and dynamic arrays?
Access speed should be nearly the same. Insertion and removal speed can't be compared because arrays have fixed size. Vectors might use more memory because, as you noticed, they can allocate memory for more elements that they actually have.
Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?
There's no need to replace fixed-size arrays with vectors, as long as static arrays are small enough to fit into the stack.
std::vector should be used instead of manually allocating arrays with new to reduce the risk of memory leaks and other bugs.
I tested a bit with std::vector and noticed that every time it needs to reallocate, it always reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case, and why would it execute in such a way.
For context, a possible reallocation strategy for resizing filled resizable containers is to double their size as to not totally lose the O(1) insertion time complexity, because the reallocation time complexity is O(N).
The logarithmic nature of the number of reallocations makes this insertion operation not lose its tendencially O(1) time complexity, the use of multiplier 3 follows the same logic, it has its advantages, the reallocation is less frequent making it potentially less time costly, but it has the downside of being more likely to have more unused space.
Following this rule any multiplier is arguably valid, deppending on what is more important, time or space complexity, there is also the possibility of having a changing the value deppending on the size of the vector, which makes sense, smaller vectors can have larger multipliers but as they grow it's more rational to allocate space more frequently as to not have too much wasted memory.
The strategies can vary dramatically, from having fixed multipilers to having the change deppending on the container's current size, for instance, see this answer, and tests kindly shared by #JHBonarius.
In addition, how is the memory/time efficiency of std::vector compared to built-in static and dynamic arrays? Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?
It's arguable that you should always use std::vector if you have an array that you know to always have the same size, it's perfectly fine to use std::array, however, std::vector can behave similarly to std::array if you reserve space for it and do not store more elements in it than the reserved size, so I can see the logic in tendencially using std::vector, though I disagree on the always part, it's too definitive.
Using vectors is one of the easiest and most reliable ways of storing data. when you use array, you have to define size at compile time. Also arrays size fixed you can't change it when you need. Using dynamic arrays (pointer) is always risky. Memory allocation, deallocation etc. If you are using a pointer somewhere, your eye should always be on it.

How does STL containers keep track of the current size of container over the total size?

Given
vector<int> a;
If a.push_back() is done, how does the vector knows whether to increase the size by reallocating memory or there is space available (Because vector allocates some extra space when size is full to reduce overhead).
P.S. Does same technique applies for other types of containers like stack, queue etc.
I think that it does the same thing as "struct" in C.
The method capacity() returns the number of items that can be stored in the vector without a reallocation.
The method size() returns the number of items which are currently stored in the vector.
Prior to inserting another item, it stands to reason that if size() == capacity() then more capacity will need to be made available. This will involve a reallocation to make more capacity available.
Does same technique applies for other types of containers like stack, queue etc.
stack and queue are built on top of other std containers. These underlying containers (normally vector or deque) employ a similar technique.
I think that it does the same thing as "struct" in C.
No.
In general, a vector ( and incidently a List in C# ), will allocate a block of memory. As you add elements to it, it will mark more and more of that memory as consumed. Then, when the block is full, it will allocate a new larger block, copy the contents into the new larger block, and delete the old one. Again the new larger block has more free space, and then, again, it can be filled up. The idea is that vector always has a contiguous space so it can be used in applications where one would consider an array. Because it has contiguous space, machine instructions for accessing a single element are trivial and so random access is very fast. List in C# has similar semantics. The implementation dependent thing has a lot to do with how much bigger that new bigger block is. Sometimes they make it a percentage bigger. Sometimes they just double the size.

How to reserve memory for vector of vector

Assume that
vector<vector<shared_ptr<Base>>> vec
vec.reserve(100)
vec[0].reserve(20) // Error : vector subscript out of range
I am trying to reserve memory for both outer vector and inner vector.
I know that the vec is empty so I cannot reserve memory for the inner vector. I could only resize() or shrink_to_fit() afterward. However, using resize() or shrink_to_fit() is useless due to that is not what I wanted to do.
The intention of reserving memory for the inner vector is trying to allocate the memory well for faster searching of inner elements afterward. I am just wondering if I do not reserve the memory, the memory that is pre-allocated is expensive and chaos.
I would like to ask :
Are there any way to reserve memory for the inner vector
Does my concept of "concerning about bad allocation of memory will be caused without reserving memory for the vector" correct?
Sorry for my poor english and I am using VC++ 2010.
You can't reserve memory for both inner and outer vectors... the inner vectors don't get constructed if you've only reserved space in the outer vector. You can resize the outer vector then do a reserve for each element thereof, or you can do the reserving on the inner vectors as they're added.
If you're sure you need to do this at all, I would probably resize the outer vector, then reserve space in each inner vector.
If 100 elements is even close to accurate, the space for your outer vector is almost irrelevant anyway (typically going to be something like 1200 bytes on a 32-bit system or 2400 bytes on a 64-bit system).
That may be a little less convenient (may force you to track how many items are created vs. really in use) but if you want to reserve space in your inner vectors, you don't really have a lot of choices.
I'd start with how you're going to interface with the final container and what you know about its content in advance. Once you have settled on a convenient interface, you can implement the code behind it. For example, you could make sure that every new inner vector get created with a capacity of 100 elements. Or, you could use a map from an x/y pair to a shared pointer, which can make sense in a sparsely populated container. Or how about allocating the 100x100 elements statically and just not reallocating at all? The important point is that all these alternatives can be implemented without changing the interface to the final container, so this gives you the freedom to experiment with different approaches.
BTW: Check out make_shared, which avoid the allocation overhead of shared_ptr, I believe. Alternatively, Boost also has an intrusive_ptr which uses an internal reference counter. These shared_ptr instances are also only half the size of a shared_ptr. However, you need benchmarks to actually prove which way is fastest. Anything else is just more or less vague speculation and guesswork.

Changing the reserve memory of C++ vector

I have a vector with 1000 "nodes"
if(count + 1 > m_listItems.capacity())
m_listItems.reserve(count + 100);
The problem is I also clear it out when I'm about to refill it.
m_listItems.clear();
The capacity doesn't change.
I've used the resize(1); but that doesn't seem to alter the capacity.
So how does one change the reserve?
vector<Item>(m_listItems).swap(m_listItems);
will shrink m_listItems again: http://www.gotw.ca/gotw/054.htm (Herb Sutter)
If you want to clear it anyway, swap with an empty vector:
vector<Item>().swap(m_listItems);
which of course is way more efficient. (Note that swapping vectors basicially means just swapping two pointers. Nothing really time consuming going on)
You can swap the vector as others have suggested, and as described in http://www.gotw.ca/gotw/054.htm but be aware that it is not free, you're performing a copy of every element, because the vector has to allocate a new, smaller, chunk of memory, and copy all the old contents over. (The swap operation is essentially free, but you're swapping with a temporary initialized with a copy of the original vector's data, which is not free)
If you know in advance how big the vector is, you should allocate the right size to begin with, so no resizing is necessary:
std::vector<foo> v(1000); // Create a vector with capacity for 1000 elements
And if you don't know the capacity in advance, why does it matter whether it wastes a bit of space? Is it worth the time spent copying every element to a new and smaller vector (which is what std::vector(v).swap(v) will do), just to save a few kilobytes of memory?
Similarly, when you clear the vector, if you intend to refill it anyway, setting its capacity to zero seems to be an impressive waste of time.
Edit:
baash05: what if you had 1000000 items
an 10 megs of ram. would you say
reducing the amount of overhead is
important?
No. Resizing the vector requires more memory, temporarily, so if you're memory-limited, that might break your app. (You have to have the original vector in memory, and the temporary, before you can swap them, so you end up using up to twice as much RAM at that point). Afterwards, you might save a small amount of memory (up to a couple of MB), but this doesn't matter, because the excess capacity in the vector would never be accessed, so it would get pushed to the pagefile, and so not count towards your RAM limit in the first place.
If you have 1000000 items, then you should initialize the vector to the correct size in the first place.
And if you can't do that, then you'll typically be better off leaving the capacity alone. Especially since you stated that you're going to refill the vector, you should definitely reuse the capacity that has already been allocated, rather than allocating, reallocating, copying and freeing everything constantly.
You have two possible cases. Either you know how many elements you need to store, or you don't. If you know, then you can create the vector with the correct size in the first place, and so you never need to resize it, or you don't know, and then you might as well keep the excess capacity, so at least it won't have to resize upwards when you refill your vector.
You could try this technique from here
std::vector< int > v;
// ... fill v with stuff...
std::vector< int >().swap( v );
You can swap it with a new vector that has desired capacity.
vector< int > tmp;
old.swap( tmp );
As far as I can tell, you can't reallocate a vector to a lower capacity than it ever has; you can only allocate it larger. There are good reasons for this; among them is that the reallocation process is hugely computationally intensive. If you really need to have a smaller vector, free the old one and create a new one that's smaller. That's actually computationally much simpler than having the vector actually resize smaller.