Control over std::vector reallocation - c++

By reading the std::vector reference I understood that
calling insert when the the maximum capacity is reached will cause the reallocation of the std::vector (causing iterator invalidation) because new memory is allocated for it with a bigger capacity. The goal is to keep the guarantee about contiguous data.
As long as I stick below the maximum capacity insert will not cause that (and iterators will be intact).
My question is the following:
When reserve is called automatically by insert, is there any way to control how much new memory must be reserved?
Suppose that I have a vector with an initial capacity of 100 and, when the maximum capacity is hit, I want to allocate an extra 20 bytes.
Is it possible to do that?

You can always track it yourself and call reserve before it would allocate, e.g.
static const int N = 20 // Amount to grow by
if (vec.capacity() == vec.size()) {
vec.reserve(vec.size() + N);
}
vec.insert(...);
You can wrap this in a function of your own and call that function instead of calling insert() directly.

Related

Why Vector's size() and capacity() is different after push_back()

I just starting to learning vectors and little confused about size() and capacity()
I know little about both of them. But why in this program both are different? even array(10) is making room for 10 elements and initializing with 0.
Before adding array.push_back(5)
So array.size(); is 10 that is ok.
So array.capacity(); is 10 that is ok.
After adding array.push_back(5)
So array.size(); is 11 that is ok (already 10 time 0 is added and then push_back add one more element 5 ).
So array.capacity(); is 15 Why? ( is it reserving 5 blocks for one int? ).
#include <iostream>
#include <vector>
int main(){
std::vector<int> array(10); // make room for 10 elements and initialize with 0
array.reserve(10); // make room for 10 elements
array.push_back(5);
std::cout << array.size() << std::endl;
std::cout << array.capacity() << std::endl;
return 0;
}
The Standard mandates that std::vector<T>::push_back() has amortized O(1) complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.
Simple example: sequentially push_back 32 ints into a std::vector<int>. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.
Doing the formal analysis shows that you get O(N) stores and copies for N elements. This means amortized O(1) complexity per push_back (often only a store without a copy, sometimes a store and a sequence of copies).
Because of this expansion strategy, you will have size() < capacity() most of the time. Lookup shrink_to_fit and reserve to learn how to control a vector's capacity in a more fine-grained manner.
Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).
It is for efficiency so that it does not have to expand the underlying data structure each time you add an element. i.e. not having to call delete/new each time.
The std::vector::capacity is not its actual size (which is returned by size()), but the size of the actual internal allocated size.
In other terms, it is the size that it can reach before another re-allocation is needed.
It doesn't increase by 1 each time you do a push_back in order to not call a new reallocation (which is a heavy call) on each inserted element. It reserves more, because it doesn't know if you won't do some other push_back just after, and in this case, it won't have to change allocated memory size for the 4 next elements.
Here, 4 next elements is a compromise between 1, that would optimize memory allocation at maximum but would risk another reallocation soon, and a huge number, that would allow you to make many push_back quickly but maybe reserve a lot of memory for nothing.
Note: if you want to specify a capacity yourself (if you know your vector maximum size for instance), you can do it with reserve member function.
Using
std::vector<int> array(10); // make room for 10 elements and initialize with 0
You actually filled all the ten spaces with zeros. Adding ad additional element will cause the capacity to be expanded thanks for efficiency.
In your case it is useless to call the function reserve because you have instantiated the same number of elements.
check this and this link
I think the following question can give you more detail about the capacity of a vector.
About Vectors growth
I will reference the answer in above question.
The growth strategy of capacity is required to meet the amortized constant time requirement for the push_back operation. Then, the strategy is designed to have a exponential growth generally when the space is lack. In short, the size of vector indicate the number of elements now, while the captacity show its ability used to push_back in future.
Size() returns how many values you have in the vector.
And capacity() returns size of allocated storage capacity means how many values it can hold now.

Can std::vector capacity/size/reserve be used to manually manage vector memory allocation?

I'm running very time sensitive code and would need a scheme to reserve more space for my vectors at a specific place in the code, where I can know (approximately) how many elements will be added, instead of having std do it for me when the vector is full.
I haven't found a way to test this to make sure there are no corner cases of std that I do not know of, therefore I'm wondering how the capacity of a vector affects the reallocation of memory. More specifically, would the code below make sure that automatic reallocation never occurs?
code
std::vector<unsigned int> data;
while (condition) {
// Reallocate here
// get_elements_required() gives an estimate that is guaranteed to be >= the actual nmber of elements.
unsigned int estimated_elements_required = get_elements_required(...);
if ((data.capacity() - data.size()) <= estimated_elements_required) {
data.reserve(min(data.capacity() * 2, data.max_length - 1));
}
...
// NEVER reallocate here, I would rather see the program crash actually...
for (unsigned int i = 0; i < get_elements_to_add(data); ++i) {
data.push_back(elements[i]);
}
}
estimated_elements_required in the code above is an estimate that is guaranteed to be equal to, or greater than, the actual number of elements that will be added. The code actually adding elements performs operations based on the capacity of the vector itself, changing the capacity halfway through will generate incorrect results.
Yes, this will work.
From the definition of reserve:
It is guaranteed that no reallocation takes place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the value of capacity().

Assigning vector size vs reserving vector size

bigvalue_t result;
result.assign(left.size() + right.size(), 0);
int carry = 0;
for(size_t i = 0; i < left.size(); i++) {
carry = 0;
for(size_t j = 0; j < right.size(); j++) {
int sum = result[i+j] + (left[i]*right[j]) + carry;
result[i+j] = sum%10;
carry = sum/10;
}
result[i+right.size()] = carry;
}
return result;
Here I used assign to allocate size of result, and result passed back normally.
When I use result.reserve(left.size()+right.size());, the function runs normally inside the both for loops. Somehow when I use print out the result.size(), it is always 0. Does reserve not allocate any space?
It is specified as
void reserve(size_type n);
Effects: A directive that informs a
vector of a planned change in size, so that it can manage the storage
allocation accordingly. After reserve(), capacity() is greater or
equal to the argument of reserve if reallocation happens; and equal to
the previous value of capacity() otherwise. Reallocation happens at
this point if and only if the current capacity is less than the
argument of reserve(). If an exception
is thrown other than by the move constructor of a non-CopyInsertable type, there are no effects.
Complexity: It does not change the size of the sequence and takes at
most linear time in the size of the sequence.
So, yes, it allocates memory, but it doesn't create any objects within the container. To actually create as much elements in the vector as you want to have later, and being able to access them via op[] you need to call resize().
reserve() is for when you want to prevent things like the vector reallocation every now and then when doing lots of push_back()s.
reserve allocates space, but doesn't really create anything. It is used in order to avoid reallocations.
For, example, if you intend to store 10000 elements, by push_back into a vector, you probably will make the vector to use re-allocations. If you use reserve before actually storing your elements, then the vector is prepared to accept about 10000 elements, thus he is prepared and the fill of the vector shall happen faster, than if you didn't use reserve.
resize, actually creates space. Note also, that resize will initialize your elements to their default values (so for an int, it will set every element to 0).
PS - In fact, when you say reserve(1000), then the vector will actually -may- allocate space for more than 1000 elements. If this happens and you store exactly 1000 elements, then the unused space remains unused (it is not de-allocated).
It is the difference between semantically increasing the size of the vector (resize/assign/push_back/etc), and physically creating more underlying memory for it to expand into (reserve).
That you see your code appear to work even with reserve is just because you're not triggering any OS memory errors (because the memory belongs to your vector), but just because you don't see any error messages or crashes doesn't mean your code is safe or correct: as far as the vector is concerned, you are writing into memory that belongs to it and not you.
If you'd used .at() instead of [] you'd have got an exception; as it is, you are simply invoking undefined behaviour.

STL's vector resizing

I can't find this piece of information. I'm dealing with an odd situation here where i'm inside of a loop and i can get a random information at any given time. This information has to be stored in a vector. Now each frame i have to set this vector to ensure that i won't exeed the space (i'm writing values into random points in the vector using indexing).
Now assuming there's no way to change this piece of code, i want to know, does the vector "ignore" the resize() function if i send an argument that's exactly the size of the vector? Where can i find this information?
From MSDN reference1
If the container's size is less than the requested size, _Newsize, elements are added to the vector until it reaches the requested size. If the container's size is larger than the requested size, the elements closest to the end of the container are deleted until the container reaches the size _Newsize. If the present size of the container is the same as the requested size, no action is taken
The ISO C++ standard (page 485 2) specifies this behaviour for vector::resize
void resize ( size_type sz , T c = T ());
if ( sz > size ())
insert ( end () , sz - size () , c );
else if ( sz < size ())
erase ( begin ()+ sz , end ());
else
; // Does nothing
So yes, the vector ignores it and you don't need to perform a check on your own.
Kinda-sorta.
Simply resizing a vector with resize() can only result in more memory being used by the vector itself (will change how much is used by its elements). If there's not enough room in the reserved space, it will reallocate (and sometimes they like to pad themselves a bit so even if there is you might grow). If there is already plenty of room for the requested size and whatever padding it wants to do, it will not regrow.
When the specification says that the elements past the end of the size will be deleted, it means in place. Basically it will call _M_buff[i].~T() for each element it is deleting. Thus any memory your object allocates will be deleted, assuming a working destructor, but the space that the object itself occupies (it's size) will not be. The vector will grow, and grow, and grow to the maximum size you ever tell it to and will not reshrink while it exists.

basic question on std::vector in C++

C++ textbooks, and threads, like these say that vector elements are physically contiguous in memory.
But when we do operations like v.push_back(3.14) I would assume the STL is using the new operator to get more memory to store the new element 3.14 just introduced into the vector.
Now say the vector of size 4 is stored in computer memory cells labelled 0x7, 0x8, 0x9, 0xA. If cell 0xB contains some other unrelated data, how will 3.14 go into this cell? Does that mean cell 0xB will be copied somewhere else, erased to make room for 3.14?
The short answer is that the entire array holding the vector's data is moved around to a location where it has space to grow. The vector class reserves a larger array than is technically required to hold the number of elements in the vector. For example:
vector< int > vec;
for( int i = 0; i < 100; i++ )
vec.push_back( i );
cout << vec.size(); // prints "100"
cout << vec.capacity(); // prints some value greater than or equal to 100
The capacity() method returns the size of the array that the vector has reserved, while the size() method returns the number of elements in the array which are actually in use. capacity() will always return a number larger than or equal to size(). You can change the size of the backing array by using the reserve() method:
vec.reserve( 400 );
cout << vec.capacity(); // returns "400"
Note that size(), capacity(), reserve(), and all related methods refer to individual instances of the type that the vector is holding. For example, if vec's type parameter T is a struct that takes 10 bytes of memory, then vec.capacity() returning 400 means that the vector actually has 4000 bytes of memory reserved (400 x 10 = 4000).
So what happens if more elements are added to the vector than it has capacity for? In that case, the vector allocates a new backing array (generally twice the size of the old array), copies the old array to the new array, and then frees the old array. In pseudo-code:
if(capacity() < size() + items_added)
{
size_t sz = capacity();
while(sz < size() + items_added)
sz*=2;
T* new_data = new T[sz];
for( int i = 0; i < size(); i++ )
new_data[ i ] = old_data[ i ];
delete[] old_data;
old_data = new_data;
}
So the entire data store is moved to a new location in memory that has enough space to store the current data plus a number of new elements. Some vectors may also dynamically decrease the size of their backing array if they have far more space allocated than is actually required.
std::vector first allocates a bigger buffer, then copies existing elements from the "old" buffer to the "new" buffer, then it deletes the "old buffer", and finally adds the new element into the "new" buffer.
Generally, std::vector implementation grow their internal buffer by doubling the capacity each time it's necessary to allocate a bigger buffer.
As Chris mentioned, every time the buffer grows, all existing iterators are invalidated.
When std::vector allocates memory for the values, it allocates more than it needs; you can find out how much by calling capacity. When that capacity is used up, it allocates a bigger chunk, again larger than it needs, and copies everything from the old memory to the new; then it releases the old memory.
If there is not enough space to add the new element, more space will be allocated (as you correctly indicated), and the old data will be copied to the new location. So cell 0xB will still contain the old value (as it might have pointers to it in other places, it is impossible to move it without causing havoc), but the whole vector in question will be moved to the new location.
A vector is an array of memory. Typical implementation is that it grabs more memory than is required. It that footprint needs to expand over any other memory - the whole lot is copied. The old stuff is freed. The vector memory is on the stack - and that should be noted. It is also a good idea to say the maximum size is required.
In C++, which comes from C, memory is not 'managed' the way you describe - Cell 0x0B's contents will not be moved around. If you did that, any existing pointers would be made invalid! (The only way this could be possible is if the language had no pointers and used only references for similar functionality.)
std::vector allocates a new, larger buffer and stores the value of 3.14 to the "end" of the buffer.
Usually, though, for optimized this->push_back()s, a std::vector allocates memory about twice its this->size(). This ensures that a reasonable amount of memory is exchanged for performance. So, it is not guaranteed 3.14 will cause a this->resize(), and may simply be put into this->buffer[this->size()++] if and only if this->size() < this->capacity().