Why Vector's size() and capacity() is different after push_back() - c++

I just starting to learning vectors and little confused about size() and capacity()
I know little about both of them. But why in this program both are different? even array(10) is making room for 10 elements and initializing with 0.
Before adding array.push_back(5)
So array.size(); is 10 that is ok.
So array.capacity(); is 10 that is ok.
After adding array.push_back(5)
So array.size(); is 11 that is ok (already 10 time 0 is added and then push_back add one more element 5 ).
So array.capacity(); is 15 Why? ( is it reserving 5 blocks for one int? ).
#include <iostream>
#include <vector>
int main(){
std::vector<int> array(10); // make room for 10 elements and initialize with 0
array.reserve(10); // make room for 10 elements
array.push_back(5);
std::cout << array.size() << std::endl;
std::cout << array.capacity() << std::endl;
return 0;
}

The Standard mandates that std::vector<T>::push_back() has amortized O(1) complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.
Simple example: sequentially push_back 32 ints into a std::vector<int>. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.
Doing the formal analysis shows that you get O(N) stores and copies for N elements. This means amortized O(1) complexity per push_back (often only a store without a copy, sometimes a store and a sequence of copies).
Because of this expansion strategy, you will have size() < capacity() most of the time. Lookup shrink_to_fit and reserve to learn how to control a vector's capacity in a more fine-grained manner.
Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).

It is for efficiency so that it does not have to expand the underlying data structure each time you add an element. i.e. not having to call delete/new each time.

The std::vector::capacity is not its actual size (which is returned by size()), but the size of the actual internal allocated size.
In other terms, it is the size that it can reach before another re-allocation is needed.
It doesn't increase by 1 each time you do a push_back in order to not call a new reallocation (which is a heavy call) on each inserted element. It reserves more, because it doesn't know if you won't do some other push_back just after, and in this case, it won't have to change allocated memory size for the 4 next elements.
Here, 4 next elements is a compromise between 1, that would optimize memory allocation at maximum but would risk another reallocation soon, and a huge number, that would allow you to make many push_back quickly but maybe reserve a lot of memory for nothing.
Note: if you want to specify a capacity yourself (if you know your vector maximum size for instance), you can do it with reserve member function.

Using
std::vector<int> array(10); // make room for 10 elements and initialize with 0
You actually filled all the ten spaces with zeros. Adding ad additional element will cause the capacity to be expanded thanks for efficiency.
In your case it is useless to call the function reserve because you have instantiated the same number of elements.
check this and this link

I think the following question can give you more detail about the capacity of a vector.
About Vectors growth
I will reference the answer in above question.
The growth strategy of capacity is required to meet the amortized constant time requirement for the push_back operation. Then, the strategy is designed to have a exponential growth generally when the space is lack. In short, the size of vector indicate the number of elements now, while the captacity show its ability used to push_back in future.

Size() returns how many values you have in the vector.
And capacity() returns size of allocated storage capacity means how many values it can hold now.

Related

Avoid the reallocation of a vector when its dimension has to be incremented

I have a
vector< pair<vector<double> , int>> samples;
This vector will contain a number of elements. For efficiency rason I initialize it in this way:
vector< pair<vector<double> , int>> samples(1000000);
I know the size in advance (not a compile-time) that I get from another container. The problem is that I have to decrease of 1 element the dimension of vector. Indeed, this case isn't a problem because resize with smaller dimension than the initial no do reallocation.I can do
samples.resize(999999);
The problem is that in some cases rather than decrease the dimension of 1 element I have to increment the dimension of an element. If I do
samples.resize(1000001);
there is the risk of do reallocation that I want avoid for efficiency rasons.
I ask if is a possible solution to my problem do like this:
vector< pair<vector<double> , int> samples;
samples.reserve(1000001);
samples.resize(1000000);
.
. Elaboration that fill samples
.
samples.resize(1000001); //here I don't want reallocation
or if there are better solutions?
Thanks in advance!
(I'm using C++11 compiler)
Just wrote a sample program to demonstrate, that resize is not going to reallocate space if capacity of the vector is sufficient:
#include <iostream>
#include <vector>
#include <utility>
#include <cassert>
using namespace std;
int main()
{
vector<pair<vector<double>, int>> samples;
samples.reserve(10001);
auto data = samples.data();
assert(10001==samples.capacity());
samples.resize(10000);
assert(10001 == samples.capacity());
assert(data == samples.data());
samples.resize(10001); //here I don't want reallocation
assert(10001==samples.capacity());
assert(data == samples.data());
}
This demo is based on assumption that std::vector guarantees contiguous memory and if data pointer does not change, than no realloc took place. This is also evident, by capacity() result to remain 10001 after every call to resize().
cppreference on vectors:
The storage of the vector is handled automatically, being expanded and contracted as needed. Vectors usually occupy more space than static arrays, because more memory is allocated to handle future growth. This way a vector does not need to reallocate each time an element is inserted, but only when the additional memory is exhausted. The total amount of allocated memory can be queried using capacity() function.
cppreference on reserve:
Correctly using reserve() can prevent unnecessary reallocations, but inappropriate uses of reserve() (for instance, calling it before every push_back() call) may actually increase the number of reallocations (by causing the capacity to grow linearly rather than exponentially) and result in increased computational complexity and decreased performance.
cppreference also sates to resize:
Complexity
Linear in the difference between the current size and count. Additional complexity possible due to reallocation if capacity is less than count
I ask if is a possible solution to my problem do like this:
samples.reserve(1000001);
samples.resize(1000000);
Yes, this is the solution.
or if there are better solutions?
Not that I know of.
As I recall when resize less than capacity, There will not be reallocation.
So, your code will work without reallocation.
cppreference.com
Vector capacity is never reduced when resizing to smaller size because that would invalidate all iterators, rather than only the ones that would be invalidated by the equivalent sequence of pop_back() calls.

Why vector has different capacity and other than the size? [duplicate]

This question already has answers here:
size vs capacity of a vector?
(8 answers)
Closed 6 years ago.
Below are program of vector and gives different result for capacity in c++11 mode.
#include<iostream>
#include<vector>
using namespace std;
int main(){
vector<int>a ={1,2,3};
cout<<"vector a size :"<<a.size()<<endl;
cout<<"vector a capacity :"<<a.capacity()<<endl<<endl;;
vector<int>b ;
b.push_back(1);
b.push_back(2);
b.push_back(3);
cout<<"vector b size :"<<b.size()<<endl;
cout<<"vector b capacity :"<<b.capacity()<<endl;
return 0;
}
OUTPUT
vector a size :3
vector a capacity:3
vector b size :3
vector b capacity :4
Why this program gives different values for capacity of a and b while both have same number of values and how size is different from capacity?
The reason is related to the very essence of the extension algorithm of the vector.
When initializing a vector, the number of extra capacity applied is 0.
In the i-th time an extension is needed, the vector copies its contain to a new vector, with capacity doubled then its current size.
This method makes the whole idea of size-changing array very efficient, since in amortized time (meaning the average time over N operations), we get O(1) insertion complexity.
You can see that after we add one more integer to the first vector, we get a capacity of 6. http://coliru.stacked-crooked.com/a/f084820652f025b8
By allocating more elements than needed, the vector does not need to reallocate memory when new elements are added to the vector. Also, when reducing the size, reallocation is not needed at all.
Reallocation of memory is a relatively expensive operation (creating new block, copying elements across, removing old block).
The trade-off is that the vector may have allocated more memory than it will need (e.g. if it allocates memory for elements that never get added/used). Practically, unless available memory is scarce, the cost of allocating a larger block (and reallocating less often) is less than the cost or reallocating every time.

Can std::vector capacity/size/reserve be used to manually manage vector memory allocation?

I'm running very time sensitive code and would need a scheme to reserve more space for my vectors at a specific place in the code, where I can know (approximately) how many elements will be added, instead of having std do it for me when the vector is full.
I haven't found a way to test this to make sure there are no corner cases of std that I do not know of, therefore I'm wondering how the capacity of a vector affects the reallocation of memory. More specifically, would the code below make sure that automatic reallocation never occurs?
code
std::vector<unsigned int> data;
while (condition) {
// Reallocate here
// get_elements_required() gives an estimate that is guaranteed to be >= the actual nmber of elements.
unsigned int estimated_elements_required = get_elements_required(...);
if ((data.capacity() - data.size()) <= estimated_elements_required) {
data.reserve(min(data.capacity() * 2, data.max_length - 1));
}
...
// NEVER reallocate here, I would rather see the program crash actually...
for (unsigned int i = 0; i < get_elements_to_add(data); ++i) {
data.push_back(elements[i]);
}
}
estimated_elements_required in the code above is an estimate that is guaranteed to be equal to, or greater than, the actual number of elements that will be added. The code actually adding elements performs operations based on the capacity of the vector itself, changing the capacity halfway through will generate incorrect results.
Yes, this will work.
From the definition of reserve:
It is guaranteed that no reallocation takes place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the value of capacity().

Control over std::vector reallocation

By reading the std::vector reference I understood that
calling insert when the the maximum capacity is reached will cause the reallocation of the std::vector (causing iterator invalidation) because new memory is allocated for it with a bigger capacity. The goal is to keep the guarantee about contiguous data.
As long as I stick below the maximum capacity insert will not cause that (and iterators will be intact).
My question is the following:
When reserve is called automatically by insert, is there any way to control how much new memory must be reserved?
Suppose that I have a vector with an initial capacity of 100 and, when the maximum capacity is hit, I want to allocate an extra 20 bytes.
Is it possible to do that?
You can always track it yourself and call reserve before it would allocate, e.g.
static const int N = 20 // Amount to grow by
if (vec.capacity() == vec.size()) {
vec.reserve(vec.size() + N);
}
vec.insert(...);
You can wrap this in a function of your own and call that function instead of calling insert() directly.

size vs capacity of a vector?

I am a bit confused about this both of these look same to me.
Although it may happen that capacity and size may differ on different compilers. how it may differ.
Its also said that if we are out of memory the capacity changes.
All these things are bit unclear to me.
Can somebody give an explanation.(if possible with and example or if I can do any test on any program to understand it)
Size is not allowed to differ between multiple compilers. The size of a vector is the number of elements that it contains, which is directly controlled by how many elements you put into the vector.
Capacity is the amount of total space that the vector has. Under the hood, a vector just uses an array. The capacity of the vector is the size of that array. This is always equal to or larger than the size. The difference between them is the number of elements that you can add to the vector before the array under the hood needs to be reallocated.
You should almost never care about the capacity. It exists to let people with very specific performance and memory constraints do exactly what they want.
Size: the number of items currently in the vector
Capacity: how many items can be fit in the vector before it is "full". Once full, adding new items will result in a new, larger block of memory being allocated and the existing items being copied to it
Let's say you have a bucket. At most, this bucket can hold 5 gallons of water, so its capacity is 5 gallons. It may have any amount of water between 0 and 5, inclusive. The amount of water currently in the bucket is, in vector terms, its size. So if this bucket is half filled, it has a size of 2.5 gallons.
If you try to add more water to a bucket and it would overflow, you need to find a bigger bucket. So you get a bucket with a larger capacity and dump the old bucket's contents into the new one, then add the new water.
Capacity: Maximum amount of stuff the Vector/bucket can hold.
Size: Amount of stuff currently in the Vector/bucket.
Size is number of elements present in a vector
Capacity is the amount of space that the vector is currently using.
Let's understand it with a very simple example:
using namespace std;
int main(){
vector<int > vec;
vec.push_back(1);
vec.push_back(1);
vec.push_back(1);
cout<<"size of vector"<<vec.size()<<endl;
cout<<"capacity of vector"<<vec.capacity()<<endl;
return 0;
}
currently size is 3 and
capacity is 4.
Now if we push back one more element,
using namespace std;
int main(){
vector<int> vec;
vec.push_back(1);
vec.push_back(1);
vec.push_back(1);
vec.push_back(1);
cout<<"size of vector"<<vec.size()<<endl;
cout<<"capacity of vector"<<vec.capacity()<<endl;
return 0;
}
now
size is: 4
capacity is 4
now if we try to insert one more element in vector then size will become 5 but capacity will become 8.
it happens based on the datatype of vector, as here in this case vector in of type int, as we know size of int is 4 bytes so compiler will allocate 4 block of memory ..and when we try to add 5th element , vector::capacity() is doubled what we have currently.
same keep on..for example : if we try to insert 9th element then size of vector will be 9 and capacity will b 16..
size() tells you how many elements you currently have. capacity() tells you how large the size can get before the vector needs to reallocate memory for itself.
Capacity is always greater than or equal to size. You cannot index beyond element # size()-1.
The size is the number of elements in the vector. The capacity is the maximum number of elements the vector can currently hold.
The vector size is the total number of elements of a vector and it is always the same for all compilers. Vectors are re-sizeable.
The capacity is the maximum number of elements the vector can currently hold. It may differ for different compilers.
Capacity changes if it needs to, or you can set an initial capacity and it will not resize until that capacity is reached. It is automatically expanded.
Capacity > = Size
One is more of an important interface and the other is more of an important implementation detail. You will mostly deal with size and not capacity. In other words:
Size is the number of items in the vector. If you want to iterate through the vector, you need to know its size.
Capacity is how many items can be fit in the vector before more memory must be allocated to it. Once the capacity limit is reached, more memory is allocated to the vector.
An analogy to size is the number of balls in a box whereas the capacity is the box size. When programming, you normally want to know how many balls are in the box. The vector implementation should handle the capacity for you (making a bigger box once it is full).