size vs capacity of a vector? - c++

I am a bit confused about this both of these look same to me.
Although it may happen that capacity and size may differ on different compilers. how it may differ.
Its also said that if we are out of memory the capacity changes.
All these things are bit unclear to me.
Can somebody give an explanation.(if possible with and example or if I can do any test on any program to understand it)

Size is not allowed to differ between multiple compilers. The size of a vector is the number of elements that it contains, which is directly controlled by how many elements you put into the vector.
Capacity is the amount of total space that the vector has. Under the hood, a vector just uses an array. The capacity of the vector is the size of that array. This is always equal to or larger than the size. The difference between them is the number of elements that you can add to the vector before the array under the hood needs to be reallocated.
You should almost never care about the capacity. It exists to let people with very specific performance and memory constraints do exactly what they want.

Size: the number of items currently in the vector
Capacity: how many items can be fit in the vector before it is "full". Once full, adding new items will result in a new, larger block of memory being allocated and the existing items being copied to it

Let's say you have a bucket. At most, this bucket can hold 5 gallons of water, so its capacity is 5 gallons. It may have any amount of water between 0 and 5, inclusive. The amount of water currently in the bucket is, in vector terms, its size. So if this bucket is half filled, it has a size of 2.5 gallons.
If you try to add more water to a bucket and it would overflow, you need to find a bigger bucket. So you get a bucket with a larger capacity and dump the old bucket's contents into the new one, then add the new water.
Capacity: Maximum amount of stuff the Vector/bucket can hold.
Size: Amount of stuff currently in the Vector/bucket.

Size is number of elements present in a vector
Capacity is the amount of space that the vector is currently using.
Let's understand it with a very simple example:
using namespace std;
int main(){
vector<int > vec;
vec.push_back(1);
vec.push_back(1);
vec.push_back(1);
cout<<"size of vector"<<vec.size()<<endl;
cout<<"capacity of vector"<<vec.capacity()<<endl;
return 0;
}
currently size is 3 and
capacity is 4.
Now if we push back one more element,
using namespace std;
int main(){
vector<int> vec;
vec.push_back(1);
vec.push_back(1);
vec.push_back(1);
vec.push_back(1);
cout<<"size of vector"<<vec.size()<<endl;
cout<<"capacity of vector"<<vec.capacity()<<endl;
return 0;
}
now
size is: 4
capacity is 4
now if we try to insert one more element in vector then size will become 5 but capacity will become 8.
it happens based on the datatype of vector, as here in this case vector in of type int, as we know size of int is 4 bytes so compiler will allocate 4 block of memory ..and when we try to add 5th element , vector::capacity() is doubled what we have currently.
same keep on..for example : if we try to insert 9th element then size of vector will be 9 and capacity will b 16..

size() tells you how many elements you currently have. capacity() tells you how large the size can get before the vector needs to reallocate memory for itself.
Capacity is always greater than or equal to size. You cannot index beyond element # size()-1.

The size is the number of elements in the vector. The capacity is the maximum number of elements the vector can currently hold.

The vector size is the total number of elements of a vector and it is always the same for all compilers. Vectors are re-sizeable.
The capacity is the maximum number of elements the vector can currently hold. It may differ for different compilers.
Capacity changes if it needs to, or you can set an initial capacity and it will not resize until that capacity is reached. It is automatically expanded.
Capacity > = Size

One is more of an important interface and the other is more of an important implementation detail. You will mostly deal with size and not capacity. In other words:
Size is the number of items in the vector. If you want to iterate through the vector, you need to know its size.
Capacity is how many items can be fit in the vector before more memory must be allocated to it. Once the capacity limit is reached, more memory is allocated to the vector.
An analogy to size is the number of balls in a box whereas the capacity is the box size. When programming, you normally want to know how many balls are in the box. The vector implementation should handle the capacity for you (making a bigger box once it is full).

Related

Why vector has different capacity and other than the size? [duplicate]

This question already has answers here:
size vs capacity of a vector?
(8 answers)
Closed 6 years ago.
Below are program of vector and gives different result for capacity in c++11 mode.
#include<iostream>
#include<vector>
using namespace std;
int main(){
vector<int>a ={1,2,3};
cout<<"vector a size :"<<a.size()<<endl;
cout<<"vector a capacity :"<<a.capacity()<<endl<<endl;;
vector<int>b ;
b.push_back(1);
b.push_back(2);
b.push_back(3);
cout<<"vector b size :"<<b.size()<<endl;
cout<<"vector b capacity :"<<b.capacity()<<endl;
return 0;
}
OUTPUT
vector a size :3
vector a capacity:3
vector b size :3
vector b capacity :4
Why this program gives different values for capacity of a and b while both have same number of values and how size is different from capacity?
The reason is related to the very essence of the extension algorithm of the vector.
When initializing a vector, the number of extra capacity applied is 0.
In the i-th time an extension is needed, the vector copies its contain to a new vector, with capacity doubled then its current size.
This method makes the whole idea of size-changing array very efficient, since in amortized time (meaning the average time over N operations), we get O(1) insertion complexity.
You can see that after we add one more integer to the first vector, we get a capacity of 6. http://coliru.stacked-crooked.com/a/f084820652f025b8
By allocating more elements than needed, the vector does not need to reallocate memory when new elements are added to the vector. Also, when reducing the size, reallocation is not needed at all.
Reallocation of memory is a relatively expensive operation (creating new block, copying elements across, removing old block).
The trade-off is that the vector may have allocated more memory than it will need (e.g. if it allocates memory for elements that never get added/used). Practically, unless available memory is scarce, the cost of allocating a larger block (and reallocating less often) is less than the cost or reallocating every time.

Why Vector's size() and capacity() is different after push_back()

I just starting to learning vectors and little confused about size() and capacity()
I know little about both of them. But why in this program both are different? even array(10) is making room for 10 elements and initializing with 0.
Before adding array.push_back(5)
So array.size(); is 10 that is ok.
So array.capacity(); is 10 that is ok.
After adding array.push_back(5)
So array.size(); is 11 that is ok (already 10 time 0 is added and then push_back add one more element 5 ).
So array.capacity(); is 15 Why? ( is it reserving 5 blocks for one int? ).
#include <iostream>
#include <vector>
int main(){
std::vector<int> array(10); // make room for 10 elements and initialize with 0
array.reserve(10); // make room for 10 elements
array.push_back(5);
std::cout << array.size() << std::endl;
std::cout << array.capacity() << std::endl;
return 0;
}
The Standard mandates that std::vector<T>::push_back() has amortized O(1) complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.
Simple example: sequentially push_back 32 ints into a std::vector<int>. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.
Doing the formal analysis shows that you get O(N) stores and copies for N elements. This means amortized O(1) complexity per push_back (often only a store without a copy, sometimes a store and a sequence of copies).
Because of this expansion strategy, you will have size() < capacity() most of the time. Lookup shrink_to_fit and reserve to learn how to control a vector's capacity in a more fine-grained manner.
Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).
It is for efficiency so that it does not have to expand the underlying data structure each time you add an element. i.e. not having to call delete/new each time.
The std::vector::capacity is not its actual size (which is returned by size()), but the size of the actual internal allocated size.
In other terms, it is the size that it can reach before another re-allocation is needed.
It doesn't increase by 1 each time you do a push_back in order to not call a new reallocation (which is a heavy call) on each inserted element. It reserves more, because it doesn't know if you won't do some other push_back just after, and in this case, it won't have to change allocated memory size for the 4 next elements.
Here, 4 next elements is a compromise between 1, that would optimize memory allocation at maximum but would risk another reallocation soon, and a huge number, that would allow you to make many push_back quickly but maybe reserve a lot of memory for nothing.
Note: if you want to specify a capacity yourself (if you know your vector maximum size for instance), you can do it with reserve member function.
Using
std::vector<int> array(10); // make room for 10 elements and initialize with 0
You actually filled all the ten spaces with zeros. Adding ad additional element will cause the capacity to be expanded thanks for efficiency.
In your case it is useless to call the function reserve because you have instantiated the same number of elements.
check this and this link
I think the following question can give you more detail about the capacity of a vector.
About Vectors growth
I will reference the answer in above question.
The growth strategy of capacity is required to meet the amortized constant time requirement for the push_back operation. Then, the strategy is designed to have a exponential growth generally when the space is lack. In short, the size of vector indicate the number of elements now, while the captacity show its ability used to push_back in future.
Size() returns how many values you have in the vector.
And capacity() returns size of allocated storage capacity means how many values it can hold now.

Should I define number of elements in a vector before or after knowing how many elements I'll use?

I'm trying to understand which way I should implement a vector so I can reduce my run time and memory usage in a program or it doesn't matter (depending solely of the computations my program does with those elements)?
Let's say I define a vector without knowing how many elements I'll use in my program but I know the max number of elements I'll be working with
#define MAX 10000
vector<int> eg(MAX);
In the other case I indicate first how many elements and then size it accordingly
vector<int> eg;
int n;
cin >> n;
eg.resize(n);
If you know the maximum number of elements that the vector will store then it is better to use member function reserve. For example
const std::vector<int>::size_type MAX = 10000;
vector<int> eg;
eg.reserve( MAX );
Both. Because when you go to resize to the final number of elements, you will only be resizing to a lesser number of elements and that takes fewer cpu cycles than resizing to a greater number of elements (if you hadn't set MAX) because it doesn't have to copy elements to different locations if there isn't room in the current contiguous location.

Heap_size in heap_sort

I'm reading Cormen's "Introduction to Algorithms", and I'm trying to implement a heap-sort, and there's one thing I continually fail to understand: how do we calculate the heap_size for a given array?
My textbook says
An array A that represents a heap is an object with two attributes:
A.length, which (as usual) gives the number of elements in the array,
and A.heap-size, which represents how many elements in the heap are
stored within array A. That is, although A[1 .. A.length] may contain
numbers, only the elements in A[1..A.heap-size],where 0 <= A.heap-size <=
A.length, are valid elements of the heap.
If I implement an array as std::vector<T> Arr, then its' size would be Arr.size, but what would its' heap_size be is currently beyond me.
The heap size should be a separately stored variable, which you manage yourself.
Whenever you remove from or add to the heap, you should decrement or increment the value appropriately.
In C++, using a vector, you may actually be able to use the size, since the underlying representation is an array that's at least as big as the size of the vector, and it's guaranteed to stay the same size if you call resize with a smaller size. (So the underlying array will be the array size and the vector size will be the heap size).

How to reserve a multi-dimensional Vector?

Lets say i have vector of vectors
vector< vector<int> > bigTable;
bigTable.reserve(5);
To clarify my understanding of resize and reserve.
When you use push_back with vectors, you have to allocate memory each time you use it.
So my goal is to set a side a set of memory space so that it will be least expensive.
As such, will reserve help with the above goal?
You don't have to allocate memory each time you call push_back. The vector starts off with a given capacity, and it only allocates extra capacity when the original capacity runs out, typically by doubling the previous capacity. You should only reserve if you are sure you are going to need the extra capacity. And you can check how much capacity you start off with using the capacity member. So, yes, a call to reserve can help, but only if you know from the outset that you really need the extra capacity. But you can also trust the vector to increase its capacity if and when it needs it.
On my particular platform, when I initialize a vector with 5 Foo elements, it has capacity 5. As I add a new element, the capacity jumps to 10. This is not mandated by the standard, the original capacity could have been much more than 5.
struct Foo {
long long n;
};
int main() {
std::vector<Foo> f(5);
std::cout << f.capacity() << "\n";
f.push_back(Foo());
std::cout << f.capacity() << "\n";
}