How to reserve a multi-dimensional Vector? - c++

Lets say i have vector of vectors
vector< vector<int> > bigTable;
bigTable.reserve(5);
To clarify my understanding of resize and reserve.
When you use push_back with vectors, you have to allocate memory each time you use it.
So my goal is to set a side a set of memory space so that it will be least expensive.
As such, will reserve help with the above goal?

You don't have to allocate memory each time you call push_back. The vector starts off with a given capacity, and it only allocates extra capacity when the original capacity runs out, typically by doubling the previous capacity. You should only reserve if you are sure you are going to need the extra capacity. And you can check how much capacity you start off with using the capacity member. So, yes, a call to reserve can help, but only if you know from the outset that you really need the extra capacity. But you can also trust the vector to increase its capacity if and when it needs it.
On my particular platform, when I initialize a vector with 5 Foo elements, it has capacity 5. As I add a new element, the capacity jumps to 10. This is not mandated by the standard, the original capacity could have been much more than 5.
struct Foo {
long long n;
};
int main() {
std::vector<Foo> f(5);
std::cout << f.capacity() << "\n";
f.push_back(Foo());
std::cout << f.capacity() << "\n";
}

Related

Why vector has different capacity and other than the size? [duplicate]

This question already has answers here:
size vs capacity of a vector?
(8 answers)
Closed 6 years ago.
Below are program of vector and gives different result for capacity in c++11 mode.
#include<iostream>
#include<vector>
using namespace std;
int main(){
vector<int>a ={1,2,3};
cout<<"vector a size :"<<a.size()<<endl;
cout<<"vector a capacity :"<<a.capacity()<<endl<<endl;;
vector<int>b ;
b.push_back(1);
b.push_back(2);
b.push_back(3);
cout<<"vector b size :"<<b.size()<<endl;
cout<<"vector b capacity :"<<b.capacity()<<endl;
return 0;
}
OUTPUT
vector a size :3
vector a capacity:3
vector b size :3
vector b capacity :4
Why this program gives different values for capacity of a and b while both have same number of values and how size is different from capacity?
The reason is related to the very essence of the extension algorithm of the vector.
When initializing a vector, the number of extra capacity applied is 0.
In the i-th time an extension is needed, the vector copies its contain to a new vector, with capacity doubled then its current size.
This method makes the whole idea of size-changing array very efficient, since in amortized time (meaning the average time over N operations), we get O(1) insertion complexity.
You can see that after we add one more integer to the first vector, we get a capacity of 6. http://coliru.stacked-crooked.com/a/f084820652f025b8
By allocating more elements than needed, the vector does not need to reallocate memory when new elements are added to the vector. Also, when reducing the size, reallocation is not needed at all.
Reallocation of memory is a relatively expensive operation (creating new block, copying elements across, removing old block).
The trade-off is that the vector may have allocated more memory than it will need (e.g. if it allocates memory for elements that never get added/used). Practically, unless available memory is scarce, the cost of allocating a larger block (and reallocating less often) is less than the cost or reallocating every time.

Why Vector's size() and capacity() is different after push_back()

I just starting to learning vectors and little confused about size() and capacity()
I know little about both of them. But why in this program both are different? even array(10) is making room for 10 elements and initializing with 0.
Before adding array.push_back(5)
So array.size(); is 10 that is ok.
So array.capacity(); is 10 that is ok.
After adding array.push_back(5)
So array.size(); is 11 that is ok (already 10 time 0 is added and then push_back add one more element 5 ).
So array.capacity(); is 15 Why? ( is it reserving 5 blocks for one int? ).
#include <iostream>
#include <vector>
int main(){
std::vector<int> array(10); // make room for 10 elements and initialize with 0
array.reserve(10); // make room for 10 elements
array.push_back(5);
std::cout << array.size() << std::endl;
std::cout << array.capacity() << std::endl;
return 0;
}
The Standard mandates that std::vector<T>::push_back() has amortized O(1) complexity. This means that the expansion has to be geometrically, say doubling the amount of storage each time it has been filled.
Simple example: sequentially push_back 32 ints into a std::vector<int>. You will store all of them once, and also do 31 copies if you double the capacity each time it runs out. Why 31? Before storing the 2nd element, you copy the 1st; before storing the 3rd, you copy elements 1-2, before storing the 5th, you copy 1-4, etc. So you copy 1 + 2 + 4 + 8 + 16 = 31 times, with 32 stores.
Doing the formal analysis shows that you get O(N) stores and copies for N elements. This means amortized O(1) complexity per push_back (often only a store without a copy, sometimes a store and a sequence of copies).
Because of this expansion strategy, you will have size() < capacity() most of the time. Lookup shrink_to_fit and reserve to learn how to control a vector's capacity in a more fine-grained manner.
Note: with geometrical growth rate, any factor larger than 1 will do, and there have been some studies claiming that 1.5 gives better performance because of less wasted memory (because at some point the reallocated memory can overwrite the old memory).
It is for efficiency so that it does not have to expand the underlying data structure each time you add an element. i.e. not having to call delete/new each time.
The std::vector::capacity is not its actual size (which is returned by size()), but the size of the actual internal allocated size.
In other terms, it is the size that it can reach before another re-allocation is needed.
It doesn't increase by 1 each time you do a push_back in order to not call a new reallocation (which is a heavy call) on each inserted element. It reserves more, because it doesn't know if you won't do some other push_back just after, and in this case, it won't have to change allocated memory size for the 4 next elements.
Here, 4 next elements is a compromise between 1, that would optimize memory allocation at maximum but would risk another reallocation soon, and a huge number, that would allow you to make many push_back quickly but maybe reserve a lot of memory for nothing.
Note: if you want to specify a capacity yourself (if you know your vector maximum size for instance), you can do it with reserve member function.
Using
std::vector<int> array(10); // make room for 10 elements and initialize with 0
You actually filled all the ten spaces with zeros. Adding ad additional element will cause the capacity to be expanded thanks for efficiency.
In your case it is useless to call the function reserve because you have instantiated the same number of elements.
check this and this link
I think the following question can give you more detail about the capacity of a vector.
About Vectors growth
I will reference the answer in above question.
The growth strategy of capacity is required to meet the amortized constant time requirement for the push_back operation. Then, the strategy is designed to have a exponential growth generally when the space is lack. In short, the size of vector indicate the number of elements now, while the captacity show its ability used to push_back in future.
Size() returns how many values you have in the vector.
And capacity() returns size of allocated storage capacity means how many values it can hold now.

Can std::vector capacity/size/reserve be used to manually manage vector memory allocation?

I'm running very time sensitive code and would need a scheme to reserve more space for my vectors at a specific place in the code, where I can know (approximately) how many elements will be added, instead of having std do it for me when the vector is full.
I haven't found a way to test this to make sure there are no corner cases of std that I do not know of, therefore I'm wondering how the capacity of a vector affects the reallocation of memory. More specifically, would the code below make sure that automatic reallocation never occurs?
code
std::vector<unsigned int> data;
while (condition) {
// Reallocate here
// get_elements_required() gives an estimate that is guaranteed to be >= the actual nmber of elements.
unsigned int estimated_elements_required = get_elements_required(...);
if ((data.capacity() - data.size()) <= estimated_elements_required) {
data.reserve(min(data.capacity() * 2, data.max_length - 1));
}
...
// NEVER reallocate here, I would rather see the program crash actually...
for (unsigned int i = 0; i < get_elements_to_add(data); ++i) {
data.push_back(elements[i]);
}
}
estimated_elements_required in the code above is an estimate that is guaranteed to be equal to, or greater than, the actual number of elements that will be added. The code actually adding elements performs operations based on the capacity of the vector itself, changing the capacity halfway through will generate incorrect results.
Yes, this will work.
From the definition of reserve:
It is guaranteed that no reallocation takes place during insertions that happen after a call to reserve() until the time when an insertion would make the size of the vector greater than the value of capacity().

Control over std::vector reallocation

By reading the std::vector reference I understood that
calling insert when the the maximum capacity is reached will cause the reallocation of the std::vector (causing iterator invalidation) because new memory is allocated for it with a bigger capacity. The goal is to keep the guarantee about contiguous data.
As long as I stick below the maximum capacity insert will not cause that (and iterators will be intact).
My question is the following:
When reserve is called automatically by insert, is there any way to control how much new memory must be reserved?
Suppose that I have a vector with an initial capacity of 100 and, when the maximum capacity is hit, I want to allocate an extra 20 bytes.
Is it possible to do that?
You can always track it yourself and call reserve before it would allocate, e.g.
static const int N = 20 // Amount to grow by
if (vec.capacity() == vec.size()) {
vec.reserve(vec.size() + N);
}
vec.insert(...);
You can wrap this in a function of your own and call that function instead of calling insert() directly.

Benefits of using reserve() in a vector - C++

What is the benefit of using reserve when dealing with vectors. When should I use them? Couldn't find a clear cut answer on this but I assume it is faster when you reserve in advance before using them.
What say you people smarter than I?
It's useful if you have an idea how many elements the vector will ultimately hold - it can help the vector avoid repeatedly allocating memory (and having to move the data to the new memory).
In general it's probably a potential optimization that you shouldn't need to worry about, but it's not harmful either (at worst you end up wasting memory if you over estimate).
One area where it can be more than an optimization is when you want to ensure that existing iterators do not get invalidated by adding new elements.
For example, a push_back() call may invalidate existing iterators to the vector (if a reallocation occurs). However if you've reserved enough elements you can ensure that the reallocation will not occur. This is a technique that doesn't need to be used very often though.
It can be ... especially if you are going to be adding a lot of elements to you vector over time, and you want to avoid the automatic memory expansion that the container will make when it runs out of available slots.
For instance, back-insertions (i.e., std::vector::push_back) are considered an ammortized O(1) or constant-time process, but that is because if an insertion at the back of a vector is made, and the vector is out of space, it must then reallocate memory for a new array of elements, copy the old elements into the new array, and then it can copy the element you were trying to insert into the container. That process is O(N), or linear-time complexity, and for a large vector, could take quite a bit of time. Using the reserve() method allows you to pre-allocate memory for the vector if you know it's going to be at least some certain size, and avoid reallocating memory every time space runs out, especially if you are going to be doing back-insertions inside some performance-critical code where you want to make sure that the time to-do the insertion remains an actual O(1) complexity-process, and doesn't incurr some hidden memory reallocation for the array. Granted, your copy constructor would have to be O(1) complexity as well to get true O(1) complexity for the entire back-insertion process, but in regards to the actual algorithm for back-insertion into the vector by the container itself, you can keep it a known complexity if the memory for the slot is already pre-allocated.
This excellent article deeply explains differences between deque and vector containers. Section "Experiment 2" shows the benefits of vector::reserve().
If you know the eventual size of the vector then reserve is worth using.
Otherwise whenever the vector runs out of internal room it will re-size the buffer. This usually involves doubling (or 1.5 * current size) the size of the internal buffer (can be expensive if you do this a lot).
The real expensive bit is invoking the copy constructor on each element to copy it from the old buffer to the new buffer, followed by calling the destructor on each element in the old buffer.
If the copy constructor is expensive then it can be a problem.
Faster and saves memory
If you push_back another element, then a full vector will typically allocate double the memory it's currently using - since allocate + copy is expensive
Don't know about people smarter than you, but I would say that you should call reserve in advance if you are going to perform lots in insertion operations and you already know or can estimate the total number of elements, at least the order of magnitude. It can save you a lot of reallocations in good circumstances.
Although its an old question, Here is my implementation for the differences.
#include <iostream>
#include <chrono>
#include <vector>
using namespace std;
int main(){
vector<int> v1;
chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v1.push_back(1);
}
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_first = chrono::duration_cast<chrono::duration<double>>(t2-t1);
cout << "Time for 1000000 insertion without reserve: " << time_first.count() * 1000 << " miliseconds." << endl;
vector<int> v2;
v2.reserve(1000000);
chrono::steady_clock::time_point t3 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v2.push_back(1);
}
chrono::steady_clock::time_point t4 = chrono::steady_clock::now();
chrono::duration<double> time_second = chrono::duration_cast<chrono::duration<double>>(t4-t3);
cout << "Time for 1000000 insertion with reserve: " << time_second.count() * 1000 << " miliseconds." << endl;
return 0;
}
When you compile and run this program, it outputs:
Time for 1000000 insertion without reserve: 24.5573 miliseconds.
Time for 1000000 insertion with reserve: 17.1771 miliseconds.
Seems to be some improvement with reserve, but not that too much improvement. I think it will be more improvement for complex objects, I am not sure. Any suggestions, changes and comments are welcome.
It's always interesting to know the final total needed space before to request any space from the system, so you just require space once. In other cases the system may have to move you in a larger free zone (it's optimized but not always a free operation because a whole data copy is required). Even the compiler will try to help you, but the best is to to tell what you know (to reserve the total space required by your process). That's what i think. Greetings.
There is one more advantage of reserve that is not much related to performance but instead to code style and code cleanliness.
Imagine I want to create a vector by iterating over another vector of objects. Something like the following:
std::vector<int> result;
for (const auto& object : objects) {
result.push_back(object.foo());
}
Now, apparently the size of result is going to be the same as objects.size() and I decide to pre-define the size of result.
The simplest way to do it is in the constructor.
std::vector<int> result(objects.size());
But now the rest of my code is invalidated because the size of result is not 0 anymore; it is objects.size(). The subsequent push_back calls are going to increase the size of the vector. So, to correct this mistake, I now have to change how I construct my for-loop. I have to use indices and overwrite the corresponding memory locations.
std::vector<int> result(objects.size());
for (int i = 0; i < objects.size(); ++i) {
result[i] = objects[i].foo();
}
And I don't like it. Indices are everywhere in the code. This is also more vulnerable to making accidental copies because of the [] operator. This example uses integers and directly assigns values to result[i], but in a more complex for-loop with complex data structures, it could be relevant.
Coming back to the main topic, it is very easy to adjust the first code by using reserve. reserve does not change the size of the vector but only the capacity. Hence, I can leave my nice for loop as it is.
std::vector<int> result;
result.reserve(objects.size());
for (const auto& object : objects) {
result.push_back(object.foo());
}