I am trying to optimize a C++ routine. The main bottleneck in this routine is the push_back() of a vector of objects. I tried using a deque instead and even tried a list. But strangely (and contrary to theory) deque and list implementations run much slower than the vector counterpart.
In fact even clear() runs much slower for the deque and list implementations than the vector counterpart. In this case too, Vector implementation seems to be the fastest while list implementation is the slowest.
Any pointers?
Note: vector reserve() could have sped the implementation but cannot be done as it is unknown in size.
Thanks.
vector being faster to build or clear than deque or list is to be expected; it's a simpler data structure.
With regard to vector::push_back, it has to do two things:
check the vector is big enough to
hold the new item.
insert the new item.
You can generally speed things up by eliminating step 1 by simply resizing the vector and using operator[] to set items.
UPDATE:
Original poster asked for an example.
The code below times 128 mega insertions, and outputs
push_back : 2.04s
reserve & push_back : 1.73s
resize & place : 0.48s
when compiled and run with g++ -O3 on Debian/Lenny on an old P4 machine.
#include <iostream>
#include <time.h>
#include <vector>
int main(int,char**)
{
const size_t n=(128<<20);
const clock_t t0=clock();
{
std::vector<unsigned char> a;
for (size_t i=0;i<n;i++) a.push_back(i);
}
const clock_t t1=clock();
{
std::vector<unsigned char> a;
a.reserve(n);
for (size_t i=0;i<n;i++) a.push_back(i);
}
const clock_t t2=clock();
{
std::vector<unsigned char> a;
a.resize(n);
for (size_t i=0;i<n;i++) a[i]=i;
}
const clock_t t3=clock();
std::cout << "push_back : " << (t1-t0)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
std::cout << "reserve & push_back : " << (t2-t1)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
std::cout << "resize & place : " << (t3-t2)/static_cast<float>(CLOCKS_PER_SEC) << "s" << std::endl;
return 0;
}
If you don't know how many object you'll be adding it's very difficult to come up with an optimal solution. All you can do is try to minimize the cost that you know is happening - which in this case is that your vector is being constantly resized.
You could do this in two ways;
1) Split your operation into building and finalizing. This is where you build the list into a vector that is guaranteed to be big enough and when done copy it to another vector.
E.g.
std::vector<Foo> hugeVec;
hugeVec.reserve(1000); // enough for 1000 foo's
// add stuff
std::vector<Foo> finalVec;
finalVec = hugeVec;
2) Alternatively, when your vector is full call reserve with enough for another set of objects;
if (vec.capacity() == vec.size())
vec.reserve(vec.size() + 16); // alloc space for 16 more objects
You could choose a different container that did not result in all elements being copied upon a resize, but your bottleneck may then become the individual memory allocations for the new elements.
Are you pushing back the objects themselves, or a pointer to them? Pointers will usually be much faster as it's only 4-8 bytes to copy, compared to whatever the size of the objects are.
"push_back()" can be slow if the copy of an object is slow. If the default constructor is fast and you have a way tu use swap to avoid the copy, you could have a much faster program.
void test_vector1()
{
vector<vector<int> > vvi;
for(size_t i=0; i<100; i++)
{
vector<int> vi(100000, 5);
vvi.push_back(vi); // copy of a large object
}
}
void test_vector2()
{
vector<int> vi0;
vector<vector<int> > vvi;
for(size_t i=0; i<100; i++)
{
vector<int> vi(100000, 5);
vvi.push_back(vi0); // copy of a small object
vvi.back().swap(vi); // swap is fast
}
}
Results :
VS2005-debug
* test_vector1 -> 297
* test_vector2 -> 172
VS2005-release
* test_vector1 -> 203
* test_vector2 -> 94
gcc
* test_vector1 -> 343
* test_vector2 -> 188
gcc -O2
* test_vector1 -> 250
* test_vector2 -> 156
If you want vector to be fast, you must reserve() enough space. It makes a huge difference, because each grow is terrible expensive. If you dont know, make a good guess.
You'll need to give more information on the behavior of the routine.
In one place you're concerned about the speed of push_back() in another you're concerned about clear(). Are you building up the container, doing something then dumping it?
The results you see for clear() are because vector<> only has to release a singl block of memory, deque<> has to release several, and list<> has to release one for each element.
Deque has a more complex structure than vector and the speed differences between the two will be heavily dependent on both the specific implementation and the actual number of elements pushed back, but for large amounts of data it should be faster. clear() may be slower because it may choose to get rid of the more complex underlying structures. Much the same goes for list.
Regarding push_back() being slow and reserve being no help, the implementation of STL used in MSVC works something like this: When you first create a vector it reserves space for I think 10 elements. From then on, whenever it gets full, it reserves space for 1.5 times the number of elements in the vector. So, something like 10, 15, 22, 33, 49, 73, 105, 157... The re-allocations are expensive.
Even if you don't know the exact size, reserve() can be useful. reserve() doesn't prevent the vector from growing if it needs to. If you reserve() and the vector grows beyond that size, you have still improved things because of the reserve. If the vector turns out to be much smaller, well, maybe that's ok because the performance in general works better with smaller sizes.
You need to profile in RELEASE mode to know for sure what strategy works best.
You have to choose your container according to what you're going to do with it.
Relevant actions are: extending (with push), insertion (may not be needed at all), extraction, deletion.
At cplusplus.com, there is a very nice overview of the operations per container type.
If the operation is push-bound, it makes sense that the vector beats all others. The good thing about deque is that it allocates fixed chunks, so will make more efficient use of fragmented memory.
Related
I read that std::vector should be contiguous. My understanding is, that its elements should be stored together, not spread out across the memory. I have simply accepted the fact and used this knowledge when for example using its data() method to get the underlying contiguous piece of memory.
However, I came across a situation, where the vector's memory behaves in a strange way:
std::vector<int> numbers;
std::vector<int*> ptr_numbers;
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
}
I expected this to give me a vector of some numbers and a vector of pointers to these numbers. However, when listing the contents of the ptr_numbers pointers, there are different and seemingly random numbers, as though I am accessing wrong parts of memory.
I have tried to check the contents every step:
for (int i = 0; i < 8; i++) {
numbers.push_back(i);
ptr_numbers.push_back(&numbers.back());
for (auto ptr_number : ptr_numbers)
std::cout << *ptr_number << std::endl;
std::cout << std::endl;
}
The result looks roughly like this:
1
some random number
2
some random number
some random number
3
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
Edit: Is std::vector contiguous only since C++17? (Just to keep the comments on my previous claim relevant to future readers.)
It roughly looks like this (excuse my MS Paint masterpiece):
The std::vector instance you have on the stack is a small object containing a pointer to a heap-allocated buffer, plus some extra variables to keep track of the size and and capacity of the vector.
So it seems as though when I push_back() to the numbers vector, its older elements change their location.
The heap-allocated buffer has a fixed capacity. When you reach the end of the buffer, a new buffer will be allocated somewhere else on the heap and all the previous elements will be moved into the new one. Their addresses will therefore change.
Does it maybe store them together, but moves them all together, when more space is needed?
Roughly, yes. Iterator and address stability of elements is guaranteed with std::vector only if no reallocation takes place.
I am aware, that std::vector is a contiguous container only since C++17
The memory layout of std::vector hasn't changed since its first appearance in the Standard. ContiguousContainer is just a "concept" that was added to differentiate contiguous containers from others at compile-time.
The Answer
It's a single contiguous storage (a 1d array).
Each time it runs out of capacity it gets reallocated and stored objects are moved to the new larger place — this is why you observe addresses of the stored objects changing.
It has always been this way, not since C++17.
TL; DR
The storage is growing geometrically to ensure the requirement of the amortized O(1) push_back(). The growth factor is 2 (Capn+1 = Capn + Capn) in most implementations of the C++ Standard Library (GCC, Clang, STLPort) and 1.5 (Capn+1 = Capn + Capn / 2) in the MSVC variant.
If you pre-allocate it with vector::reserve(N) and sufficiently large N, then addresses of the stored objects won't be changing when you add new ones.
In most practical applications is usually worth pre-allocating it to at least 32 elements to skip the first few reallocations shortly following one other (0→1→2→4→8→16).
It is also sometimes practical to slow it down, switch to the arithmetic growth policy (Capn+1 = Capn + Const), or stop entirely after some reasonably large size to ensure the application does not waste or grow out of memory.
Lastly, in some practical applications, like column-based object storages, it may be worth giving up the idea of contiguous storage completely in favor of a segmented one (same as what std::deque does but with much larger chunks). This way the data may be stored reasonably well localized for both per-column and per-row queries (though this may need some help from the memory allocator as well).
std::vector being a contiguous container means exactly what you think it means.
However, many operations on a vector can re-locate that entire piece of memory.
One common case is when you add element to it, the vector must grow, it can re-allocate and copy all elements to another contiguous piece of memory.
So what does it exactly mean, that std::vector is a contiguous container and why do its elements move? Does it maybe store them together, but moves them all together, when more space is needed?
That's exactly how it works and why appending elements does indeed invalidate all iterators as well as memory locations when a reallocation takes place¹. This is not only valid since C++17, it has been the case ever since.
There are a couple of benefits from this approach:
It is very cache-friendly and hence efficient.
The data() method can be used to pass the underlying raw memory to APIs that work with raw pointers.
The cost of allocating new memory upon push_back, reserve or resize boil down to constant time, as the geometric growth amortizes over time (each time push_back is called the capacity is doubled in libc++ and libstdc++, and approx. growths by a factor of 1.5 in MSVC).
It allows for the most restricted iterator category, i.e., random access iterators, because classical pointer arithmetic works out well when the data is contiguously stored.
Move construction of a vector instance from another one is very cheap.
These implications can be considered the downside of such a memory layout:
All iterators and pointers to elements are invalidate upon modifications of the vector that imply a reallocation. This can lead to subtle bugs when e.g. erasing elements while iterating over the elements of a vector.
Operations like push_front (as std::list or std::deque provide) aren't provided (insert(vec.begin(), element) works, but is possibly expensive¹), as well as efficient merging/splicing of multiple vector instances.
¹ Thanks to #FrançoisAndrieux for pointing that out.
In terms of the actual structure, an std::vector looks something like this in memory:
struct vector { // Simple C struct as example (T is the type supplied by the template)
T *begin; // vector::begin() probably returns this value
T *end; // vector::end() probably returns this value
T *end_capacity; // First non-valid address
// Allocator state might be stored here (most allocators are stateless)
};
Relevant code snippet from the libc++ implementation as used by LLVM
Printing the raw memory contents of an std::vector:
(Don't do this if you don't know what you're doing!)
#include <iostream>
#include <vector>
struct vector {
int *begin;
int *end;
int *end_capacity;
};
int main() {
union vecunion {
std::vector<int> stdvec;
vector myvec;
~vecunion() { /* do nothing */ }
} vec = { std::vector<int>() };
union veciterator {
std::vector<int>::iterator stditer;
int *myiter;
~veciterator() { /* do nothing */ }
};
vec.stdvec.push_back(1); // Add something so we don't have an empty vector
std::cout
<< "vec.begin = " << vec.myvec.begin << "\n"
<< "vec.end = " << vec.myvec.end << "\n"
<< "vec.end_capacity = " << vec.myvec.end_capacity << "\n"
<< "vec's size = " << vec.myvec.end - vec.myvec.begin << "\n"
<< "vec's capacity = " << vec.myvec.end_capacity - vec.myvec.begin << "\n"
<< "vector::begin() = " << (veciterator { vec.stdvec.begin() }).myiter << "\n"
<< "vector::end() = " << (veciterator { vec.stdvec.end() }).myiter << "\n"
<< "vector::size() = " << vec.stdvec.size() << "\n"
<< "vector::capacity() = " << vec.stdvec.capacity() << "\n"
;
}
I am using all the time the same std::vector<int> in order to try to avoid allocating an deallocating all the time. In a few lines, my code is as follows:
std::vector<int> myVector;
myVector.reserve(4);
for (int i = 0; i < 100; ++i) {
fillVector(myVector);
//use of myVector
//....
myVector.resize(0);
}
In each for iteration, myVector will be filled with up to 4 elements. In order to make efficient code, I want to use always myVector. However, in myVector.resize() the elements in myVector are being destroyed. I understand that myVector.clear() will have the same effect.
I think if I could just overwrite the existing elements in myVector I could save some time. However I think the std::vector is not capable of doing this.
Is there any way of doing this? Does it make sense to create a home-grown implementation which overwrites elements ?
Your code is already valid (myVector.clear() has better style than myVector.resize(0) though).
'int destructor' does nothing.
So resize(0) just sets the size to 0, capacity is untouched.
Simply don't keep resizing myVector. Instead, initialise it with 4 elements (with std::vector<int> myVector(4)) and just assign to the elements instead (e.g. myVector[0] = 5).
However, if it's always going to be fixed size, then you might prefer to use a std::array<int, 4>.
Resizing a vector to 0 will not reduce its capacity and, since your element type is int, there are no destructors to run:
#include <iostream>
#include <vector>
int main() {
std::vector<int> v{1,2,3};
std::cout << v.capacity() << ' ';
v.resize(0);
std::cout << v.capacity() << '\n';
}
// Output: 3 3
Therefore, your code already performs mostly optimally; the only further optimisation you could make would be to avoid the resize entirely, thereby losing the internal "set size to 0" inside std::vector that likely comes down to an if statement and a data member value change.
std::vector is not a solution in this case. You don't want to resize/clear/(de)allocate all over again? Don't.
fillVector() fills 'vector' with number of elements known in each iteration.
Vector is internally represented as continuous block of memory of type T*.
You don't want to (de)allocate memory each time.
Ok. Use simple struct:
struct upTo4ElemVectorOfInts
{
int data[4];
size_t elems_num;
};
And modify fillVector() to save additional info:
void fillVector(upTo4ElemVectorOfInts& vec)
{
//fill vec.data with values
vec.elems_num = filled_num; //save how many values was filled in this iteration
}
Use it in the very same way:
upTo4ElemVectorOfInts myVector;
for (int i = 0; i < 100; ++i)
{
fillVector(myVector);
//use of myVector:
//- myVector.data contains data (it's equivalent of std::vector<>::data())
//- myVector.elems_num will tell you how many numbers you should care about
//nothing needs to be resized/cleared
}
Additional Note:
If you want more general solution (to operate on any type or size), you can, of course, use templates:
template <class T, size_t Size>
struct upToSizeElemVectorOfTs
{
T data[Size];
size_t elems_num;
};
and adjust fillVector() to accept template instead of known type.
This solution is probably the fastest one. You can think: "Hey, and if I want to fill up to 100 elements? 1000? 10000? What then? 10000-elem array will consume a lot of storage!".
It would consume anyway. Vector is resizing itself automatically and this reallocs are out of your control and thus can be very inefficient. If your array is reasonably small and you can predict max required size, always use fixed-size storage created on local stack. It's faster, more efficient and simpler. Of course this won't work for arrays of 1.000.000 elements (you would get Stack Overflow in this case).
In fact what you have at present is
for (int i = 0; i < 100; ++i) {
myVector.reserve(4);
//use of myVector
//....
myVector.resize(0);
}
I do not see any sense in that code.
Of course it would be better to use myVector.clear() instead of myVector.resize(0);
If you always overwrite exactly 4 elements of the vector inside the loop then you could use
std::vector<int> myVector( 4 );
instead of
std::vector<int> myVector;
myVector.reserve(4);
provided that function fillVector(myVector); uses the subscript operator to access these 4 elements of the vector instead of member function push_back
Otherwise use clear as it was early suggested.
Wouldn't you expect the addresses printed by the two loops to be the same? I was, and I cannot understand why (sometimes) they are different.
#include <iostream>
#include <vector>
using namespace std;
struct S {
void print_address() {
cout << this << endl;
}
};
int main(int argc,char *argv[]) {
vector<S> v;
for (size_t i = 0; i < 10; i++) {
v.push_back( S() );
v.back().print_address();
}
cout << endl;
for (size_t i = 0; i < v.size(); i++) {
v[i].print_address();
}
return 0;
}
I tested this code with many local and on-line compilers and the output I get looks like this (the last three figures are always the same):
0xaec010
0xaec031
0xaec012
0xaec013
0xaec034
0xaec035
0xaec036
0xaec037
0xaec018
0xaec019
0xaec010
0xaec011
0xaec012
0xaec013
0xaec014
0xaec015
0xaec016
0xaec017
0xaec018
0xaec019
I spotted this because making some initialization in the first loop I obtained uninitialized object in the subsequent part of the program. Am I missing something?
Because when vector capicity changes, it reallocates elements. If you std::vector::reserve enough capacity, no reallcation is needed, it will print same address.
vector<S> v;
v.reserve(10);
Note: properly use std::vector::reserve will increase application performance, because no unnecessary reallocation and objects copy.
The vector is performing re-allocations in order to grow as needed. Each time it does this, it allocates a larger buffer for the data and copies the elements across. You can see this clearly in the first loop, where each address jump is followed by a larger sequence of consecutive addresses. In the second loop, you just look at the addresses after the final reallocation.
0xaec010
0xaec031 <--
0xaec012 <--
0xaec013
0xaec034 <--
0xaec035
0xaec036
0xaec037
0xaec018 <--
0xaec019
The simplest way to instantiate a vector with 10 S objects would be
std::vector<S> v(10);
This would involve no re-allocations. See also std::vector::reserve.
Vector elements are stored contiguously; that is, they're all in a row in memory. Your vector object has to allocate space for this contiguous block of elements.
Your vector can't just keep having things added to it indefinitely. It has to grow the space it has allocated. The memory model typically doesn't allow us to expand a memory block — we have to create a new one instead. When the vector does this, it has to move all its elements to the new space. This is occurring several times within your first loop.
If you'd done:
vector<S> v;
v.reserve(10);
(which you can, since you know you'll end up with 10 elements), then no re-allocation would have been necessary, and the addresses would not have changed.
I'm not really surprised that they can change. As the vector initially has no size, it's likely to reallocate the vector once or twice during the initial loop. That'll change the base address of the vector. It's not impossible that after a resize, you'll end up using an address you used before (though I find that somewhat surprising. Are you sure about the first part of the addresses?)
If you want to ensure they don't change, you need to add a v.reserve() before you start pushing stuff on it.
I am creating an array and vector of size 100 and generating a random value and trying to maintain both array and vector as sorted.
Here is my code for the same
vector<int> myVector;
int arr[SIZE];
clock_t start, finish;
int random;
for(int i=0; i<SIZE;i++)
{
myVector.push_back(0);
arr[i] = 0;
}
//testing for Array
start = clock();
for(int i=0; i<MAX;++i)
{
random = getRandom(); //returns rand() % 100
for(int j=0; j<SIZE;++j){
if(random > arr[j])
{
for(int k = SIZE - 1; k > j ; --k)
{
arr[k] = arr[k-1];
}
arr[j] = random;
break;
}
}
}
finish = clock();
cout << "Array Time " << finish - start << endl;
//Vector Processing
start = clock();
for(int i=0; i<MAX;++i)
{
random = getRandom(); //returns rand() % 100
for(int j=0; j<SIZE;++j){
if(random > myVector[j])
{
for(int k = SIZE - 1; k > j ; --k)
{
myVector[k] = myVector[k-1];
}
myVector[j] = random;
break;
}
}
}
finish = clock();
cout << "Vector Time " << finish - start << endl;
The output is as follows:
Array Time : 5
Vector Time: 83
I am not able to understand why vector is so slow compared to array in this case?
Doesn't this contradict the thumb-rule of preferring Vector over Array.
Please Help !
First of all: Many rules of thumb in programming are not about ganing some milliseconds in performance, but about managing complexity, therefore avoiding bugs. In this case, it's about performing range checks wich most vector implementations do in debug mode, and wich arrays don't. It's also about memory management for dynamic arrays - vector does manage it's memory itself, while you have to do it manually in arrays at the risk of introducing memory leaks (ever forgot a delete[] or used delete instead? I be you have!). And it's about ease of use, e.g. resizing the vector or inserting element in the middle, wich is tedious work with manually managed arrays.
In other words, performance measurements can never ever contradict a rule of thumb, because a rule of thumb never targets performance. Performance measurements can only be one of the few possible reasons to not obey a coding guideline.
At first sight I'd guess you have not enabled optimizations. The main source of performance loss for the vector would then be index checks that many vector implementations have enabled for debug builds. Those won't kick in in optimized builds, so that should be your first concern. Rule of thumb: performance measurements without optimizations enabled are meaningless
If enabling optimizations still does show a better performance for the array, there's another difference:
The array is stored on the stack, so the compiler can directly use the adrresses and calculate address offsets at compiletime, while the vector elements are stored on the heap and the compiler will have to dereference the pointer stored in the vector. I'd expect the optimizer to dereference the pointer once and calculate the address offsets from that point on. Still, there might be a small performance penalty compared to compiletime-calculated address offsets, especially if the optimizer can unroll the loop a bit. This still does not contradict the rule of thumb, because you are comparing apples with pears here. The rule of thumb says,
Prefer std::vector over dynamic arrays, and prefer std::array over fixed arrays.
So either use a dynamically allocated array (including some kind of delete[], please) or compare the fixed size array to a std::array. In C++14, you'll have to consider new candidates in the game, namely std::dynarray and C++14 VLAs, non-resizable, runtime length arrays comparable to C's VLAs.
Update:
As was pointed out in the comments, optimizers are good at identifying code that has no side effects, like the operations on the array that you never read from. std::vector implementations are complicated enough that optimizers typically won't see through those several layers of indirection and optimize away all the insert, so you'll get zero time for the array compared to some time for the vector. Reading the array contens after the loop will disable such rude optimizations.
The vector class has to dynamically grow the memory, that may involve copying the whole thing from time to time.
Also it has to call internal functions for many operations - like reallocating.
Also it may have security functionality like boundary checks.
Meanwhile your array is preallocated and all your operations propably do not call any internal functions.
That is the overhead price for more functionality.
And who said that vectors should be faster than arrays in all cases?
Your array does not need to grow, thats a special case where arrays are indeed faster!
Because arrays are native data types, whereas the compiler can manipulate it directly from memory, they are managed internally by the compiled exec.
On the other hand, you get vector that is more like a class, template as I read, and it needs some management going through another header files and libraries.
Essentially native data type can be managed withouth including any headers, which make them easier to manipulate from the program, without having to use external code. Which makes the overhead on the vector time is the need for the program to look through the code and use the methods related to vector data type.
Every time you need to add more code to your app and operate from it, it will make your app performance to drop
You can read about it, here, here and here
What is the benefit of using reserve when dealing with vectors. When should I use them? Couldn't find a clear cut answer on this but I assume it is faster when you reserve in advance before using them.
What say you people smarter than I?
It's useful if you have an idea how many elements the vector will ultimately hold - it can help the vector avoid repeatedly allocating memory (and having to move the data to the new memory).
In general it's probably a potential optimization that you shouldn't need to worry about, but it's not harmful either (at worst you end up wasting memory if you over estimate).
One area where it can be more than an optimization is when you want to ensure that existing iterators do not get invalidated by adding new elements.
For example, a push_back() call may invalidate existing iterators to the vector (if a reallocation occurs). However if you've reserved enough elements you can ensure that the reallocation will not occur. This is a technique that doesn't need to be used very often though.
It can be ... especially if you are going to be adding a lot of elements to you vector over time, and you want to avoid the automatic memory expansion that the container will make when it runs out of available slots.
For instance, back-insertions (i.e., std::vector::push_back) are considered an ammortized O(1) or constant-time process, but that is because if an insertion at the back of a vector is made, and the vector is out of space, it must then reallocate memory for a new array of elements, copy the old elements into the new array, and then it can copy the element you were trying to insert into the container. That process is O(N), or linear-time complexity, and for a large vector, could take quite a bit of time. Using the reserve() method allows you to pre-allocate memory for the vector if you know it's going to be at least some certain size, and avoid reallocating memory every time space runs out, especially if you are going to be doing back-insertions inside some performance-critical code where you want to make sure that the time to-do the insertion remains an actual O(1) complexity-process, and doesn't incurr some hidden memory reallocation for the array. Granted, your copy constructor would have to be O(1) complexity as well to get true O(1) complexity for the entire back-insertion process, but in regards to the actual algorithm for back-insertion into the vector by the container itself, you can keep it a known complexity if the memory for the slot is already pre-allocated.
This excellent article deeply explains differences between deque and vector containers. Section "Experiment 2" shows the benefits of vector::reserve().
If you know the eventual size of the vector then reserve is worth using.
Otherwise whenever the vector runs out of internal room it will re-size the buffer. This usually involves doubling (or 1.5 * current size) the size of the internal buffer (can be expensive if you do this a lot).
The real expensive bit is invoking the copy constructor on each element to copy it from the old buffer to the new buffer, followed by calling the destructor on each element in the old buffer.
If the copy constructor is expensive then it can be a problem.
Faster and saves memory
If you push_back another element, then a full vector will typically allocate double the memory it's currently using - since allocate + copy is expensive
Don't know about people smarter than you, but I would say that you should call reserve in advance if you are going to perform lots in insertion operations and you already know or can estimate the total number of elements, at least the order of magnitude. It can save you a lot of reallocations in good circumstances.
Although its an old question, Here is my implementation for the differences.
#include <iostream>
#include <chrono>
#include <vector>
using namespace std;
int main(){
vector<int> v1;
chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v1.push_back(1);
}
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_first = chrono::duration_cast<chrono::duration<double>>(t2-t1);
cout << "Time for 1000000 insertion without reserve: " << time_first.count() * 1000 << " miliseconds." << endl;
vector<int> v2;
v2.reserve(1000000);
chrono::steady_clock::time_point t3 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v2.push_back(1);
}
chrono::steady_clock::time_point t4 = chrono::steady_clock::now();
chrono::duration<double> time_second = chrono::duration_cast<chrono::duration<double>>(t4-t3);
cout << "Time for 1000000 insertion with reserve: " << time_second.count() * 1000 << " miliseconds." << endl;
return 0;
}
When you compile and run this program, it outputs:
Time for 1000000 insertion without reserve: 24.5573 miliseconds.
Time for 1000000 insertion with reserve: 17.1771 miliseconds.
Seems to be some improvement with reserve, but not that too much improvement. I think it will be more improvement for complex objects, I am not sure. Any suggestions, changes and comments are welcome.
It's always interesting to know the final total needed space before to request any space from the system, so you just require space once. In other cases the system may have to move you in a larger free zone (it's optimized but not always a free operation because a whole data copy is required). Even the compiler will try to help you, but the best is to to tell what you know (to reserve the total space required by your process). That's what i think. Greetings.
There is one more advantage of reserve that is not much related to performance but instead to code style and code cleanliness.
Imagine I want to create a vector by iterating over another vector of objects. Something like the following:
std::vector<int> result;
for (const auto& object : objects) {
result.push_back(object.foo());
}
Now, apparently the size of result is going to be the same as objects.size() and I decide to pre-define the size of result.
The simplest way to do it is in the constructor.
std::vector<int> result(objects.size());
But now the rest of my code is invalidated because the size of result is not 0 anymore; it is objects.size(). The subsequent push_back calls are going to increase the size of the vector. So, to correct this mistake, I now have to change how I construct my for-loop. I have to use indices and overwrite the corresponding memory locations.
std::vector<int> result(objects.size());
for (int i = 0; i < objects.size(); ++i) {
result[i] = objects[i].foo();
}
And I don't like it. Indices are everywhere in the code. This is also more vulnerable to making accidental copies because of the [] operator. This example uses integers and directly assigns values to result[i], but in a more complex for-loop with complex data structures, it could be relevant.
Coming back to the main topic, it is very easy to adjust the first code by using reserve. reserve does not change the size of the vector but only the capacity. Hence, I can leave my nice for loop as it is.
std::vector<int> result;
result.reserve(objects.size());
for (const auto& object : objects) {
result.push_back(object.foo());
}