std::vector::reserve performance penalty - c++

inline void add(const DataStruct& rhs) {
using namespace boost::assign;
vec.reserve(vec.size() + 3);
vec += rhs.a, rhs.b, rhs.c;
}
The above function was executed for about 17000 times and it performed (as far as I can see. There was some transformation involved) about 2 magnitudes worse with the call to vector::reserve.
I always was under the impression that reserve can speed up push_back even for small values but this doesn't seem true and I can't find any obvious reasons why it shouldn't be this way. Does reserve prevent the inlining of the function? Is the call to size() too expensive? Does this depend on the platform? I'll try and code up some small benchmark to confirm this in a clean environment.
Compiler: gcc (GCC) 4.4.2 with -g -O2

GCC implementation of reserve() will allocate exact number of elements, while push_back() will grow internal buffer exponentially by doubling it, so you are defeating the exponential growth and forcing reallocation/copy on each iteration. Run your test under ltrace or valgrind and see the number of malloc() calls.

You only use reserve() if you know in advance the number of elements. In that case reserve() space for all elements at once.
Otherwise just use push_back() and rely on the default strategy - it will reallocate exponentially and greatly reduce the number of reallocations at a cost of slightly suboptimal memory consumption.

Use only reserve if you know in advance how much place it will use.
Reserve will need to copy your whole vector...
If you do a push_back and the vector is too small, then it will do a reserve (vec.size()*2).
If you don't know beforehand how big your vector is going to be and if you need random access, consider using std::deque.

When std::vector needs to reallocate it grows its allocation size by N*2, where n is its current size. This results in a logarithmic number of reallocs as the vector grows.
If instead the std::vector grew its allocated space by a constant amount, the number of reallocs would grow linearly as the vector grows.
What you've done is essentially cause the vector to grow by a constant amount 3, meaning linear growth. Linear is obviously worse that logarithmic, especially with big numbers.
Generally, the only growth better than logarithmic is constant. That is why the standards committee created the reserve method. If you can avoid all reallocs (constant) you will perform better than the default logarithmic behavior.
That said you may want to consider Herb Sutter's comments about preferring std::deque over vector www.gotw.ca/gotw/054.htm

Move the reserve outside of the add.
Each time you invoke "add", you are reserving atleast 3 extra elements. Depending on the implementation of vector, this could be increasing the size of the backing array almost every time you call "add". That is would definately cause the performance difference that you describe.
The correct way to use reserve is something like:
vec.reserve(max*3);
for(int i=0; i<max; i++)
add(i);

If you profiled the code I bet you would see that the += IS very fast, the problem is the reserve is killing you. You should really only use reserve when you have some knowledge of how big the vector will grow to. If you can guess ahead of time then do ONE reserve, otherwise just go with the default push_back.

Related

What is the memory/runtime efficiency of std::vector, and what is its memory allocation strategy?

I am reading "C++ Primer", and in the chapter about containers, the book suggests to always use std::vector whenever possible, eg when there is no demand to insert or delete in the middle or front.
I tested a bit with std::vector and noticed that every time it needs to reallocate, it always reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case, and why would it execute in such a way.
In addition, how is the memory/time efficiency of std::vector compared to built-in static and dynamic arrays? Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?
why would it execute in such a way
Because even though this wastes some memory, it makes insertions faster, by reducing the number of reallocations.
It keeps push_back complexity at amortized O(1), while increasing the size by 1 and reallocating each time would make it O(n).
reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case
The standard just says that push_back has to have amortized O(1) complexity, and compilers (more precisely, standard library implementations) are free to achieve that by any means.
This disallows increasing capacity by 1 each time, as that would make the complexity O(n). Apparently achieving amortized O(1) requires the size to be increased by N times, but N can have any value. As show in the comments, N can wary between insertions, and is normally between 1.5 and 2.
memory/time efficiency of std::vector compared to built-in static and dynamic arrays?
Access speed should be nearly the same. Insertion and removal speed can't be compared because arrays have fixed size. Vectors might use more memory because, as you noticed, they can allocate memory for more elements that they actually have.
Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?
There's no need to replace fixed-size arrays with vectors, as long as static arrays are small enough to fit into the stack.
std::vector should be used instead of manually allocating arrays with new to reduce the risk of memory leaks and other bugs.
I tested a bit with std::vector and noticed that every time it needs to reallocate, it always reallocates a piece of memory that is three times larger than its previous size. I wonder if this is always the case, and why would it execute in such a way.
For context, a possible reallocation strategy for resizing filled resizable containers is to double their size as to not totally lose the O(1) insertion time complexity, because the reallocation time complexity is O(N).
The logarithmic nature of the number of reallocations makes this insertion operation not lose its tendencially O(1) time complexity, the use of multiplier 3 follows the same logic, it has its advantages, the reallocation is less frequent making it potentially less time costly, but it has the downside of being more likely to have more unused space.
Following this rule any multiplier is arguably valid, deppending on what is more important, time or space complexity, there is also the possibility of having a changing the value deppending on the size of the vector, which makes sense, smaller vectors can have larger multipliers but as they grow it's more rational to allocate space more frequently as to not have too much wasted memory.
The strategies can vary dramatically, from having fixed multipilers to having the change deppending on the container's current size, for instance, see this answer, and tests kindly shared by #JHBonarius.
In addition, how is the memory/time efficiency of std::vector compared to built-in static and dynamic arrays? Why would the book suggest always using std::vector even in a simple scenario which a simple static array could handle?
It's arguable that you should always use std::vector if you have an array that you know to always have the same size, it's perfectly fine to use std::array, however, std::vector can behave similarly to std::array if you reserve space for it and do not store more elements in it than the reserved size, so I can see the logic in tendencially using std::vector, though I disagree on the always part, it's too definitive.
Using vectors is one of the easiest and most reliable ways of storing data. when you use array, you have to define size at compile time. Also arrays size fixed you can't change it when you need. Using dynamic arrays (pointer) is always risky. Memory allocation, deallocation etc. If you are using a pointer somewhere, your eye should always be on it.

Faster alternative to push_back(size is known)

I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.
For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.
Is there a faster alternative, like a different data structure or something?
If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.
Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).
The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.
Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.
You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.
An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.
::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.
You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:
const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
vec[i] = the_value_for(i);
This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.
I have two answers for you:
As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []
So the most efficient way I would recommend is:
Initialize the vector with the 'fill'-constructor:
std::vector<float> values(172490752, 0.0f);
Set the entries directly using the access operator:
values[i] = some_float;
++i;
The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.
You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.
If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with
size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};
You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.

Is std::deque faster than std::vector for inserting at the end?

I started doing comparisons between:
inserting at the front of list
inserting at the back of a vector
inserting at the front of a deque
But then I noticed that even on push_back() the deque seemed to be faster. I must be doing something wrong, I can't believe a more general container would outperform a particular one.
My code using google benchmark:
#include "benchmark/benchmark.h"
#include <deque>
#include <vector>
#define NUM_INS 1000
static void BM_InsertVector(benchmark::State& state) {
std::vector<int> v;
v.reserve(NUM_INS);
while (state.KeepRunning()) {
state.PauseTiming();
v.clear();
state.ResumeTiming();
for (size_t i = 0; i < NUM_INS; i++)
v.push_back(i);
}
}
BENCHMARK(BM_InsertVector);
static void BM_InsertDeque(benchmark::State& state) {
std::deque<int> v;
while (state.KeepRunning()) {
state.PauseTiming();
v.clear();
state.ResumeTiming();
for (size_t i = 0; i < NUM_INS; i++)
v.push_back(i);
}
}
BENCHMARK(BM_InsertDeque);
BENCHMARK_MAIN();
Results:
Run on (1 X 2592 MHz CPU )
2016-02-18 14:03:47
Benchmark Time(ns) CPU(ns) Iterations
------------------------------------------------
BM_InsertVector 2820 2470 312500
BM_InsertDeque 1872 1563 406977
I notice some differences when playing with the number of elements, but deque always outperforms vector.
EDIT:
compiler: gcc version 5.2.1
compiling with: g++ -O3 -std=c++11 push_front.cpp -lbenchmark -lpthread
I think the -O3 is actually instrumental; when I turn it off I get a slightly worse deque performance.
There are basically 3 sources of cost involved in continuously appending elements to a dynamic container:
Memory management.
The internal bookkeeping of the container.
Any operations that need to be performed on the elements themselves. Notably; any container that invalidates references on insertion is potentially moving/copying elements around.
Let's start with 1. vector keeps asking for double the memory, and deque allocates fixed sized chunks (deque is typically implemented as an array of arrays, with the lower tier arrays being of fixed size). Asking for more memory may take longer than asking for less, but typically unless your heap is very fragmented asking for a big chunk all at once is the fastest way to get some memory. It's probably faster to allocate one meg once, then ask for a kilobyte 1000 times. So it seems clear that vector will eventually have the advantage here (until the container is so large it's affected by fragmentation). However, this isn't eventually: you asked for only 1000 elements. I wrote the following code http://coliru.stacked-crooked.com/a/418b18ff8a81c1c0. It's not very interesting but it basically uses a trivial allocator that increments a global to see how many allocations are performed.
In the course of your benchmark, vector asks for memory 11 times, and deque only 10. deque keeps asking for the same amount, vector asks for doubling amounts. As well, vector must be calling free 10 times. And deque 0. This seems like a small win for deque.
For internal bookkeeping, vector has a simpler implementation than deque. After all, vector is just a dynamic array, and deque is an array of arrays and is strictly more complex. So this is clearly a win for vector.
Finally, elements on the operations themselves. In deque, nothing is ever moved. With vector, every new heap allocation also involves moving all the elements. It's probably optimized to use memcpy for trivial types, but even see, that's 10 calls to memcpy to copy 1, 2, 4, 8 ... 512 integers. This is clearly a win for deque.
I can speculate that cranking up to O3 allowed very aggressive inlining of a lot of the more complex codepaths in deque, reducing the weight of 2. But obviously, unless you do a much more detailed (very careful!) benchmark, you'll never know for sure.
Mostly, this post is to show that it's more complex than simply a specialized container vs a more general one. I will make a prediction though (put my neck out to be cut off, as it were): if you increase the number of elements by even say a factor of 2 or 4, you will not see deque win anymore. That's because deque will make 2x or 4x as many heap allocations, but vector will only make 1-2 more.
I may as well note here that deque is actually kind of an odd data structure; it's theoretically an array of arrays but in many implementations the array is either a certain size, or just one element, whichever is larger. Also, some of it's big O guarantees are nonsense. push_back is only fixed constant time, because in C++, only operations on the elements themselves count towards the big O. Otherwise it should be clear, that since it's an array of arrays, the top level array will be proportional in size to the number of elements already stored. And eventually that top level array runs out of room, and you have to reallocate it, moving O(N) pointers. So it's not really O(1) push_back.
I think the vector is slower because you're calling clear() which, depending on your STL implementation, may be freeing the underlying array storage.
If that's the case, then your reserve() call isn't helping; and your vector is continuously resizing, which requires every element to be moved to the new, larger, storage.

Variable sized char array with minimizing calls to new?

I need a char array that will dynamically change in size. I do not know how big it can get so preallocating is not an option. It could never get any bigger than 20 bytes 1 time, the next time it may get up to 5kb...
I want the allocation to be like a std vector.
I thought of using a std vector < char > but all those push backs seem like they waste time:
strVec.clear();
for(size_t i = 0; i < varLen; ++i)
{
strVec.push_back(0);
}
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
Thanks
std::vector doesn't allocate memory every time you call push_back, but only when the size becomes bigger than the capacity
First, don't optimize until you've profiled your code and determined that there is a bottleneck. Consider the costs to readability, accessibility, and maintainability by doing something clever. Make sure any plan you take won't preclude you from working with Unicode in future. Still here? Alright.
As others have mentioned, vectors reserve more memory than they use initially, and push_back usually is very cheap.
There are cases when using push_back reallocates memory more than is necessary, however. For example, one million calls to myvector.push_back() might trigger 10 or 20 reallocations of myvector. On the other hand, inserting into a vector at its end will cause at most 1 reallocation of myvector*. I generally prefer the insertion idiom to the reserve / push_back idiom for both speed and readability reasons.
myvector.insert(myvector.end(), inputBegin, inputEnd)
If you do not know the size of your string in advance and cannot tolerate the hiccups caused by reallocations, perhaps because of hard real-time constraints, then maybe you should use a linked list. A linked list will have consistent performance at the price of much worse average performance.
If all of this isn't enough for your purposes, consider other data structures such as a rope or post back with more specifics about your case.
From Scott Meyer's Effective STL, IIRC
You can use the resize member function to add a bunch. However, I would not expect that push_back would be slow, especially if the vector's internal capacity is already non-trivial.
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
push_back isn't very slow, it just compares the size to the current capacity and reallocates if necessary. The comparison may work out to essentially zero time because of branch prediction and superscalar execution on the CPU. The reallocation is performed O(log N) times, so the vector uses up to twice as much memory as needed but time spent on reallocation seldom adds up to anything.
To insert several items at once, use insert. There are a few overloads, the only trick is that you need to explicitly pass end.
my_vec.insert( my_vec.end(), num_to_add, initial_value );
my_vec.insert( my_vec.end(), first, last ); // iterators or pointers
For the second form, you could put the values in an array first and then copy the array to the end of the vector. But this might add as much complexity as it removes. That's how it goes with micro-optimization. Only attempt to optimize if you know there's a measurable gain to be had.

are STL Containers .push_back() naughty

This might seem daft for which I'm sorry, I've been writing a bit some code for the Playstation 2 for uni. I am writing a sort of API for the Graphic Synthesizer. I am using a similar syntax to that of openGL which is a state machine.
So the input would something like
gsBegin(GS_TRIANGLE);
gsColor(...);
gsVertex3f(...);
gsVertex3f(...);
gsVertex3f(...);
gsEnd();
This is great so far for line/triangles/quads with a determined amount of vertices, however things like a LINE_STRIP or TRIANGLE_FAN take an undetermined amount of points. I have been warned off several times for using stl containers because of the push_back() method in this situation because of the time sensitive nature (is this justified).
If its not justified what would be a better way of dealing with the undetermined amount situation. Currently I have an Array that can hold 30 vertices at a time, is this best way of dealing with this kind of situation?
Vector's push_back has amortized constant time complexity because it exponentially increases the capacity of the vector. (I'm assuming you're using vector, because it's ideal for this situation.) However, in practice, rendering code is very performance sensitive, so if the push_back causes a vector reallocation, performance may suffer.
You can prevent reallocations by reserving the capacity before you add to it. If you call myvec.reserve(10);, you are guaranteed to be able to add 10 elements before the vector reallocates.
However, this still requires knowing ahead of time how many elements you need. Also, if you create and destroy lots of different vectors, you're still doing a lot of memory allocation. Instead, just use one vector for all vertices, and re-use it. Calling clear() returns it to empty while keeping its allocated capacity. This way you don't actually need to reserve anything - the first few times you use it it'll reallocate and grow, but once it reaches its peak size, it won't need to reallocate any more. The nice thing about this is the vector finds the approximate size it needs to be, and once it's "warmed up" there's no further allocation so it is high performance.
In short:
Use a single persistently stored std::vector
push_back as much as you like
When you're done, clear().
In practice this will perform as well as a C array, but without a hard limit on size.
University, eh? Just tell them push_back has amortized constant time complexity and they'll be happy.
First, avoid using glBegin / glEnd if you can, and instead use something like glDrawArrays or glDrawElements.
push_back() on a std::vector is a quick operation unless the array needs to grow in size when the operation occurs. Set the vector capacity as high as you think you will need it to be and you should see minimal overhead. 'Raw' arrays will usually always be slightly faster, but then you have to deal with using 'raw' arrays.
There is always the alternative of using a deque.
A deque is very much like a vector, contiguity apart. Basically, it's often implemented as a vector of arrays.
This means a lower allocation cost, but member access might be slightly slower (though constant) because of the double dereference, so I am unsure if it's profitable in your case.
There is also the LLVM alternative: SmallVector<T,N>, which preallocates (right in the vector) space for N elements, and will simply get back to using a traditional vector-like implementation once the size has grown too much.
The drawback to using std::vector in this kind of situation is making sure you manage your memory allocation properly. On systems like the PS2 (PS3 seems to be a bit better at this), memory allocation is insanely slow and if you don't reserve the right amount of space in the vector to begin with (and it has to resize several times when adding items), you will slow your game to a creeping crawl. If you know what your max size is going to be and reserve it when you create the vector, you won't have a problem.
That said, if this vector is going to be a temporary/local variable, you will still be reallocating memory every time your function is called. So if this function is called every frame, you will still have the performance problem. You can get around this by using a custom allocator and/or making the vector global (or a member variable to a class that will exist during your game loop).
You can always equip the container you want to use with proper allocator, which takes into account the limitations of the platform and the expected grow/shrink scenarios etc...