I have identified a bottleneck in my c++ code, and my goal is to speed it up. I am moving items from one vector to another vector if a condition is true.
In python, the pythonic way of doing this would be to use a list comprehension:
my_vector = [x for x in data_vector if x > 1]
I have hacked a way to do this in C++, and it is working fine. However, I am calling this millions of times in a while-loop and it is slow. I do not understand much about memory allocation, but I assume that my problem has to do with allocating memory over-and-over again using push_back. Is there a way to allocate my memory differently to speed up this code? (I do not know how large my_vector should be until the for-loop has completed).
std::vector<float> data_vector;
// Put a bunch of floats into data_vector
std::vector<float> my_vector;
while (some_condition_is_true) {
my_vector.clear();
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Use my_vector to render graphics on the GPU, but do not change the elements of my_vector
// Change the elements of data_vector, but not the size of data_vector
}
Use std::copy_if, and reserve data_vector.size() for my_vector initially (as this is the maximum possible number of elements for which your predicate could evaluate to true):
std::vector<int> my_vec;
my_vec.reserve(data_vec.size());
std::copy_if(data_vec.begin(), data_vec.end(), std::back_inserter(my_vec),
[](const auto& el) { return el > 1; });
Note that you could avoid the reserve call here if you expect that the number of times that your predicate evaluates to true will be much less than the size of the data_vector.
Though there are various great solutions posted by others for your query, it seems there is still no much explanation for the memory allocation, which you do not much understand, so I would like to share my knowledge about this topic with you. Hope this helps.
Firstly, in C++, there are several types of memory: stack, heap, data segment.
Stack is for local variables. There are some important features associated with it, for example, they will be automatically deallocated, operation on it is very fast, its size is OS-dependent and small such that storing some KB of data in the stack may cause an overflow of memory, et cetera.
Heap's memory can be accessed globally. As for its important features, we have, its size can be dynamically extended if needed and its size is larger(much larger than stack), operation on it is slower than stack, manual deallocation of memory is needed (in nowadays's OS, the memory will be automatically freed in the end of program), et cetera.
Data segment is for global and static variables. In fact, this piece of memory can be divided into even smaller parts, e.g. BBS.
In your case, vector is used. In fact, the elements of vector are stored into its internal dynamic array, that is an internal array with a dynamic array size. In the early C++, a dynamic array can be created on the stack memory, however, it is no longer that case. To create a dynamic array, ones have to create it on heap. Therefore, the elements of vector are stored in an internal dynamic array on heap. In fact, to dynamically increase the size of an array, a process namely memory reallocation is needed. However, if a vector user keeps enlarging his or her vector, then the overhead cost of reallocation cost will be high. To deal with it, a vector would firstly allocate a piece of memory that is larger than the current need, that is allocating memory for potential future use. Therefore, in your code, it is not that case that memory reallocation is performed every time push_back() is called. However, if the vector to be copied is quite large, the memory reserved for future use will be not enough. Then, memory allocation will occur. To tackle it, vector.reserve() may be used.
I am a newbie. Hopefully, I have not made any mistake in my sharing.
Hope this helps.
Run the code twice, first time only counting, how many new elements you will need. Then use reserve to already allocate all the memory you need.
while (some_condition_is_true) {
my_vector.clear();
int newLength = 0;
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
newLength++;
my_vector.reserve(newLength);
for (i = 0; i < data_vector.size(); i++) {
if (data_vector[i] > 1) {
my_vector.push_back(data_vector[i]);
}
}
// Do stuff with my_vector and change data_vector
}
I doubt allocating my_vector is the problem, especially if the while loop is executed many times as the capacity of my_vector should quickly become sufficient.
But to be sure you can just reserve capacity in my_vector corresponding to the size of data_vector:
my_vector.reserve(data_vector.size());
while (some_condition_is_true) {
my_vector.clear();
for (auto value : data_vector) {
if (value > 1)
my_vector.push_back(value);
}
}
If you are on Linux you can reserve memory for my_vector to prevent std::vector reallocations which is bottleneck in your case. Note that reserve will not waste memory due to overcommit, so any rough upper estimate for reserve value will fit your needs. In your case the size of data_vector will be enough. This line of code before while loop should fix the bottleneck:
my_vector.reserve(data_vector.size());
Related
Suppose I have a forever loop to create hashmap:
void createMap() {
map<int, int> mymap;
for (int i = 0; i < INT_MAX; i++) {
mymap[i] = i;
}
mymap.clear(); // <-- this line doesn't seem to make a difference in memory growth
}
int main (void) {
while (1) {
createMap();
}
return 0;
}
I watched the code run and on MacOS, watching the Activity Monitor, the application keeps growing the memory usage with or without the mymap.clear() at end of the createMap() function.
Shouldn't memory usage be constant for the case where mymap.clear() is used?
What's the general recommendation for using STL data containers? Need to .clear() before end of function?
I asked in another forum, the folks there helped me understand the answer. It turns out, I didn't wait long enough to exit createMap function nor do I have enough memory to sustain this program.
It takes INT_MAX=2147483647 elements to be created, and for each map = 24 bytes element of pair<int, int> = 8 bytes.
Total minimum memory = 2.147483647^9 * 8 + 24 = 17179869200 bytes ~= 17.2 GB.
I reduced the size of the elements and tested both with and without .clear() the program grew and reduce in size accordingly.
The container you create is bound to the scope of your function. If the function returns, its lifetime ends. And as std::map owns its data, the memory it allocates is freed upon destruction.
Your code hence constantly allocates and frees the same amount of memory. Memory consumption is hence constant, although the exact memory locations will probably differ. This also means that you should not manually call clear at the end of this function. Use clear when you want to empty a container that you intend to continue using afterwards.
As a side note, std::map is not a hash map (std::unordered_map is one).
I have data which is N by 4 which I push back data as follows.
vector<vector<int>> a;
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
N would be less than 13000. In order to prevent unnecessary reallocation, I would like to reserve 13000 by 4 spaces in advance.
After reading multiple related posts on this topic (eg How to reserve a multi-dimensional Vector?), I know the following will do the work. But I would like to do it with reserve() or any similar function if there are any, to be able to use push_back().
vector<vector<int>> a(13000,vector<int>(4);
or
vector<vector<int>> a;
a.resize(13000,vector<int>(4));
How can I just reserve memory without increasing the vector size?
If your data is guaranteed to be N x 4, you do not want to use a std::vector<std::vector<int>>, but rather something like std::vector<std::array<int, 4>>.
Why?
It's the more semantically-accurate type - std::array is designed for fixed-width contiguous sequences of data. (It also opens up the potential for more performance optimizations by the compiler, although that depends on exactly what it is that you're writing.)
Your data will be laid out contiguously in memory, rather than every one of the different vectors allocating potentially disparate heap locations.
Having said that - #pasbi's answer is correct: You can use std::vector::reserve() to allocate space for your outer vector before inserting any actual elements (both for vectors-of-vectors and for vectors-of-arrays). Also, later on, you can use the std::vector::shrink_to_fit() method if you ended up inserting a lot less than you had planned.
Finally, one other option is to use a gsl::multispan and pre-allocate memory for it (GSL is the C++ Core Guidelines Support Library).
You've already answered your own question.
There is a function vector::reserve which does exactly what you want.
vector<vector<int>> a;
a.reserve(N);
for(some loop){
...
a.push_back(vector<int>(4){val1,val2,val3,val4});
}
This will reserve memory to fit N times vector<int>. Note that the actual size of the inner vector<int> is irrelevant at this point since the data of a vector is allocated somewhere else, only a pointer and some bookkeeping is stored in the actual std::vector-class.
Note: this answer is only here for completeness in case you ever come to have a similar problem with an unknown size; keeping a std::vector<std::array<int, 4>> in your case will do perfectly fine.
To pick up on einpoklum's answer, and in case you didn't find this earlier, it is almost always a bad idea to have nested std::vectors, because of the memory layout he spoke of. Each inner vector will allocate its own chunk of data, which won't (necessarily) be contiguous with the others, which will produce cache misses.
Preferably, either:
Like already said, use an std::array if you have a fixed and known amount of elements per vector;
Or flatten your data structure by having a single std::vector<T> of size N x M.
// Assuming N = 13000, M = 4
std::vector<int> vec;
vec.reserve(13000 * 4);
Then you can access it like so:
// Before:
int& element = vec[nIndex][mIndex];
// After:
int& element = vec[mIndex * 13000 + nIndex]; // Still assuming N = 13000
I have a piece of code that creates thousand of objects, and appends them to a vector.
The following code is just an example of what is being done, even though the constructor has some parameters, and the for does not actually have that condition, but it serves the purpose of showing that it runs thousands of times.
vector<VolumeInformation*> vector = vector<VolumeInformation*>();
for (int i = 0; i < 5000; ++i) {
VolumeInformation* info = new VolumeInformation();
vector.push_back(info);
}
The code takes a lot of time to run, and I was trying to find a faster way of creating all the objects. I read about block allocators, but I am unsure if this is really meant for what I am trying to do, and if it really helps on getting this done faster. I would want to allocate memory for a thousand objects (for example), and keep on using that memory while it is still available, and then allocate some more when needed, avoiding having to allocate memory for a single object every time. Can this be done? Can you point me to somewhere where I can find an example on how to tell 'new' to use the previously allocated memory? If not for the objects itself, can the allocator be used for the memory of the vector (even though the object is what really needs speeding up)?
Thank you.
** UPDATE **
After all the answers and comments, I decided making a change in the code, so the vector would store the objects instead of the pointers, so I could use reserve to pre-allocate some memory for the vector, allowing to save some time by allocating memory for several object instances at once. Although, after doing some performance benchmark, I verify that the change I made is performing much worse, unless I know, ahead of time, the exact size of the vector. Here are my findings, I was wondering if someone could shed light into this, letting me know why this happens, if I am missing something here, or if the approach I was using before is really the best one.
Here is the code I used for benchmarking:
vector<int> v = vector<int>();
v.push_back(1);
v.push_back(3);
v.push_back(4);
v.push_back(5);
v.push_back(7);
v.push_back(9);
int testAmount = 200000;
int reserve = 500000;
Stopwatch w = Stopwatch();
w = Stopwatch();
vector<VolumeInformation> infos = vector<VolumeInformation>();
infos.reserve(reserve);
for (int i = 0; i < testAmount; ++i) {
infos.emplace_back(&v, 1, 0, 0);
}
int elapsed = w.Elapsed();
w = Stopwatch();
vector<VolumeInformation*> infoPointers = vector<VolumeInformation*>();
infoPointers.reserve(reserve);
for (int i = 0; i < testAmount; ++i) {
infoPointers.emplace_back(new VolumeInformation(&v, 1, 0, 0));
}
int elapsed2 = w.Elapsed();
If I comment out both reserve() lines, the version without pointers takes 32.701 seconds, while the pointer version takes 6.159! It takes 5+ times less than using a vector of objects.
If I use reserve, but set the amount of items to reserve to a value lower than the number of iterations, the vector of objects version still takes more time than the pointer version.
If I use reserve with a value higher or equal to the amount of iterations, the vector of objects version becomes a lot faster, taking only 270ms, against 8.901 seconds of the pointer version. The main issue here is that I do not know in advance the size that the vector will reach, as the iterations are not based in a hardcoded number, this was only to do the benchmarking.
Can someone explain why this happens, if there is another way around this, or if I am making anything wrong here?
vector is perfectly capable of pre-allocating a large block and using it for all the elements, if you just use it correctly:
// create 5000 default-constructed X objects
std::vector<X> v(5000);
Or if you need to pass constructor arguments:
std::vector<X> v;
v.reserve(5000); // allocate block of memory for 5000 objects
for (int i=0 ; i < v.size(); ++i)
v.emplace_back(arg1, arg2, i % 2 ? arg3 : arg4);
The last line constructs an X in the pre-allocated memory, with no copying, passing the function arguments to the X constructor.
I would want to allocate memory for a thousand objects (for example), and keep on using that memory while it is still available, and then allocate some more when needed, avoiding having to allocate memory for a single object every time.
std::vector does that automatically, you should probably stop using new and just have a vector<VolumeInformation> and put objects into it directly, instead of allocating individual objects and storing pointers to them.
Memory allocation is slow (see Why should C++ programmers minimize use of 'new'?), so stop allocating individual objects. Both the examples above will do 1 allocation, and 5000 constructor calls. Your original code does at least 5001 allocations and 5000 constructor calls (in typical C++ implementations it would do 5013 allocations and 5000 constructor calls).
** UPDATE **
If I comment out both reserve() lines, the version without pointers takes 32.701 seconds, while the pointer version takes 6.159! It takes 5+ times less than using a vector of objects.
Since you haven't actually shown a complete working program you're asking people to guess (always show the actual code!) but it suggests your class has a very slow copy constructor, which is used when the vector grows and the existing elements need to be copied over to the new memory (and the old elements are then destroyed).
If you can add a noexcept move constructor that is more efficient than the copy constructor then std::vector will use that when the vector needs to grow and will run much faster.
The main issue here is that I do not know in advance the size that the vector will reach, as the iterations are not based in a hardcoded number, this was only to do the benchmarking.
You could just reserve more elements than you are ever likely to need, trading higher memory usage for better performance.
You probably want to reserve space for your 5000 elements ahead of the loop:
vector.reserve(5000);
for (int i = 0; i < 5000; ++i) {
VolumeInformation info = new VolumeInformation();
vector.push_back(info);
}
this could save time by eliminating severals resizes as vector grows and if VolumeInformation costs a lot (in time) to copy.
I have a long array of data (n entities). Every object in this array has some values (let's say, m values for an object). And I have a cycle like:
myType* A;
// reading the array of objects
std::vector<anotherType> targetArray;
int i, j, k = 0;
for (i = 0; i < n; i++)
for (j = 0; j < m; j++)
{
if (check((A[i].fields[j]))
{
// creating and adding the object to targetArray
targetArray[k] = someGenerator(A[i].fields[j]);
k++;
}
}
In some cases I have n * m valid objects, in some (n * m) /10 or less.
The question is how do I allocate a memory for targetArray?
targetArray.reserve(n*m);
// Do work
targetArray.shrink_to_fit();
Count the elements without generating objects, and then allocate as much memory as I need and go with cycle one more time.
Resize the array on every iteration where new objects are being created.
I see a huge tactical mistake in each of my methods. Is another way to do it?
What you are doing here is called premature optimization. By default, std::vector will exponentially increase its memory footprint as it runs out of memory to store new objects. For example, a first push_back will allocate 2 elements. The third push_back will double the size etc. Just stick with push_back and get your code working.
You should start thinking about memory allocation optimization only when the above approach proves itself as a bottleneck in your design. If that ever happens, I think the best bet would be to come up with a good approximation for a number of valid objects and just call reserve() on a vector. Something like your first approach. Just make sure your shrink to fit implementation is correct because vectors don't like to shrink. You have to use swap.
Resizing array on every step is no good and std::vector won't really do it unless you try hard.
Doing an extra cycle through the list of objects can help, but it may also hurt as you could easily waste CPU cycles, bloat CPU cache etc. If in doubt - profile it.
The typical way would be to use targetArray.push_back(). This reallocates the memory when needed and avoids two passes through your data. It has a system for reallocating the memory that makes it pretty efficient, doing fewer reallocations as the vector gets larger.
However, if your check() function is very fast, you might get better performance by going through the data twice, determining how much memory you need and making your vector the right size to begin with. I would only do this if profiling has determined it is really necessary though.
What is the benefit of using reserve when dealing with vectors. When should I use them? Couldn't find a clear cut answer on this but I assume it is faster when you reserve in advance before using them.
What say you people smarter than I?
It's useful if you have an idea how many elements the vector will ultimately hold - it can help the vector avoid repeatedly allocating memory (and having to move the data to the new memory).
In general it's probably a potential optimization that you shouldn't need to worry about, but it's not harmful either (at worst you end up wasting memory if you over estimate).
One area where it can be more than an optimization is when you want to ensure that existing iterators do not get invalidated by adding new elements.
For example, a push_back() call may invalidate existing iterators to the vector (if a reallocation occurs). However if you've reserved enough elements you can ensure that the reallocation will not occur. This is a technique that doesn't need to be used very often though.
It can be ... especially if you are going to be adding a lot of elements to you vector over time, and you want to avoid the automatic memory expansion that the container will make when it runs out of available slots.
For instance, back-insertions (i.e., std::vector::push_back) are considered an ammortized O(1) or constant-time process, but that is because if an insertion at the back of a vector is made, and the vector is out of space, it must then reallocate memory for a new array of elements, copy the old elements into the new array, and then it can copy the element you were trying to insert into the container. That process is O(N), or linear-time complexity, and for a large vector, could take quite a bit of time. Using the reserve() method allows you to pre-allocate memory for the vector if you know it's going to be at least some certain size, and avoid reallocating memory every time space runs out, especially if you are going to be doing back-insertions inside some performance-critical code where you want to make sure that the time to-do the insertion remains an actual O(1) complexity-process, and doesn't incurr some hidden memory reallocation for the array. Granted, your copy constructor would have to be O(1) complexity as well to get true O(1) complexity for the entire back-insertion process, but in regards to the actual algorithm for back-insertion into the vector by the container itself, you can keep it a known complexity if the memory for the slot is already pre-allocated.
This excellent article deeply explains differences between deque and vector containers. Section "Experiment 2" shows the benefits of vector::reserve().
If you know the eventual size of the vector then reserve is worth using.
Otherwise whenever the vector runs out of internal room it will re-size the buffer. This usually involves doubling (or 1.5 * current size) the size of the internal buffer (can be expensive if you do this a lot).
The real expensive bit is invoking the copy constructor on each element to copy it from the old buffer to the new buffer, followed by calling the destructor on each element in the old buffer.
If the copy constructor is expensive then it can be a problem.
Faster and saves memory
If you push_back another element, then a full vector will typically allocate double the memory it's currently using - since allocate + copy is expensive
Don't know about people smarter than you, but I would say that you should call reserve in advance if you are going to perform lots in insertion operations and you already know or can estimate the total number of elements, at least the order of magnitude. It can save you a lot of reallocations in good circumstances.
Although its an old question, Here is my implementation for the differences.
#include <iostream>
#include <chrono>
#include <vector>
using namespace std;
int main(){
vector<int> v1;
chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v1.push_back(1);
}
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_first = chrono::duration_cast<chrono::duration<double>>(t2-t1);
cout << "Time for 1000000 insertion without reserve: " << time_first.count() * 1000 << " miliseconds." << endl;
vector<int> v2;
v2.reserve(1000000);
chrono::steady_clock::time_point t3 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v2.push_back(1);
}
chrono::steady_clock::time_point t4 = chrono::steady_clock::now();
chrono::duration<double> time_second = chrono::duration_cast<chrono::duration<double>>(t4-t3);
cout << "Time for 1000000 insertion with reserve: " << time_second.count() * 1000 << " miliseconds." << endl;
return 0;
}
When you compile and run this program, it outputs:
Time for 1000000 insertion without reserve: 24.5573 miliseconds.
Time for 1000000 insertion with reserve: 17.1771 miliseconds.
Seems to be some improvement with reserve, but not that too much improvement. I think it will be more improvement for complex objects, I am not sure. Any suggestions, changes and comments are welcome.
It's always interesting to know the final total needed space before to request any space from the system, so you just require space once. In other cases the system may have to move you in a larger free zone (it's optimized but not always a free operation because a whole data copy is required). Even the compiler will try to help you, but the best is to to tell what you know (to reserve the total space required by your process). That's what i think. Greetings.
There is one more advantage of reserve that is not much related to performance but instead to code style and code cleanliness.
Imagine I want to create a vector by iterating over another vector of objects. Something like the following:
std::vector<int> result;
for (const auto& object : objects) {
result.push_back(object.foo());
}
Now, apparently the size of result is going to be the same as objects.size() and I decide to pre-define the size of result.
The simplest way to do it is in the constructor.
std::vector<int> result(objects.size());
But now the rest of my code is invalidated because the size of result is not 0 anymore; it is objects.size(). The subsequent push_back calls are going to increase the size of the vector. So, to correct this mistake, I now have to change how I construct my for-loop. I have to use indices and overwrite the corresponding memory locations.
std::vector<int> result(objects.size());
for (int i = 0; i < objects.size(); ++i) {
result[i] = objects[i].foo();
}
And I don't like it. Indices are everywhere in the code. This is also more vulnerable to making accidental copies because of the [] operator. This example uses integers and directly assigns values to result[i], but in a more complex for-loop with complex data structures, it could be relevant.
Coming back to the main topic, it is very easy to adjust the first code by using reserve. reserve does not change the size of the vector but only the capacity. Hence, I can leave my nice for loop as it is.
std::vector<int> result;
result.reserve(objects.size());
for (const auto& object : objects) {
result.push_back(object.foo());
}