C++ - STL vector question - c++

Is there any way to make std::vector faster on reserving + resizing?
I would like to achieve the performance which would be somewhat equivalent to plain C arrays.
See the following code snippets:
TEST(test, vector1) {
for (int i = 0; i < 50; ++i) {
std::vector<int> a;
a.reserve(10000000);
a.resize(10000000);
}
}
TEST(test, vector2) {
for (int i = 0; i < 50; ++i) {
std::vector<int> a(10000000);
}
}
TEST(test, carray) {
for (int i = 0; i < 50; ++i) {
int* new_a = new int[10000000];
delete[] new_a;
}
}
First two tests are two times slower (4095 ms vs 2101 ms) and, obviously, that happens because std::vector is nulling the elements in it. Any ideas on how this could be avoided?
Or probably there is some standard (boost?) container that implements a fixed-size and heap-based array?
Thank you

Well naturally the first 2 tests are slower. They explicitly go through the entire vector and call "int()" on each element. Edit: This has the effect of setting all the elements to "0".
Just try reserving.
There is some very relevant info to your question in this question i asked a while back:
std::vector reserve() and push_back() is faster than resize() and array index, why?

There's boost::array.

Were your tests performed in debug or release mode? I know the microsoft compiler adds a lot of debug checks that can really slow down performance.

Maybe you could use a boost::scoped_array, but if this really is that performance critical, maybe you should try putting the initialization/allocation outside the innermost loop somehow?

I'm going to give you the benefit of the doubt and assume you've already done some profiling and determined the use of vector in this fashion to be a hotspot. If not, it's a bit premature to consider the differences unless you're working at a very tight, small-scale application where every clock cycle counts in which case it's even easier to use a profiler and there's just as much of a reason to do so.
boost::scoped_array is one solution. There's no way to get vector to not initialize the elements it stores. Another one is std::deque if you don't need a contiguous memory block. deque can be significantly faster than vector or a dynamically-allocated array with the same number of elements as it creates as it creates smaller memory blocks which operating systems tend to deal with better along with being cache-friendly.

Related

C++ Block Allocator for creating new objects faster

I have a piece of code that creates thousand of objects, and appends them to a vector.
The following code is just an example of what is being done, even though the constructor has some parameters, and the for does not actually have that condition, but it serves the purpose of showing that it runs thousands of times.
vector<VolumeInformation*> vector = vector<VolumeInformation*>();
for (int i = 0; i < 5000; ++i) {
VolumeInformation* info = new VolumeInformation();
vector.push_back(info);
}
The code takes a lot of time to run, and I was trying to find a faster way of creating all the objects. I read about block allocators, but I am unsure if this is really meant for what I am trying to do, and if it really helps on getting this done faster. I would want to allocate memory for a thousand objects (for example), and keep on using that memory while it is still available, and then allocate some more when needed, avoiding having to allocate memory for a single object every time. Can this be done? Can you point me to somewhere where I can find an example on how to tell 'new' to use the previously allocated memory? If not for the objects itself, can the allocator be used for the memory of the vector (even though the object is what really needs speeding up)?
Thank you.
** UPDATE **
After all the answers and comments, I decided making a change in the code, so the vector would store the objects instead of the pointers, so I could use reserve to pre-allocate some memory for the vector, allowing to save some time by allocating memory for several object instances at once. Although, after doing some performance benchmark, I verify that the change I made is performing much worse, unless I know, ahead of time, the exact size of the vector. Here are my findings, I was wondering if someone could shed light into this, letting me know why this happens, if I am missing something here, or if the approach I was using before is really the best one.
Here is the code I used for benchmarking:
vector<int> v = vector<int>();
v.push_back(1);
v.push_back(3);
v.push_back(4);
v.push_back(5);
v.push_back(7);
v.push_back(9);
int testAmount = 200000;
int reserve = 500000;
Stopwatch w = Stopwatch();
w = Stopwatch();
vector<VolumeInformation> infos = vector<VolumeInformation>();
infos.reserve(reserve);
for (int i = 0; i < testAmount; ++i) {
infos.emplace_back(&v, 1, 0, 0);
}
int elapsed = w.Elapsed();
w = Stopwatch();
vector<VolumeInformation*> infoPointers = vector<VolumeInformation*>();
infoPointers.reserve(reserve);
for (int i = 0; i < testAmount; ++i) {
infoPointers.emplace_back(new VolumeInformation(&v, 1, 0, 0));
}
int elapsed2 = w.Elapsed();
If I comment out both reserve() lines, the version without pointers takes 32.701 seconds, while the pointer version takes 6.159! It takes 5+ times less than using a vector of objects.
If I use reserve, but set the amount of items to reserve to a value lower than the number of iterations, the vector of objects version still takes more time than the pointer version.
If I use reserve with a value higher or equal to the amount of iterations, the vector of objects version becomes a lot faster, taking only 270ms, against 8.901 seconds of the pointer version. The main issue here is that I do not know in advance the size that the vector will reach, as the iterations are not based in a hardcoded number, this was only to do the benchmarking.
Can someone explain why this happens, if there is another way around this, or if I am making anything wrong here?
vector is perfectly capable of pre-allocating a large block and using it for all the elements, if you just use it correctly:
// create 5000 default-constructed X objects
std::vector<X> v(5000);
Or if you need to pass constructor arguments:
std::vector<X> v;
v.reserve(5000); // allocate block of memory for 5000 objects
for (int i=0 ; i < v.size(); ++i)
v.emplace_back(arg1, arg2, i % 2 ? arg3 : arg4);
The last line constructs an X in the pre-allocated memory, with no copying, passing the function arguments to the X constructor.
I would want to allocate memory for a thousand objects (for example), and keep on using that memory while it is still available, and then allocate some more when needed, avoiding having to allocate memory for a single object every time.
std::vector does that automatically, you should probably stop using new and just have a vector<VolumeInformation> and put objects into it directly, instead of allocating individual objects and storing pointers to them.
Memory allocation is slow (see Why should C++ programmers minimize use of 'new'?), so stop allocating individual objects. Both the examples above will do 1 allocation, and 5000 constructor calls. Your original code does at least 5001 allocations and 5000 constructor calls (in typical C++ implementations it would do 5013 allocations and 5000 constructor calls).
** UPDATE **
If I comment out both reserve() lines, the version without pointers takes 32.701 seconds, while the pointer version takes 6.159! It takes 5+ times less than using a vector of objects.
Since you haven't actually shown a complete working program you're asking people to guess (always show the actual code!) but it suggests your class has a very slow copy constructor, which is used when the vector grows and the existing elements need to be copied over to the new memory (and the old elements are then destroyed).
If you can add a noexcept move constructor that is more efficient than the copy constructor then std::vector will use that when the vector needs to grow and will run much faster.
The main issue here is that I do not know in advance the size that the vector will reach, as the iterations are not based in a hardcoded number, this was only to do the benchmarking.
You could just reserve more elements than you are ever likely to need, trading higher memory usage for better performance.
You probably want to reserve space for your 5000 elements ahead of the loop:
vector.reserve(5000);
for (int i = 0; i < 5000; ++i) {
VolumeInformation info = new VolumeInformation();
vector.push_back(info);
}
this could save time by eliminating severals resizes as vector grows and if VolumeInformation costs a lot (in time) to copy.

Why Maintaining Sorted Array is faster than Vector in C++

I am creating an array and vector of size 100 and generating a random value and trying to maintain both array and vector as sorted.
Here is my code for the same
vector<int> myVector;
int arr[SIZE];
clock_t start, finish;
int random;
for(int i=0; i<SIZE;i++)
{
myVector.push_back(0);
arr[i] = 0;
}
//testing for Array
start = clock();
for(int i=0; i<MAX;++i)
{
random = getRandom(); //returns rand() % 100
for(int j=0; j<SIZE;++j){
if(random > arr[j])
{
for(int k = SIZE - 1; k > j ; --k)
{
arr[k] = arr[k-1];
}
arr[j] = random;
break;
}
}
}
finish = clock();
cout << "Array Time " << finish - start << endl;
//Vector Processing
start = clock();
for(int i=0; i<MAX;++i)
{
random = getRandom(); //returns rand() % 100
for(int j=0; j<SIZE;++j){
if(random > myVector[j])
{
for(int k = SIZE - 1; k > j ; --k)
{
myVector[k] = myVector[k-1];
}
myVector[j] = random;
break;
}
}
}
finish = clock();
cout << "Vector Time " << finish - start << endl;
The output is as follows:
Array Time : 5
Vector Time: 83
I am not able to understand why vector is so slow compared to array in this case?
Doesn't this contradict the thumb-rule of preferring Vector over Array.
Please Help !
First of all: Many rules of thumb in programming are not about ganing some milliseconds in performance, but about managing complexity, therefore avoiding bugs. In this case, it's about performing range checks wich most vector implementations do in debug mode, and wich arrays don't. It's also about memory management for dynamic arrays - vector does manage it's memory itself, while you have to do it manually in arrays at the risk of introducing memory leaks (ever forgot a delete[] or used delete instead? I be you have!). And it's about ease of use, e.g. resizing the vector or inserting element in the middle, wich is tedious work with manually managed arrays.
In other words, performance measurements can never ever contradict a rule of thumb, because a rule of thumb never targets performance. Performance measurements can only be one of the few possible reasons to not obey a coding guideline.
At first sight I'd guess you have not enabled optimizations. The main source of performance loss for the vector would then be index checks that many vector implementations have enabled for debug builds. Those won't kick in in optimized builds, so that should be your first concern. Rule of thumb: performance measurements without optimizations enabled are meaningless
If enabling optimizations still does show a better performance for the array, there's another difference:
The array is stored on the stack, so the compiler can directly use the adrresses and calculate address offsets at compiletime, while the vector elements are stored on the heap and the compiler will have to dereference the pointer stored in the vector. I'd expect the optimizer to dereference the pointer once and calculate the address offsets from that point on. Still, there might be a small performance penalty compared to compiletime-calculated address offsets, especially if the optimizer can unroll the loop a bit. This still does not contradict the rule of thumb, because you are comparing apples with pears here. The rule of thumb says,
Prefer std::vector over dynamic arrays, and prefer std::array over fixed arrays.
So either use a dynamically allocated array (including some kind of delete[], please) or compare the fixed size array to a std::array. In C++14, you'll have to consider new candidates in the game, namely std::dynarray and C++14 VLAs, non-resizable, runtime length arrays comparable to C's VLAs.
Update:
As was pointed out in the comments, optimizers are good at identifying code that has no side effects, like the operations on the array that you never read from. std::vector implementations are complicated enough that optimizers typically won't see through those several layers of indirection and optimize away all the insert, so you'll get zero time for the array compared to some time for the vector. Reading the array contens after the loop will disable such rude optimizations.
The vector class has to dynamically grow the memory, that may involve copying the whole thing from time to time.
Also it has to call internal functions for many operations - like reallocating.
Also it may have security functionality like boundary checks.
Meanwhile your array is preallocated and all your operations propably do not call any internal functions.
That is the overhead price for more functionality.
And who said that vectors should be faster than arrays in all cases?
Your array does not need to grow, thats a special case where arrays are indeed faster!
Because arrays are native data types, whereas the compiler can manipulate it directly from memory, they are managed internally by the compiled exec.
On the other hand, you get vector that is more like a class, template as I read, and it needs some management going through another header files and libraries.
Essentially native data type can be managed withouth including any headers, which make them easier to manipulate from the program, without having to use external code. Which makes the overhead on the vector time is the need for the program to look through the code and use the methods related to vector data type.
Every time you need to add more code to your app and operate from it, it will make your app performance to drop
You can read about it, here, here and here

define a large array in for-loop would compromise the performance?

Code looks like this:
for (int i = 0; i <= LARGE_NUMBER; ++i) {
int x[LARGE_NUMBER] = {0};
// do something with x
}
I think array x will be created each time when for-loop wades thru 0~LARGE_NUMBER, so
this will compromise the performance? Will -O2 do some help?
Your array will be zeroed in each iteration, so definitely it will.
This code is linear time, every element of the array will be zero-initialized:
int x[LARGE_NUMBER] = {0};
And this is constant time, just increment of the stack pointer:
int x[LARGE_NUMBER];
Performance will depend on whether LARGE_NUMBER is really large or not. If LARGE_NUMBER is of size of one or two cache lines than you won't notice difference between first and second version. But if LARGE_NUMBER is really large - you will. But if your array is so large than performance difference is noticeable - than you definitely need to move it to heap. Stack space is expensive and allocation of megabytes of data in it is wrong.
If your array is really large, you can allocate it on the heap once and call memset between iteration.
How is LARGE_NUMBER expected to be?
Consider that an on-stack allocation of a wide object can result wider than the stack space the system can give to a thread, and you are probably facing an "out of memory" problem even before performance can start. (The stack need to be fast, and hence in no more than few Megabytes: Ideally it has to fit the processor cache)
If that's the case a std::vector (that leave in the stack, but manages allocation from the heap) play better.
But defining it inside, makes it to be created / destroyed upon every iteration. Now: does those creation / destruction make sense (I mean: do they take some action that make sense to be repeated every time) or your problem is just to initialize to zero on every iteration? If that's the case, I wold probably do something like:
{ //just begin a scope block
std::vector<int> v(LARGE_NUMBER); //allocates LARGE_NUMBER int-s on heap
for (int i = 0; i <= LARGE_NUMBER; ++i)
{
std::fill(v.begin(), v.end(), 0); //reset at every iteration
// other stuff with v
}
} //here the vector and associated memory is finally eliminated
Note that the performance of std:fill is linear just like the initialization of an array, but you avoid that way to allocate/deallocate at every cycle.
In any case, your problem has O2 complexity, by definition.
Depends on your application... i assume that you have a fixed array size? It will use: http://www.cplusplus.com/reference/cstring/memset/
#include <stdio.h>
#include <string.h>
int* x = new int[LARGE_NUMBER];
for (int i = 0; i <= LARGE_NUMBER; ++i) {
memset(x,0,LARGE_NUMBER);
// do something with x
}
//some more stuff that needs x
delete[] x;
BTW: I have no C/C++ at hand to test the code.

The fastest way to populate std::vector of unknown size

I have a long array of data (n entities). Every object in this array has some values (let's say, m values for an object). And I have a cycle like:
myType* A;
// reading the array of objects
std::vector<anotherType> targetArray;
int i, j, k = 0;
for (i = 0; i < n; i++)
for (j = 0; j < m; j++)
{
if (check((A[i].fields[j]))
{
// creating and adding the object to targetArray
targetArray[k] = someGenerator(A[i].fields[j]);
k++;
}
}
In some cases I have n * m valid objects, in some (n * m) /10 or less.
The question is how do I allocate a memory for targetArray?
targetArray.reserve(n*m);
// Do work
targetArray.shrink_to_fit();
Count the elements without generating objects, and then allocate as much memory as I need and go with cycle one more time.
Resize the array on every iteration where new objects are being created.
I see a huge tactical mistake in each of my methods. Is another way to do it?
What you are doing here is called premature optimization. By default, std::vector will exponentially increase its memory footprint as it runs out of memory to store new objects. For example, a first push_back will allocate 2 elements. The third push_back will double the size etc. Just stick with push_back and get your code working.
You should start thinking about memory allocation optimization only when the above approach proves itself as a bottleneck in your design. If that ever happens, I think the best bet would be to come up with a good approximation for a number of valid objects and just call reserve() on a vector. Something like your first approach. Just make sure your shrink to fit implementation is correct because vectors don't like to shrink. You have to use swap.
Resizing array on every step is no good and std::vector won't really do it unless you try hard.
Doing an extra cycle through the list of objects can help, but it may also hurt as you could easily waste CPU cycles, bloat CPU cache etc. If in doubt - profile it.
The typical way would be to use targetArray.push_back(). This reallocates the memory when needed and avoids two passes through your data. It has a system for reallocating the memory that makes it pretty efficient, doing fewer reallocations as the vector gets larger.
However, if your check() function is very fast, you might get better performance by going through the data twice, determining how much memory you need and making your vector the right size to begin with. I would only do this if profiling has determined it is really necessary though.

Benefits of using reserve() in a vector - C++

What is the benefit of using reserve when dealing with vectors. When should I use them? Couldn't find a clear cut answer on this but I assume it is faster when you reserve in advance before using them.
What say you people smarter than I?
It's useful if you have an idea how many elements the vector will ultimately hold - it can help the vector avoid repeatedly allocating memory (and having to move the data to the new memory).
In general it's probably a potential optimization that you shouldn't need to worry about, but it's not harmful either (at worst you end up wasting memory if you over estimate).
One area where it can be more than an optimization is when you want to ensure that existing iterators do not get invalidated by adding new elements.
For example, a push_back() call may invalidate existing iterators to the vector (if a reallocation occurs). However if you've reserved enough elements you can ensure that the reallocation will not occur. This is a technique that doesn't need to be used very often though.
It can be ... especially if you are going to be adding a lot of elements to you vector over time, and you want to avoid the automatic memory expansion that the container will make when it runs out of available slots.
For instance, back-insertions (i.e., std::vector::push_back) are considered an ammortized O(1) or constant-time process, but that is because if an insertion at the back of a vector is made, and the vector is out of space, it must then reallocate memory for a new array of elements, copy the old elements into the new array, and then it can copy the element you were trying to insert into the container. That process is O(N), or linear-time complexity, and for a large vector, could take quite a bit of time. Using the reserve() method allows you to pre-allocate memory for the vector if you know it's going to be at least some certain size, and avoid reallocating memory every time space runs out, especially if you are going to be doing back-insertions inside some performance-critical code where you want to make sure that the time to-do the insertion remains an actual O(1) complexity-process, and doesn't incurr some hidden memory reallocation for the array. Granted, your copy constructor would have to be O(1) complexity as well to get true O(1) complexity for the entire back-insertion process, but in regards to the actual algorithm for back-insertion into the vector by the container itself, you can keep it a known complexity if the memory for the slot is already pre-allocated.
This excellent article deeply explains differences between deque and vector containers. Section "Experiment 2" shows the benefits of vector::reserve().
If you know the eventual size of the vector then reserve is worth using.
Otherwise whenever the vector runs out of internal room it will re-size the buffer. This usually involves doubling (or 1.5 * current size) the size of the internal buffer (can be expensive if you do this a lot).
The real expensive bit is invoking the copy constructor on each element to copy it from the old buffer to the new buffer, followed by calling the destructor on each element in the old buffer.
If the copy constructor is expensive then it can be a problem.
Faster and saves memory
If you push_back another element, then a full vector will typically allocate double the memory it's currently using - since allocate + copy is expensive
Don't know about people smarter than you, but I would say that you should call reserve in advance if you are going to perform lots in insertion operations and you already know or can estimate the total number of elements, at least the order of magnitude. It can save you a lot of reallocations in good circumstances.
Although its an old question, Here is my implementation for the differences.
#include <iostream>
#include <chrono>
#include <vector>
using namespace std;
int main(){
vector<int> v1;
chrono::steady_clock::time_point t1 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v1.push_back(1);
}
chrono::steady_clock::time_point t2 = chrono::steady_clock::now();
chrono::duration<double> time_first = chrono::duration_cast<chrono::duration<double>>(t2-t1);
cout << "Time for 1000000 insertion without reserve: " << time_first.count() * 1000 << " miliseconds." << endl;
vector<int> v2;
v2.reserve(1000000);
chrono::steady_clock::time_point t3 = chrono::steady_clock::now();
for(int i = 0; i < 1000000; ++i){
v2.push_back(1);
}
chrono::steady_clock::time_point t4 = chrono::steady_clock::now();
chrono::duration<double> time_second = chrono::duration_cast<chrono::duration<double>>(t4-t3);
cout << "Time for 1000000 insertion with reserve: " << time_second.count() * 1000 << " miliseconds." << endl;
return 0;
}
When you compile and run this program, it outputs:
Time for 1000000 insertion without reserve: 24.5573 miliseconds.
Time for 1000000 insertion with reserve: 17.1771 miliseconds.
Seems to be some improvement with reserve, but not that too much improvement. I think it will be more improvement for complex objects, I am not sure. Any suggestions, changes and comments are welcome.
It's always interesting to know the final total needed space before to request any space from the system, so you just require space once. In other cases the system may have to move you in a larger free zone (it's optimized but not always a free operation because a whole data copy is required). Even the compiler will try to help you, but the best is to to tell what you know (to reserve the total space required by your process). That's what i think. Greetings.
There is one more advantage of reserve that is not much related to performance but instead to code style and code cleanliness.
Imagine I want to create a vector by iterating over another vector of objects. Something like the following:
std::vector<int> result;
for (const auto& object : objects) {
result.push_back(object.foo());
}
Now, apparently the size of result is going to be the same as objects.size() and I decide to pre-define the size of result.
The simplest way to do it is in the constructor.
std::vector<int> result(objects.size());
But now the rest of my code is invalidated because the size of result is not 0 anymore; it is objects.size(). The subsequent push_back calls are going to increase the size of the vector. So, to correct this mistake, I now have to change how I construct my for-loop. I have to use indices and overwrite the corresponding memory locations.
std::vector<int> result(objects.size());
for (int i = 0; i < objects.size(); ++i) {
result[i] = objects[i].foo();
}
And I don't like it. Indices are everywhere in the code. This is also more vulnerable to making accidental copies because of the [] operator. This example uses integers and directly assigns values to result[i], but in a more complex for-loop with complex data structures, it could be relevant.
Coming back to the main topic, it is very easy to adjust the first code by using reserve. reserve does not change the size of the vector but only the capacity. Hence, I can leave my nice for loop as it is.
std::vector<int> result;
result.reserve(objects.size());
for (const auto& object : objects) {
result.push_back(object.foo());
}