How best to fill a vector of vectors (avoiding wasting memory and unnecessary allocations & de-allocations)? - c++

I want to fill a vector of vectors when individual vectors can have different size(), e.g.
std::vector<std::vector<big_data_type> > table;
std::vector<big_data_type> tmp;
for(auto i=0; i!=4242; ++i) {
tmp = make_vector(i); // copy elison; calls new[] only for i=0
table.push_back(tmp); // copy calls new[] each time
}
My main issue is to avoid wasting memory on unused capacity. So my first question is:
Q1 Will the copy (made inside push_back) have capacity() == size() (what I want), or preserve whatever tmp had, or is this implementation dependent / undefined?
I was considering to move the individual vectors into the table
table.push_back(std::move(tmp)); // move
but that would surely preserve the capacity and hence waste memory. Moreover, this doesn't avoid the allocation of each individual vector, it only moves it into another place (inside make_vector instead of push_back).
Q2 I was wondering what difference it makes to omit the variable tmp, resulting in the more elegant looking code (2 instead of 5 lines):
for(auto i=0; i!=4242; ++i)
table.push_back(make_vector(i)); // move!
My initial thought is that this will construct and destruct another temporary at each iteration and hence generate many calls to new[] and delete[] (which will essentially re-use the same memory). However, in addition this will call the moving version of push_back and hence waste memory (see above). Correct?
Q3 Is it possible that the compiler "optimizes" my former code into this latter form and thus uses moving instead of copying (resulting in wasting memory)?
Q4 If I'm correct, it seems to me that all this implies that moving data automatically for temporary objects is a mixed blessing (as it prevents compacting). Is there are any way to explicitly suppress moving in the last code snipped, i.e. something like
for(auto i=0; i!=4242; ++i)
table.push_back(std::copy(make_vector(i))); // don't move!

Q1 Will the copy (made inside push_back) have capacity() == size() (what I want), or preserve whatever tmp had, or is this implementation dependent / undefined?
The standard never sets maximums for capacity, only minimums. That said, most implementations will have capacity() == size() for a fresh vector copy or capacity slightly rounded up to the blocksize of the allocator implementation.
Q2 I was wondering what difference it makes to omit the variable tmp, resulting in the more elegant looking code.
The result is to move into table instead of copying.
Q3 Is it possible that the compiler "optimizes" my former code into this latter form and thus uses moving instead of copying (resulting in wasting memory)?
It's possible but very unlikely. The compiler would have to prove that moving isn't observably different from copying, which is challenging enough that to my knowledge no current compiler tries.
Q4 If I'm correct, it seems to me that all this implies that moving data automatically for temporary objects is a mixed blessing (as it prevents compacting).
Moving is a speed optimization, not necessarily a space optimization. Copying may reduce the space, but it definitely will increase the processing time.
If you want to optimize space, your best bet is to use shrink_to_fit:
std::vector<std::vector<big_data_type> > table;
for(auto i=0; i!=4242; ++i) {
std::vector<big_data_type> tmp = make_vector(i); // copy elison
tmp.shrink_to_fit(); // shrink
table.push_back(std::move(tmp)); // move
}
EDIT: In-depth analysis.
Assumptions:
table will have its space reserved in advance since its size is known, we
thus focus on allocations and deallocations of the vector<big_data_type>s
that are returned from make_vector, stored temporarily in tmp,
and finally in table.
The return value of make_vector(i) may or may not have capacity == size.
This analysis treats make_vector as opaque and ignores any allocations
necessary to build the returned vector.
A default-constructed vector has 0 capacity.
reserve(n) sets capacity to exactly n if and only if n > capacity().
shrink_to_fit() sets capacity == size. It may or may not be implemented
to require a copy of the entire vector contents.
The vector copy constructor sets capacity == size.
std::vector may or may not provide the strong exception guarantee for
copy assignment.
I'll parameterize the analysis on two positive integers: N, the number of
vectors that will be in table at the end of the algorithm (4242 in the OP),
and K: the total number of big_data_type objects contained in all vectors
produced by make_vector during the course of the algorithm.
Your Technique
std::vector<std::vector<big_data_type> > table;
table.reserve(N);
std::vector<big_data_type> tmp;
for(auto i=0; i!=N; ++i) {
tmp = make_vector(i); // #1
table.push_back(tmp); // #2
}
// #3
For C++11
At #1, since tmp is already constructed, RVO/copy elision cannot occur. On
every pass through the loop the return value is assigned to tmp. The
assignment is a move: old data in tmp will be destroyed (except on the
first iteration when tmp is empty) and the contents of the return value from
make_vector moved into tmp with no copying taking place. tmp has capacity == size
if and only if make_vector's return value has that property.
At #2, tmp is copied into table. The newly constructed copy in table has
capacity == size as desired. At #3 tmp presumably leaves scope and its
storage is deallocated.
Total allocations/deallocations: N. All allocations at #2, N - 1 deallocations at #1, and one at #3.
Total copies (of big_data_type objects): K.
For Pre-C++11
At #1, since tmp is already constructed, RVO/copy elision cannot occur. On
every pass through the loop the return value is assigned to tmp. This
assignment requires an allocation and a deallocation if either (a) the
implementation provides the strong guarantee, or (b) tmp is too small to
contain all the elements in the returned vector. In any case the elements must
be copied individually. At the end of the full expression, the temporary object
that holds the return value from make_vector is destroyed, resulting in a
deallocation.
At #2, tmp is copied into table. The newly constructed copy in table has
capacity == size as desired. At #3 tmp presumably leaves scope and its
storage is deallocated.
Total allocation/deallocations: N + 1 to 2 * N. 1 to N allocations at #1, N at #2;
N to 2 * N - 1 deallocations at #1, and one at #3.
Total copies: 2 * K. K at #1 and K at #2.
My Technique (C++11-only)
std::vector<std::vector<big_data_type> > table;
table.reserve(N);
for(auto i=0; i!=N; ++i) {
auto tmp = make_vector(i); // #1
tmp.shrink_to_fit(); // #2
table.emplace_back(std::move(tmp)); // #3
}
At #1 tmp is freshly constructed from the return value of make_vector, so
RVO/copy elision is possible. Even if the implementation of make_vector
impedes RVO, tmp will be move-constructed resulting in no allocations,
deallocations, or copies.
At #2 shrink_to_fit may or may not require a single allocation and
deallocation, depending on whether the return value from make_vector already
has the capacity == size property. If allocation/deallocation occurs, the
elements may or may not be copied depending on quality of implementation.
At #3 the contents of tmp are moved into a freshly constructed vector in
table. No allocations/deallocations/copies are performed.
Total allocations/deallocations: 0 or N, all at #2 if and only if make_vector does not return vectors with capacity == size.
Total copies: 0 or K, all at #2 if and only if shrink_to_fit is implemented as a copy.
If the implementor of make_vector produces vectors with the capacity == size
property and the standard library implements shrink_to_fit optimally, there
are no news/deletes and no copies.
Conclusions
Worst case performance of My Technique is the same as expected case performance
of Your Technique. My technique is conditionally optimal.

Here are some run time tests with a helper type that counts creation, moving and copying:
#include <vector>
#include <iostream>
struct big_data_type {
double state;
big_data_type( double d ):state(d) { ++counter; ++create_counter; }
big_data_type():state(0.) { ++counter; }
big_data_type( big_data_type const& o ): state(o.state) { ++counter; }
big_data_type( big_data_type && o ): state(o.state) { ++move_counter; }
big_data_type& operator=( big_data_type const& o ) {
state = o.state;
++counter;
return *this;
}
big_data_type& operator=( big_data_type && o ) {
state = o.state;
++move_counter;
return *this;
}
static int counter;
static int create_counter;
static int move_counter;
};
int big_data_type::move_counter = 0;
int big_data_type::create_counter = 0;
int big_data_type::counter = 0;
std::vector<big_data_type>& make_vector( int i, std::vector<big_data_type>& tmp ) {
tmp.resize(0);
tmp.reserve(1000);
for( int j = 0; j < 10+i/100; ++j ) {
tmp.emplace_back( 100. - j/10. );
}
return tmp;
}
std::vector<big_data_type> make_vector2( int i ) {
std::vector<big_data_type> tmp;
tmp.resize(0);
tmp.reserve(1000);
for( int j = 0; j < 10+i/100; ++j ) {
tmp.emplace_back( 100. - j/10. );
}
return tmp;
}
enum option { a, b, c, d, e };
void test(option op) {
std::vector<std::vector<big_data_type> > table;
std::vector<big_data_type> tmp;
for(int i=0; i!=10; ++i) {
switch(op) {
case a:
table.emplace_back(make_vector(i, tmp));
break;
case b:
tmp = make_vector2(i);
table.emplace_back(tmp);
break;
case c:
tmp = make_vector2(i);
table.emplace_back(std::move(tmp));
break;
case d:
table.emplace_back(make_vector2(i));
break;
case e:
std::vector<big_data_type> result;
make_vector(i, tmp);
result.reserve( tmp.size() );
result.insert( result.end(), std::make_move_iterator( tmp.begin() ),std::make_move_iterator( tmp.end() ) );
table.emplace_back(std::move(result));
break;
}
}
std::cout << "Big data copied or created:" << big_data_type::counter << "\n";
big_data_type::counter = 0;
std::cout << "Big data created:" << big_data_type::create_counter << "\n";
big_data_type::create_counter = 0;
std::cout << "Big data moved:" << big_data_type::move_counter << "\n";
big_data_type::move_counter = 0;
std::size_t cap = 0;
for (auto&& v:table)
cap += v.capacity();
std::cout << "Total capacity at end:" << cap << "\n";
}
int main() {
std::cout << "A\n";
test(a);
std::cout << "B\n";
test(b);
std::cout << "C\n";
test(c);
std::cout << "D\n";
test(d);
std::cout << "E\n";
test(e);
}
Live example
Output:
+ g++ -O4 -Wall -pedantic -pthread -std=c++11 main.cpp
+ ./a.out
A
Big data copied or created:200
Big data created:100
Big data moved:0
Total capacity at end:100
B
Big data copied or created:200
Big data created:100
Big data moved:0
Total capacity at end:100
C
Big data copied or created:100
Big data created:100
Big data moved:0
Total capacity at end:10000
D
Big data copied or created:100
Big data created:100
Big data moved:0
Total capacity at end:10000
E
Big data copied or created:100
Big data created:100
Big data moved:100
Total capacity at end:100
E is an example when your big data can be moved, which often doesn't work.
created refers to only explicitly created data (ie, from the double) -- data "created on purpose". Copied or created refers to any time that any big data is duplicated in a way that the source big data cannot be "discarded". And moved refers to any situation where big data is moved in a way that the source big data can be "discarded".
Case a and b, which are identical in result, are probably want you are looking for. Note the explicit use of the tmp vector as an argument to make_vector: elision won't let you reuse the buffer, you have to be explicit about it.

In addition to Casey's post, I have the following remarks.
As jrok said in a comment here, shrink_to_fit is not guaranteed to do anything. However, if shrink_to_fit allocates the memory for exaclty size() number of elements, copy/move the elements, and deallocate the original buffer, then this is exactly what the OP's asked.
My exact answer to Q4, that is,
Is there are any way to explicitly suppress moving in the last code snipped [...]?
is: Yes, you can do
for(auto i=0; i!=4242; ++i)
table.push_back(static_cast<const std::vector<big_data_type>&>(make_vector(i)));
The copy function suggested by the OP could be written as follow.
template <typename T>
const T& copy(const T& x) {
return x;
}
and the code becomes
for(auto i=0; i!=4242; ++i)
table.push_back(copy(make_vector(i)));
But, honestly, I don't think this is a sensible thing to do.
The best place to make each element v of table such that v.size() == v.capacity() is in make_vector(), if possible. (As Casey said the standard doesn't set any upper bound on capacity.) Then moving the result of make_vector() to table would be optimal in both senses (memory and speed). The OP's snipped should probably take care of table.size() instead.
In summary, the standard doesn't provide any way to force capacity to match size. There was a (sensible, IMHO) suggestion by Jon Kalb to make std::vector::shrink_to_fit at least as efficient (with respect to memory usage) as the shrink_to_fit idiom (which also doesn't guarantee anything). However, some members of the committee were not very keen on it and suggested that people should rather complain with their vendors or implement their own containers and allocation functions.

The vector of vectors construct brings a lot of unnecessary overhead in cases where data is only ever added at the end of the top level vector (as appears to be the case here).
The main issue is the separate buffer allocations and management for each individual entry in the top level vector.
It's much better to concatenate all the sub-entries together into a single contiguous buffer, if possible, with a separate buffer to index into this for each top level entry.
See this article (on my blog), for more discussion about this, and for an example implementation of a 'collapsed vector vector' class to wrap this kind of indexed buffer setup up in a generic container object.
As I said before, this only applies if data only ever gets added at the end of your data structure, i.e. you don't come back later and push entries into arbitrary top level sub vectors, but in cases where this technique applied it can be quite a significant optimisation..

In general, if you want capacity to equal size you can use vector::shrink_to_fit()
http://www.cplusplus.com/reference/vector/vector/shrink_to_fit/

Okay, I think I learned a bit, but couldn't really find a complete answer. So let's first clarify the task:
We have a function filling in a vector. In order to avoid arguments whether copy elision is possible or not, let's just assume that its definition is
void fill_vector(std::vector<big_data_type>& v, int i)
{
v.clear();
v.reserve(large_number); // allocates unless v.capacity() >= large_number
for(int done=0,k=0; k<large_number && !done; ++k)
v.push_back(get_more_big_data(i,done));
// v.capacity() == v.size() is highly unlikely at this point.
}
Further we want to fill the table
std::vector<std::vector<big_data_type>> table;
with N entries, each generated by fill_vector() in such a way that (1) minimises memory usage in the table but (2) avoids unnecessary allocations/de-allocations. In a simple C code, there would be N+2 allocations and 1 de-allocation and only the total number K of big_data_type actually provided by fill_vector() would be allocated. We should not need more with C++. Here is a possible C++ answer
table.reserve(N); // allocates enough space for N vectors
size_t K=0; // count big_data_types in table
std::vector<big_data_type> tmp;
for(int n=0; n!=N; ++n) {
fill_vector(tmp,i); // allocates at first iteration only
K += tmp.size();
table.push_back(tmp.begin(),tmp.end()); // allocates tmp.size() big_data_type
}
// de-allocates tmp
Thus we have N+2 allocations and 1 de-allocation as required and no memory wasted (not more than K big_data_type allocated in table). The push_back calls a constructor of std::vector (without passing information about the capacity of tmp) and implies a copy of each big_data_type. (If big_data_type can be moved, we could use make_move_iterator(tmp.begin()) etc.)
Note that no matter how we code this, we must do at least N+1 allocations (for table and each of its elements). This implies that usage of shrink_to_fit cannot be helpful, because at best it does one allocation and one de-allocation (unless capacity==size which we don't expect to happen with any probability), cancelling each other (so that the allocation cannot contribute towards the required sum of N+1). This is why some other answers were unacceptable.

Related

Specifying the size of a vector in declaration vs using reserve [duplicate]

I know the size of a vector, which is the best way to initialize it?
Option 1:
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
Option 2:
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess, Option2 is better than Option1. Is it? Any other options?
Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
For a simple rule to follow:
Option #1 is faster...
...but this probably shouldn't be your biggest concern.
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3):
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
main.cpp:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
rng.hpp
timer.hpp
Both variants have different semantics, i.e. you are comparing apples and oranges.
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
Choose what better fits your needs, i.e. what is "better" in a certain situation.
The "best" way would be:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
* From Stroustrups C++ Style and Technique FAQ:
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!
Constructor
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.
Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.
Reserve
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.
Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.
Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.
List initialisation
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.
Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
For C-like types it will probably be the same.
How it Works
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.
When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
Why to Select Option 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
Why to Select Option 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.
Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.
Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
Conclusion: use method #2.
In the long run, it depends on the usage and numbers of the elements.
Run the program below to understand how the compiler reserves space:
vector<int> vec;
for(int i=0; i<50; i++)
{
cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl;
vec.push_back(i);
}
size is the number of actual elements and capacity is the actual size of the array to imlement vector.
In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.
Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
It is "better" to have simpler code that does the same job.
I think answer may depend on situation. For instance:
Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
auto vec2 = std::vector<example>{};
vec2.reserve(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
For this case, transform will call copy and move constructors.
The output of the transform part:
copy
move
copy
move
copy
move
Now lets remove the reserve and use the constructor.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
std::vector<example> vec2(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
And in this case transform part produces:
copy
move
move
move
move
copy
move
copy
move
As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.

Efficiently fill a std::vector: push_back vs operator[] [duplicate]

I have a function that takes a pointer to char array and segment size as input arguments and calls another function that requires a std::array<std::string>. The idea is that the input char array is "sectioned" into equal parts, and string array formed.
The input char array format is several smaller arrays (or strings) of determined size, concatenated togeather. These are not assumed zero-terminated, although they might be. Examples for segment size 5 and number of elements 10:
char k[] = "1234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\000";
char m[] = "1234\00067890987654321\000234567809876\0005432\000\000\0003456789098";
char n[] = "12345678909876543211234567890987654321123456789098";
Length of all char arrays is 51 (segment * elements + 1). My goal is to make the function use resources efficiently, most importantly execution time.
Since there are many ways to skin a cat, I have two (or three) ways to tackle this, and the question is, which is "better"? By that I mean faster and less resource-wasteful. I am not a professional, so be patient with me.
Here, values is preallocated and then each string assigned a value.
void myfnc_1(void *a_src, uint32_t a_segment) {
// a_segment = 5 for example
size_t nSize = GetSize(); // another method, gets 10
std::vector<std::string> values(nSize);
char* v = a_src; // take k, n or m for example
for (size_t i = 0; i < nSize; ++i) {
values.at(i).assign(v, a_segment);
v += a_segment;
}
}
Here, the vector is not allocated, but each iteration a new string is added.
void myfnc_1(void *a_src, uint32_t a_segment) {
size_t nSize = GetSize();
std::vector<std::string> values();
char* v = a_src;
for (size_t i = 0; i < nSize; ++i) {
values.push_back("");
values.back().assign(v, a_segment);
v += a_segment;
}
}
There might be a third way, that's better. I'm not so experienced with vectors so I don't know exactly. Do the segment length and number of elements make a difference if they are usually large (5, 10) or small (100, 10000)?
First post, big fan :)
Adding elements to a vector
There are several ways to add data to a vector:
create an empty vector, push_back() elements into it.
create an empty vector, allocate some capacity with reserve(), then push_back() elements into it.
create a vector of n elements, use indexing and copy-assignment.
create an empty vector, emplace_back() elements into it.
create an empty vector, allocate some capacity with reserve(), then emplace_back() elements into it.
There are other ways, e.g. creating the container with a pair of iterators, or filling it via standard library algorithms. I won't consider these here.
General considerations
The following two considerations are important for the following analysis:
Avoid (re)allocations: Memory allocation is slow. reallocation often involves copying everything that's already in the container to the new location.
Avoid unnecessary work: It's better to construct the element you want than to default-construct, then assign. It's better to construct an element where you want it than to construct it elsewhere, then copy.
Other factors will also affect the efficiency of the chosen solution, but these are significant factors that we have direct control over. Other factors may become obvious through profiling your code.
push_back()
Each push_back() copy-constructs an element in the vector from the argument to the push_back() call. If the vector size() == capacity(), a reallocation will be performed. This will usually (but may not always) double the capacity, and may result in copying all of the existing elements into new storage.
push_back() with pre-allocation
Using reserve() allocates enough memory for the elements before we start. It's always worth doing this if you know (or have a reasonable guess for) the number of elements. If you're guessing, over-estimates are better than under-estimates.
The push_back() call will still use the copy constructor of the element type, but there shouldn't be any allocations, because the space is already provided. You just pay the cost of a single allocation during the reserve() call. If you do run over the existing capacity, push_back() will reallocate, often doubling the capacity. This is why an over-estimate for the size is better; you're less likely to get a reallocation, whereas with an under-estimate, not only will you likely reallocate, but you'll waste memory allocating almost twice as much as you needed!
Note that the "doubling capacity" behaviour is not specified by the standard, but it is a common implementation, designed to reduce the reallocation frequency when using push_back() for data sets of unknown size.
indexing and element assignment
Here, we create a vector of the correct number of default-constructed elements, and then use the copy-assignment operator to replace them with the elements we want. This has only one allocation, but can be slow if copy-assignment does anything complicated. This doesn't really work for data sets of unknown (or only guessed) size; the element indexing is safe only if you know the index will never exceed size(), and you have to resort to push_back() or resizing if you need more. This isn't a good general solution, but it can work sometimes.
emplace_back()
emplace_back() constructs the element in-place with the arguments to the emplace_back() call. This can often be faster than the equivalent push_back() (but not always). It still allocates in the same pattern as push_back(), reserving some capacity, filling it, then reallocating when more is needed. Much of the same argument applies, but you can make some gains from the construction method.
emplace_back() with pre-allocation
This should be your go-to strategy for C++11 or later codebases. You gain the emplace_back() efficiency (where possible) and avoid repeated allocations. Of the mechanisms listed, this would be expected to be the fastest in most cases.
A note on efficiency
Efficiency can be measured in several ways. Execution time is a common measure, but not always the most relevant. General advice about which strategy to use is based on experience and essentially "averages" the various effects to provide some reasonable statements about what to do first. As always, if any kind of efficiency is critical for your application, the only way to be sure you're optimising the right place is to profile your code, make changes, and then profile it again to demonstrate that the changes had the desired effect. Different data types, hardware, I/O requirements, etc. can all affect this kind of timing, and you will never know how those effects combine in your particular application until you profile it.
Example
Live example: http://coliru.stacked-crooked.com/a/83d23c2d0dcee2ff
In this example, I fill a vector with 10,000 strings using each of the approaches listed above. I time each one and print the results.
This is similar to your question, but not identical; your results will differ!
Note that the emplace_back() with reserve() is the fastest, but the indexing & assignment is also quick here. This is likely because std::string has an efficient swap(), and its default constructor doesn't do much. The other approaches are an order of magnitude slower.
#include <chrono>
#include <iostream>
#include <vector>
using Clock = std::chrono::high_resolution_clock;
using time_point = std::chrono::time_point<Clock>;
std::vector<std::string> strings = {"one", "two", "three", "four", "five"};
std::chrono::duration<double> vector_push_back(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
for (size_t i = 0; i < n; ++i) {
v.push_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_push_back_with_reserve(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
v.reserve(n);
for (size_t i = 0; i < n; ++i) {
v.push_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_element_assignment(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v(n);
for (size_t i = 0; i < n; ++i) {
v[i] = strings[i % strings.size()];
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_emplace_back(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
for (size_t i = 0; i < n; ++i) {
v.emplace_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_emplace_back_with_reserve(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
v.reserve(n);
for (size_t i = 0; i < n; ++i) {
v.emplace_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
int main() {
const size_t n = 10000;
std::cout << "vector push_back: " << vector_push_back(n).count() << "\n";
std::cout << "vector push_back with reserve: " << vector_push_back(n).count() << "\n";
std::cout << "vector element assignment: " << vector_element_assignment(n).count() << "\n";
std::cout << "vector emplace_back: " << vector_emplace_back(n).count() << "\n";
std::cout << "vector emplace_back with reserve: " << vector_emplace_back_with_reserve(n).count() << "\n";
}
Results:
vector push_back: 0.00205563
vector push_back with reserve: 0.00152464
vector element assignment: 0.000610934
vector emplace_back: 0.00125141
vector emplace_back with reserve: 0.000545451
Conclusions
For most new code, using reserve() and emplace_back() (or push_back() for older code) should give you a good first-approximation for efficiency. If it isn't enough, profile it and find out where the bottleneck is. It probably won't be where you think it is.
Better performance will be reached when avoiding dynamic reallocation, so try to have the vector memory be big enough to receive all elements.
Your first solution will be more efficient because if nSize is bigger than default vector capacity, the second one will need a reallocation to be able to store all elements.
As commented by Melkon, reserve is even better:
void myfnc_1(void *a_src, uint32_t a_segment) {
size_t nSize = GetSize();
std::vector<std::string> values;
values.reserve( nSize );
char* v = a_src;
for (size_t i = 0; i < nSize; ++i) {
values.push_back( std::string( v, a_segment ) );
v += a_segment;
}
}
Do not use parentheses to invoke the default constructor.
push_back requires extra reallocations each time capacity is exceeded. So option 2 can be improved by reserving enough space to avoid reallocations. It is also more efficient to directly push the string than to push an empty one and later on reassign. And there is a constructor for std::string which is very convenient for your needs : from sequence (5)
string (const char* s, size_t n);
Regarding option 1: Preallocating the whole vector requires each element to be constructed once for initialization and yet another time for assignment. Better to reserve without constructing elements and directly push_back the ones you really want.
This is the code using those improvements :
void myfnc_1(void *a_src, uint32_t a_segment)
{
std::vector<std::string> values;
size_t nSize = GetSize( );
values.reserve(nSize);
char* v = static_cast<char*> ( a_src );
for (size_t i = 0; i < nSize; ++i)
{
values.push_back( std::string( v, a_segment) );
v += a_segment;
}
}
Just do whatever is easier to read and to maintain. Quite often, it's the fastest solution anyway.
And even if it isn't the fastest, who cares? Maybe your app will be 1% slower.

Overloaded vector operators causing a massive performance reduction?

I am summing and multiplying vectors by a constant many many times so I overloaded the operators * and +. However working with vectors greatly slowed down my program. Working with a standard C-array improved the time by a factor of 40. What would cause such a slow down?
An example program showing my overloaded operators and exhibiting the slow-down is below. This program does k = k + (0.0001)*q, log(N) times (here N = 1000000). At the end the program prints the times to do the operations using vectors and c-arrays, and also the ratio of the times.
#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include <time.h>
#include <vector>
using namespace std;
// -------- OVERLOADING VECTOR OPERATORS ---------------------------
vector<double> operator*(const double a,const vector<double> & vec)
{
vector<double> result;
for(int i = 0; i < vec.size(); i++)
result.push_back(a*vec[i]);
return result;
}
vector<double> operator+(const vector<double> & lhs,
const vector<double> & rhs)
{
vector<double> result;
for(int i = 0; i < lhs.size();i++)
result.push_back(lhs[i]+rhs[i]);
return result;
}
//------------------------------------------------------------------
//--------------- Basic C-Array operations -------------------------
// s[k] = y[k];
void populate_array(int DIM, double *y, double *s){
for(int k=0;k<DIM;k++)
s[k] = y[k];
}
//sums the arrays y and s as y+c s and sends them to s;
void sum_array(int DIM, double *y, double *s, double c){
for(int k=0;k<DIM;k++)
s[k] = y[k] + c*s[k];
}
// sums the array y and s as a*y+c*s and sends them to s;
void sum_array2(int DIM, double *y, double *s,double a,double c){
for(int k=0;k<DIM;k++)
s[k] = a*y[k] + c*s[k];
}
//------------------------------------------------------------------
int main(){
vector<double> k = {1e-8,2e-8,3e-8,4e-8};
vector<double> q = {1e-8,2e-8,3e-8,4e-8};
double ka[4] = {1e-8,2e-8,3e-8,4e-8};
double qa[4] = {1e-8,2e-8,3e-8,4e-8};
int N = 3;
clock_t begin,end;
double elapsed_sec,elapsed_sec2;
begin = clock();
do
{
k = k + 0.0001*q;
N = 2*N;
}while(N<1000000);
end = clock();
elapsed_sec = double(end-begin) / CLOCKS_PER_SEC;
printf("vector time: %g \n",elapsed_sec);
N = 3;
begin = clock();
do
{
sum_array2(4, qa, ka,0.0001,1.0);
N = 2*N;
}while(N<1000000);
end = clock();
elapsed_sec2 = double(end-begin) / CLOCKS_PER_SEC;
printf("array time: %g \n",elapsed_sec2);
printf("time ratio : %g \n", elapsed_sec/elapsed_sec2);
}
I get the ratio of vector time to c-array timeto be typically ~40 on my linux system. What is it about my overload operators that causes the slowdown compared to C-array operations?
Let's take a look at this line:
k = k + 0.0001*q;
To evaluate this, first the computer needs to call your operator*. This function creates a vector and needs to allocate dynamic storage for its elements. Actually, since you use push_back rather than setting the size ahead of time via constructor, resize, or reserve, it might allocate too few elements the first time and need to allocate again to grow the vector.
This created vector (or one move-constructed from it) is then used as a temporary object representing the subexpression 0.0001*q within the whole statement.
Next the computer needs to call your operator+, passing k and that temporary vector. This function also creates and returns a vector, doing at least one dynamic allocation and possibly more. There's a second temporary vector for the subexpression k + 0.0001*q.
Finally, the computer calls an operator= belonging to std::vector. Luckily, there's a move assignment overload, which (probably) just moves the allocated memory from the second temporary to k and deallocates the memory that was in k.
Now that the entire statement has been evaluated, the temporary objects are destroyed. First the temporary for k + 0.0001*q is destroyed, but it no longer has any memory to clean up. Then the temporary for 0.0001*q is destroyed, and it does need to deallocate its memory.
Doing lots of allocating and deallocating of memory, even in small amounts, tends to be somewhat expensive. (The vectors will use std::allocator, which is allowed to be smarter and avoid some allocations and deallocations, but I couldn't say without investigation how likely it would be to actually help here.)
On the other hand, your "C-style" implementation does no allocating or deallocating at all. It does an "in-place" calculation, just modifying arrays passed in to store the values passed out. If you had another C-style implementation with individual functions like double* scalar_times_vec(double s, const double* v, unsigned int len); that used malloc to get memory for the result and required the results to eventually be freed, you would probably get similar results.
So how might the C++ implementation be improved?
As mentioned, you could either reserve the vectors before adding data to them, or give them an initial size and do assignments like v[i] = out; rather than push_back(out);.
The next easiest step would be to use more operators that allow in-place calculations. If you overloaded:
std::vector<double>& operator+=(const std::vector<double>&);
std::vector<double>& operator*=(double);
then you could do:
k += 0.0001*q;
n *= 2;
// or:
n += n;
to do the final computations on k and n in-place. This doesn't easily help with the expression 0.0001*q, though.
Another option that sometimes helps is to overload operators to accept rvalues in order to reuse storage that belonged to temporaries. If we had an overload:
std::vector<double> operator+(const std::vector<double>& a, std::vector<double>&& b);
it would get called for the + in the expression k + 0.0001*q, and the implementation could create the return value from std::move(b), reusing its storage. This gets tricky to get both flexible and correct, though. And it still doesn't eliminate the temporary representing 0.0001*q or its allocation and deallocation.
Another solution that allows in-place calculations in the most general cases is called expression templates. That's rather a lot of work to implement, but if you really need a combination of convenient syntax and efficiency, there are some existing libraries that might be worth looking into.
Edit:
I should have taken a closer look on how you perform the c-array operations... See aschepler's answer on why growing the vectors is the least of your problems.
–––
If you have any idea how many elements you are going to add to a vector, you should always call reserve on the vector before adding them. Otherwise you are going to trigger a potentially large amount of reallocations, which are costly.
A vector occupies a continuous block of memory. To grow, it has to allocate a larger block of memory and copy its entire content to the new location. To avoid this happening every time a element is added, the vector usually allocates more memory than is presently needed to store all its elements. The number of elements it can store without reallocation is its capacity. How large this capacity should be is of course a trade off between avoiding potential future reallocation and wasting memory.
However, if you know (or have a good idea) how many elements will eventually be stored in the vector, you can call reserve(n) to set its capacity to (at least) n and avoid unecessary reallocation.
Edit :
See also here. push_back performes a bound check and is thus a bit slower than just writing to the vector through operator[]. In your case it might be fastest to directly construct a vector of size (not just capacity) n, as doubles are POD and cheap to construct, and than insert the correct values through operator[].

What is better: reserve vector capacity, preallocate to size or push back in loop?

I have a function that takes a pointer to char array and segment size as input arguments and calls another function that requires a std::array<std::string>. The idea is that the input char array is "sectioned" into equal parts, and string array formed.
The input char array format is several smaller arrays (or strings) of determined size, concatenated togeather. These are not assumed zero-terminated, although they might be. Examples for segment size 5 and number of elements 10:
char k[] = "1234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\0001234\000";
char m[] = "1234\00067890987654321\000234567809876\0005432\000\000\0003456789098";
char n[] = "12345678909876543211234567890987654321123456789098";
Length of all char arrays is 51 (segment * elements + 1). My goal is to make the function use resources efficiently, most importantly execution time.
Since there are many ways to skin a cat, I have two (or three) ways to tackle this, and the question is, which is "better"? By that I mean faster and less resource-wasteful. I am not a professional, so be patient with me.
Here, values is preallocated and then each string assigned a value.
void myfnc_1(void *a_src, uint32_t a_segment) {
// a_segment = 5 for example
size_t nSize = GetSize(); // another method, gets 10
std::vector<std::string> values(nSize);
char* v = a_src; // take k, n or m for example
for (size_t i = 0; i < nSize; ++i) {
values.at(i).assign(v, a_segment);
v += a_segment;
}
}
Here, the vector is not allocated, but each iteration a new string is added.
void myfnc_1(void *a_src, uint32_t a_segment) {
size_t nSize = GetSize();
std::vector<std::string> values();
char* v = a_src;
for (size_t i = 0; i < nSize; ++i) {
values.push_back("");
values.back().assign(v, a_segment);
v += a_segment;
}
}
There might be a third way, that's better. I'm not so experienced with vectors so I don't know exactly. Do the segment length and number of elements make a difference if they are usually large (5, 10) or small (100, 10000)?
First post, big fan :)
Adding elements to a vector
There are several ways to add data to a vector:
create an empty vector, push_back() elements into it.
create an empty vector, allocate some capacity with reserve(), then push_back() elements into it.
create a vector of n elements, use indexing and copy-assignment.
create an empty vector, emplace_back() elements into it.
create an empty vector, allocate some capacity with reserve(), then emplace_back() elements into it.
There are other ways, e.g. creating the container with a pair of iterators, or filling it via standard library algorithms. I won't consider these here.
General considerations
The following two considerations are important for the following analysis:
Avoid (re)allocations: Memory allocation is slow. reallocation often involves copying everything that's already in the container to the new location.
Avoid unnecessary work: It's better to construct the element you want than to default-construct, then assign. It's better to construct an element where you want it than to construct it elsewhere, then copy.
Other factors will also affect the efficiency of the chosen solution, but these are significant factors that we have direct control over. Other factors may become obvious through profiling your code.
push_back()
Each push_back() copy-constructs an element in the vector from the argument to the push_back() call. If the vector size() == capacity(), a reallocation will be performed. This will usually (but may not always) double the capacity, and may result in copying all of the existing elements into new storage.
push_back() with pre-allocation
Using reserve() allocates enough memory for the elements before we start. It's always worth doing this if you know (or have a reasonable guess for) the number of elements. If you're guessing, over-estimates are better than under-estimates.
The push_back() call will still use the copy constructor of the element type, but there shouldn't be any allocations, because the space is already provided. You just pay the cost of a single allocation during the reserve() call. If you do run over the existing capacity, push_back() will reallocate, often doubling the capacity. This is why an over-estimate for the size is better; you're less likely to get a reallocation, whereas with an under-estimate, not only will you likely reallocate, but you'll waste memory allocating almost twice as much as you needed!
Note that the "doubling capacity" behaviour is not specified by the standard, but it is a common implementation, designed to reduce the reallocation frequency when using push_back() for data sets of unknown size.
indexing and element assignment
Here, we create a vector of the correct number of default-constructed elements, and then use the copy-assignment operator to replace them with the elements we want. This has only one allocation, but can be slow if copy-assignment does anything complicated. This doesn't really work for data sets of unknown (or only guessed) size; the element indexing is safe only if you know the index will never exceed size(), and you have to resort to push_back() or resizing if you need more. This isn't a good general solution, but it can work sometimes.
emplace_back()
emplace_back() constructs the element in-place with the arguments to the emplace_back() call. This can often be faster than the equivalent push_back() (but not always). It still allocates in the same pattern as push_back(), reserving some capacity, filling it, then reallocating when more is needed. Much of the same argument applies, but you can make some gains from the construction method.
emplace_back() with pre-allocation
This should be your go-to strategy for C++11 or later codebases. You gain the emplace_back() efficiency (where possible) and avoid repeated allocations. Of the mechanisms listed, this would be expected to be the fastest in most cases.
A note on efficiency
Efficiency can be measured in several ways. Execution time is a common measure, but not always the most relevant. General advice about which strategy to use is based on experience and essentially "averages" the various effects to provide some reasonable statements about what to do first. As always, if any kind of efficiency is critical for your application, the only way to be sure you're optimising the right place is to profile your code, make changes, and then profile it again to demonstrate that the changes had the desired effect. Different data types, hardware, I/O requirements, etc. can all affect this kind of timing, and you will never know how those effects combine in your particular application until you profile it.
Example
Live example: http://coliru.stacked-crooked.com/a/83d23c2d0dcee2ff
In this example, I fill a vector with 10,000 strings using each of the approaches listed above. I time each one and print the results.
This is similar to your question, but not identical; your results will differ!
Note that the emplace_back() with reserve() is the fastest, but the indexing & assignment is also quick here. This is likely because std::string has an efficient swap(), and its default constructor doesn't do much. The other approaches are an order of magnitude slower.
#include <chrono>
#include <iostream>
#include <vector>
using Clock = std::chrono::high_resolution_clock;
using time_point = std::chrono::time_point<Clock>;
std::vector<std::string> strings = {"one", "two", "three", "four", "five"};
std::chrono::duration<double> vector_push_back(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
for (size_t i = 0; i < n; ++i) {
v.push_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_push_back_with_reserve(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
v.reserve(n);
for (size_t i = 0; i < n; ++i) {
v.push_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_element_assignment(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v(n);
for (size_t i = 0; i < n; ++i) {
v[i] = strings[i % strings.size()];
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_emplace_back(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
for (size_t i = 0; i < n; ++i) {
v.emplace_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
std::chrono::duration<double> vector_emplace_back_with_reserve(const size_t n) {
time_point start, end;
start = Clock::now();
std::vector<std::string> v;
v.reserve(n);
for (size_t i = 0; i < n; ++i) {
v.emplace_back(strings[i % strings.size()]);
}
end = Clock::now();
return end - start;
}
int main() {
const size_t n = 10000;
std::cout << "vector push_back: " << vector_push_back(n).count() << "\n";
std::cout << "vector push_back with reserve: " << vector_push_back(n).count() << "\n";
std::cout << "vector element assignment: " << vector_element_assignment(n).count() << "\n";
std::cout << "vector emplace_back: " << vector_emplace_back(n).count() << "\n";
std::cout << "vector emplace_back with reserve: " << vector_emplace_back_with_reserve(n).count() << "\n";
}
Results:
vector push_back: 0.00205563
vector push_back with reserve: 0.00152464
vector element assignment: 0.000610934
vector emplace_back: 0.00125141
vector emplace_back with reserve: 0.000545451
Conclusions
For most new code, using reserve() and emplace_back() (or push_back() for older code) should give you a good first-approximation for efficiency. If it isn't enough, profile it and find out where the bottleneck is. It probably won't be where you think it is.
Better performance will be reached when avoiding dynamic reallocation, so try to have the vector memory be big enough to receive all elements.
Your first solution will be more efficient because if nSize is bigger than default vector capacity, the second one will need a reallocation to be able to store all elements.
As commented by Melkon, reserve is even better:
void myfnc_1(void *a_src, uint32_t a_segment) {
size_t nSize = GetSize();
std::vector<std::string> values;
values.reserve( nSize );
char* v = a_src;
for (size_t i = 0; i < nSize; ++i) {
values.push_back( std::string( v, a_segment ) );
v += a_segment;
}
}
Do not use parentheses to invoke the default constructor.
push_back requires extra reallocations each time capacity is exceeded. So option 2 can be improved by reserving enough space to avoid reallocations. It is also more efficient to directly push the string than to push an empty one and later on reassign. And there is a constructor for std::string which is very convenient for your needs : from sequence (5)
string (const char* s, size_t n);
Regarding option 1: Preallocating the whole vector requires each element to be constructed once for initialization and yet another time for assignment. Better to reserve without constructing elements and directly push_back the ones you really want.
This is the code using those improvements :
void myfnc_1(void *a_src, uint32_t a_segment)
{
std::vector<std::string> values;
size_t nSize = GetSize( );
values.reserve(nSize);
char* v = static_cast<char*> ( a_src );
for (size_t i = 0; i < nSize; ++i)
{
values.push_back( std::string( v, a_segment) );
v += a_segment;
}
}
Just do whatever is easier to read and to maintain. Quite often, it's the fastest solution anyway.
And even if it isn't the fastest, who cares? Maybe your app will be 1% slower.

Vector: initialization or reserve?

I know the size of a vector, which is the best way to initialize it?
Option 1:
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
Option 2:
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess, Option2 is better than Option1. Is it? Any other options?
Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
For a simple rule to follow:
Option #1 is faster...
...but this probably shouldn't be your biggest concern.
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3):
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
main.cpp:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
rng.hpp
timer.hpp
Both variants have different semantics, i.e. you are comparing apples and oranges.
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
Choose what better fits your needs, i.e. what is "better" in a certain situation.
The "best" way would be:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
* From Stroustrups C++ Style and Technique FAQ:
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!
Constructor
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.
Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.
Reserve
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.
Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.
Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.
List initialisation
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.
Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
For C-like types it will probably be the same.
How it Works
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.
When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
Why to Select Option 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
Why to Select Option 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.
Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.
Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
Conclusion: use method #2.
In the long run, it depends on the usage and numbers of the elements.
Run the program below to understand how the compiler reserves space:
vector<int> vec;
for(int i=0; i<50; i++)
{
cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl;
vec.push_back(i);
}
size is the number of actual elements and capacity is the actual size of the array to imlement vector.
In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.
Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
It is "better" to have simpler code that does the same job.
I think answer may depend on situation. For instance:
Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
auto vec2 = std::vector<example>{};
vec2.reserve(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
For this case, transform will call copy and move constructors.
The output of the transform part:
copy
move
copy
move
copy
move
Now lets remove the reserve and use the constructor.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
std::vector<example> vec2(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
And in this case transform part produces:
copy
move
move
move
move
copy
move
copy
move
As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.