Vector: initialization or reserve? - c++

I know the size of a vector, which is the best way to initialize it?
Option 1:
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
Option 2:
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess, Option2 is better than Option1. Is it? Any other options?

Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
For a simple rule to follow:
Option #1 is faster...
...but this probably shouldn't be your biggest concern.
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3):
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
main.cpp:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
rng.hpp
timer.hpp

Both variants have different semantics, i.e. you are comparing apples and oranges.
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
Choose what better fits your needs, i.e. what is "better" in a certain situation.

The "best" way would be:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
* From Stroustrups C++ Style and Technique FAQ:
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.

While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!
Constructor
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.
Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.
Reserve
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.
Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.
Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.
List initialisation
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.

Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
For C-like types it will probably be the same.

How it Works
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.
When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
Why to Select Option 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
Why to Select Option 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.

Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.
Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
Conclusion: use method #2.

In the long run, it depends on the usage and numbers of the elements.
Run the program below to understand how the compiler reserves space:
vector<int> vec;
for(int i=0; i<50; i++)
{
cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl;
vec.push_back(i);
}
size is the number of actual elements and capacity is the actual size of the array to imlement vector.
In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.

Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
It is "better" to have simpler code that does the same job.

I think answer may depend on situation. For instance:
Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
auto vec2 = std::vector<example>{};
vec2.reserve(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
For this case, transform will call copy and move constructors.
The output of the transform part:
copy
move
copy
move
copy
move
Now lets remove the reserve and use the constructor.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
std::vector<example> vec2(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
And in this case transform part produces:
copy
move
move
move
move
copy
move
copy
move
As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.

Related

Specifying the size of a vector in declaration vs using reserve [duplicate]

I know the size of a vector, which is the best way to initialize it?
Option 1:
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
Option 2:
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess, Option2 is better than Option1. Is it? Any other options?
Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
For a simple rule to follow:
Option #1 is faster...
...but this probably shouldn't be your biggest concern.
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3):
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
main.cpp:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
rng.hpp
timer.hpp
Both variants have different semantics, i.e. you are comparing apples and oranges.
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
Choose what better fits your needs, i.e. what is "better" in a certain situation.
The "best" way would be:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
* From Stroustrups C++ Style and Technique FAQ:
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!
Constructor
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.
Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.
Reserve
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.
Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.
Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.
List initialisation
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.
Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
For C-like types it will probably be the same.
How it Works
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.
When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
Why to Select Option 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
Why to Select Option 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.
Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.
Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
Conclusion: use method #2.
In the long run, it depends on the usage and numbers of the elements.
Run the program below to understand how the compiler reserves space:
vector<int> vec;
for(int i=0; i<50; i++)
{
cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl;
vec.push_back(i);
}
size is the number of actual elements and capacity is the actual size of the array to imlement vector.
In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.
Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
It is "better" to have simpler code that does the same job.
I think answer may depend on situation. For instance:
Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
auto vec2 = std::vector<example>{};
vec2.reserve(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
For this case, transform will call copy and move constructors.
The output of the transform part:
copy
move
copy
move
copy
move
Now lets remove the reserve and use the constructor.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
std::vector<example> vec2(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
And in this case transform part produces:
copy
move
move
move
move
copy
move
copy
move
As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.

How can I make my dynamic array or vector operate at a similar speed to a standard array? C++

I'm still quite inexperienced in C++ and i'm trying to write sum code to add numbers precisely. This is a dll plugin for some finite difference software and the code is called several million times during a run. I want to write a function where any number of arguments can be passed in and the sum will be returned. My code looks like:
#include <cstdarg>
double SumFunction(int numArgs, ...){ // this allows me to pass any number
// of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg
double myarray[10];
for (int i = 0; i < numArgs; i++) {
myarray[i] = va_arg(args,double);
} // I imagine this is sloppy code; however i cannot create
// myarray{numArgs] because numArgs is not a const int.
sum(myarray); // The actual method of addition is not relevant here, but
//for more complicated methods, I need to put the summation
// terms in a list.
vector<double> vec(numArgs); // instead, place all values in a vector
for (int i = 0; i < numArgs; i++) {
vec.at(i) = va_arg(args,double);
}
sum(vec); //This would be passed by reference, of course. The function sum
// doesn't actually exist, it would all be contained within the
// current function. This is method is twice as slow as placing
//all the values in the static array.
double *vec;
vec = new double[numArgs];
for (int i = 0; i < (numArgs); i++) {
vec[i] = va_arg(args,double);
}
sum(vec); // Again half of the speed of using a standard array and
// increasing in magnitude for every extra dynamic array!
delete[] vec;
va_end(args);
}
So the problem I have is that using an oversized static array is sloppy programming, but using either a vector or a dynamic array slows the program down considerably. So I really don't know what to do. Can anyone help, please?
One way to speed the code up (at the cost of making it more complicated) is to reuse a dynamic array or vector between calls, then you will avoid incurring the overhead of memory allocation and deallocation each time you call the function.
For example declare these variables outside your function either as global variables or as member variables inside some class. I'll just make them globals for ease of explanation:
double* sumArray = NULL;
int sumArraySize = 0;
In your SumFunction, check if the array exists and if not allocate it, and resize if necessary:
double SumFunction(int numArgs, ...){ // this allows me to pass any number
// of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg
// if the array has already been allocated, check if it is large enough and delete if not:
if((sumArray != NULL) && (numArgs > sumArraySize))
{
delete[] sumArray;
sumArray = NULL;
}
// allocate the array, but only if necessary:
if(sumArray == NULL)
{
sumArray = new double[numArgs];
sumArraySize = numArgs;
}
double *vec = sumArray; // set to your array, reusable between calls
for (int i = 0; i < (numArgs); i++) {
vec[i] = va_arg(args,double);
}
sum(vec, numArgs); // you will need to pass the array size
va_end(args);
// note no array deallocation
}
The catch is that you need to remember to deallocate the array at some point by calling a function similar to this (like I said, you pay for speed with extra complexity):
void freeSumArray()
{
if(sumArray != NULL)
{
delete[] sumArray;
sumArray = NULL;
sumArraySize = 0;
}
}
You can take a similar (and simpler/cleaner) approach with a vector, allocate it the first time if it doesn't already exist, or call resize() on it with numArgs if it does.
When using a std::vector the optimizer must consider that relocation is possible and this introduces an extra indirection.
In other words the code for
v[index] += value;
where v is for example a std::vector<int> is expanded to
int *p = v._begin + index;
*p += value;
i.e. from vector you need first to get the field _begin (that contains where the content starts in memory), then apply the index, and then dereference to get the value and mutate it.
If the code performing the computation on the elements of the vector in a loop calls any unknown non-inlined code, the optimizer is forced to assume that unknown code may mutate the _begin field of the vector and this will require doing the two-steps indirection for each element.
(NOTE: that the vector is passed with a cost std::vector<T>& reference is totally irrelevant: a const reference doesn't mean that the vector is const but simply puts a limitation on what operations are permitted using that reference; external code could have a non-const reference to access the vector and constness can also be legally casted away... constness of references is basically ignored by the optimizer).
One way to remove this extra lookup (if you know that the vector is not being resized during the computation) is to cache this address in a local and use that instead of the vector operator [] to access the element:
int *p = &v[0];
for (int i=0,n=v.size(); i<n; i++) {
/// use p[i] instead of v[i]
}
This will generate code that is almost as efficient as a static array because, given that the address of p is not published, nothing in the body of the loop can change it and the value p can be assumed constant (something that cannot be done for v._begin as the optimizer cannot know if someone else knows the address of _begin).
I'm saying "almost" because a static array only requires indexing, while using a dynamically allocated area requires "base + indexing" access; most CPUs however provide this kind of memory access at no extra cost. Moreover if you're processing elements in sequence the indexing addressing becomes just a sequential memory access but only if you can assume the start address constant (i.e. not in the case of std::vector<T>::operator[]).
Assuming that the "max storage ever needed" is in the order of 10-50, I'd say using a local array is perfectly fine.
Using vector<T> will use 3 * sizeof(*T) (at least) to track the contents of the vector. So if we compare that to an array of double arr[10];, then that's 7 elements more on the stack of equal size (or 8.5 in 32-bit build). But you also need a call to new, which takes a size argument. So that takes up AT LEAST one, more likely 2-3 elements of stackspace, and the implementation of new is quite possibly not straightforward, so further calls are needed, which take up further stack-space.
If you "don't know" the number of elements, and need to cope with quite large numbers of elements, then using a hybrid solution, where you have a small stack-based local array, and if numargs > small_size use vector, and then pass vec.data() to the function sum.

Resizing std::vector without destroying elements

I am using all the time the same std::vector<int> in order to try to avoid allocating an deallocating all the time. In a few lines, my code is as follows:
std::vector<int> myVector;
myVector.reserve(4);
for (int i = 0; i < 100; ++i) {
fillVector(myVector);
//use of myVector
//....
myVector.resize(0);
}
In each for iteration, myVector will be filled with up to 4 elements. In order to make efficient code, I want to use always myVector. However, in myVector.resize() the elements in myVector are being destroyed. I understand that myVector.clear() will have the same effect.
I think if I could just overwrite the existing elements in myVector I could save some time. However I think the std::vector is not capable of doing this.
Is there any way of doing this? Does it make sense to create a home-grown implementation which overwrites elements ?
Your code is already valid (myVector.clear() has better style than myVector.resize(0) though).
'int destructor' does nothing.
So resize(0) just sets the size to 0, capacity is untouched.
Simply don't keep resizing myVector. Instead, initialise it with 4 elements (with std::vector<int> myVector(4)) and just assign to the elements instead (e.g. myVector[0] = 5).
However, if it's always going to be fixed size, then you might prefer to use a std::array<int, 4>.
Resizing a vector to 0 will not reduce its capacity and, since your element type is int, there are no destructors to run:
#include <iostream>
#include <vector>
int main() {
std::vector<int> v{1,2,3};
std::cout << v.capacity() << ' ';
v.resize(0);
std::cout << v.capacity() << '\n';
}
// Output: 3 3
Therefore, your code already performs mostly optimally; the only further optimisation you could make would be to avoid the resize entirely, thereby losing the internal "set size to 0" inside std::vector that likely comes down to an if statement and a data member value change.
std::vector is not a solution in this case. You don't want to resize/clear/(de)allocate all over again? Don't.
fillVector() fills 'vector' with number of elements known in each iteration.
Vector is internally represented as continuous block of memory of type T*.
You don't want to (de)allocate memory each time.
Ok. Use simple struct:
struct upTo4ElemVectorOfInts
{
int data[4];
size_t elems_num;
};
And modify fillVector() to save additional info:
void fillVector(upTo4ElemVectorOfInts& vec)
{
//fill vec.data with values
vec.elems_num = filled_num; //save how many values was filled in this iteration
}
Use it in the very same way:
upTo4ElemVectorOfInts myVector;
for (int i = 0; i < 100; ++i)
{
fillVector(myVector);
//use of myVector:
//- myVector.data contains data (it's equivalent of std::vector<>::data())
//- myVector.elems_num will tell you how many numbers you should care about
//nothing needs to be resized/cleared
}
Additional Note:
If you want more general solution (to operate on any type or size), you can, of course, use templates:
template <class T, size_t Size>
struct upToSizeElemVectorOfTs
{
T data[Size];
size_t elems_num;
};
and adjust fillVector() to accept template instead of known type.
This solution is probably the fastest one. You can think: "Hey, and if I want to fill up to 100 elements? 1000? 10000? What then? 10000-elem array will consume a lot of storage!".
It would consume anyway. Vector is resizing itself automatically and this reallocs are out of your control and thus can be very inefficient. If your array is reasonably small and you can predict max required size, always use fixed-size storage created on local stack. It's faster, more efficient and simpler. Of course this won't work for arrays of 1.000.000 elements (you would get Stack Overflow in this case).
In fact what you have at present is
for (int i = 0; i < 100; ++i) {
myVector.reserve(4);
//use of myVector
//....
myVector.resize(0);
}
I do not see any sense in that code.
Of course it would be better to use myVector.clear() instead of myVector.resize(0);
If you always overwrite exactly 4 elements of the vector inside the loop then you could use
std::vector<int> myVector( 4 );
instead of
std::vector<int> myVector;
myVector.reserve(4);
provided that function fillVector(myVector); uses the subscript operator to access these 4 elements of the vector instead of member function push_back
Otherwise use clear as it was early suggested.

How best to fill a vector of vectors (avoiding wasting memory and unnecessary allocations & de-allocations)?

I want to fill a vector of vectors when individual vectors can have different size(), e.g.
std::vector<std::vector<big_data_type> > table;
std::vector<big_data_type> tmp;
for(auto i=0; i!=4242; ++i) {
tmp = make_vector(i); // copy elison; calls new[] only for i=0
table.push_back(tmp); // copy calls new[] each time
}
My main issue is to avoid wasting memory on unused capacity. So my first question is:
Q1 Will the copy (made inside push_back) have capacity() == size() (what I want), or preserve whatever tmp had, or is this implementation dependent / undefined?
I was considering to move the individual vectors into the table
table.push_back(std::move(tmp)); // move
but that would surely preserve the capacity and hence waste memory. Moreover, this doesn't avoid the allocation of each individual vector, it only moves it into another place (inside make_vector instead of push_back).
Q2 I was wondering what difference it makes to omit the variable tmp, resulting in the more elegant looking code (2 instead of 5 lines):
for(auto i=0; i!=4242; ++i)
table.push_back(make_vector(i)); // move!
My initial thought is that this will construct and destruct another temporary at each iteration and hence generate many calls to new[] and delete[] (which will essentially re-use the same memory). However, in addition this will call the moving version of push_back and hence waste memory (see above). Correct?
Q3 Is it possible that the compiler "optimizes" my former code into this latter form and thus uses moving instead of copying (resulting in wasting memory)?
Q4 If I'm correct, it seems to me that all this implies that moving data automatically for temporary objects is a mixed blessing (as it prevents compacting). Is there are any way to explicitly suppress moving in the last code snipped, i.e. something like
for(auto i=0; i!=4242; ++i)
table.push_back(std::copy(make_vector(i))); // don't move!
Q1 Will the copy (made inside push_back) have capacity() == size() (what I want), or preserve whatever tmp had, or is this implementation dependent / undefined?
The standard never sets maximums for capacity, only minimums. That said, most implementations will have capacity() == size() for a fresh vector copy or capacity slightly rounded up to the blocksize of the allocator implementation.
Q2 I was wondering what difference it makes to omit the variable tmp, resulting in the more elegant looking code.
The result is to move into table instead of copying.
Q3 Is it possible that the compiler "optimizes" my former code into this latter form and thus uses moving instead of copying (resulting in wasting memory)?
It's possible but very unlikely. The compiler would have to prove that moving isn't observably different from copying, which is challenging enough that to my knowledge no current compiler tries.
Q4 If I'm correct, it seems to me that all this implies that moving data automatically for temporary objects is a mixed blessing (as it prevents compacting).
Moving is a speed optimization, not necessarily a space optimization. Copying may reduce the space, but it definitely will increase the processing time.
If you want to optimize space, your best bet is to use shrink_to_fit:
std::vector<std::vector<big_data_type> > table;
for(auto i=0; i!=4242; ++i) {
std::vector<big_data_type> tmp = make_vector(i); // copy elison
tmp.shrink_to_fit(); // shrink
table.push_back(std::move(tmp)); // move
}
EDIT: In-depth analysis.
Assumptions:
table will have its space reserved in advance since its size is known, we
thus focus on allocations and deallocations of the vector<big_data_type>s
that are returned from make_vector, stored temporarily in tmp,
and finally in table.
The return value of make_vector(i) may or may not have capacity == size.
This analysis treats make_vector as opaque and ignores any allocations
necessary to build the returned vector.
A default-constructed vector has 0 capacity.
reserve(n) sets capacity to exactly n if and only if n > capacity().
shrink_to_fit() sets capacity == size. It may or may not be implemented
to require a copy of the entire vector contents.
The vector copy constructor sets capacity == size.
std::vector may or may not provide the strong exception guarantee for
copy assignment.
I'll parameterize the analysis on two positive integers: N, the number of
vectors that will be in table at the end of the algorithm (4242 in the OP),
and K: the total number of big_data_type objects contained in all vectors
produced by make_vector during the course of the algorithm.
Your Technique
std::vector<std::vector<big_data_type> > table;
table.reserve(N);
std::vector<big_data_type> tmp;
for(auto i=0; i!=N; ++i) {
tmp = make_vector(i); // #1
table.push_back(tmp); // #2
}
// #3
For C++11
At #1, since tmp is already constructed, RVO/copy elision cannot occur. On
every pass through the loop the return value is assigned to tmp. The
assignment is a move: old data in tmp will be destroyed (except on the
first iteration when tmp is empty) and the contents of the return value from
make_vector moved into tmp with no copying taking place. tmp has capacity == size
if and only if make_vector's return value has that property.
At #2, tmp is copied into table. The newly constructed copy in table has
capacity == size as desired. At #3 tmp presumably leaves scope and its
storage is deallocated.
Total allocations/deallocations: N. All allocations at #2, N - 1 deallocations at #1, and one at #3.
Total copies (of big_data_type objects): K.
For Pre-C++11
At #1, since tmp is already constructed, RVO/copy elision cannot occur. On
every pass through the loop the return value is assigned to tmp. This
assignment requires an allocation and a deallocation if either (a) the
implementation provides the strong guarantee, or (b) tmp is too small to
contain all the elements in the returned vector. In any case the elements must
be copied individually. At the end of the full expression, the temporary object
that holds the return value from make_vector is destroyed, resulting in a
deallocation.
At #2, tmp is copied into table. The newly constructed copy in table has
capacity == size as desired. At #3 tmp presumably leaves scope and its
storage is deallocated.
Total allocation/deallocations: N + 1 to 2 * N. 1 to N allocations at #1, N at #2;
N to 2 * N - 1 deallocations at #1, and one at #3.
Total copies: 2 * K. K at #1 and K at #2.
My Technique (C++11-only)
std::vector<std::vector<big_data_type> > table;
table.reserve(N);
for(auto i=0; i!=N; ++i) {
auto tmp = make_vector(i); // #1
tmp.shrink_to_fit(); // #2
table.emplace_back(std::move(tmp)); // #3
}
At #1 tmp is freshly constructed from the return value of make_vector, so
RVO/copy elision is possible. Even if the implementation of make_vector
impedes RVO, tmp will be move-constructed resulting in no allocations,
deallocations, or copies.
At #2 shrink_to_fit may or may not require a single allocation and
deallocation, depending on whether the return value from make_vector already
has the capacity == size property. If allocation/deallocation occurs, the
elements may or may not be copied depending on quality of implementation.
At #3 the contents of tmp are moved into a freshly constructed vector in
table. No allocations/deallocations/copies are performed.
Total allocations/deallocations: 0 or N, all at #2 if and only if make_vector does not return vectors with capacity == size.
Total copies: 0 or K, all at #2 if and only if shrink_to_fit is implemented as a copy.
If the implementor of make_vector produces vectors with the capacity == size
property and the standard library implements shrink_to_fit optimally, there
are no news/deletes and no copies.
Conclusions
Worst case performance of My Technique is the same as expected case performance
of Your Technique. My technique is conditionally optimal.
Here are some run time tests with a helper type that counts creation, moving and copying:
#include <vector>
#include <iostream>
struct big_data_type {
double state;
big_data_type( double d ):state(d) { ++counter; ++create_counter; }
big_data_type():state(0.) { ++counter; }
big_data_type( big_data_type const& o ): state(o.state) { ++counter; }
big_data_type( big_data_type && o ): state(o.state) { ++move_counter; }
big_data_type& operator=( big_data_type const& o ) {
state = o.state;
++counter;
return *this;
}
big_data_type& operator=( big_data_type && o ) {
state = o.state;
++move_counter;
return *this;
}
static int counter;
static int create_counter;
static int move_counter;
};
int big_data_type::move_counter = 0;
int big_data_type::create_counter = 0;
int big_data_type::counter = 0;
std::vector<big_data_type>& make_vector( int i, std::vector<big_data_type>& tmp ) {
tmp.resize(0);
tmp.reserve(1000);
for( int j = 0; j < 10+i/100; ++j ) {
tmp.emplace_back( 100. - j/10. );
}
return tmp;
}
std::vector<big_data_type> make_vector2( int i ) {
std::vector<big_data_type> tmp;
tmp.resize(0);
tmp.reserve(1000);
for( int j = 0; j < 10+i/100; ++j ) {
tmp.emplace_back( 100. - j/10. );
}
return tmp;
}
enum option { a, b, c, d, e };
void test(option op) {
std::vector<std::vector<big_data_type> > table;
std::vector<big_data_type> tmp;
for(int i=0; i!=10; ++i) {
switch(op) {
case a:
table.emplace_back(make_vector(i, tmp));
break;
case b:
tmp = make_vector2(i);
table.emplace_back(tmp);
break;
case c:
tmp = make_vector2(i);
table.emplace_back(std::move(tmp));
break;
case d:
table.emplace_back(make_vector2(i));
break;
case e:
std::vector<big_data_type> result;
make_vector(i, tmp);
result.reserve( tmp.size() );
result.insert( result.end(), std::make_move_iterator( tmp.begin() ),std::make_move_iterator( tmp.end() ) );
table.emplace_back(std::move(result));
break;
}
}
std::cout << "Big data copied or created:" << big_data_type::counter << "\n";
big_data_type::counter = 0;
std::cout << "Big data created:" << big_data_type::create_counter << "\n";
big_data_type::create_counter = 0;
std::cout << "Big data moved:" << big_data_type::move_counter << "\n";
big_data_type::move_counter = 0;
std::size_t cap = 0;
for (auto&& v:table)
cap += v.capacity();
std::cout << "Total capacity at end:" << cap << "\n";
}
int main() {
std::cout << "A\n";
test(a);
std::cout << "B\n";
test(b);
std::cout << "C\n";
test(c);
std::cout << "D\n";
test(d);
std::cout << "E\n";
test(e);
}
Live example
Output:
+ g++ -O4 -Wall -pedantic -pthread -std=c++11 main.cpp
+ ./a.out
A
Big data copied or created:200
Big data created:100
Big data moved:0
Total capacity at end:100
B
Big data copied or created:200
Big data created:100
Big data moved:0
Total capacity at end:100
C
Big data copied or created:100
Big data created:100
Big data moved:0
Total capacity at end:10000
D
Big data copied or created:100
Big data created:100
Big data moved:0
Total capacity at end:10000
E
Big data copied or created:100
Big data created:100
Big data moved:100
Total capacity at end:100
E is an example when your big data can be moved, which often doesn't work.
created refers to only explicitly created data (ie, from the double) -- data "created on purpose". Copied or created refers to any time that any big data is duplicated in a way that the source big data cannot be "discarded". And moved refers to any situation where big data is moved in a way that the source big data can be "discarded".
Case a and b, which are identical in result, are probably want you are looking for. Note the explicit use of the tmp vector as an argument to make_vector: elision won't let you reuse the buffer, you have to be explicit about it.
In addition to Casey's post, I have the following remarks.
As jrok said in a comment here, shrink_to_fit is not guaranteed to do anything. However, if shrink_to_fit allocates the memory for exaclty size() number of elements, copy/move the elements, and deallocate the original buffer, then this is exactly what the OP's asked.
My exact answer to Q4, that is,
Is there are any way to explicitly suppress moving in the last code snipped [...]?
is: Yes, you can do
for(auto i=0; i!=4242; ++i)
table.push_back(static_cast<const std::vector<big_data_type>&>(make_vector(i)));
The copy function suggested by the OP could be written as follow.
template <typename T>
const T& copy(const T& x) {
return x;
}
and the code becomes
for(auto i=0; i!=4242; ++i)
table.push_back(copy(make_vector(i)));
But, honestly, I don't think this is a sensible thing to do.
The best place to make each element v of table such that v.size() == v.capacity() is in make_vector(), if possible. (As Casey said the standard doesn't set any upper bound on capacity.) Then moving the result of make_vector() to table would be optimal in both senses (memory and speed). The OP's snipped should probably take care of table.size() instead.
In summary, the standard doesn't provide any way to force capacity to match size. There was a (sensible, IMHO) suggestion by Jon Kalb to make std::vector::shrink_to_fit at least as efficient (with respect to memory usage) as the shrink_to_fit idiom (which also doesn't guarantee anything). However, some members of the committee were not very keen on it and suggested that people should rather complain with their vendors or implement their own containers and allocation functions.
The vector of vectors construct brings a lot of unnecessary overhead in cases where data is only ever added at the end of the top level vector (as appears to be the case here).
The main issue is the separate buffer allocations and management for each individual entry in the top level vector.
It's much better to concatenate all the sub-entries together into a single contiguous buffer, if possible, with a separate buffer to index into this for each top level entry.
See this article (on my blog), for more discussion about this, and for an example implementation of a 'collapsed vector vector' class to wrap this kind of indexed buffer setup up in a generic container object.
As I said before, this only applies if data only ever gets added at the end of your data structure, i.e. you don't come back later and push entries into arbitrary top level sub vectors, but in cases where this technique applied it can be quite a significant optimisation..
In general, if you want capacity to equal size you can use vector::shrink_to_fit()
http://www.cplusplus.com/reference/vector/vector/shrink_to_fit/
Okay, I think I learned a bit, but couldn't really find a complete answer. So let's first clarify the task:
We have a function filling in a vector. In order to avoid arguments whether copy elision is possible or not, let's just assume that its definition is
void fill_vector(std::vector<big_data_type>& v, int i)
{
v.clear();
v.reserve(large_number); // allocates unless v.capacity() >= large_number
for(int done=0,k=0; k<large_number && !done; ++k)
v.push_back(get_more_big_data(i,done));
// v.capacity() == v.size() is highly unlikely at this point.
}
Further we want to fill the table
std::vector<std::vector<big_data_type>> table;
with N entries, each generated by fill_vector() in such a way that (1) minimises memory usage in the table but (2) avoids unnecessary allocations/de-allocations. In a simple C code, there would be N+2 allocations and 1 de-allocation and only the total number K of big_data_type actually provided by fill_vector() would be allocated. We should not need more with C++. Here is a possible C++ answer
table.reserve(N); // allocates enough space for N vectors
size_t K=0; // count big_data_types in table
std::vector<big_data_type> tmp;
for(int n=0; n!=N; ++n) {
fill_vector(tmp,i); // allocates at first iteration only
K += tmp.size();
table.push_back(tmp.begin(),tmp.end()); // allocates tmp.size() big_data_type
}
// de-allocates tmp
Thus we have N+2 allocations and 1 de-allocation as required and no memory wasted (not more than K big_data_type allocated in table). The push_back calls a constructor of std::vector (without passing information about the capacity of tmp) and implies a copy of each big_data_type. (If big_data_type can be moved, we could use make_move_iterator(tmp.begin()) etc.)
Note that no matter how we code this, we must do at least N+1 allocations (for table and each of its elements). This implies that usage of shrink_to_fit cannot be helpful, because at best it does one allocation and one de-allocation (unless capacity==size which we don't expect to happen with any probability), cancelling each other (so that the allocation cannot contribute towards the required sum of N+1). This is why some other answers were unacceptable.

c++ efficient data structure for appending data with overwriting?

I would like some container that I can very efficiently append a variable amount of elements to, but be able to trigger something so I can start overwriting from the beginning. With a std::list it would look something like this:
while(whatever)
{
for(int i = 0; i < randNumber; ++i)
list.push_back( foo() );
//now want to reset
list.clear();
}
The problem is list.clear() is linear time whereas I would really just like to go back to the beginning and start overwriting from there...I tried with vector using vector[index++] = foo() and replacing the clear with index = 0 but you cant predict randNumber so this does not work...what can I use instead to achieve this?
BTW vector clear does not seem to be constant time even if I have a trivial destructor:
struct rat
{
rat(int* a, int* b) : a_(a), b_(b) {}
int *a_;
int *b_;
};
int main(int argc, char **argv)
{
uint64_t start, end;
int k = 0;
vector<rat> v;
for (int i = 0; i < 9000; ++i)
v.push_back(rat(&k, &k));
start = timer();
v.clear();
end = timer();
cout << end - start << endl;
}
Just replace std::list for std::vector in your code. push_back will increment the size as needed, and clear will remove all elements from the container. Note: std::vector<>::clear() takes linear time on the size of the container, but the operations are destruction of the stored elements (if they have a non-trivial destructor) which you need to do anyway. For types with trivial destructors, std::vector<>::clear() behaves as a constant time operation.
Do you have an upper bound on randNumber? If so, you could use std::vector::reserve() to speed up things. This way, you would have append in O(1) and remove in O(1).
NOTE! If the vector contains data-types with non-trivial destructor, clear takes O(n). However, if the destructor is trivial, clear takes O(1).
Comment from stl_constructor.h:
/**
* Destroy a range of objects. If the value_type of the object has
* a trivial destructor, the compiler should optimize all of this
* away, otherwise the objects' destructors must be invoked.
*/