c++ efficient data structure for appending data with overwriting? - c++

I would like some container that I can very efficiently append a variable amount of elements to, but be able to trigger something so I can start overwriting from the beginning. With a std::list it would look something like this:
while(whatever)
{
for(int i = 0; i < randNumber; ++i)
list.push_back( foo() );
//now want to reset
list.clear();
}
The problem is list.clear() is linear time whereas I would really just like to go back to the beginning and start overwriting from there...I tried with vector using vector[index++] = foo() and replacing the clear with index = 0 but you cant predict randNumber so this does not work...what can I use instead to achieve this?
BTW vector clear does not seem to be constant time even if I have a trivial destructor:
struct rat
{
rat(int* a, int* b) : a_(a), b_(b) {}
int *a_;
int *b_;
};
int main(int argc, char **argv)
{
uint64_t start, end;
int k = 0;
vector<rat> v;
for (int i = 0; i < 9000; ++i)
v.push_back(rat(&k, &k));
start = timer();
v.clear();
end = timer();
cout << end - start << endl;
}

Just replace std::list for std::vector in your code. push_back will increment the size as needed, and clear will remove all elements from the container. Note: std::vector<>::clear() takes linear time on the size of the container, but the operations are destruction of the stored elements (if they have a non-trivial destructor) which you need to do anyway. For types with trivial destructors, std::vector<>::clear() behaves as a constant time operation.

Do you have an upper bound on randNumber? If so, you could use std::vector::reserve() to speed up things. This way, you would have append in O(1) and remove in O(1).
NOTE! If the vector contains data-types with non-trivial destructor, clear takes O(n). However, if the destructor is trivial, clear takes O(1).
Comment from stl_constructor.h:
/**
* Destroy a range of objects. If the value_type of the object has
* a trivial destructor, the compiler should optimize all of this
* away, otherwise the objects' destructors must be invoked.
*/

Related

Specifying the size of a vector in declaration vs using reserve [duplicate]

I know the size of a vector, which is the best way to initialize it?
Option 1:
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
Option 2:
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess, Option2 is better than Option1. Is it? Any other options?
Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
For a simple rule to follow:
Option #1 is faster...
...but this probably shouldn't be your biggest concern.
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3):
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
main.cpp:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
rng.hpp
timer.hpp
Both variants have different semantics, i.e. you are comparing apples and oranges.
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
Choose what better fits your needs, i.e. what is "better" in a certain situation.
The "best" way would be:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
* From Stroustrups C++ Style and Technique FAQ:
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!
Constructor
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.
Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.
Reserve
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.
Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.
Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.
List initialisation
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.
Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
For C-like types it will probably be the same.
How it Works
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.
When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
Why to Select Option 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
Why to Select Option 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.
Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.
Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
Conclusion: use method #2.
In the long run, it depends on the usage and numbers of the elements.
Run the program below to understand how the compiler reserves space:
vector<int> vec;
for(int i=0; i<50; i++)
{
cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl;
vec.push_back(i);
}
size is the number of actual elements and capacity is the actual size of the array to imlement vector.
In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.
Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
It is "better" to have simpler code that does the same job.
I think answer may depend on situation. For instance:
Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
auto vec2 = std::vector<example>{};
vec2.reserve(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
For this case, transform will call copy and move constructors.
The output of the transform part:
copy
move
copy
move
copy
move
Now lets remove the reserve and use the constructor.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
std::vector<example> vec2(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
And in this case transform part produces:
copy
move
move
move
move
copy
move
copy
move
As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.

Moving from old C-style pointer to C++ smart pointers with little changes in the code?

I have a function in which the nodes of a binary 'tree' are populated with values recursively computed based on the input vector, which represents the values on the leaves. An old C++ implementation of the function is as follows
using namespace std;
double f(const size_t h, vector<double>& input) {
double** x;
x = new double*[h+1];
x[0] = input.data();
for (size_t l = 1; l <= h; l++)
x[l] = new double[1<<(h-l)];
// do the computations on the tree where x[l][n] is the value
// on the node n at level l.
result = x[l][0];
for (size_t l = 1; l <= h; l++)
delete[] x[l];
delete[] x;
return result;
}
Now I'm trying to write a 'modern' implementation of the code using smart pointers in C++11/C++14. I attempted to define x using std::unique_ptr specialization for arrays so that I do not have to change the 'computation' procedure. The obvious problem with such an approach is that the contents of `input' will be deleted at the end of the function (because the unique pointer that takes the ownership of the data will be destroyed at the end of the function).
One simple (and perhaps safe) solution would be to allocate the memory for the whole tree (including the leaves) in x and copy the values of the leaves from input to x[0] in the beginning of the function (in this case I can even used nested std::vectors instead of std::unique_ptrs specialized for arrays as the type of x). But I prefer to avoid the cost of copying.
Alternatively one can change the computational procedures to read the values of the leaves directly from input not from x which requires changing too many small pieces of the code.
Is there any other way to do this?
C++11/14 didn't really introduce anything that wasn't already achievable prior using the modern std::vector for managing the memory of dynamic arrays.
The obvious problem with [std::unique_ptr] is that the contents of `input' will be deleted at the end of the function
Indeed. You may not "steal" the buffer of the input vector (except into another vector, by swapping or moving). This would lead to undefined behaviour.
Alternatively one can change the computational procedures to read the values of the leaves directly from input not from x which requires changing too many small pieces of the code.
This alternative makes a lot of sense. It is unclear why the input vector must be pointed by x[0]. The loops start from 1, so it appears to not be used by them. If it is only ever referenced directly, then it would make much more sense to use the input argument itself. With the shown code, I expect that this would simplify your function greatly.
Also the fact that the input is not taken as const std::vector& bothers me.
This is another reason to not point to the input vector from the modifiable x[0]. The limitation can however be worked around using const_cast. This is the kind of situation what const_cast is for.
Let us assume henceforth that it makes sense for the input to be part of the local array of arrays.
One simple (and perhaps safe) solution would be to allocate the memory for the whole tree (including the leaves) in x ... I can even used nested std::vectors ... But I prefer to avoid the cost of copying.
You don't necessarily need to copy if you use a vector of vectors. You can swap or move the input vector into x[0]. Once the processing is complete, you can restore the input if so desired by swapping or moving back. None of this is necessary if you keep the input separate as suggested.
I suggest another approach. The following suggestion is primarily a performance optimization, since it reduces the number of allocations to 2. As a bonus, it just so happens to also easily fit with your desire to point to input vector from the local array of arrays. The idea is to allocate all of the tree in one flat vector, and allocate another vector for bare pointers into the content vector.
Here is an example that uses the input vector as x[0], but it is easy to change if you choose to use input directly.
double f(const size_t h, const std::vector<double>& input) {
std::vector<double*> x(h + 1);
x[0] = const_cast<double*>(input.data()); // take extra care not to modify x[0]
// (1 << 0) + (1 << 1) + ... + (1 << (h-1)) == (1 << h) - 1
std::vector<double> tree((1 << h) - 1);
for (std::size_t index = 0, l = 1; l <= h; l++) {
x[l] = &tree[index];
index += (1 << (h - l));
}
// do the computations on the tree where x[l][n] is the value
// on the node n at level l.
return x[l][0];
}
This certainly looks like a job for a std::vector<std::vector<double>>, not std::unique_ptr, but with the additional complexity that you conceptually want the vector to own only a part of its contents, while the first element is a non-owned reference to the input vector (and not a copy).
That's not directly possible, but you can add an additional layer of indirection to achieve the desired effect. If I understand your problem correctly, you want to behave x such that it supports an operator[] where an argument of 0 refers to input, whereas arguments > 0 refer to data owned by x itself.
I'd write a simple container implemented in terms of std::vector for that. Here is a toy example; I've called the container SpecialVector:
#include <vector>
double f(const std::size_t h, std::vector<double>& input) {
struct SpecialVector {
SpecialVector(std::vector<double>& input) :
owned(),
input(input)
{}
std::vector<std::vector<double>> owned;
std::vector<double>& input;
std::vector<double>& operator[](int index) {
if (index == 0) {
return input;
} else {
return owned[index - 1];
}
}
void add(int size) {
owned.emplace_back(size);
}
};
SpecialVector x(input);
for (std::size_t l = 1; l <= h; l++)
x.add(1<<(h-l));
// do the computations on the tree where x[l][n] is the value
// on the node n at level l.
auto result = x[1][0];
return result;
}
int main() {
std::vector<double> input { 1.0, 2.0, 3.0 };
f(10, input);
}
This approach allows the rest of the legacy code to continue to use [] exactly as it did before.
Write a class Row, which contains a flag for ownership controlling destruction behavior and implement operator[], then create a vector of row.
As noted above, you have issues if input is constant, as you cannot explicitly enforce it at compiler level, and you have to be careful not to write where you cannot, but this is not worse then what you have now.
I have not tried to compile it, but your new Row class could look a bit like this.
class Row
{
double *p;
bool f;
public:
Row() :p(0), f(false) {}
void init(size_t n) { p = new double[n]; f=true; }
void init(double *d) { p=d;, f=false;}
double operator[](size_t i) { return p[i]; }
~Row() { if (flag) delete[] p; }
};

How can I make my dynamic array or vector operate at a similar speed to a standard array? C++

I'm still quite inexperienced in C++ and i'm trying to write sum code to add numbers precisely. This is a dll plugin for some finite difference software and the code is called several million times during a run. I want to write a function where any number of arguments can be passed in and the sum will be returned. My code looks like:
#include <cstdarg>
double SumFunction(int numArgs, ...){ // this allows me to pass any number
// of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg
double myarray[10];
for (int i = 0; i < numArgs; i++) {
myarray[i] = va_arg(args,double);
} // I imagine this is sloppy code; however i cannot create
// myarray{numArgs] because numArgs is not a const int.
sum(myarray); // The actual method of addition is not relevant here, but
//for more complicated methods, I need to put the summation
// terms in a list.
vector<double> vec(numArgs); // instead, place all values in a vector
for (int i = 0; i < numArgs; i++) {
vec.at(i) = va_arg(args,double);
}
sum(vec); //This would be passed by reference, of course. The function sum
// doesn't actually exist, it would all be contained within the
// current function. This is method is twice as slow as placing
//all the values in the static array.
double *vec;
vec = new double[numArgs];
for (int i = 0; i < (numArgs); i++) {
vec[i] = va_arg(args,double);
}
sum(vec); // Again half of the speed of using a standard array and
// increasing in magnitude for every extra dynamic array!
delete[] vec;
va_end(args);
}
So the problem I have is that using an oversized static array is sloppy programming, but using either a vector or a dynamic array slows the program down considerably. So I really don't know what to do. Can anyone help, please?
One way to speed the code up (at the cost of making it more complicated) is to reuse a dynamic array or vector between calls, then you will avoid incurring the overhead of memory allocation and deallocation each time you call the function.
For example declare these variables outside your function either as global variables or as member variables inside some class. I'll just make them globals for ease of explanation:
double* sumArray = NULL;
int sumArraySize = 0;
In your SumFunction, check if the array exists and if not allocate it, and resize if necessary:
double SumFunction(int numArgs, ...){ // this allows me to pass any number
// of arguments to my function.
va_list args;
va_start(args,numArgs); //necessary prerequisites for using cstdarg
// if the array has already been allocated, check if it is large enough and delete if not:
if((sumArray != NULL) && (numArgs > sumArraySize))
{
delete[] sumArray;
sumArray = NULL;
}
// allocate the array, but only if necessary:
if(sumArray == NULL)
{
sumArray = new double[numArgs];
sumArraySize = numArgs;
}
double *vec = sumArray; // set to your array, reusable between calls
for (int i = 0; i < (numArgs); i++) {
vec[i] = va_arg(args,double);
}
sum(vec, numArgs); // you will need to pass the array size
va_end(args);
// note no array deallocation
}
The catch is that you need to remember to deallocate the array at some point by calling a function similar to this (like I said, you pay for speed with extra complexity):
void freeSumArray()
{
if(sumArray != NULL)
{
delete[] sumArray;
sumArray = NULL;
sumArraySize = 0;
}
}
You can take a similar (and simpler/cleaner) approach with a vector, allocate it the first time if it doesn't already exist, or call resize() on it with numArgs if it does.
When using a std::vector the optimizer must consider that relocation is possible and this introduces an extra indirection.
In other words the code for
v[index] += value;
where v is for example a std::vector<int> is expanded to
int *p = v._begin + index;
*p += value;
i.e. from vector you need first to get the field _begin (that contains where the content starts in memory), then apply the index, and then dereference to get the value and mutate it.
If the code performing the computation on the elements of the vector in a loop calls any unknown non-inlined code, the optimizer is forced to assume that unknown code may mutate the _begin field of the vector and this will require doing the two-steps indirection for each element.
(NOTE: that the vector is passed with a cost std::vector<T>& reference is totally irrelevant: a const reference doesn't mean that the vector is const but simply puts a limitation on what operations are permitted using that reference; external code could have a non-const reference to access the vector and constness can also be legally casted away... constness of references is basically ignored by the optimizer).
One way to remove this extra lookup (if you know that the vector is not being resized during the computation) is to cache this address in a local and use that instead of the vector operator [] to access the element:
int *p = &v[0];
for (int i=0,n=v.size(); i<n; i++) {
/// use p[i] instead of v[i]
}
This will generate code that is almost as efficient as a static array because, given that the address of p is not published, nothing in the body of the loop can change it and the value p can be assumed constant (something that cannot be done for v._begin as the optimizer cannot know if someone else knows the address of _begin).
I'm saying "almost" because a static array only requires indexing, while using a dynamically allocated area requires "base + indexing" access; most CPUs however provide this kind of memory access at no extra cost. Moreover if you're processing elements in sequence the indexing addressing becomes just a sequential memory access but only if you can assume the start address constant (i.e. not in the case of std::vector<T>::operator[]).
Assuming that the "max storage ever needed" is in the order of 10-50, I'd say using a local array is perfectly fine.
Using vector<T> will use 3 * sizeof(*T) (at least) to track the contents of the vector. So if we compare that to an array of double arr[10];, then that's 7 elements more on the stack of equal size (or 8.5 in 32-bit build). But you also need a call to new, which takes a size argument. So that takes up AT LEAST one, more likely 2-3 elements of stackspace, and the implementation of new is quite possibly not straightforward, so further calls are needed, which take up further stack-space.
If you "don't know" the number of elements, and need to cope with quite large numbers of elements, then using a hybrid solution, where you have a small stack-based local array, and if numargs > small_size use vector, and then pass vec.data() to the function sum.

Vector: initialization or reserve?

I know the size of a vector, which is the best way to initialize it?
Option 1:
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
Option 2:
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess, Option2 is better than Option1. Is it? Any other options?
Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
For a simple rule to follow:
Option #1 is faster...
...but this probably shouldn't be your biggest concern.
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3):
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
main.cpp:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
rng.hpp
timer.hpp
Both variants have different semantics, i.e. you are comparing apples and oranges.
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
Choose what better fits your needs, i.e. what is "better" in a certain situation.
The "best" way would be:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back and stop worrying about getting the indexes right.
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
* From Stroustrups C++ Style and Technique FAQ:
People sometimes worry about the cost of std::vector growing
incrementally. I used to worry about that and used reserve() to
optimize the growth. After measuring my code and repeatedly having
trouble finding the performance benefits of reserve() in real
programs, I stopped using it except where it is needed to avoid
iterator invalidation (a rare case in my code). Again: measure before
you optimize.
While your examples are essentially the same, it may be that when the type used is not an int the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve. Just don't fall into the trap I did and use reserve and then the operator[] for initialisation!
Constructor
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size and numberOfElementsToStart will be equal and capacity will be greater than or equal to them.
Think of myVec as a vector containing a number of items of MyType which can be accessed and modified, push_back(anotherInstanceOfMyType) will append it the the end of the vector.
Reserve
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve function, size will be 0 until you add an element to the array and capacity will be equal to or greater than numberOfElementsToStart.
Think of myVec as an empty vector which can have new items appended to it using push_back with no memory allocation for at least the first numberOfElementsToStart elements.
Note that push_back() still requires an internal check to ensure that size < capacity and to increment size, so you may want to weigh this against the cost of default construction.
List initialisation
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size will be equal to the number of elements you initialised it with, and capacity will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.
Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
For C-like types it will probably be the same.
How it Works
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr by default.
When you call vector<int> vec(N); as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
When you do vec.reserve(N); after default constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
Why to Select Option 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
Why to Select Option 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.
Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
Method #1: we pass an initial size parameter into the vector, let's call it n. That means the vector is filled with n elements, which will be initialized to their default value. For example, if the vector holds ints, it will be filled with n zeros.
Method #2: we first create an empty vector. Then we reserve space for n elements. In this case, we never create the n elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
* better - real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
Conclusion: use method #2.
In the long run, it depends on the usage and numbers of the elements.
Run the program below to understand how the compiler reserves space:
vector<int> vec;
for(int i=0; i<50; i++)
{
cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl;
vec.push_back(i);
}
size is the number of actual elements and capacity is the actual size of the array to imlement vector.
In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.
Another option is to Trust Your Compiler(tm) and do the push_backs without calling reserve first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
It is "better" to have simpler code that does the same job.
I think answer may depend on situation. For instance:
Lets try to copy simple vector to another vector. Vector hold example class which has only integer. In first example lets use reserve.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
auto vec2 = std::vector<example>{};
vec2.reserve(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
For this case, transform will call copy and move constructors.
The output of the transform part:
copy
move
copy
move
copy
move
Now lets remove the reserve and use the constructor.
#include <iostream>
#include <vector>
#include <algorithm>
class example
{
public:
// Copy constructor
example(const example& p1)
{
std::cout<<"copy"<<std::endl;
this->a = p1.a;
}
example(example&& o) noexcept
{
std::cout<<"move"<<std::endl;
std::swap(o.a, this->a);
}
example(int a_)
{
std::cout<<"const"<<std::endl;
a = a_;
}
example()
{
std::cout<<"Def const"<<std::endl;
}
int a;
};
int main()
{
auto vec = std::vector<example>{1,2,3};
std::vector<example> vec2(vec.size());
auto dst_vec2 = std::back_inserter(vec2);
std::cout<<"transform"<<std::endl;
std::transform(vec.begin(), vec.end(),
dst_vec2, [](const example& ex){ return ex; });
}
And in this case transform part produces:
copy
move
move
move
move
copy
move
copy
move
As it is seen, for this specific case, reserve prevents extra move operations because there is no initialized object to move.

delete [] performance issues

I wrote a program, which computes the flow shop scheduling problem.
I need help with optimizing the slowest parts of my program:
Firstly there is array 2D array allocation:
this->_perm = new Chromosome*[f];
//... for (...)
this->_perm[i] = new Chromosome[fM1];
It works just fine, but a problem occurs, when I try to delete array:
delete [] _perm[i];
It takes extremely long to execute line above. Chromosome is array of about 300k elements - allocating it takes less than a second but deleting takes far more than a minute.
I would appreciate any suggestions of improving delete part.
On a general note, you should never manually manage memory in C++. This will lead to leaks, double-deletions and all kinds of nasty inconveniences. Use proper resource-handling classes for this. For example, std::vector is what you should use for managing a dynamically allocated array.
To get back to your problem at hand, you first need to know what delete [] _perm[i] does: It calls the destructor for every Chromosome object in that array and then frees the memory. Now you do this in a loop, which means this will call all Chromosome destructors and perform f deallocations. As was already mentioned in a comment to your question, it is very likely that the Chromosome destructor is the actual culprit. Try to investigate that.
You can, however, change your memory handling to improve the speed of allocation and deallocation. As Nawaz has shown, you could allocate one big chunk of memory and use that. I'd use a std::vector for a buffer:
void f(std::size_t row, std::size_t col)
{
int sizeMemory = sizeof(Chromosome) * row * col;
std::vector<unsigned char> buffer(sizeMemory); //allocation of memory at once!
vector<Chromosome*> chromosomes(row);
// use algorithm as shown by Nawaz
std::size_t j = 0 ;
for(std::size_t i = 0 ; i < row ; i++ )
{
//...
}
make_baby(chromosomes); //use chromosomes
in_place_destruct(chromosomes.begin(), chromosomes.end());
// automatic freeing of memory holding pointers in chromosomes
// automatic freeing of buffer memory
}
template< typename InpIt >
void in_place_destruct(InpIt begin, InpIt end)
{
typedef std::iterator_traits<InpIt>::value_type value_type; // to call dtor
while(begin != end)
(begin++)->~value_type(); // call dtor
}
However, despite handling all memory through std::vector this still is not fully exception-safe, as it needs to call the Chromosome destructors explicitly. (If make_baby() throws an exception, the function f() will be aborted early. While the destructors of the vectors will delete their content, one only contains pointers, and the other treats its content as raw memory. No guard is watching over the actual objects created in that raw memory.)
The best solution I can see is to use a one-dimensional arrays wrapped in a class that allows two-dimensional access to the elements in that array. (Memory is one-dimensional, after all, on current hardware, so the system is already doing this.) Here's a sketch of that:
class chromosome_matrix {
public:
chromosome_matrix(std::size_t row, std::size_t col)
: row_(row), col_(col), data_(row*col)
{
// data_ contains row*col constructed Chromosome objects
}
// note needed, compiler generated dtor will do the right thing
//~chromosome_matrix()
// these rely on pointer arithmetic to access a column
Chromosome* operator[](std::size_t row) {return &data_[row*col_];}
const Chromosome* operator[](std::size_t row) const {return &data_[row*col_];}
private:
std::size_t row_;
std::size_t col_;
std::vector<chromosomes> data_
};
void f(std::size_t row, std::size_t col)
{
chromosome_matrix cm(row, col);
Chromosome* column = ch[0]; // get a whole column
Chromosome& chromosome1 = column[0]; // get one object
Chromosome& chromosome2 = cm[1][2]; // access object directly
// make baby
}
check your destructors.
If you were allocating a built-in type (eg an int) then allocating 300,000 of them would be more expensive than the corresponding delete. But that's a relative term, 300k allocated in a single block is pretty fast.
As you're allocating 300k Chromosomes, the allocator has to allocate 300k * sizeof the Chromosome object, and as you say its fast - I can't see it doing much beside just that (ie the constructor calls are optimised into nothingness)
However, when you come to delete, it not only frees up all that memory, but it also calls the destructor for each object, and if its slow, I would guess that the destructor for each object takes a small, but noticeable, time when you have 300k of them.
I would suggest you to use placement new. The allocation and deallocation can be done just in one statement each!
int sizeMemory = sizeof(Chromosome) * row * col;
char* buffer = new char[sizeMemory]; //allocation of memory at once!
vector<Chromosome*> chromosomes;
chromosomes.reserve(row);
int j = 0 ;
for(int i = 0 ; i < row ; i++ )
{
//only construction of object. No allocation!
Chromosome *pChromosome = new (&buffer[j]) Chromosome[col];
chromosomes.push_back(pChromosome);
j = j+ sizeof(Chromosome) * col;
}
for(int i = 0 ; i < row ; i++ )
{
for(int j = 0 ; j < col ; j++ )
{
//only destruction of object. No deallocation!
chromosomes[i][j].~Chromosome();
}
}
delete [] buffer; //actual deallocation of memory at once!
std::vector can help.
Special memory allocators too.