Resources management - vector and pointers - c++

I need to store a sequence of elements of type ThirdPartyElm, and I'm using a std::vector (or a std::array if I need a fixed size sequence).
I'm wondering how I should initialise the sequence. The first version creates a new element and (if I'm right) creates a copy of the element when it is inserted in the sequence:
for (int i = 0; i < N; i++)
{
auto elm = ThirdPartyElm();
// init elm..
my_vector.push_back(elm); // my_array[i] = elm;
}
The second version stores a sequence of pointers (or better smart pointers with c++11):
for (int i = 0; i < N; i++)
{
std::unique_ptr<ThirdPartyElm> elm(new ThirdPartyElm());
// init elm..
my_vector.push_back(std::move(elm)); // my_array[i] = std::move(elm);
}
Which is the most lightweight version?
Please highlight any errors.

You can just declare it with the size, and it will call the default constructor on those elements.
std::vector<ThirdPartyElem> my_vector(N);
As far as your statement
The first version creates a new element and (if I'm right) creates a copy of the element when it is inserted in the sequence
Don't worry about that. Since ele is a local variable that is about to fall out of scope, your compiler will likely use copy elision such that a move will be invoked instead of a copy.
I was mistaken about the above, please disregard that.

Avoid dynamic allocation whenever you can. Thus, generally prefer saving the elements themselves instead of smart-pointers to them in the vector.
That said, either is fine, and if ThirdPartyElem is polymorphic, you wouldn't have a choice.
Other considerations are the cost and possibility of moving and copying the type, though generally don't worry.
There are two refinements to option one which might be worthwhile though:
std::move the new element to its place, as that is probably less expensive than copying (which might not even be possible).
If the type is only copyable and not movable (legacy, ask for update), that falls back to copying.
Try to construct it in-place, to eliminate copy or move, and needless destruction.
for (int i = 0; i < N; i++)
{
my_vector.emplace_back();
try {
auto&& elm = my_vector.back();
// init elm..
} catch(...) {
my_vector.pop_back();
throw;
}
}
If the initialization cannot throw, the compiler will remove the exception-handling (or you can just omit it).

Focusing on a slightly different aspect from other answers, you are using push_back().
Instead of that if you know the size before entering the loop, please consider doing
my_vector.resize(N);
This way, you will be able to do the array style element insertion.
my_vector[i] = elem;
You may ask, what are the advantages:
push_back() does a bounds check everytime, it wants to insert a new element.
If you didn't do a reserve(), a push_back() may occasionally incur the resizing penalty.
In the case of a large enough array, the resizings, may involve copying a lot of elements.
Even if you did a reserve(N) or construct the vector(N), it must still do a bounds-check!
Of course, this approach is better if you are dealing with (smart or otherwise)pointers, as opposed to fat objects. The construction costs have to be weighed before taking this approach.
In my measurements, I have seen at least 1.2x performance improvement by going with resize() approach.

Storing pointers means you then have to clean those up after, or rely on smart pointers to do it for you, which adds unnecessary indirection and overhead.
As Cyber mentions, copy elision may prevent a copy, but you already explicitly avoid that by using std::move.
Since you mention C++11, I would suggest using emplace_back - push_back with std::move should have the same result (see answers to this question) but it's better practice to use emplace_back just on principle really; the other optimisation you can undertake, and the one most likely to have a major effect, is reserving the correct size in the vector at the start to ensure there are no unnecessary reallocations:
my_vector.reserve(N);
for (int i = 0; i < N; i++)
{
auto elm = ThirdPartyElm();
// init elm..
my_vector.emplace_back(std::move(elm));
}
Edit: As per #Chris Drew's comment, this is not an effective optimisation if the type is not moveable. A more robust optimisation in that case, if construction is costly and copy-construction is to be avoided if possible, would be to emplace_back and then modify the newly emplaced element:
my_vector.reserve(N);
for (int i = 0; i < N; i++)
{
my_vector.emplace_back(ThirdPartyElm());
my_vector.back().initialise(); // or whatever
}
There is slight additional overhead in accessing myvector.back() but this will be less costly than copy construction for non-trivial types.

Related

Creating std::vector of nonmovable type

I have a std::vector named args (I don’t know the size of the vector at compile time) and a non movable type NonMoveable.
I want to create a vector of the same size as args, so that it equals to
{NonMovable(args[0], additional_arg), NonMovable(args[1], additional_arg), …, NonMovable(args.back(), additional_arg)}
I don’t need to change the size of the vector later. How do I do that?
I can’t reserve() then emplace_back() because emplace_back() requires moving (to allow reallocation which is not possible in my case)
I do not want to use std::list because it is not contiguous.
If you want the elements to be contiguous, you could use the good old 2 times dynamic array construction:
// allocate a dynamic array
NonMoveable *mv = std::allocator<NonMoveable>().allocate(args.size());
// use inplace new to construct the NonMoveable elements
for (unsigned int i = 0; i < args.size(); i++) {
new(mv + i) NonMoveable(args[i], additional_arg);
}
... // use the dynamic array
// Explicitely delete the elements
for (unsigned int i = 0; i < args.size(); i++) {
mv[i].~NonMoveable();
}
// and de-allocate
std::allocator<NonMoveable>().deallocate(mv, args.size());
It is rather C-ish but meets the contiguous requirement. Of course this should be encapsulated in a custom container to allow automatic destruction and de-allocation at container destruction.
You can:
Have a vector<unique_ptr<T>> or vector<optional<T>> or vector<some_other_defer_storage_mechanism<T>> instead of just vector<T> - these are all wrapper types that adding some functionality T without affecting T (unique_ptr<T> makes it movable, optional<T> ensures default construction so you can construct with the right size then emplace() within the optional, etc.)
Use deque<T> which does not require movability for emplace_back (although you lose Contiguity)
Write your own dynamic array that is roughly equivalent to a pair<unique_ptr<T[]>, size_t> that just allocates space for n Ts and then placement-news onto each of them, ensuring that destruction does the right thing. This isn't so bad to implement - since you won't be changing the size, you need to support a very minimal amount of overall operations.
Whichever one of these is the best answer really depends.

are C++ structs fully copied or just referenced when assigned with '='?

If structs are fully copied, then the first loop is more expensive than the second one, because it is performing an additional copy for each element of v.
vector<MyStruct> v;
for (int i = 0; i < v.size(); ++i) {
MyStruct s = v[i];
doSomething(s);
}
for (int i = 0; i < v.size(); ++i) {
doSomething(v[i]);
}
Suppose I want to write efficient code (as in loop 2) but at the same time I want to name the MyStruct elements that I draw from v (as in loop 1). Can I do that?
Structs (and all variables for that matter) are indeed fully copied when you use =. Overloading the = operator and the copy constructor can give you more control over what happens, but there is no way you can use these to change the behavior from copying to referencing. You can work around this by creating a reference like this:
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i]; //& creates reference; no copying performed
doSomething(s);
}
Note that the struct will still be fully copied when you pass it to the function, unless the argument is declared as a reference. This is a common pattern when taking structs as arguments. For instance,
void doSomething(structType x);
Will generally perform poorer than
void doSomething(const structType& x);
If sizeof structType is greater than sizeof structType*. The const is used to prevent the function from modifying the argument, imitating pass-by-value behavior.
In your first example, the object will be copied over and you will have to deal with the cost of the overhead of the copy.
If you don't want the cost of the over head, but still want to have a local object then you could use a reference.
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i];
doSomething(s);
}
You can use references or pointers to avoid copying and having a name to relate to.
vector<MyStruct> v;
for (int i = 0; i < v.size(); ++i) {
MyStruct& s = v[i];
doSomething(s);
}
However since you use a vector for your container, using iterators might be a good idea. doSomething should take argument by const ref though otherwise, you'll still copy to pass argument to it.
vector<MyStruct> v;
for (vector<MyStruct>::iterator it = v.begin(); it != v.end(); ++it) {
doSomething(*it);
}
In your examples, you are creating copies. However not all uses of operator '=' will result in a copy. C++11 allows for 'move construction' or 'move assignment' in which case you aren't actually copying the data; instead, you're just (hopefully) making a high-speed move from one structure to another. (Naturally, what it ACTUALLY does is entirely dependent upon how the move constructor or move assignment operator is implemented, but that's the intent.)
For example:
std::vector<int> foo(); // returns a long vector
std::vector<int> myVector = std::move(foo());
Will cause a MOVE construction, which hopefully just performs a very efficient re-pointing of the memory in the new myVector object, meaning that you don't have to copy the huge amount of data.
Don't forget, however, about the return-value optimization, as well. This was just a trivial example. RVO is actually superior to move semantics when it can be used. RVO allows the compiler to simply avoid any copying or moving at all when an object is returned, instead just using it directly on the stack where it was returned (see http://en.wikipedia.org/wiki/Return_value_optimization). No constructor is called at all.
Copied*. Unless you overload the assignment operator. Also, Structs and Classes in C++ are the same in this respect, their copy behaviour does not differ as it does in c#.
If you want to dive deep into C++ you can also look up the move operator, but it is generally best to ignore that for beginners.
C++ does not have garbage collection, and gives more control over memory management. If you want behaviour similar to c# references, you can use pointers. If you use pointers, you should use them with smart pointers (What is a smart pointer and when should I use one?).
* Keep in mind, if the struct stores a pointer, the pointer in a copied struct will point to the same location. If the object in that location is changed, both structs' pointers will see the changed object.
P.S: I assume you come from a c# background based on the vocabulary in your question.

Controlling memory with std::vector<T*>

Suppose that T contains an array whose size may vary depending on initialization. I'm passing a pointer to the vector to avoid copying all the data, and initialize as follows:
for(int i=10; i < 100; i++)
std::vector.push_back(new T(i));
On exiting, one deletes the element's of the vector. Is there a risk of memory loss if the data contained in T is also a pointer, even if there are good destructors? Eg
template<class M> class T{
M * Array;
public:
T(int i) : Array(new M[i]){ }
~T(){ delete Array;}
};
There are two major problems with your class T:
You use delete rather than delete [] to delete the array, giving undefined behaviour
You don't implement (or delete) the copy constructor and copy-assignment operator (per the Rule of Three), so there's a danger of two objects both trying to delete the same array.
Both of these can be solved easily by using std::vector rather than writing your own version of it.
Finally, unless you have a good reason (such as polymorphism) to store pointers, use std::vector<T> so that you don't need to manually delete the elements. It's easy to forget to do this when removing an element or leaving the vector's scope, especially when an exception is thrown. (If you do need pointers, consider unique_ptr to delete the objects automatically).
The answer is: don't.
Either use
std::vector<std::vector<M>> v;
v.emplace_back(std::vector<M>(42)); // vector of 42 elements
or (yuck)
std::vector<std::unique_ptr<M[]>> v;
// C++11
std::unique_ptr<M[]> temp = new M[42]; // array of 42 elements
v.emplace_back(temp);
// C++14 or with handrolled make_unique
v.emplace_back(std::make_unique<M[]>(42);
which both do everything for you with minimal overhead (especially the last one).
Note that calling emplace_back with a new argument is not quite as exception-safe as you would want, even when the resulting element will be a smart pointer. To make it so, you need to use std::make_unique, which is in C++14. Various implementations exist, and it needs nothing special. It was just omitted from C++11, and will be added to C++14.

How to insert objects in a vector efficiently and correctly

Suppose I want to declare a vector of objects. I can do it this way -
vector<mynode> nodes;
But if the size of the mynode is large, this would be bad. So I think of doing it this way -
vector<mynode*> nodes;
But the above declaration has an obvious problem that I'm storing addresses and it's not safe at all. For instance, if i add objects in a foor loop -
vector<mynode*> nodes;
for (int i=0; i<10; i++)
{
mynode mn;
nodes.push_back(&mn);
}
This will lead to errors as I can never guarantee if the contents of the pointer are actually ok.
So, I decide to use this declaration -
vector<mynode&> nodes;
for (int i=0; i<10; i++)
{
mynode mn;
nodes.push_back(mn);
}
is this ok? safe? It gives a compilation with the first line itself. Please suggest some efficient way of storing the objects in a vector. thanks a lot.
I can do it this way -
vector<mynode> nodes;
But if the size of the mynode is large, this would be bad.
No, it would not. You need to store the objects anyway. If you are worried about copying large objects, you have some solutions:
Use std::vector<std::unique_ptr<my_node>> (or another smart pointer), which automatically releases objects on destruction. This is the best solution if my_node is polymorphic.
Use std::vector<my_node> and use the emplace_back function to construct objects in place (beware if you're using visual studio 2010, this function does not do what it is supposed to do).
Still use std::vector<my_node> and use push_back with a rvalue reference, as in
v.push_back(std::move(some_node));
to move already constructed objects.
Anyway, a good rule of thumb is to have the copy constructor/assignment deleted (or private) for most non-lightweight objects. Containers are still functional (provided, again, that you use C++11) and your concerns are moot.
Using references gives is essentially the same as using pointers (it's just that you don't need to dereference them in code).
If you want to automatically ensure that the objects inserted to vector don't get deleted without copying them, you should use smart pointers from boost or c++11.
vector< smart_ptr<mynode> > nodes;
for (int i=0; i<10; i++)
{
smart_ptr<mynode> mn = new mynode();
nodes.push_back(mn);
}
I dont see pointer being so bad here. It's not void or something. Inserting reference as in your example saves a reference to a temporary object located on a stack an this will go out of scope...

Is there a C++ container with reasonable random access that never calls the element type's copy constructor?

I need a container that implements the following API (and need not implement anything else):
class C<T> {
C();
T& operator[](int); // must have reasonably sane time constant
// expand the container by default constructing elements in place.
void resize(int); // only way anything is added.
void clear();
C<T>::iterator begin();
C<T>::iterator end();
}
and can be used on:
class I {
public:
I();
private: // copy and assignment explicate disallowed
I(I&);
I& operator=(I&);
}
Dose such a beast exist?
vector<T> doesn't do it (resize moves) and I'm not sure how fast deque<T> is.
I don't care about allocation
Several people have assumed that the reason I can't do copies is memory allocation issues. The reason for the constraints is that the element type explicitly disallows copying and I can't change that.
Looks like I've got my answer: STL doesn't have one. But now I'm wondering Why not?
I'm pretty sure that the answer here is a rather emphatic "No". By your definition, resize() should allocate new storage and initialize with the default constructor if I am reading this correctly. Then you would manipulate the objects by indexing into the collection and manipulating the reference instead of "inserting" into the collection. Otherwise, you need the copy constructor and assignment operator. All of the containers in the Standard Library have this requirement.
You might want to look into using something like boost::ptr_vector<T>. Since you are inserting pointers, you don't have to worry about copying. This would require that you dynamically allocate all of your objects though.
You could use a container of pointers, like std::vector<T*>, if the elements cannot be copied and their memory is managed manually elsewhere.
If the vector should own the elements, something like std::vector< std::shared_ptr<T> > could be more appropriate.
And there is also the Boost Pointer Container library, which provides containers for exception safe handling of pointers.
Use deque: performance is fine.
The standard says, "deque is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence" (23.1.1). In your case, all insertions and deletions take place at the end, satisfying the criterion for using deque.
http://www.gotw.ca/gotw/054.htm has some hints on how you might measure performance, although presumably you have a particular use-case in mind, so that's what you should be measuring.
Edit: OK, if your objection to deque is in fact not, "I'm not sure how fast deque is", but "the element type cannot be an element in a standard container", then we can rule out any standard container. No, such a beast does not exist. deque "never copies elements", but it does copy-construct them from other objects.
Next best thing is probably to create arrays of elements, default-constructed, and maintain a container of pointers to those elements. Something along these lines, although this can probably be tweaked considerably.
template <typename T>
struct C {
vector<shared_array<T> > blocks;
vector<T*> elements; // lazy, to avoid needing deque-style iterators through the blocks.
T &operator[](size_t idx) { return *elements[idx]; }
void resize(size_t n) {
if (n <= elements.size()) { /* exercise for the reader */ }
else {
boost::shared_array<T> newblock(new T[elements.size() - n]);
blocks.push_back(newblock);
size_t old = elements.size();
// currently we "leak" newblock on an exception: see below
elements.resize(n);
for (int i = old; j < n; ++i) {
elements[i] = &newblock[i - old];
}
}
void clear() {
blocks.clear();
elements.clear();
}
};
As you add more functions and operators, it will approach deque, but avoiding anything that requires copying of the type T.
Edit: come to think of it, my "exercise for the reader" can't be done quite correctly in cases where someone does resize(10); resize(20); resize(15);. You can't half-delete an array. So if you want to correctly reproduce container resize() semantics, destructing the excess elements immediately, then you will have to allocate the elements individually (or get acquainted with placement new):
template <typename T>
struct C {
deque<shared_ptr<T> > elements; // or boost::ptr_deque, or a vector.
T &operator[](size_t idx) { return *elements[idx]; }
void resize(size_t n) {
size_t oldsize = elements.size();
elements.resize(n);
if (n > oldsize) {
try {
for (size_t i = oldsize; i < n; ++i) {
elements[i] = shared_ptr<T>(new T());
}
} catch(...) {
// closest we can get to strong exception guarantee, since
// by definition we can't do anything copy-and-swap-like
elements.resize(oldsize);
throw;
}
}
}
void clear() {
elements.clear();
}
};
Nicer code, not so keen on the memory access patterns (but then, I'm not clear whether performance is a concern or not since you were worried about the speed of deque.)
As you've discovered, all of the standard containers are incompatible with your requirements. If we can make a couple of additional assumptions, it wouldn't be too hard to write your own container.
The container will always grow - resize will always be called with a greater number than previously, never lesser.
It is OK for resize to make the container larger than what was asked for; constructing some number of unused objects at the end of the container is acceptable.
Here's a start. I leave many of the details to you.
class C<T> {
C();
~C() { clear(); }
T& operator[](int i) // must have reasonably sane time constant
{
return blocks[i / block_size][i % block_size];
}
// expand the container by default constructing elements in place.
void resize(int n) // only way anything is added.
{
for (int i = (current_size/block_size)+1; i <= n/block_size; ++i)
{
blocks.push_back(new T[block_size]);
}
current_size = n;
}
void clear()
{
for (vector<T*>::iterator i = blocks.begin(); i != blocks.end(); ++i)
delete[] *i;
current_size = 0;
}
C<T>::iterator begin();
C<T>::iterator end();
private:
vector<T*> blocks;
int current_size;
const int block_size = 1024; // choose a size appropriate to T
}
P.S. If anybody asks you why you want to do this, tell them you need an array of std::auto_ptr. That should be good for a laugh.
All the standard containers require copyable elements. At the very least because push_back and insert copy the element passed to them. I don't think you can get away with std::deque because even its resize method takes parameter to be copied for filling the elements.
To use a completely non-copyable class in the standard containers, you would have to store pointers to those objects. That can sometimes be a burden but usage of shared_ptr or the various boost pointer containers can make it easier.
If you don't like any of those solutions then take a browse through the rest of boost. Maybe there's something else suitable in there. Perhaps intrusive containers?
Otherwise, if you don't think any of that suits your needs then you could always try to roll your own container that does what you want. (Or else do more searching to see if anyone else has ever made such a thing.)
You shouldn't pick a container based on how it handles memory. deque for example is a double-ended queue, so you should only use it when you need a double-ended queue.
Pretty much every container will allocate memory if you resize it! Of course, you could change the capacity up front by calling vector::reserve. The capacity is the number of physical elements in memory, the size is how many you are actively using.
Obviously, there will still be an allocation if you grow past your capacity.
Look at ::boost::array. It doesn't allow the container to be resized after creating it, but it doesn't copy anything ever.
Getting both resize and no copying is going to be a trick. I wouldn't trust a ::std::deque because I think maybe it can copy in some cases. If you really need resizing, I would code your own deque-like container. Because the only way you're going to get resizing and no copying is to have a page system like ::std::deque uses.
Also, having a page system necessarily means that at isn't going to be quite as fast as it would be for ::std::vector and ::boost::array with their contiguous memory layout, even though it can still be fairly fast.