How to avoid std::vector to copy on (re-)allocation?

How to avoid std::vector to copy on (re-)allocation? - c++

I just stumbled into an issue with std::vector when adding new elements to it.
It seems when you try to add more elements to it, and it needs to allocate more space, it does so by copying last element all elements it currently holds. This seems to assume that any element in the vector is fully valid, and as such a copy will always succeed.
In our case, this is not necessarily true. Currently we might have some left over elements in the vector, because we chose not to remove them yet, which are valid objects, but their data doesn't guarantee valid behavior. The objects have guards, but I never considered adding guards to a copy constructor as I assumed we would never be copying an invalid object (which vector forces) :
CopiedClass::CopiedClass(const CopiedClass& other)
: member(other.member)
{
member->DoSomething();
}
It just so happened that "member" is nulled when we are done with the original object and leave it lying about in the vector, so when std::vector tries to copy it, it crashes.
Is it possible to prevent std::vector from copying that element? Or do we have to guard against possible invalid objects being copied? Ideally we'd like to keep assuming that only valid objects are created, but that seems to imply we immediately clean them from the vector, rather than waiting and doing it at some later stage.

It's actually a design flaw in your CopiedClass that needs to be fixed.
std::vector behaves as expected and documented.
From the little code you show
member->DoSomething();
this indicates you are going to take a shallow copy of a pointer to what ever.
in our case, this is not necessarily true. Currently we might have some left over elements in the vector, because we chose not to remove them yet, which are valid objects, but their data doesn't guarantee valid behavior.
It just so happened that "member" is nulled when we are done with the original object and leave it lying about in the vector, so when std::vector tries to copy it, it crashes.
As your copy constructor doesn't handle that situation correctly regarding following the Rule of Three your design of CopiedClass is wrong.
You should either care to create a deep copy of your member, or use a smart pointer (or a plain instance member), rather than a raw pointer.
The smart pointers should take care of the management for these members correctly.
Also for the code snippet from above, you should test against nullpointer before blindly dereferencing and calling DoSomething().
Is it possible to prevent std::vector from copying that element?
As you're asking for c++11 it's possible, but requires you never change the size of the vector, and provide a move constructor and assignment operator for CopiedClass.
Otherwise, std::vector explicitly requires copyable types (at least for certain operations):
T must meet the requirements of CopyAssignable and CopyConstructible.
It's your responsibility to meet these requirements correctly.
... it does so by copying the last element it currently holds.
Also note: In a situation the vector needs to be resized, all of the existing elements will be copied, not only the last one as you premise.

If you know the vector's maximum size in advance, then a workaround for this issue would be calling reserve with that maximum size. When the vector's capacity is big enough, no reallocations occur and no copies of existing elements need to be made.
But that's really just a workaround. You should redesign your class such that copying cannot suddenly become an invalid operation, either by making copying safe or by disallowing it and perhaps replace it with moving, like std::unique_ptr. If you make the validity of copying dependent on some run-time status, then you don't even need a std::vector to get into trouble. A simple CopiedClass get(); will be a potential error.

It seems when you try to add more elements to it, and it needs to
allocate more space, it does so by copying the last element it
currently holds. This seems to assume that any element in the vector
is fully valid, and as such a copy will always succeed.
Both sentences are not really correct. When vector runs out of space it, yes, performs reallocation, but of all the elements and these are copied to the new location or possibly moved. When you push_back or emplace_back, and reallocation is needed, the element will generally by copied: if an exception is thrown during that, it will be treated as if you had never added the element. If vector detects a noexcept user-defined move constructor ( MoveInsertable concept), elements are moved; if this constructor though is not noexcept, and an exception is thrown, the result is unspecified.
The objects have guards, but I never considered adding guards to a
copy constructor as I assumed we would never be copying an invalid
object (which vector forces) :
vector copies its value_type: it does not care whether what it contains is valid or invalid. You should take care of that in your copy control methods, whose scope is exactly to define how the object is passed around.
CopiedClass::CopiedClass(const CopiedClass& other)
: member(other.member)
{
member->DoSomething();
}
The obvious solution would be to check if member is valid and act thereby.
if (member.internalData.isValid())
member->DoSomething()
// acknowledge this.member's data is invalid
We don't know how member is represented, but Boost.Optional is what you may be looking for.
Is it possible to prevent std::vector from copying that element?
Reallocation is something vector is expected to commit so, no, you can't. reserv-ing space could avoid that, but maintaining code to handle that would not really be painless. Rather, prefer a container like std::forward_list/std::list.
Other solutions:
Hold a unique_ptr<decltype(member)>, but pointers are not often a real solution
Define move semantics for your object, as described in other answers.

Related

Safe to std:move a member?

Have found comparable questions but not exactly with such a case.
Take the following code for example:
#include <iostream>
#include <string>
#include <vector>
struct Inner
{
int a, b;
};
struct Outer
{
Inner inner;
};
std::vector<Inner> vec;
int main()
{
Outer * an_outer = new Outer;
vec.push_back(std::move(an_outer->inner));
delete an_outer;
}
Is this safe? Even if those were polymorphic classes or ones with custom destructors?
My concern regards the instance of "Outer" which has a member variable "inner" moved away. From what I learned, moved things should not be touched anymore. However does that include the delete call that is applied to outer and would technically call delete on inner as well (and thus "touch" it)?

Neither std::move, nor move semantics more generally, have any effect on the object model. They don't stop objects from existing, nor prevent you from using those objects in the future.
What they do is ask to borrow encapsulated resources from the thing you're "moving from". For example, a vector, which directly only stores a pointer some dynamically-allocated data: the concept of ownership of that data can be "stolen" by simply copying that pointer then telling the vector to null the pointer and never have anything to do with that data again. It's yielded. The data belongs to you now. You have the last pointer to it that exists in the universe.
All of this is achieved simply by a bunch of hacks. The first is std::move, which just casts your vector expression to vector&&, so when you pass the result of it to a construction or assignment operation, the version that takes vector&& (the move constructor, or move-assignment operator) is triggered instead of the one that takes const vector&, and that version performs the steps necessary to do what I described in the previous paragraph.
(For other types that we make, we traditionally keep following that pattern, because that's how we can have nice things and persuade people to use our libraries.)
But then you can still use the vector! You can "touch" it. What exactly you can do with it is discoverable from the documentation for vector, and this extends to any other moveable type: the constraints emplaced on your usage of a moved-from object depend entirely on its type, and on the decisions made by the person who designed that type.
None of this has any impact on the lifetime of the vector. It still exists, it still takes memory, and it will still be destructed when the time comes. (In this particular example you can actually .clear() it and start again adding data to a new buffer.)
So, even if ints had any sort of concept of this (they don't; they encapsulate no indirectly-stored data, and own no resources; they have no constructors, so they also have no constructors taking int&&), the delete "touch"ing them would be entirely safe. And, more generally, none of this depends on whether the thing you've moved from is a member or not.
More generally, if you had a type T, and an object of that type, and you moved from it, and one of the constraints for T was that you couldn't delete it after moving from it, that would be a bug in T. That would be a serious mistake by the author of T. Your objects all need to be destructible. The mistake could manifest as a compilation failure or, more likely, undefined behaviour, depending on what exactly the bug was.
tl;dr: Yes, this is safe, for several reasons.

std::move is a cast to an rvalue-reference, which primarily changes which constructor/assignment operator overload is chosen. In your example the move-constructor is the default generated move-constructor, which just copies the ints over so nothing happens.
Whether or not this generally safe depends on the way your classes implement move construction/assignment. Assume for example that your class instead held a pointer. You would have to set that to nullptr in the moved-from class to avoid destroying the pointed-to data, if the moved-from class is destroyed.
Because just defining move-semantics is a custom way almost always leads to problems, the rule of five says that if you customize any of:
the copy constructor
the copy assignment operator
the move constructor
the move assignment operator
the destructor
you should usually customize all to ensure that they behave consistently with the expectations a caller would usually have for your class.

Is it necessary to reset std::list after been moved?

I've got following code:
std::list some_data;
...
std::list new_data = std::move(some_data);
some_data.clear();
...
The question is whether some_data.clear() is necessary? (for the record, some_data will be reused in the future)

Yes, it's necessary.
Only the std smart pointers are guaranteed to be in a default constructed state after being moved from.
Containers are in an valid, but unspecified state. This means you can only call member functions without preconditions, e.g. clear, that put the object in a fully known state.

The working draft of standard (N4713) states about the state of objects after they are moved from:
20.5.5.15 Moved-from state of library types [lib.types.movedfrom]
1 Objects of types defined in the C++ standard library may be moved from. Move operations may be explicitly specified or implicitly generated. Unless otherwise specified, such moved-from objects shall be placed in a valid but unspecified state.
20.3.25 [defns.valid]
valid but unspecified state value of an object that is not specified except that the object’s invariants are met and operations on the object behave as specified for its type
The only operations you can safely perform on a moved-from container are those that do not require a precondition. clear has no preconditions. And it will return the object to a known state from which it can be used again.

No it is not necessary to clear the list. It is just reasonable and convenient to do so.
The list will have N elements, each taking an unspecified value, for some N≥0. So if you want to re-poppulate the list, you may in theory assign to the elements that are kept rather than clear and re-insert everything from scratch.
Of course chances that N≠0 are vanishingly small, so in practice clearing is the right choice.

Short answer:
Yes, you should clear it, because the standard doesn't explicitly define in what state the source list is after the move.
Furthermore, always calling clear() when a container gets reused after being moved from is a simple rule that should not cause any significant overhead even when it is redundant.
Longer answer:
As I mentioned in some comments, I can't imagine any legal and sensible implementation for which the source std::list will be anything but an empty std::list after being used as the source for a move constructor (thanks #Lightness Races in Orbit for digging through the standard). Move constructing must happen in linear time, which doesn't allow any per-object operations and AFAIK it is also not allowed to have a small, fixed-size inplace buffer that could be a source of left-over "zombie" elements in the source list.
However, even if you can track down all of that in the standard, you'd still have to document that everytime you omit the clear() in the code and be wary of someone refactoring the code by replacing the std::list with a homegrown container, that doesn't quite full fill the requirements in the standard (e.g. to make use of the small buffer optimization). Also, the previous statement is only valid for move construction. On moveassignment it is absolutely possible that the moved from container will not be empty.
In summary: Even if it was technically correct to not use clear in this particular case, it is imho not worth the mental effort to do it (and if you are in an environment where it is worth it, you are probably tuning your code to a particular version of the standard library and not the standard document anyway).

It rather depends on what you mean by "reused". There are many reuses that totally define the state of the object, with no regard to it's previous state.
The obvious one is assigning to a moved-from object, which occurs in the naive swap
template <typename T>
void swap(T& lhs, T& rhs)
{
T temp = std::move(lhs);
lhs = std::move(rhs); // lhs is moved-from, but we don't care.
rhs = std::move(temp); // rhs is moved-from, but we don't care.
}
If you instead wish to "reuse" by calling some_data.push_back, then yes, you should clear it first

Could it be possible to have types with move operations that throw in containers?

While explaining move operations on objects with a colleague, I basically said that move operations should not throw exceptions in a container because if the move operation fails, then there is no way to bring back the original object reliably. Thinking about this more, I'm wondering if that is not correct and that if a move operation that does throw, it could revert the original object back to it's original state.
The reason for this, is that if an object can throw, then it would throw not due to copying or moving the contained objects from the old to the new address, but throw if a resource failed to be acquired. So all of the original information should still be there. If this is the case, then should the compiler not be able to reverse the operations that it did to reconstitute the original object?
It could be possible for an operation to be one way, like moving an integer, but in that case it could just terminate the application, and perhaps if the developer wanted to avoid the one way operation could use a swap method instead.
This would only be possible on default move operators, as if there are any additional logic, it may be difficult for the compiler to do a reverse partial transform.
Am I oversimplifying things? Is there something that I missed which keeps containers from moving objects without a non-throwing move constructor/operator?

You can use types with throwing moves in containers like vector which can move their elements. However, such containers will not use throwing move operations.
Let's say you have a vector of 10 throwing move elements. And the vector needs to resize itself. So it moves 5 objects to the new memory, but the 6th throws. Well, that's OK; construction failed, so the assumption is that the value of the 6th object is fine. That is, whatever that type's exception guarantee is will be how things work.
But then, because the movement of one object failed, vector needs to move the last 5 objects back to the first array, since vector is trying to provide a strong exception guarantee. That's a problem, since the move back can itself fail.
C++ in general does not have valid answers when the process of repairing a failure itself fails. You can see that in exceptions; you can't emit an exception from a destructor that is called during the process of unwinding due to an exception failure. std::terminate happens in this case.
The same goes for vector. If the move back were to fail, vector has no sane answer. As such, if vector cannot guarantee that restoring its previous array state is noexcept, then it will use copying, since that can provide that guarantee.

First of all, I can hardly imagine object that acquires resources in the move operation. Think about it - unique_ptr just passes the pointer without acquiring anything, same for shared_ptr. string, vector, all the containters etc. just steal the pointers to the resources acquired earlier in default or copy constructor. I feel like throwing from move constructor is like throwing from destructor. Sure, go ahead, shoot yourself in your knee. But ok, I can accept that exceptions from this exist.
So let's move to the second point - when moving there can be a moment when actually both objects (moved from and moved to) are invalid. And to roll back from such situation would require additional 'magic' function to be called to fix one of them. So it seems the data cannot be repaired as standard does not define such a function.

Is it safe to use std::auto_ptr in std::map?

I am aware that you have to be careful with auto pointers (and why), especially with the STL. But I don't see a problem with this:
std::map<T1, std::auto_ptr<T2> >;
Is this safe?
I see how it would break in an std::vector, because it has to copy its items from time to time, but is this also true for the value type of an std::map?
Edit: Apparently, regardless whether it's safe, I cannot (technically) populate the map, but I'll leave the question open for theoretical considerations. Otherwise, consider it closed.

It's not safe. Technically, std::auto_ptr doesn't meet the requirement of CopyConstructible or Assignable because copies made using auto_ptr's copy constructor or copy assignment operator aren't equivalent to the source of the copy after the copy operation. These requirements must be met for any type used with any standard container.
You may find that you appear to get the behaviour you expect on one implementation for one particular use case but if you violate the requirements of the container you can't expect your code to work in all situations.

Still not safe. If nothing else, the first time you retrieved the pointer from the map you would transfer ownership and make it invalid.
There is an extremely narrow valid use case for auto_ptr. The only one I can remember coming across is when an object wants to make it explicitly clear that it takes ownership and responsibility for a pointer that you hand it.

unique_ptr - major improvement?

In the actual C++ standard, creating collections satisfying following rules is hard if not impossible:
exception safety,
cheap internal operations (in actual STL containers: the operations are copies),
automatic memory management.
To satisfy (1), a collection can't store raw pointers. To satisfy (2), a collection must store raw pointers. To satisfy (3), a collection must store objects by value.
Conclusion: the three items conflict with each other.
Item (2) will not be satisfied when shared_ptrs are used because when a collection will need to move an element, it will need to make two calls: to a constructor and to a destructor. No massive, memcpy()-like copy/move operations are possible.
Am I correct that the described problem will be solved by unique_ptr and std::move()? Collections utilizing the tools will be able to satisfy all 3 conditions:
When a collection will be deleted as a side effect of an exception, it will call unique_ptr's destructors. No memory leak.
unique_ptr does not need any extra space for reference counter; therefore its body should be exact the same size, as wrapped pointer,
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations (?),
even if it's not possible, the std::move() operator will allow to move each unique_ptr object without making the constructor/destructor pair calls.
unique_ptr will have exclusive ownership of given memory. No accidental memory leaks will be possible.
Is this true? What are other advantages of using unique_ptr?

I agree entirely. There's at last a natural way of handling heap allocated objects.
In answer to:
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations,
there was a proposal to allow this, but it hasn't made it into the C++11 Standard.

Yes, you are right. I would only add this is possible thanks to r-value references.

When a collection will be deleted as a side effect of an exception, it will call unique_ptr's destructors. No memory leak.
Yes, a container of unique_ptr will satisfy this.
unique_ptr does not need any extra space for reference counter; therefore its body should be exact the same size, as wrapped pointer
unique_ptr's size is implementation-defined. While all reasonable implementations of unique_ptr using it's default destructor will likely only be a pointer in size, there is no guarantee of this in the standard.
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations (?),
Absolutely not. unique_ptr is not a trivial class; therefore, it cannot be memmoved around. Even if it were, you can't just memmove them, because the destructors for the originals need to be called. It would have to be a memmove followed by a memset.
even if it's not possible, the std::move() operator will allow to move each unique_ptr object without making the constructor/destructor pair calls.
Also incorrect. Movement does not make constructors and destructors not be called. The unique_ptr's that are being destroyed need to be destroyed; that requires a call to their destructors. Similarly, the new unique_ptrs need to have their constructors called; that's how an object's lifetime begins.
There's no avoiding that; it's how C++ works.
However, that's not what you should be worried about. Honestly, if you're concerned about a simple constructor/destructor call, you're either in code that you should be hand-optimizing (and thus writing your own code for), or you're prematurely optimizing your code. What matters is not whether constructors/destructors are called; what matters is how fast the resulting code is.
unique_ptr will have exclusive ownership of given memory. No accidental memory leaks will be possible.
Yes, it will.
Personally, I'd say you're doing one of the following:
Being excessively paranoid about copying objects. This is evidence by the fact that you consider putting a shared_ptr in a container is too costly of a copy. This is an all-too-common malady among C++ programmers. That's not to say that copying is always good or something, but obsessing over copying a shared_ptr in a container is ridiculous outside of exceptional circumstances.
Not aware of how to properly use move semantics. If your objects are expensive to copy but cheap to move... then move them into the container. There's no reason to have a pointer indirection when your objects already contain pointer indirections. Just use movement with the objects themselves, not unique_ptrs to objects.
Disregarding the alternatives. Namely, Boost's pointer containers. They seem to have everything you want. They own pointers to their objects, but externally they have value semantics rather than pointer semantics. They're exception safe, and any copying happens with pointers. No unique_ptr constructor/destructor "overhead".

It looks like the three conditions I've enumerated in my post are possible to obtain by using Boost Pointer Container Library.

This question illlustrates why I so love the Boehm garbage collector (libgc). There's never a need to copy anything for reasons of memory management, and indeed, ownership of memory no longer needs to be mentioned as part of APIs. You have to buy a little more RAM to get the same CPU performance, but you save hundreds of hours of programmers' time. You decide.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js