Using memcpy in the STL

Using memcpy in the STL - c++

Why does C++'s vector class call copy constructors? Why doesn't it just memcpy the underlying data? Wouldn't that be a lot faster, and remove half of the need for move semantics?
I can't imagine a use case where this would be worse, but then again, maybe it's just because I'm being quite unimaginative.

Because the object needs to be notified that it is being moved. For example, there could be pointers to that given object that need to be fixed because the object is being copied. Or the reference count on a reference counted smart pointer might need to be updated. Or....
If you just memcpy'd the underlying memory, then you'd end up calling the destructor twice on the same object, which is also bad. What if the destructor controls something like an OS file handle?
EDIT: To sum up the above:
The copy constructor and destructor can have side effects. Those side effects need to be preserved.

You can safely memcpy POD and built-in types. Those are defined to have no semantics beyond being a bag of bits with respect to memory.
Types that have constructors and destructors, however, must be constructed and destructed in order to maintain invariants. While you may be able to create types that can be memcpy'd and have constructors, it is very difficult to express that in the type signature. So instead of possibly violating invariants associated with all types that have constructors and/or destructors, STL errs on the side of instead maintaining invariants by using the copy constructor when necessary.
This is why C++0x added move semantics via rvalue references. It means that STL containers can take advantage of types that provide a move constructor and give the compiler enough information to optimize out otherwise expensive construction.

Related

Safe to std:move a member?

Have found comparable questions but not exactly with such a case.
Take the following code for example:
#include <iostream>
#include <string>
#include <vector>
struct Inner
{
int a, b;
};
struct Outer
{
Inner inner;
};
std::vector<Inner> vec;
int main()
{
Outer * an_outer = new Outer;
vec.push_back(std::move(an_outer->inner));
delete an_outer;
}
Is this safe? Even if those were polymorphic classes or ones with custom destructors?
My concern regards the instance of "Outer" which has a member variable "inner" moved away. From what I learned, moved things should not be touched anymore. However does that include the delete call that is applied to outer and would technically call delete on inner as well (and thus "touch" it)?

Neither std::move, nor move semantics more generally, have any effect on the object model. They don't stop objects from existing, nor prevent you from using those objects in the future.
What they do is ask to borrow encapsulated resources from the thing you're "moving from". For example, a vector, which directly only stores a pointer some dynamically-allocated data: the concept of ownership of that data can be "stolen" by simply copying that pointer then telling the vector to null the pointer and never have anything to do with that data again. It's yielded. The data belongs to you now. You have the last pointer to it that exists in the universe.
All of this is achieved simply by a bunch of hacks. The first is std::move, which just casts your vector expression to vector&&, so when you pass the result of it to a construction or assignment operation, the version that takes vector&& (the move constructor, or move-assignment operator) is triggered instead of the one that takes const vector&, and that version performs the steps necessary to do what I described in the previous paragraph.
(For other types that we make, we traditionally keep following that pattern, because that's how we can have nice things and persuade people to use our libraries.)
But then you can still use the vector! You can "touch" it. What exactly you can do with it is discoverable from the documentation for vector, and this extends to any other moveable type: the constraints emplaced on your usage of a moved-from object depend entirely on its type, and on the decisions made by the person who designed that type.
None of this has any impact on the lifetime of the vector. It still exists, it still takes memory, and it will still be destructed when the time comes. (In this particular example you can actually .clear() it and start again adding data to a new buffer.)
So, even if ints had any sort of concept of this (they don't; they encapsulate no indirectly-stored data, and own no resources; they have no constructors, so they also have no constructors taking int&&), the delete "touch"ing them would be entirely safe. And, more generally, none of this depends on whether the thing you've moved from is a member or not.
More generally, if you had a type T, and an object of that type, and you moved from it, and one of the constraints for T was that you couldn't delete it after moving from it, that would be a bug in T. That would be a serious mistake by the author of T. Your objects all need to be destructible. The mistake could manifest as a compilation failure or, more likely, undefined behaviour, depending on what exactly the bug was.
tl;dr: Yes, this is safe, for several reasons.

std::move is a cast to an rvalue-reference, which primarily changes which constructor/assignment operator overload is chosen. In your example the move-constructor is the default generated move-constructor, which just copies the ints over so nothing happens.
Whether or not this generally safe depends on the way your classes implement move construction/assignment. Assume for example that your class instead held a pointer. You would have to set that to nullptr in the moved-from class to avoid destroying the pointed-to data, if the moved-from class is destroyed.
Because just defining move-semantics is a custom way almost always leads to problems, the rule of five says that if you customize any of:
the copy constructor
the copy assignment operator
the move constructor
the move assignment operator
the destructor
you should usually customize all to ensure that they behave consistently with the expectations a caller would usually have for your class.

C++ Move Semantics vs Copy Constructor and Assignment Operator in relation to Smart Pointers

I'm trying to figure out when to use move semantics and when to use a copy constructor and assignment operator as a rule of thumb. The type of pointer you use (if any) in your class seems to be affected by this answer, so I have included this.
No pointers - Based on this answer, if you have a POD class with primitive types like int and string, you don't need to write custom move or copy constructors and operators.
unique-ptr - Based on this answer, when using move semantics, then unique_ptr is a better fit over shared_ptr as there can only be one unique_ptr to the resource.
shared_ptr - Equally, if using copy semantics, shared_ptr seems the way to go. There can be multiple copies of the object, so having a shared pointer to the resource makes sense to me. However, unique_ptr is generally preferred to shared_ptr so avoid this option if you can.
But:
When should I use move semantics?
When should I use copy semantics?
Should I ever use both?
Should I ever use none and rely on the default copy constructor and assignment operator?

As the name indicates, use unique_ptr when there must exist exactly one owner to a resource. The copy constructor of unique_ptr is disabled, which means it is impossible for two instances of it to exist. However, it is movable... Which is fine, since that allows transfer of ownership.
Also as the name indicates, shared_ptr represents shared ownership of a resource. However, there is also another difference between the two smart pointers: The Deleter of a unique_ptr is part of its type signature, but it is not part of the type signature of shared_ptr. That is because shared_ptr uses "type erasure" to "erase the type" of the deleter. Also note that shared_ptr can also be moved to transfer ownership (like unique_ptr.)
When should I use move semantics?
Although shared_ptr can be copied, you may want to move them anyways when you are making a transfer of ownership (as opposed to creating a new reference). You're obligated to use move semantics for unique_ptr, since ownership must be unique.
When should I use copy semantics?
In the case of smart pointers, you should use copying to increase the reference count of shared_ptrs. (If you're unfamiliar with the concept of a reference count, research reference counted garbage collection.)
Should I ever use both?
Yes. As mentioned above, shared_ptr can be both copied and moved. Copying denotes incrementing the reference count, while moving only indicates a transfer of ownership (the reference count stays the same.)
Should I ever use none and rely on the default copy constructor and assignment operator?
When you want to make a member-by-member copy of an object.

When should I use move semantics?
I presume by this you mean, "When should I give my class a move constructor?" The answer is whenever moving objects of this type is useful and the default move constructor doesn't do the job correctly. Moving is useful when there is some benefit to transferring resources from one object to another. For example, moving is useful to std::string because it allows objects to be copied from temporaries without having to reallocate and copy their internal resources and instead by simply moving the resource from one to the other. Many types would benefit from this. On the other hand, moving is useful to std::unique_ptr because it is the only way to pass a std::unique_ptr around by value without violating its "unique ownership".
When should I use copy semantics?
Again, I presume by this you mean, "When should I give my class a copy constructor?" Whenever you need to be able to make copies of an object, where internal resources are copied between them, and the default copy constructor doesn't do the job correctly. Copying is useful for almost any type, except those like std::unique_ptr that must enforce unique ownership over an internal resource.
Should I ever use both?
Your classes should provide both copy and move semantics most of the time. The most common classes should be both copyable and moveable. Being copyable provides the standard semantics of passing around an object by value. Being moveable allows the optimisation that can be gained when passing around a temporary object by value. Whether this means having to provide a copy or move constructor depends on whether the default constructors do the appropriate things.
Should I ever use none and rely on the default copy constructor and assignment operator?
The default copy and move constructors just do a copy or move of each member of the class respectively. If this behaviour is appropriate for copying and moving your class, that's great. Most of the time, this should be good enough. For example, if I have a class that contains a std::string, the default copy constructor will copy the string over, and the default move constructor will move the string's resources to the new object - both do the appropriate job. If your class contains a std::unique_ptr, copying will simply not work, and your class will only be moveable. That may be what you want, or you may want to implement a copy constructor that performs a deep copy of the resource. The most important case in which you should implement copy/move constructors is when your class performs resource management itself (using new and delete, for example). If that's the case, the default constructors will almost never be doing a good job of managing those resources.
Everything here applies similarly to the assignment operator.

Can we say bye to copy constructors?

Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.

The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.

Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.

One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).

Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.

Will memcpy or memmove cause problems copying classes?

Suppose I have any kind of class or structure. No virtual functions or anything, just some custom constructors, as well as a few pointers that would require cleanup in the destructor.
Would there be any adverse affects to using memcpy or memmove on this structure? Will deleting a moved structure cause problems? The question assumes that the memory alignment is also correct, and we are copying to safe memory.

In the general case, yes, there will be problems. Both memcpy and memmove are bitwise operations with no further semantics. That might not be sufficient to move the object*, and it is clearly not enough to copy.
In the case of the copy it will break as multiple objects will be referring to the same dynamically allocated memory, and more than one destructor will try to release it. Note that solutions like shared_ptr will not help here, as sharing ownership is part of the further semantics that memcpy/memmove don't offer.
For moving, and depending on the type you might get away with it in some cases. But it won't work if the objects hold pointers/references to the elements being moved (including self-references) as the pointers will be bitwise copied (again, no further semantics of copying/moving) and will refer to the old locations.
The general answer is still the same: don't.
* Don't take move here in the exact C++11 sense. I have seen an implementation of the standard library containers that used special tags to enable moving objects while growing buffers through the use of memcpy, but it required explicit annotations in the stored types that marked the objects as safely movable through memcpy, after the objects were placed in the new buffer the old buffer was discarded without calling any destructors (C++11 move requires leaving the object in a destructible state, which cannot be achieved through this hack)

Generally using memcpy on a class based object is not a good idea. The most likely problem would be copying a pointer and then deleting it. You should use a copy constructor or assignment operator instead.

No, don't do this.
If you memcpy a structure whose destructor deletes a pointer within itself, you'l wind up doing a double delete when the second instance of the structure is destroyed in whatever manner.
The C++ idiom is copy constructor for classes and std::copy or any of its friends for copying ranges/sequences/containers.

If you are using C++11, you can use std::is_trivially_copyable to determine if an object can be copied or moved using memcpy or memmove. From the documentation:
Objects of trivially-copyable types are the only C++ objects that may
be safely copied with std::memcpy or serialized to/from binary files
with std::ofstream::write()/std::ifstream::read(). In general, a
trivially copyable type is any type for which the underlying bytes can
be copied to an array of char or unsigned char and into a new object
of the same type, and the resulting object would have the same value
as the original.
Many classes don't fit this description, and you must beware that classes can change. I would suggest that if you are going to use memcpy/memmove on C++ objects, that you somehow protect unwanted usage. For example, if you're implementing a container class, it's easy for the type that the container holds to be modified, such that it is no longer trivially copyable (eg. somebody adds a virtual function). You could do this with a static_assert:
template<typename T>
class MemcopyableArray
{
static_assert(std::is_trivially_copyable<T>::value, "MemcopyableArray used with object type that is not trivially copyable.");
// ...
};

Aside from safety, which is the most important issue as the other answers have already pointed out, there may also be an issue of performance, especially for small objects.
Even for simple POD types, you may discover that doing proper initialization in the initializer list of your copy constructor (or assignments in the assignment operator depending on your usage) is actually faster than even an intrinsic version of memcpy. This may very well be due to memcpy's entry code which may check for word alignment, overlaps, buffer/memory access rights, etc.... In Visual C++ 10.0 and higher, to give you a specific example, you would be surprised by how much preamble code that tests various things executes before memcpy even begins its logical function.

unique_ptr - major improvement?

In the actual C++ standard, creating collections satisfying following rules is hard if not impossible:
exception safety,
cheap internal operations (in actual STL containers: the operations are copies),
automatic memory management.
To satisfy (1), a collection can't store raw pointers. To satisfy (2), a collection must store raw pointers. To satisfy (3), a collection must store objects by value.
Conclusion: the three items conflict with each other.
Item (2) will not be satisfied when shared_ptrs are used because when a collection will need to move an element, it will need to make two calls: to a constructor and to a destructor. No massive, memcpy()-like copy/move operations are possible.
Am I correct that the described problem will be solved by unique_ptr and std::move()? Collections utilizing the tools will be able to satisfy all 3 conditions:
When a collection will be deleted as a side effect of an exception, it will call unique_ptr's destructors. No memory leak.
unique_ptr does not need any extra space for reference counter; therefore its body should be exact the same size, as wrapped pointer,
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations (?),
even if it's not possible, the std::move() operator will allow to move each unique_ptr object without making the constructor/destructor pair calls.
unique_ptr will have exclusive ownership of given memory. No accidental memory leaks will be possible.
Is this true? What are other advantages of using unique_ptr?

I agree entirely. There's at last a natural way of handling heap allocated objects.
In answer to:
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations,
there was a proposal to allow this, but it hasn't made it into the C++11 Standard.

Yes, you are right. I would only add this is possible thanks to r-value references.

When a collection will be deleted as a side effect of an exception, it will call unique_ptr's destructors. No memory leak.
Yes, a container of unique_ptr will satisfy this.
unique_ptr does not need any extra space for reference counter; therefore its body should be exact the same size, as wrapped pointer
unique_ptr's size is implementation-defined. While all reasonable implementations of unique_ptr using it's default destructor will likely only be a pointer in size, there is no guarantee of this in the standard.
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations (?),
Absolutely not. unique_ptr is not a trivial class; therefore, it cannot be memmoved around. Even if it were, you can't just memmove them, because the destructors for the originals need to be called. It would have to be a memmove followed by a memset.
even if it's not possible, the std::move() operator will allow to move each unique_ptr object without making the constructor/destructor pair calls.
Also incorrect. Movement does not make constructors and destructors not be called. The unique_ptr's that are being destroyed need to be destroyed; that requires a call to their destructors. Similarly, the new unique_ptrs need to have their constructors called; that's how an object's lifetime begins.
There's no avoiding that; it's how C++ works.
However, that's not what you should be worried about. Honestly, if you're concerned about a simple constructor/destructor call, you're either in code that you should be hand-optimizing (and thus writing your own code for), or you're prematurely optimizing your code. What matters is not whether constructors/destructors are called; what matters is how fast the resulting code is.
unique_ptr will have exclusive ownership of given memory. No accidental memory leaks will be possible.
Yes, it will.
Personally, I'd say you're doing one of the following:
Being excessively paranoid about copying objects. This is evidence by the fact that you consider putting a shared_ptr in a container is too costly of a copy. This is an all-too-common malady among C++ programmers. That's not to say that copying is always good or something, but obsessing over copying a shared_ptr in a container is ridiculous outside of exceptional circumstances.
Not aware of how to properly use move semantics. If your objects are expensive to copy but cheap to move... then move them into the container. There's no reason to have a pointer indirection when your objects already contain pointer indirections. Just use movement with the objects themselves, not unique_ptrs to objects.
Disregarding the alternatives. Namely, Boost's pointer containers. They seem to have everything you want. They own pointers to their objects, but externally they have value semantics rather than pointer semantics. They're exception safe, and any copying happens with pointers. No unique_ptr constructor/destructor "overhead".

It looks like the three conditions I've enumerated in my post are possible to obtain by using Boost Pointer Container Library.

This question illlustrates why I so love the Boehm garbage collector (libgc). There's never a need to copy anything for reasons of memory management, and indeed, ownership of memory no longer needs to be mentioned as part of APIs. You have to buy a little more RAM to get the same CPU performance, but you save hundreds of hours of programmers' time. You decide.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js