Why doesn't CPP create a default deep-copy constructor? - c++

I don't understand why don't they have a copy constructor which makes a real multiple of the original.
As we know the main problem of the default copy constructor is, that it performs a shallow copy. So that if there is a pointer it would only copy it's address but why doesn't it dereference the pointer a copies the content?
The main problem arises when having a dinamic allocation of memory . So that one can mistakenly delete it while having a pointer pointing to it, that's why we make our own copy constructors and do not use the default one.
But I can't get it, why doesn't CPP do it? Why doesn't it copy the content

As we know the main problem of the default copy constructor is, that it performs a shallow copy.
We don't know that.
that's why we make our own copy constructors and do not use the default one.
In C++ you should almost never write your own copy constructor (the rule of zero).
The main problem arises when having a dinamic allocation of memory . So that one can mistakenly delete it while having a pointer pointing to it
It's a non-problem. Why? Because in C++ we use the concept of RAII and we have tools in the standard library that solved all of the problems you see. In C++ you should never have to write explicit new and you should never have a raw pointer that is a owner. Use standard containers (e.g. std::vector) and smart pointers e.g. (std::unique_ptr).
I don't understand why don't they have a copy constructor which makes a real multiple of the original
Because the compiler doesn't know what the copy-semantics of the object should be. That only the writer of the class knows. You can't know what the semantics of the pointer are. Is is a pointer that uniquely owns the memory resource? If so, was it acquired with malloc, new, new[], or with something else? Does it share its ownership of the memory? Or is it simply pointing to an object it doesn't own? Since you cannot know any of this from the declaration/definition of a class, a compiler simply cannot implement "deep copy" automatically with raw pointers.
Except it does. It does implement deep copy by default, or shallow copy by default, or a combination of them. And it does so correctly. Remember when I told you to not use raw pointers for ownership? Use the appropriate abstractions (containers, smart pointers) and the default copy ctor will do exactly what it needs to do.

As we know the main problem of the default copy constructor is, that it performs a shallow copy.
Shallow copying is not a problem in general. It's only a problem if you want to make a deep copy, have a referential member, and assume the implicit copy constructor to do what you want.
Shallow copying is often useful and typically intentional.
but why doesn't it dereference the pointer a copies the content?
Because where would the copy constructor store the copied object? Where should the pointer member point to?
The compiler cannot read your mind and cannot know whether you want to allocate memory, or how to allocate memory. If the compiler did allocate memory in an implicit function, then who would be responsible for its deletion? If the compiler deleted any pointers implicitly, that would be highly surprising when you intended to have non-owning pointers to something that must not be deleted.
why can't the programmer be responsible for it?
Memory management is difficult enough as it is. At the moment it is at least manageable by following the simple rule: You delete everything that you new. If we introduce implicit allocations and impose the responsibility on the programmer to know of the existence of these implicit allocations, our job would become much harder.
Furthermore, a pointer can have an invalid value. In such cases indirecting through it would have undefined behaviour. And it is impossible to inspect the pointer to find out whether it is invalid. This would make the suggested implicit "deep copying" highly error prone.
Shallow implicit copying is the only sensible choice in a language that has manual memory management. Even in garbage collected languages this is generally the better choice.
that's why we make our own copy constructors and do not use the default one.
We rarely write user declared copy constructors (outside of beginner courses). Those are mostly needed for classes whose sole purpose is to manage memory (or other resources) such as smart pointers. And we rarely need to write those, since the standard library offers the most general smart pointers and data structures out of the box.
we should create our own copy constructor once we have a dynamic memory inside the class
Indeed, if your class manages dynamic memory, then it will need a custom copy constructor. But the typical solution is to not manage dynamic memory inside your class. See the paragraph above. Keep any and all memory allocations and other dynamic resources in a smart pointer or a container.

The compiler has no way of knowing the meaning of the pointers it is supposed to "deep copy".
For example, does a float pointer point to a single float or to a C-style float array? If it is an array, what is the length of the array that it should copy? Please note that I am not talking about C++ style arrays (i.e. std::array).
If you want "deep copying" to be handled automatically, you can use container classes for data members that should be copied.

Related

What does the destructor of primitive types actually do? [duplicate]

This question comes from me trying to understand the motivation for smart pointers where you make a wrapper class around the pointer so that you could add a custom destructor. Do pointers (and ints, bools, doubles, etc.) not have a destructor?
Technically speaking, non-class types (C++ term for what often called 'primitive type' in layman words) do not have destructors.
C++ Standard only speaks of real destructors in context of classes, see [class.dtor] in C++ standard. Aside from that, C++ also allows to call a destructor on a non-class object using the same notation, i.e. following code is valid:
void foo(int z) {
using T = int;
z.~T();
}
This is called 'pseudo-destructor' and exists exclusively to allow writing generic templated code to deal in the same manner with class and non-class types. This call does nothing at all. This syntax is defined in [expr.prim.id] in C++ standard.
Primitive types (and compounds thereof) have trivial destructors. These don't do anything, and have special wording that allows them to be skipped altogether in some cases.
This, however, is orthogonal to why C++ has smart pointers. A raw pointer is non-owning: it points at another object, but does not affect its lifetime. Smart pointers, on the other hand, own (or share ownership of) their pointee, tying its lifetime to their own. This is what is implemented inside, among other special functions, their destructor.
In addition to the answers given here so far, since C++20 the pseudo-destructor call on a non-class object will always end its lifetime. Consequently accessing the object's value after the call will have undefined behavior. This does not mean however that the compiler has to emit any code for such a call. It still effectively does nothing.
No, pointers don't have destructors. An object referenced through a plain old pointer has to be deleted to avoid memory leaks, and the object's destructor is called then, but the compiler won't call delete automatically, even when a pointer goes out of scope - what if another part of your program also had a pointer to the same object?
Smart pointers aren't about calling a custom destructor, they're about ensuring that things get cleaned up automatically when they go out of scope. This 'cleaning up' might be deleting owned objects, freeing any malloced memory, closing files, releasing locks, etc.
Destructors are used to free the resources that an object may have used.
For pointers, you don't need delete if you are not allocating new memory from the heap.
C and C++ have two ways to store a variable: stack and heap.
Stack is for static memory, and the compiler takes care of that. Heap is for dynamic memory, and you have to take care of this if you are using it.
When you do primitive type declarations, stack memory is allocated for the variables.
When you use new to declare an object, this object is stored on the heap, which you need to delete it when you are finishing using it, or it would be a memory leak.
Basically, you only need delete if you new something.

Should one always define a copy constructor for deep copying pointers in a class with raw pointer members?

To my knowledge, in theory, if a class has a raw pointer member, then the default copy constructor will do a shallow copy of that pointer, such that when the original object is destroyed, the pointer member in the copy will have had the value to which it pointed deleted. This seems to imply that outside of the case where we want to restrict copying for some reason, any class with a raw pointer member should define a copy constructor to do a deep copy on that pointer.
I am working with a respectable third party API, and I've come across a class with a raw pointer member, but no defined copy constructor, which casts doubt on my understanding above. Am I missing something?
UPDATE: The third party informed me that this class is not meant to be copied, as the object represents a visual element. They noted that they should have made a private copy constructor.
I think std::reference_wrapper is part of a respectable API (the C++ standard library). It has a copy constructor but that's not necessarily explicitly defined in the implementation code, because it just copies the raw pointer. So there you are: having a pointer member doesn't always imply ownership.
As a counter-example, an example of where you have non-owning pointer member but still need to take charge of copying, an object can contain a pointer to a part of itself. It would be ungood if copying such an object resulted in its internal pointer pointing to a part of some other object.
In summary, it depends. Must often you need a defined copy constructor. But absolutely not always.
For C++03 the law of three (or "rule of three") was a rule of thumb that if you needed either a destructor, a copy assignment operator or a copy constructor, then you probably needed all three.
So check if there is a copy assignment operator or destructor. In that case a copy constructor is probably required, and missing.
In C++11 and later some people extend it to the law of five, by including move assignment operator and move constructor, and some people reduce to the law of zero, by requiring that all ownership should be expressed via smart pointers or collection objects.
In an ideal world, any raw pointer member could be assumed to be a non-owning pointer. Since, if you had wanted an owning pointer you would have used a smart pointer!
A non-owning pointer member will point at an object whose lifetime is managed outside of the class and should be (by design) longer than the lifetime of the pointer.
The default behaviour when you copy the class is for the copy to also point at the same object. Often that is the behaviour you want. If not, yes, you will have to change that behavior.
Of course, we don't live in an ideal world and there are many places where owning raw pointer members are used. In that case, you are right, the default copy constructor is not appropriate (see Alf's answer regarding the rule of 3/5).
No, you're not missing anything. In general.
I'm hesitant to make any specific guarantees without knowing what library you're talking about and what the semantics of that class are (documented or otherwise) but, on the surface of it, this sounds like a library bug.
It's possible that the class doesn't own the pointee. This will likely be the case if its constructor doesn't allocate it, and its destructor doesn't de-allocate it.
Barring that, though, you're right.

Will memcpy or memmove cause problems copying classes?

Suppose I have any kind of class or structure. No virtual functions or anything, just some custom constructors, as well as a few pointers that would require cleanup in the destructor.
Would there be any adverse affects to using memcpy or memmove on this structure? Will deleting a moved structure cause problems? The question assumes that the memory alignment is also correct, and we are copying to safe memory.
In the general case, yes, there will be problems. Both memcpy and memmove are bitwise operations with no further semantics. That might not be sufficient to move the object*, and it is clearly not enough to copy.
In the case of the copy it will break as multiple objects will be referring to the same dynamically allocated memory, and more than one destructor will try to release it. Note that solutions like shared_ptr will not help here, as sharing ownership is part of the further semantics that memcpy/memmove don't offer.
For moving, and depending on the type you might get away with it in some cases. But it won't work if the objects hold pointers/references to the elements being moved (including self-references) as the pointers will be bitwise copied (again, no further semantics of copying/moving) and will refer to the old locations.
The general answer is still the same: don't.
* Don't take move here in the exact C++11 sense. I have seen an implementation of the standard library containers that used special tags to enable moving objects while growing buffers through the use of memcpy, but it required explicit annotations in the stored types that marked the objects as safely movable through memcpy, after the objects were placed in the new buffer the old buffer was discarded without calling any destructors (C++11 move requires leaving the object in a destructible state, which cannot be achieved through this hack)
Generally using memcpy on a class based object is not a good idea. The most likely problem would be copying a pointer and then deleting it. You should use a copy constructor or assignment operator instead.
No, don't do this.
If you memcpy a structure whose destructor deletes a pointer within itself, you'l wind up doing a double delete when the second instance of the structure is destroyed in whatever manner.
The C++ idiom is copy constructor for classes and std::copy or any of its friends for copying ranges/sequences/containers.
If you are using C++11, you can use std::is_trivially_copyable to determine if an object can be copied or moved using memcpy or memmove. From the documentation:
Objects of trivially-copyable types are the only C++ objects that may
be safely copied with std::memcpy or serialized to/from binary files
with std::ofstream::write()/std::ifstream::read(). In general, a
trivially copyable type is any type for which the underlying bytes can
be copied to an array of char or unsigned char and into a new object
of the same type, and the resulting object would have the same value
as the original.
Many classes don't fit this description, and you must beware that classes can change. I would suggest that if you are going to use memcpy/memmove on C++ objects, that you somehow protect unwanted usage. For example, if you're implementing a container class, it's easy for the type that the container holds to be modified, such that it is no longer trivially copyable (eg. somebody adds a virtual function). You could do this with a static_assert:
template<typename T>
class MemcopyableArray
{
static_assert(std::is_trivially_copyable<T>::value, "MemcopyableArray used with object type that is not trivially copyable.");
// ...
};
Aside from safety, which is the most important issue as the other answers have already pointed out, there may also be an issue of performance, especially for small objects.
Even for simple POD types, you may discover that doing proper initialization in the initializer list of your copy constructor (or assignments in the assignment operator depending on your usage) is actually faster than even an intrinsic version of memcpy. This may very well be due to memcpy's entry code which may check for word alignment, overlaps, buffer/memory access rights, etc.... In Visual C++ 10.0 and higher, to give you a specific example, you would be surprised by how much preamble code that tests various things executes before memcpy even begins its logical function.

Using memcpy in the STL

Why does C++'s vector class call copy constructors? Why doesn't it just memcpy the underlying data? Wouldn't that be a lot faster, and remove half of the need for move semantics?
I can't imagine a use case where this would be worse, but then again, maybe it's just because I'm being quite unimaginative.
Because the object needs to be notified that it is being moved. For example, there could be pointers to that given object that need to be fixed because the object is being copied. Or the reference count on a reference counted smart pointer might need to be updated. Or....
If you just memcpy'd the underlying memory, then you'd end up calling the destructor twice on the same object, which is also bad. What if the destructor controls something like an OS file handle?
EDIT: To sum up the above:
The copy constructor and destructor can have side effects. Those side effects need to be preserved.
You can safely memcpy POD and built-in types. Those are defined to have no semantics beyond being a bag of bits with respect to memory.
Types that have constructors and destructors, however, must be constructed and destructed in order to maintain invariants. While you may be able to create types that can be memcpy'd and have constructors, it is very difficult to express that in the type signature. So instead of possibly violating invariants associated with all types that have constructors and/or destructors, STL errs on the side of instead maintaining invariants by using the copy constructor when necessary.
This is why C++0x added move semantics via rvalue references. It means that STL containers can take advantage of types that provide a move constructor and give the compiler enough information to optimize out otherwise expensive construction.

unique_ptr - major improvement?

In the actual C++ standard, creating collections satisfying following rules is hard if not impossible:
exception safety,
cheap internal operations (in actual STL containers: the operations are copies),
automatic memory management.
To satisfy (1), a collection can't store raw pointers. To satisfy (2), a collection must store raw pointers. To satisfy (3), a collection must store objects by value.
Conclusion: the three items conflict with each other.
Item (2) will not be satisfied when shared_ptrs are used because when a collection will need to move an element, it will need to make two calls: to a constructor and to a destructor. No massive, memcpy()-like copy/move operations are possible.
Am I correct that the described problem will be solved by unique_ptr and std::move()? Collections utilizing the tools will be able to satisfy all 3 conditions:
When a collection will be deleted as a side effect of an exception, it will call unique_ptr's destructors. No memory leak.
unique_ptr does not need any extra space for reference counter; therefore its body should be exact the same size, as wrapped pointer,
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations (?),
even if it's not possible, the std::move() operator will allow to move each unique_ptr object without making the constructor/destructor pair calls.
unique_ptr will have exclusive ownership of given memory. No accidental memory leaks will be possible.
Is this true? What are other advantages of using unique_ptr?
I agree entirely. There's at last a natural way of handling heap allocated objects.
In answer to:
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations,
there was a proposal to allow this, but it hasn't made it into the C++11 Standard.
Yes, you are right. I would only add this is possible thanks to r-value references.
When a collection will be deleted as a side effect of an exception, it will call unique_ptr's destructors. No memory leak.
Yes, a container of unique_ptr will satisfy this.
unique_ptr does not need any extra space for reference counter; therefore its body should be exact the same size, as wrapped pointer
unique_ptr's size is implementation-defined. While all reasonable implementations of unique_ptr using it's default destructor will likely only be a pointer in size, there is no guarantee of this in the standard.
I am not sure, but it looks like this allows to move groups of unique_ptrs by using memmove() like operations (?),
Absolutely not. unique_ptr is not a trivial class; therefore, it cannot be memmoved around. Even if it were, you can't just memmove them, because the destructors for the originals need to be called. It would have to be a memmove followed by a memset.
even if it's not possible, the std::move() operator will allow to move each unique_ptr object without making the constructor/destructor pair calls.
Also incorrect. Movement does not make constructors and destructors not be called. The unique_ptr's that are being destroyed need to be destroyed; that requires a call to their destructors. Similarly, the new unique_ptrs need to have their constructors called; that's how an object's lifetime begins.
There's no avoiding that; it's how C++ works.
However, that's not what you should be worried about. Honestly, if you're concerned about a simple constructor/destructor call, you're either in code that you should be hand-optimizing (and thus writing your own code for), or you're prematurely optimizing your code. What matters is not whether constructors/destructors are called; what matters is how fast the resulting code is.
unique_ptr will have exclusive ownership of given memory. No accidental memory leaks will be possible.
Yes, it will.
Personally, I'd say you're doing one of the following:
Being excessively paranoid about copying objects. This is evidence by the fact that you consider putting a shared_ptr in a container is too costly of a copy. This is an all-too-common malady among C++ programmers. That's not to say that copying is always good or something, but obsessing over copying a shared_ptr in a container is ridiculous outside of exceptional circumstances.
Not aware of how to properly use move semantics. If your objects are expensive to copy but cheap to move... then move them into the container. There's no reason to have a pointer indirection when your objects already contain pointer indirections. Just use movement with the objects themselves, not unique_ptrs to objects.
Disregarding the alternatives. Namely, Boost's pointer containers. They seem to have everything you want. They own pointers to their objects, but externally they have value semantics rather than pointer semantics. They're exception safe, and any copying happens with pointers. No unique_ptr constructor/destructor "overhead".
It looks like the three conditions I've enumerated in my post are possible to obtain by using Boost Pointer Container Library.
This question illlustrates why I so love the Boehm garbage collector (libgc). There's never a need to copy anything for reasons of memory management, and indeed, ownership of memory no longer needs to be mentioned as part of APIs. You have to buy a little more RAM to get the same CPU performance, but you save hundreds of hours of programmers' time. You decide.