In C, we have the functions memcpy and memmove to efficiently copy data around. The former yields undefined behavior if the source and destination regions overlap, but the latter is guaranteed to deal with that "as expected," presumably by noticing the direction of overlap and (if necessary) choosing a different algorithm.
The above functions are available in C++ (as std::memcpy and std::memmove), of course, but they don't really work with non-trivial classes. Instead, we get std::copy and std::copy_backward. Each of these works if the source and destination ranges don't overlap; moreover, each is guaranteed to work for one "direction" of overlap.
What can we use if we want to copy from one region to another and we don't know at compile-time if the ranges may overlap or in what direction that overlap may occur? It doesn't seem that we have an option. For general iterators it may be difficult to determine if ranges overlap, so I understand why no solution is provided in that case, but what about when we're dealing with pointers? Ideally, there'd be a function like:
template<class T>
T * copy_either_direction(const T * inputBegin, const T * inputEnd, T * outputBegin) {
if ("outputBegin ∈ [inputBegin, inputEnd)") {
outputBegin += (inputEnd - inputBegin);
std::copy_backward(inputBegin, inputEnd, outputBegin);
return outputBegin;
} else {
return std::copy(inputBegin, inputEnd, outputBegin);
}
}
(A similar function with T * replaced with std::vector<T>::iterator would also be nice. Even better would be if this were guaranteed to work if inputBegin == outputBegin, but that's a separate gripe of mine.)
Unfortunately, I don't see a sensible way to write the condition in the if statement, as comparing pointers into separate blocks of memory often yields undefined behavior. On the other hand, the implementation clearly has its own way to do this, as std::memmove inherently requires one. Thus, any implementation could provide such a function, thereby filling a need that the programmer simply can't. Since std::memmove was considered useful, why not copy_either_direction? Is there a solution I'm missing?
memmove works because it traffics in pointers to contiguous bytes, so the ranges of the two blocks to be copied are well defined. copy and move take iterators that don't necessarily point at contiguous ranges. For example, a list iterator can jump around in memory; there's no range that the code can look at, and no meaningful notion of overlap.
I recently learned that std::less is specialized for pointers in a way that provides a total order, presumably to allow one to store pointers in std::sets and related classes. Assuming that this must agree with the standard ordering whenever the latter is defined, I think the following will work:
#include <functional>
template<class T>
T * copy_either_direction(const T * inputBegin, const T * inputEnd, T * outputBegin) {
if (std::less<const T *>()(inputBegin, outputBegin)) {
outputBegin += (inputEnd - inputBegin);
std::copy_backward(inputBegin, inputEnd, outputBegin);
return outputBegin;
} else {
return std::copy(inputBegin, inputEnd, outputBegin);
}
}
What can we use if we want to copy from one region to another and we don't know at compile-time if the ranges may overlap or in what direction that overlap may occur?
This is not a logically consistent concept.
After a copy operation, you will have two objects. And each object is defined by a separate and distinct region of memory. You cannot have objects which overlap in this way (you can have subobjects, but an object type cannot be its own subobject). And therefore, it is impossible to copy an object on top of part of itself.
To move an object on top of itself is also not logically consistent. Why? Because moving is a fiction in C++; after the move, you still have two perfectly functional objects. A move operation is merely a destructive copy, one that steals resources owned by the other object. It is still there, and it is still a viable object.
And since an object cannot overlap with another object, it is again impossible.
Trivially copyable types get around this because they are just blocks of bits, with no destructors or specialized copy operations. So their lifetime is not as rigid as that of others. A non-trivially copyable type cannot do this because:
The experience with memmove suggests that there could be a solution in this case (and perhaps also for iterators into contiguous containers).
This is neither possible nor generally desirable for types which are not trivially copyable in C++.
The rules for trivial copyability are that the type has no non-trivial copy/move constructors/assignment operators, as well as no non-trivial destructor. A trivial copy/move constructor/assignment is nothing more than a memcpy, and a trivial destructor does nothing. And therefore, these rules effectively ensures that the type is nothing more than a "block of bits". And one "block of bits" is no different from another, so copying it via memmove is a legal construct.
If a type has a real destructor, then the type is maintaining some sort of invariant that requires actual effort to maintain. It may free up a pointer or release a file handle or whatever. Given that, it makes no sense to copy the bits, because now you have two objects that reference the same pointer/file handle. That's a bad thing to do, because the class will generally want to control how that gets handled.
There is no way to solve this problem without the class itself being involved in the copy operation. Different class have different behavior with respect to managing their internals. Indeed, that is the entire purpose of objects having copy constructors and assignment operators. So that a class can decide for itself how to maintain the sanity of its own state.
And it doesn't even have to be a pointer or file handle. Maybe each class instance has a unique identifier; such a value is generated at construction time, and it never gets copied (new copies get new values). For you to violate such a restriction with memmove leaves your program in an indeterminate state, because you will have code that expects such identifiers to be unique.
That's why memmoveing for non-trivially copyable types yields undefined behavior.
Related
I've got following code:
std::list some_data;
...
std::list new_data = std::move(some_data);
some_data.clear();
...
The question is whether some_data.clear() is necessary? (for the record, some_data will be reused in the future)
Yes, it's necessary.
Only the std smart pointers are guaranteed to be in a default constructed state after being moved from.
Containers are in an valid, but unspecified state. This means you can only call member functions without preconditions, e.g. clear, that put the object in a fully known state.
The working draft of standard (N4713) states about the state of objects after they are moved from:
20.5.5.15 Moved-from state of library types [lib.types.movedfrom]
1 Objects of types defined in the C++ standard library may be moved from. Move operations may be explicitly specified or implicitly generated. Unless otherwise specified, such moved-from objects shall be placed in a valid but unspecified state.
20.3.25 [defns.valid]
valid but unspecified state value of an object that is not specified except that the object’s invariants are met and operations on the object behave as specified for its type
The only operations you can safely perform on a moved-from container are those that do not require a precondition. clear has no preconditions. And it will return the object to a known state from which it can be used again.
No it is not necessary to clear the list. It is just reasonable and convenient to do so.
The list will have N elements, each taking an unspecified value, for some N≥0. So if you want to re-poppulate the list, you may in theory assign to the elements that are kept rather than clear and re-insert everything from scratch.
Of course chances that N≠0 are vanishingly small, so in practice clearing is the right choice.
Short answer:
Yes, you should clear it, because the standard doesn't explicitly define in what state the source list is after the move.
Furthermore, always calling clear() when a container gets reused after being moved from is a simple rule that should not cause any significant overhead even when it is redundant.
Longer answer:
As I mentioned in some comments, I can't imagine any legal and sensible implementation for which the source std::list will be anything but an empty std::list after being used as the source for a move constructor (thanks #Lightness Races in Orbit for digging through the standard). Move constructing must happen in linear time, which doesn't allow any per-object operations and AFAIK it is also not allowed to have a small, fixed-size inplace buffer that could be a source of left-over "zombie" elements in the source list.
However, even if you can track down all of that in the standard, you'd still have to document that everytime you omit the clear() in the code and be wary of someone refactoring the code by replacing the std::list with a homegrown container, that doesn't quite full fill the requirements in the standard (e.g. to make use of the small buffer optimization). Also, the previous statement is only valid for move construction. On moveassignment it is absolutely possible that the moved from container will not be empty.
In summary: Even if it was technically correct to not use clear in this particular case, it is imho not worth the mental effort to do it (and if you are in an environment where it is worth it, you are probably tuning your code to a particular version of the standard library and not the standard document anyway).
It rather depends on what you mean by "reused". There are many reuses that totally define the state of the object, with no regard to it's previous state.
The obvious one is assigning to a moved-from object, which occurs in the naive swap
template <typename T>
void swap(T& lhs, T& rhs)
{
T temp = std::move(lhs);
lhs = std::move(rhs); // lhs is moved-from, but we don't care.
rhs = std::move(temp); // rhs is moved-from, but we don't care.
}
If you instead wish to "reuse" by calling some_data.push_back, then yes, you should clear it first
I used to think C++'s object model is very robust when best practices are followed.
Just a few minutes ago, though, I had a realization that I hadn't had before.
Consider this code:
class Foo
{
std::set<size_t> set;
std::vector<std::set<size_t>::iterator> vector;
// ...
// (assume every method ensures p always points to a valid element of s)
};
I have written code like this. And until today, I hadn't seen a problem with it.
But, thinking about it a more, I realized that this class is very broken:
Its copy-constructor and copy-assignment copy the iterators inside the vector, which implies that they will still point to the old set! The new one isn't a true copy after all!
In other words, I must manually implement the copy-constructor even though this class isn't managing any resources (no RAII)!
This strikes me as astonishing. I've never come across this issue before, and I don't know of any elegant way to solve it. Thinking about it a bit more, it seems to me that copy construction is unsafe by default -- in fact, it seems to me that classes should not be copyable by default, because any kind of coupling between their instance variables risks rendering the default copy-constructor invalid.
Are iterators fundamentally unsafe to store? Or, should classes really be non-copyable by default?
The solutions I can think of, below, are all undesirable, as they don't let me take advantage of the automatically-generated copy constructor:
Manually implement a copy constructor for every nontrivial class I write. This is not only error-prone, but also painful to write for a complicated class.
Never store iterators as member variables. This seems severely limiting.
Disable copying by default on all classes I write, unless I can explicitly prove they are correct. This seems to run entirely against C++'s design, which is for most types to have value semantics, and thus be copyable.
Is this a well-known problem, and if so, does it have an elegant/idiomatic solution?
C++ copy/move ctor/assign are safe for regular value types. Regular value types behave like integers or other "regular" values.
They are also safe for pointer semantic types, so long as the operation does not change what the pointer "should" point to. Pointing to something "within yourself", or another member, is an example of where it fails.
They are somewhat safe for reference semantic types, but mixing pointer/reference/value semantics in the same class tends to be unsafe/buggy/dangerous in practice.
The rule of zero is that you make classes that behave like either regular value types, or pointer semantic types that don't need to be reseated on copy/move. Then you don't have to write copy/move ctors.
Iterators follow pointer semantics.
The idiomatic/elegant around this is to tightly couple the iterator container with the pointed-into container, and block or write the copy ctor there. They aren't really separate things once one contains pointers into the other.
Yes, this is a well known "problem" -- whenever you store pointers in an object, you're probably going to need some kind of custom copy constructor and assignment operator to ensure that the pointers are all valid and point at the expected things.
Since iterators are just an abstraction of collection element pointers, they have the same issue.
Is this a well-known problem?
Well, it is known, but I would not say well-known. Sibling pointers do not occur often, and most implementations I have seen in the wild were broken in the exact same way than yours is.
I believe the problem to be infrequent enough to have escaped most people's notice; interestingly, as I follow more Rust than C++ nowadays, it crops up there quite often because of the strictness of the type system (ie, the compiler refuses those programs, prompting questions).
does it have an elegant/idiomatic solution?
There are many types of sibling pointers situations, so it really depends, however I know of two generic solutions:
keys
shared elements
Let's review them in order.
Pointing to a class-member, or pointing into an indexable container, then one can use an offset or key rather than an iterator. It is slightly less efficient (and might require a look-up) however it is a fairly simple strategy. I have seen it used to great effect in shared-memory situation (where using pointers is a no-no since the shared-memory area may be mapped at different addresses).
The other solution is used by Boost.MultiIndex, and consists in an alternative memory layout. It stems from the principle of the intrusive container: instead of putting the element into the container (moving it in memory), an intrusive container uses hooks already inside the element to wire it at the right place. Starting from there, it is easy enough to use different hooks to wire a single elements into multiple containers, right?
Well, Boost.MultiIndex kicks it two steps further:
It uses the traditional container interface (ie, move your object in), but the node to which the object is moved in is an element with multiple hooks
It uses various hooks/containers in a single entity
You can check various examples and notably Example 5: Sequenced Indices looks a lot like your own code.
Is this a well-known problem
Yes. Any time you have a class that contains pointers, or pointer-like data like an iterator, you have to implement your own copy-constructor and assignment-operator to ensure the new object has valid pointers/iterators.
and if so, does it have an elegant/idiomatic solution?
Maybe not as elegant as you might like, and probably is not the best in performance (but then, copies sometimes are not, which is why C++11 added move semantics), but maybe something like this would work for you (assuming the std::vector contains iterators into the std::set of the same parent object):
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
struct findAndPushIterator
{
Foo &foo;
findAndPushIterator(Foo &f) : foo(f) {}
void operator()(const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = foo.s.find(*iter);
if (found != foo.s.end())
foo.v.push_back(found);
}
};
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(), findAndPushIterator(*this));
return *this;
}
//...
};
Or, if using C++11:
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(),
[this](const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = s.find(*iter);
if (found != s.end())
v.push_back(found);
}
);
return *this;
}
//...
};
Yes, of course it's a well-known problem.
If your class stored pointers, as an experienced developer you would intuitively know that the default copy behaviours may not be sufficient for that class.
Your class stores iterators and, since they are also "handles" to data stored elsewhere, the same logic applies.
This is hardly "astonishing".
The assertion that Foo is not managing any resources is false.
Copy-constructor aside, if a element of set is removed, there must be code in Foo that manages vector so that the respective iterator is removed.
I think the idiomatic solution is to just use one container, a vector<size_t>, and check that the count of an element is zero before inserting. Then the copy and move defaults are fine.
"Inherently unsafe"
No, the features you mention are not inherently unsafe; the fact that you thought of three possible safe solutions to the problem is evidence that there is no "inherent" lack of safety here, even though you think the solutions are undesirable.
And yes, there is RAII here: the containers (set and vector) are managing resources. I think your point is that the RAII is "already taken care of" by the std containers. But you need to then consider the container instances themselves to be "resources", and in fact your class is managing them. You're correct that you're not directly managing heap memory, because this aspect of the management problem is taken care of for you by the standard library. But there's more to the management problem, which I'll talk a bit more about below.
"Magic" default behavior
The problem is that you are apparently hoping that you can trust the default copy constructor to "do the right thing" in a non-trivial case such as this. I'm not sure why you expected the right behavior--perhaps you're hoping that memorizing rules-of-thumb such as the "rule of 3" will be a robust way to ensure that you don't shoot yourself in the foot? Certainly that would be nice (and, as pointed out in another answer, Rust goes much further than other low-level languages toward making foot-shooting much harder), but C++ simply isn't designed for "thoughtless" class design of that sort, nor should it be.
Conceptualizing constructor behavior
I'm not going to try to address the question of whether this is a "well-known problem", because I don't really know how well-characterized the problem of "sister" data and iterator-storing is. But I hope that I can convince you that, if you take the time to think about copy-constructor-behavior for every class you write that can be copied, this shouldn't be a surprising problem.
In particular, when deciding to use the default copy-constructor, you must think about what the default copy-constructor will actually do: namely, it will call the copy-constructor of each non-primitive, non-union member (i.e. members that have copy-constructors) and bitwise-copy the rest.
When copying your vector of iterators, what does std::vector's copy-constructor do? It performs a "deep copy", i.e., the data inside the vector is copied. Now, if the vector contains iterators, how does that affect the situation? Well, it's simple: the iterators are the data stored by the vector, so the iterators themselves will be copied. What does an iterator's copy-constructor do? I'm not going to actually look this up, because I don't need to know the specifics: I just need to know that iterators are like pointers in this (and other respect), and copying a pointer just copies the pointer itself, not the data pointed to. I.e., iterators and pointers do not have deep-copying by default.
Note that this is not surprising: of course iterators don't do deep-copying by default. If they did, you'd get a different, new set for each iterator being copied. And this makes even less sense than it initially appears: for instance, what would it actually mean if uni-directional iterators made deep-copies of their data? Presumably you'd get a partial copy, i.e., all the remaining data that's still "in front of" the iterator's current position, plus a new iterator pointing to the "front" of the new data structure.
Now consider that there is no way for a copy-constructor to know the context in which it's being called. For instance, consider the following code:
using iter = std::set<size_t>::iterator; // use typedef pre-C++11
std::vector<iter> foo = getIters(); // get a vector of iterators
useIters(foo); // pass vector by value
When getIters is called, the return value might be moved, but it might also be copy-constructed. The assignment to foo also invokes a copy-constructor, though this may also be elided. And unless useIters takes its argument by reference, then you've also got a copy constructor call there.
In any of these cases, would you expect the copy constructor to change which std::set is pointed to by the iterators contained by the std::vector<iter>? Of course not! So naturally std::vector's copy-constructor can't be designed to modify the iterators in that particular way, and in fact std::vector's copy-constructor is exactly what you need in most cases where it will actually be used.
However, suppose std::vector could work like this: suppose it had a special overload for "vector-of-iterators" that could re-seat the iterators, and that the compiler could somehow be "told" only to invoke this special constructor when the iterators actually need to be re-seated. (Note that the solution of "only invoke the special overload when generating a default constructor for a containing class that also contains an instance of the iterators' underlying data type" wouldn't work; what if the std::vector iterators in your case were pointing at a different standard set, and were being treated simply as a reference to data managed by some other class? Heck, how is the compiler supposed to know whether the iterators all point to the same std::set?) Ignoring this problem of how the compiler would know when to invoke this special constructor, what would the constructor code look like? Let's try it, using _Ctnr<T>::iterator as our iterator type (I'll use C++11/14isms and be a bit sloppy, but the overall point should be clear):
template <typename T, typename _Ctnr>
std::vector< _Ctnr<T>::iterator> (const std::vector< _Ctnr<T>::iterator>& rhs)
: _data{ /* ... */ } // initialize underlying data...
{
for (auto i& : rhs)
{
_data.emplace_back( /* ... */ ); // What do we put here?
}
}
Okay, so we want each new, copied iterator to be re-seated to refer to a different instance of _Ctnr<T>. But where would this information come from? Note that the copy-constructor can't take the new _Ctnr<T> as an argument: then it would no longer be a copy-constructor. And in any case, how would the compiler know which _Ctnr<T> to provide? (Note, too, that for many containers, finding the "corresponding iterator" for the new container may be non-trivial.)
Resource management with std:: containers
This isn't just an issue of the compiler not being as "smart" as it could or should be. This is an instance where you, the programmer, have a specific design in mind that requires a specific solution. In particular, as mentioned above, you have two resources, both std:: containers. And you have a relationship between them. Here we get to something that most of the other answers have stated, and which by this point should be very, very clear: related class members require special care, since C++ does not manage this coupling by default. But what I hope is also clear by this point is that you shouldn't think of the problem as arising specifically because of data-member coupling; the problem is simply that default-construction isn't magic, and the programmer must be aware of the requirements for correctly copying a class before deciding to let the implicitly-generated constructor handle copying.
The elegant solution
...And now we get to aesthetics and opinions. You seem to find it inelegant to be forced to write a copy-constructor when you don't have any raw pointers or arrays in your class that must be manually managed.
But user-defined copy constructors are elegant; allowing you to write them is C++'s elegant solution to the problem of writing correct non-trivial classes.
Admittedly, this seems like a case where the "rule of 3" doesn't quite apply, since there's a clear need to either =delete the copy-constructor or write it yourself, but there's no clear need (yet) for a user-defined destructor. But again, you can't simply program based on rules of thumb and expect everything to work correctly, especially in a low-level language such as C++; you must be aware of the details of (1) what you actually want and (2) how that can be achieved.
So, given that the coupling between your std::set and your std::vector actually creates a non-trivial problem, solving the problem by wrapping them together in a class that correctly implements (or simply deletes) the copy-constructor is actually a very elegant (and idiomatic) solution.
Explicitly defining versus deleting
You mention a potential new "rule of thumb" to follow in your coding practices: "Disable copying by default on all classes I write, unless I can explicitly prove they are correct." While this might be a safer rule of thumb (at least in this case) than the "rule of 3" (especially when your criterion for "do I need to implement the 3" is to check whether a deleter is required), my above caution against relying on rules of thumb still applies.
But I think the solution here is actually simpler than the proposed rule of thumb. You don't need to formally prove the correctness of the default method; you simply need to have a basic idea of what it would do, and what you need it to do.
Above, in my analysis of your particular case, I went into a lot of detail--for instance, I brought up the possibility of "deep-copying iterators". You don't need to go into this much detail to determine whether or not the default copy-constructor will work correctly. Instead, simply imagine what your manually-created copy constructor will look like; you should be able to tell pretty quickly how similar your imaginary explicitly-defined constructor is to the one the compiler would generate.
For example, a class Foo containing a single vector data will have a copy constructor that looks like this:
Foo::Foo(const Foo& rhs)
: data{rhs.data}
{}
Without even writing that out, you know that you can rely on the implicitly-generated one, because it's exactly the same as what you'd have written above.
Now, consider the constructor for your class Foo:
Foo::Foo(const Foo& rhs)
: set{rhs.set}
, vector{ /* somehow use both rhs.set AND rhs.vector */ } // ...????
{}
Right away, given that simply copying vector's members won't work, you can tell that the default constructor won't work. So now you need to decide whether your class needs to be copyable or not.
Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.
The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.
Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.
One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).
Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.
Suppose I have any kind of class or structure. No virtual functions or anything, just some custom constructors, as well as a few pointers that would require cleanup in the destructor.
Would there be any adverse affects to using memcpy or memmove on this structure? Will deleting a moved structure cause problems? The question assumes that the memory alignment is also correct, and we are copying to safe memory.
In the general case, yes, there will be problems. Both memcpy and memmove are bitwise operations with no further semantics. That might not be sufficient to move the object*, and it is clearly not enough to copy.
In the case of the copy it will break as multiple objects will be referring to the same dynamically allocated memory, and more than one destructor will try to release it. Note that solutions like shared_ptr will not help here, as sharing ownership is part of the further semantics that memcpy/memmove don't offer.
For moving, and depending on the type you might get away with it in some cases. But it won't work if the objects hold pointers/references to the elements being moved (including self-references) as the pointers will be bitwise copied (again, no further semantics of copying/moving) and will refer to the old locations.
The general answer is still the same: don't.
* Don't take move here in the exact C++11 sense. I have seen an implementation of the standard library containers that used special tags to enable moving objects while growing buffers through the use of memcpy, but it required explicit annotations in the stored types that marked the objects as safely movable through memcpy, after the objects were placed in the new buffer the old buffer was discarded without calling any destructors (C++11 move requires leaving the object in a destructible state, which cannot be achieved through this hack)
Generally using memcpy on a class based object is not a good idea. The most likely problem would be copying a pointer and then deleting it. You should use a copy constructor or assignment operator instead.
No, don't do this.
If you memcpy a structure whose destructor deletes a pointer within itself, you'l wind up doing a double delete when the second instance of the structure is destroyed in whatever manner.
The C++ idiom is copy constructor for classes and std::copy or any of its friends for copying ranges/sequences/containers.
If you are using C++11, you can use std::is_trivially_copyable to determine if an object can be copied or moved using memcpy or memmove. From the documentation:
Objects of trivially-copyable types are the only C++ objects that may
be safely copied with std::memcpy or serialized to/from binary files
with std::ofstream::write()/std::ifstream::read(). In general, a
trivially copyable type is any type for which the underlying bytes can
be copied to an array of char or unsigned char and into a new object
of the same type, and the resulting object would have the same value
as the original.
Many classes don't fit this description, and you must beware that classes can change. I would suggest that if you are going to use memcpy/memmove on C++ objects, that you somehow protect unwanted usage. For example, if you're implementing a container class, it's easy for the type that the container holds to be modified, such that it is no longer trivially copyable (eg. somebody adds a virtual function). You could do this with a static_assert:
template<typename T>
class MemcopyableArray
{
static_assert(std::is_trivially_copyable<T>::value, "MemcopyableArray used with object type that is not trivially copyable.");
// ...
};
Aside from safety, which is the most important issue as the other answers have already pointed out, there may also be an issue of performance, especially for small objects.
Even for simple POD types, you may discover that doing proper initialization in the initializer list of your copy constructor (or assignments in the assignment operator depending on your usage) is actually faster than even an intrinsic version of memcpy. This may very well be due to memcpy's entry code which may check for word alignment, overlaps, buffer/memory access rights, etc.... In Visual C++ 10.0 and higher, to give you a specific example, you would be surprised by how much preamble code that tests various things executes before memcpy even begins its logical function.
Why does C++'s vector class call copy constructors? Why doesn't it just memcpy the underlying data? Wouldn't that be a lot faster, and remove half of the need for move semantics?
I can't imagine a use case where this would be worse, but then again, maybe it's just because I'm being quite unimaginative.
Because the object needs to be notified that it is being moved. For example, there could be pointers to that given object that need to be fixed because the object is being copied. Or the reference count on a reference counted smart pointer might need to be updated. Or....
If you just memcpy'd the underlying memory, then you'd end up calling the destructor twice on the same object, which is also bad. What if the destructor controls something like an OS file handle?
EDIT: To sum up the above:
The copy constructor and destructor can have side effects. Those side effects need to be preserved.
You can safely memcpy POD and built-in types. Those are defined to have no semantics beyond being a bag of bits with respect to memory.
Types that have constructors and destructors, however, must be constructed and destructed in order to maintain invariants. While you may be able to create types that can be memcpy'd and have constructors, it is very difficult to express that in the type signature. So instead of possibly violating invariants associated with all types that have constructors and/or destructors, STL errs on the side of instead maintaining invariants by using the copy constructor when necessary.
This is why C++0x added move semantics via rvalue references. It means that STL containers can take advantage of types that provide a move constructor and give the compiler enough information to optimize out otherwise expensive construction.