Can we say bye to copy constructors? - c++

Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.

The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.

Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.

One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).

Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.

Related

Safe to std:move a member?

Have found comparable questions but not exactly with such a case.
Take the following code for example:
#include <iostream>
#include <string>
#include <vector>
struct Inner
{
int a, b;
};
struct Outer
{
Inner inner;
};
std::vector<Inner> vec;
int main()
{
Outer * an_outer = new Outer;
vec.push_back(std::move(an_outer->inner));
delete an_outer;
}
Is this safe? Even if those were polymorphic classes or ones with custom destructors?
My concern regards the instance of "Outer" which has a member variable "inner" moved away. From what I learned, moved things should not be touched anymore. However does that include the delete call that is applied to outer and would technically call delete on inner as well (and thus "touch" it)?
Neither std::move, nor move semantics more generally, have any effect on the object model. They don't stop objects from existing, nor prevent you from using those objects in the future.
What they do is ask to borrow encapsulated resources from the thing you're "moving from". For example, a vector, which directly only stores a pointer some dynamically-allocated data: the concept of ownership of that data can be "stolen" by simply copying that pointer then telling the vector to null the pointer and never have anything to do with that data again. It's yielded. The data belongs to you now. You have the last pointer to it that exists in the universe.
All of this is achieved simply by a bunch of hacks. The first is std::move, which just casts your vector expression to vector&&, so when you pass the result of it to a construction or assignment operation, the version that takes vector&& (the move constructor, or move-assignment operator) is triggered instead of the one that takes const vector&, and that version performs the steps necessary to do what I described in the previous paragraph.
(For other types that we make, we traditionally keep following that pattern, because that's how we can have nice things and persuade people to use our libraries.)
But then you can still use the vector! You can "touch" it. What exactly you can do with it is discoverable from the documentation for vector, and this extends to any other moveable type: the constraints emplaced on your usage of a moved-from object depend entirely on its type, and on the decisions made by the person who designed that type.
None of this has any impact on the lifetime of the vector. It still exists, it still takes memory, and it will still be destructed when the time comes. (In this particular example you can actually .clear() it and start again adding data to a new buffer.)
So, even if ints had any sort of concept of this (they don't; they encapsulate no indirectly-stored data, and own no resources; they have no constructors, so they also have no constructors taking int&&), the delete "touch"ing them would be entirely safe. And, more generally, none of this depends on whether the thing you've moved from is a member or not.
More generally, if you had a type T, and an object of that type, and you moved from it, and one of the constraints for T was that you couldn't delete it after moving from it, that would be a bug in T. That would be a serious mistake by the author of T. Your objects all need to be destructible. The mistake could manifest as a compilation failure or, more likely, undefined behaviour, depending on what exactly the bug was.
tl;dr: Yes, this is safe, for several reasons.
std::move is a cast to an rvalue-reference, which primarily changes which constructor/assignment operator overload is chosen. In your example the move-constructor is the default generated move-constructor, which just copies the ints over so nothing happens.
Whether or not this generally safe depends on the way your classes implement move construction/assignment. Assume for example that your class instead held a pointer. You would have to set that to nullptr in the moved-from class to avoid destroying the pointed-to data, if the moved-from class is destroyed.
Because just defining move-semantics is a custom way almost always leads to problems, the rule of five says that if you customize any of:
the copy constructor
the copy assignment operator
the move constructor
the move assignment operator
the destructor
you should usually customize all to ensure that they behave consistently with the expectations a caller would usually have for your class.

Is C++'s default copy-constructor inherently unsafe? Are iterators fundamentally unsafe too?

I used to think C++'s object model is very robust when best practices are followed.
Just a few minutes ago, though, I had a realization that I hadn't had before.
Consider this code:
class Foo
{
std::set<size_t> set;
std::vector<std::set<size_t>::iterator> vector;
// ...
// (assume every method ensures p always points to a valid element of s)
};
I have written code like this. And until today, I hadn't seen a problem with it.
But, thinking about it a more, I realized that this class is very broken:
Its copy-constructor and copy-assignment copy the iterators inside the vector, which implies that they will still point to the old set! The new one isn't a true copy after all!
In other words, I must manually implement the copy-constructor even though this class isn't managing any resources (no RAII)!
This strikes me as astonishing. I've never come across this issue before, and I don't know of any elegant way to solve it. Thinking about it a bit more, it seems to me that copy construction is unsafe by default -- in fact, it seems to me that classes should not be copyable by default, because any kind of coupling between their instance variables risks rendering the default copy-constructor invalid.
Are iterators fundamentally unsafe to store? Or, should classes really be non-copyable by default?
The solutions I can think of, below, are all undesirable, as they don't let me take advantage of the automatically-generated copy constructor:
Manually implement a copy constructor for every nontrivial class I write. This is not only error-prone, but also painful to write for a complicated class.
Never store iterators as member variables. This seems severely limiting.
Disable copying by default on all classes I write, unless I can explicitly prove they are correct. This seems to run entirely against C++'s design, which is for most types to have value semantics, and thus be copyable.
Is this a well-known problem, and if so, does it have an elegant/idiomatic solution?
C++ copy/move ctor/assign are safe for regular value types. Regular value types behave like integers or other "regular" values.
They are also safe for pointer semantic types, so long as the operation does not change what the pointer "should" point to. Pointing to something "within yourself", or another member, is an example of where it fails.
They are somewhat safe for reference semantic types, but mixing pointer/reference/value semantics in the same class tends to be unsafe/buggy/dangerous in practice.
The rule of zero is that you make classes that behave like either regular value types, or pointer semantic types that don't need to be reseated on copy/move. Then you don't have to write copy/move ctors.
Iterators follow pointer semantics.
The idiomatic/elegant around this is to tightly couple the iterator container with the pointed-into container, and block or write the copy ctor there. They aren't really separate things once one contains pointers into the other.
Yes, this is a well known "problem" -- whenever you store pointers in an object, you're probably going to need some kind of custom copy constructor and assignment operator to ensure that the pointers are all valid and point at the expected things.
Since iterators are just an abstraction of collection element pointers, they have the same issue.
Is this a well-known problem?
Well, it is known, but I would not say well-known. Sibling pointers do not occur often, and most implementations I have seen in the wild were broken in the exact same way than yours is.
I believe the problem to be infrequent enough to have escaped most people's notice; interestingly, as I follow more Rust than C++ nowadays, it crops up there quite often because of the strictness of the type system (ie, the compiler refuses those programs, prompting questions).
does it have an elegant/idiomatic solution?
There are many types of sibling pointers situations, so it really depends, however I know of two generic solutions:
keys
shared elements
Let's review them in order.
Pointing to a class-member, or pointing into an indexable container, then one can use an offset or key rather than an iterator. It is slightly less efficient (and might require a look-up) however it is a fairly simple strategy. I have seen it used to great effect in shared-memory situation (where using pointers is a no-no since the shared-memory area may be mapped at different addresses).
The other solution is used by Boost.MultiIndex, and consists in an alternative memory layout. It stems from the principle of the intrusive container: instead of putting the element into the container (moving it in memory), an intrusive container uses hooks already inside the element to wire it at the right place. Starting from there, it is easy enough to use different hooks to wire a single elements into multiple containers, right?
Well, Boost.MultiIndex kicks it two steps further:
It uses the traditional container interface (ie, move your object in), but the node to which the object is moved in is an element with multiple hooks
It uses various hooks/containers in a single entity
You can check various examples and notably Example 5: Sequenced Indices looks a lot like your own code.
Is this a well-known problem
Yes. Any time you have a class that contains pointers, or pointer-like data like an iterator, you have to implement your own copy-constructor and assignment-operator to ensure the new object has valid pointers/iterators.
and if so, does it have an elegant/idiomatic solution?
Maybe not as elegant as you might like, and probably is not the best in performance (but then, copies sometimes are not, which is why C++11 added move semantics), but maybe something like this would work for you (assuming the std::vector contains iterators into the std::set of the same parent object):
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
struct findAndPushIterator
{
Foo &foo;
findAndPushIterator(Foo &f) : foo(f) {}
void operator()(const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = foo.s.find(*iter);
if (found != foo.s.end())
foo.v.push_back(found);
}
};
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(), findAndPushIterator(*this));
return *this;
}
//...
};
Or, if using C++11:
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(),
[this](const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = s.find(*iter);
if (found != s.end())
v.push_back(found);
}
);
return *this;
}
//...
};
Yes, of course it's a well-known problem.
If your class stored pointers, as an experienced developer you would intuitively know that the default copy behaviours may not be sufficient for that class.
Your class stores iterators and, since they are also "handles" to data stored elsewhere, the same logic applies.
This is hardly "astonishing".
The assertion that Foo is not managing any resources is false.
Copy-constructor aside, if a element of set is removed, there must be code in Foo that manages vector so that the respective iterator is removed.
I think the idiomatic solution is to just use one container, a vector<size_t>, and check that the count of an element is zero before inserting. Then the copy and move defaults are fine.
"Inherently unsafe"
No, the features you mention are not inherently unsafe; the fact that you thought of three possible safe solutions to the problem is evidence that there is no "inherent" lack of safety here, even though you think the solutions are undesirable.
And yes, there is RAII here: the containers (set and vector) are managing resources. I think your point is that the RAII is "already taken care of" by the std containers. But you need to then consider the container instances themselves to be "resources", and in fact your class is managing them. You're correct that you're not directly managing heap memory, because this aspect of the management problem is taken care of for you by the standard library. But there's more to the management problem, which I'll talk a bit more about below.
"Magic" default behavior
The problem is that you are apparently hoping that you can trust the default copy constructor to "do the right thing" in a non-trivial case such as this. I'm not sure why you expected the right behavior--perhaps you're hoping that memorizing rules-of-thumb such as the "rule of 3" will be a robust way to ensure that you don't shoot yourself in the foot? Certainly that would be nice (and, as pointed out in another answer, Rust goes much further than other low-level languages toward making foot-shooting much harder), but C++ simply isn't designed for "thoughtless" class design of that sort, nor should it be.
Conceptualizing constructor behavior
I'm not going to try to address the question of whether this is a "well-known problem", because I don't really know how well-characterized the problem of "sister" data and iterator-storing is. But I hope that I can convince you that, if you take the time to think about copy-constructor-behavior for every class you write that can be copied, this shouldn't be a surprising problem.
In particular, when deciding to use the default copy-constructor, you must think about what the default copy-constructor will actually do: namely, it will call the copy-constructor of each non-primitive, non-union member (i.e. members that have copy-constructors) and bitwise-copy the rest.
When copying your vector of iterators, what does std::vector's copy-constructor do? It performs a "deep copy", i.e., the data inside the vector is copied. Now, if the vector contains iterators, how does that affect the situation? Well, it's simple: the iterators are the data stored by the vector, so the iterators themselves will be copied. What does an iterator's copy-constructor do? I'm not going to actually look this up, because I don't need to know the specifics: I just need to know that iterators are like pointers in this (and other respect), and copying a pointer just copies the pointer itself, not the data pointed to. I.e., iterators and pointers do not have deep-copying by default.
Note that this is not surprising: of course iterators don't do deep-copying by default. If they did, you'd get a different, new set for each iterator being copied. And this makes even less sense than it initially appears: for instance, what would it actually mean if uni-directional iterators made deep-copies of their data? Presumably you'd get a partial copy, i.e., all the remaining data that's still "in front of" the iterator's current position, plus a new iterator pointing to the "front" of the new data structure.
Now consider that there is no way for a copy-constructor to know the context in which it's being called. For instance, consider the following code:
using iter = std::set<size_t>::iterator; // use typedef pre-C++11
std::vector<iter> foo = getIters(); // get a vector of iterators
useIters(foo); // pass vector by value
When getIters is called, the return value might be moved, but it might also be copy-constructed. The assignment to foo also invokes a copy-constructor, though this may also be elided. And unless useIters takes its argument by reference, then you've also got a copy constructor call there.
In any of these cases, would you expect the copy constructor to change which std::set is pointed to by the iterators contained by the std::vector<iter>? Of course not! So naturally std::vector's copy-constructor can't be designed to modify the iterators in that particular way, and in fact std::vector's copy-constructor is exactly what you need in most cases where it will actually be used.
However, suppose std::vector could work like this: suppose it had a special overload for "vector-of-iterators" that could re-seat the iterators, and that the compiler could somehow be "told" only to invoke this special constructor when the iterators actually need to be re-seated. (Note that the solution of "only invoke the special overload when generating a default constructor for a containing class that also contains an instance of the iterators' underlying data type" wouldn't work; what if the std::vector iterators in your case were pointing at a different standard set, and were being treated simply as a reference to data managed by some other class? Heck, how is the compiler supposed to know whether the iterators all point to the same std::set?) Ignoring this problem of how the compiler would know when to invoke this special constructor, what would the constructor code look like? Let's try it, using _Ctnr<T>::iterator as our iterator type (I'll use C++11/14isms and be a bit sloppy, but the overall point should be clear):
template <typename T, typename _Ctnr>
std::vector< _Ctnr<T>::iterator> (const std::vector< _Ctnr<T>::iterator>& rhs)
: _data{ /* ... */ } // initialize underlying data...
{
for (auto i& : rhs)
{
_data.emplace_back( /* ... */ ); // What do we put here?
}
}
Okay, so we want each new, copied iterator to be re-seated to refer to a different instance of _Ctnr<T>. But where would this information come from? Note that the copy-constructor can't take the new _Ctnr<T> as an argument: then it would no longer be a copy-constructor. And in any case, how would the compiler know which _Ctnr<T> to provide? (Note, too, that for many containers, finding the "corresponding iterator" for the new container may be non-trivial.)
Resource management with std:: containers
This isn't just an issue of the compiler not being as "smart" as it could or should be. This is an instance where you, the programmer, have a specific design in mind that requires a specific solution. In particular, as mentioned above, you have two resources, both std:: containers. And you have a relationship between them. Here we get to something that most of the other answers have stated, and which by this point should be very, very clear: related class members require special care, since C++ does not manage this coupling by default. But what I hope is also clear by this point is that you shouldn't think of the problem as arising specifically because of data-member coupling; the problem is simply that default-construction isn't magic, and the programmer must be aware of the requirements for correctly copying a class before deciding to let the implicitly-generated constructor handle copying.
The elegant solution
...And now we get to aesthetics and opinions. You seem to find it inelegant to be forced to write a copy-constructor when you don't have any raw pointers or arrays in your class that must be manually managed.
But user-defined copy constructors are elegant; allowing you to write them is C++'s elegant solution to the problem of writing correct non-trivial classes.
Admittedly, this seems like a case where the "rule of 3" doesn't quite apply, since there's a clear need to either =delete the copy-constructor or write it yourself, but there's no clear need (yet) for a user-defined destructor. But again, you can't simply program based on rules of thumb and expect everything to work correctly, especially in a low-level language such as C++; you must be aware of the details of (1) what you actually want and (2) how that can be achieved.
So, given that the coupling between your std::set and your std::vector actually creates a non-trivial problem, solving the problem by wrapping them together in a class that correctly implements (or simply deletes) the copy-constructor is actually a very elegant (and idiomatic) solution.
Explicitly defining versus deleting
You mention a potential new "rule of thumb" to follow in your coding practices: "Disable copying by default on all classes I write, unless I can explicitly prove they are correct." While this might be a safer rule of thumb (at least in this case) than the "rule of 3" (especially when your criterion for "do I need to implement the 3" is to check whether a deleter is required), my above caution against relying on rules of thumb still applies.
But I think the solution here is actually simpler than the proposed rule of thumb. You don't need to formally prove the correctness of the default method; you simply need to have a basic idea of what it would do, and what you need it to do.
Above, in my analysis of your particular case, I went into a lot of detail--for instance, I brought up the possibility of "deep-copying iterators". You don't need to go into this much detail to determine whether or not the default copy-constructor will work correctly. Instead, simply imagine what your manually-created copy constructor will look like; you should be able to tell pretty quickly how similar your imaginary explicitly-defined constructor is to the one the compiler would generate.
For example, a class Foo containing a single vector data will have a copy constructor that looks like this:
Foo::Foo(const Foo& rhs)
: data{rhs.data}
{}
Without even writing that out, you know that you can rely on the implicitly-generated one, because it's exactly the same as what you'd have written above.
Now, consider the constructor for your class Foo:
Foo::Foo(const Foo& rhs)
: set{rhs.set}
, vector{ /* somehow use both rhs.set AND rhs.vector */ } // ...????
{}
Right away, given that simply copying vector's members won't work, you can tell that the default constructor won't work. So now you need to decide whether your class needs to be copyable or not.

Purpose of assignment operator overloading in C++

I'm trying to understand the purpose of overloading some operators in C++.
Conceptually, an assignment statement can be easily implemented via:
Destruction of the old object followed by copy construction of the new object
Copy construction of the new object, followed by a swap with the old object, followed by destruction of the old object
In fact, often, the copy-and-swap implementation is the implementation of assignment in real code.
Why, then, does C++ allow the programmer to overload the assignment operator, instead of just performing the above?
Was it intended to allow a scenario in which assignment is faster than destruction + construction?
If so, when does that happen? And if not, then what use case was it intended to support?
1) Reference counting
Suppose you have a resource that is ref-counted and it is wrapped in objects.
void operator=(const MyObject& v) {
this->resource = refCount(v->resource);
}
// Here the internal resource will be copied and not the objects.
MyObject A = B;
2) Or you just want to copy the fields without fancy semantics.
void operator=(const MyObject& v) {
this->someField = v->someField;
}
// So this statement should generate just one function call and not a fancy
// collection of temporaries that require construction destruction.
MyObject A = B;
In both cases the code runs much faster. In the second case the effect is similar.
3) Also what about types...
Use the operator to handle assigning other types to your type.
void operator=(const MyObject& v) {
this->someField = v->someField;
}
void operator=(int x) {
this->value = x;
}
void operator=(float y) {
this->floatValue = y;
}
first, note that "Destruction of the old object followed by copy construction of the new object" is not exception safe.
but re "Copy construction of the new object, followed by a swap with the old object, followed by destruction of the old object", that's the swap idiom for implementing an assignment operator, and it's exception safe if done correctly.
in some cases a custom assignment operator can be faster than the swap idiom. for example, direct arrays of POD type can't really be swapped except by way of lower level assignments. so there for the swap idiom you can expect an overhead proportional to the array size.
however, historically there wasn't much focus on swapping and exception safety.
bjarne wanted exceptions originally (if i recall correctly), but they didn't get into the language until 1989 or thereabouts. so the original c++ way of programming was more focused on assignments. to the degree that a failing constructor signalled its failure by assigning 0 to this… i think, that in those days your question would not have made sense. it was just assignments all over.
typewise, some objects have identity, and others have value. it makes sense to assign to value objects, but for identity objects one typically wants to limit the ways that the object can be modified. while this doesn't require the ability to customize copy assignment (only to make it unavailable), with that ability one doesn't need any other language support.
and i think likewise for any other specific reasons one can think of: probably no such reason really requires the general ability, but the general ability is sufficient to cover it all, so it lowers the overall language complexity.
a good source to get more definitive answer than my hunches, recollections and gut feelings, is bjarne's "the design and evolution of c++" book.
probably the question has a definitive answer there.
Destruction of the old object, followed by copy construction of
the new, will not usually work. And the swap idiom is
guaranteed not to work unless the class provides a special swap
function—std::swap uses assignment in its unspecialized
implementation, and using it directly in the assignment operator
will lead to endless recursion.
And of course, the user may want to do something special, e.g.
make the assignment operator private, for example.
And finally, what is almost certainly an overruling reason: the
default assignment operator has to be compatible with C.
Actually, after seeing juanchopanza's answer (which was deleted), I think I ended up figuring it out myself.
Copy-assignment operators allow classes like basic_string to avoid allocating resources unnecessarily when they can re-use them (in this case, memory).
So when you assign to a basic_string, an overloaded copy assignment operator would avoid allocating memory, and would just copy the string data directly to the existing buffer.
If the object had to be destroyed and constructed again, the buffer would have to be reallocated, which would be potentially much more costly for a small string.
(Note that vector could benefit from this too, but only if it knew that the elements' copy constructors would never throw exceptions. Otherwise it would need to maintain its exception safety and actually perform a copy-and-swap.)
It allows you to use assignment of other types as well.
You could have a class Person with an assignment operator that assigns an ID.
But besides that, you don't always want to copy all the members as they are.
The default assignment only does a shallow copy.
For example, if the class contains pointers, or locks, you dont always want to copy them from the other object.
Usually when you have pointers you want to use a deep copy, and maybe create a copy of the object that the pointers are pointed to.
And if you have locks, you want them to be specific to the object, and you don't want to copy their state from the other object.
It is actually a common practice to provide your own copy constructor and assignment operator if your class holds pointers as members.
I have used it often as a conversion constructor but with already existing objects. i.e assigning member variable type, etc to an object.

Does D have something akin to C++0x's move semantics?

A problem of "value types" with external resources (like std::vector<T> or std::string) is that copying them tends to be quite expensive, and copies are created implicitly in various contexts, so this tends to be a performance concern. C++0x's answer to this problem is move semantics, which is conceptionally based on the idea of resource pilfering and technically powered by rvalue references.
Does D have anything similar to move semantics or rvalue references?
I believe that there are several places in D (such as returning structs) that D manages to make them moves whereas C++ would make them a copy. IIRC, the compiler will do a move rather than a copy in any case where it can determine that a copy isn't needed, so struct copying is going to happen less in D than in C++. And of course, since classes are references, they don't have the problem at all.
But regardless, copy construction already works differently in D than in C++. Generally, instead of declaring a copy constructor, you declare a postblit constructor: this(this). It does a full memcpy before this(this) is called, and you only make whatever changes are necessary to ensure that the new struct is separate from the original (such as doing a deep copy of member variables where needed), as opposed to creating an entirely new constructor that must copy everything. So, the general approach is already a bit different from C++. It's also generally agreed upon that structs should not have expensive postblit constructors - copying structs should be cheap - so it's less of an issue than it would be in C++. Objects which would be expensive to copy are generally either classes or structs with reference or COW semantics.
Containers are generally reference types (in Phobos, they're structs rather than classes, since they don't need polymorphism, but copying them does not copy their contents, so they're still reference types), so copying them around is not expensive like it would be in C++.
There may very well be cases in D where it could use something similar to a move constructor, but in general, D has been designed in such a way as to reduce the problems that C++ has with copying objects around, so it's nowhere near the problem that it is in C++.
I think all answers completely failed to answer the original question.
First, as stated above, the question is only relevant for structs. Classes have no meaningful move. Also stated above, for structs, a certain amount of move will happen automatically by the compiler under certain conditions.
If you wish to get control over the move operations, here's what you have to do. You can disable copying by annotating this(this) with #disable. Next, you can override C++'s constructor(constructor &&that) by defining this(Struct that). Likewise, you can override the assign with opAssign(Struct that). In both cases, you need to make sure that you destroy the values of that.
For assignment, since you also need to destroy the old value of this, the simplest way is to swap them. An implementation of C++'s unique_ptr would, therefore, look something like this:
struct UniquePtr(T) {
private T* ptr = null;
#disable this(this); // This disables both copy construction and opAssign
// The obvious constructor, destructor and accessor
this(T* ptr) {
if(ptr !is null)
this.ptr = ptr;
}
~this() {
freeMemory(ptr);
}
inout(T)* get() inout {
return ptr;
}
// Move operations
this(UniquePtr!T that) {
this.ptr = that.ptr;
that.ptr = null;
}
ref UniquePtr!T opAssign(UniquePtr!T that) { // Notice no "ref" on "that"
swap(this.ptr, that.ptr); // We change it anyways, because it's a temporary
return this;
}
}
Edit:
Notice I did not define opAssign(ref UniquePtr!T that). That is the copy assignment operator, and if you try to define it, the compiler will error out because you declared, in the #disable line, that you have no such thing.
D have separate value and object semantics :
if you declare your type as struct, it will have value semantic by default
if you declare your type as class, it will have object semantic.
Now, assuming you don't manage the memory yourself, as it's the default case in D - using a garbage collector - you have to understand that object of types declared as class are automatically pointers (or "reference" if you prefer) to the real object, not the real object itself.
So, when passing vectors around in D, what you pass is the reference/pointer. Automatically. No copy involved (other than the copy of the reference).
That's why D, C#, Java and other language don't "need" moving semantic (as most types are object semantic and are manipulated by reference, not by copy).
Maybe they could implement it, I'm not sure. But would they really get performance boost as in C++? By nature, it don't seem likely.
I somehow have the feeling that actually the rvalue references and the whole concept of "move semantics" is a consequence that it's normal in C++ to create local, "temporary" stack objects. In D and most GC languages, it's most common to have objects on the heap, and then there's no overhead with having a temporary object copied (or moved) several times when returning it through a call stack - so there's no need for a mechanism to avoid that overhead too.
In D (and most GC languages) a class object is never copied implicitly and you're only passing the reference around most of the time, so this may mean that you don't need any rvalue references for them.
OTOH, struct objects are NOT supposed to be "handles to resources", but simple value types behaving similar to builtin types - so again, no reason for any move semantics here, IMHO.
This would yield a conclusion - D doesn't have rvalue refs because it doesn't need them.
However, I haven't used rvalue references in practice, I've only had a read on them, so I might have skipped some actual use cases of this feature. Please treat this post as a bunch of thoughts on the matter which hopefully would be helpful for you, not as a reliable judgement.
I think if you need the source to loose the resource you might be in trouble. However being GC'ed you can often avoid needing to worry about multiple owners so it might not be an issue for most cases.

Using memcpy in the STL

Why does C++'s vector class call copy constructors? Why doesn't it just memcpy the underlying data? Wouldn't that be a lot faster, and remove half of the need for move semantics?
I can't imagine a use case where this would be worse, but then again, maybe it's just because I'm being quite unimaginative.
Because the object needs to be notified that it is being moved. For example, there could be pointers to that given object that need to be fixed because the object is being copied. Or the reference count on a reference counted smart pointer might need to be updated. Or....
If you just memcpy'd the underlying memory, then you'd end up calling the destructor twice on the same object, which is also bad. What if the destructor controls something like an OS file handle?
EDIT: To sum up the above:
The copy constructor and destructor can have side effects. Those side effects need to be preserved.
You can safely memcpy POD and built-in types. Those are defined to have no semantics beyond being a bag of bits with respect to memory.
Types that have constructors and destructors, however, must be constructed and destructed in order to maintain invariants. While you may be able to create types that can be memcpy'd and have constructors, it is very difficult to express that in the type signature. So instead of possibly violating invariants associated with all types that have constructors and/or destructors, STL errs on the side of instead maintaining invariants by using the copy constructor when necessary.
This is why C++0x added move semantics via rvalue references. It means that STL containers can take advantage of types that provide a move constructor and give the compiler enough information to optimize out otherwise expensive construction.