When are R-value references necessary? - c++

I understand that, without R-value references, perfect forwarding in C++ would be impossible.
However, I would like to know: is there anything else that necessitates them?
For example, this page points out this example in which R-value references are apparently necessary:
X foo();
X x;
// perhaps use x in various ways
x = foo();
The last line above:
Destructs the resource held by x,
Clones the resource from the temporary returned by foo,
Destructs the temporary and thereby releases its resource.
However, it seems to me that a simple change would have fixed the problem, if swap were implemented properly:
X foo();
X x;
// perhaps use x in various ways
{
X y = foo();
swap(x, y);
}
So it seems to me that r-value references not necessary for this optimization. (Is that correct?)
So, what are some problems that could not be solved with r-value references (except for perfect forwarding, about which I already know)?

So, what are some problems that could not be solved with r-value references (except for perfect forwarding, about which I already know)?
Yes. In order for the swap trick to work (or at least, work optimally), the class must be designed to be in an empty state when constructed. Imagine a vector implementation that always reserved a few elements, rather than starting off totally empty. Swapping from such a default-constructed vector with an already existing vector would mean doing an extra allocation (in the default constructor of this vector implementation).
With an actual move, the default-constructed object can have allocated data, but the moved-from object can be left in an unallocated state.
Also, consider this code:
std::unique_ptr<T> make_unique(...) {return std::unique_ptr<T>(new T(...));}
std::unique_ptr<T> unique = make_unique();
This is not possible without language-level move semantics. Why? Because of the second statement. By the standard, this is copy initialization. That, as the name suggests, requires the type to be copyable. And the whole point of unique_ptr is that it is, well, unique. IE: not copyable.
If move semantics didn't exist, you couldn't return it. Oh yes, you can talk about copy elision and such, but remember: elision is an optimization. A conforming compiler must still detect that the copy constructor exists and is callable; the compiler simply has the option to not call it.
So no, you can't just emulate move semantics with swap. And quite frankly, even if you could, why would you ever want to?
You have to write swap implementations yourself. So if you've got a class with 6 members, your swap function has to swap each member in turn. Recursively through all base classes and such too.
C++11 (though not VC10 or VC2012) allows the compiler to write the move constructors for you.
Yes, there are circumstances where you can get away without it. But they look incredibly hacky, are hard to see why you're doing things that way, and most important of all, make things difficult for both the reader and the writer.
When people want to do something a lot for performance sake that makes their code tedious to read, difficult to write, and error-prone to maintain (add another member to the class without adding it to the swap function, for example), then you're looking at something that probably should become part of the language. Especially when it's something that the compiler can handle quite easily.

Related

Reducing assignment of temporary object to in-place construction

Using std::list supporting move semantics as an example.
std::list<std::string> X;
... //X is used in various ways
X=std::list<std::string>({"foo","bar","dead","beef"});
The most straightforward way for compiler to do the assignment since C++11 is:
destroy X
construct std::list
move std::list to X
Now, compiler isn't allowed to do following instead:
destroy X
contruct std::list in-place
because while this obviously saves another memcpy it eliminates assignment. What is the convenient way of making second behaviour possible and usable? Is it planned in future versions of C++?
My guess is that C++ still does not offer that except with writing:
X.~X();
new(&X) std::list<std::string>({"foo","bar","dead","beef"});
Am I right?
You can actually do it by defining operator= to take an initializer list.
For std::list, just call
X = {"foo","bar","dead","beef"}.
In your case, what was happening is actually:
Construct a temporary
Call move assignment operator on X with the temporary
On most objects, such as std::list, this won't actually be expensive compared to simply constructing an object.
However, it still incurs additional allocations for the internal storage of the second std::list, which could be avoided: we could reuse the internal storage already allocated for X if possible. What is happenning is:
Construct: the temporary allocates some space for the elements
Move: the pointer is moved to X; the space used by X before is freed
Some objects overload the assignment operator to take an initializer list, and it is the case for std::vector and std::list. Such an operator may use the storage already allocated internally, which is the most effective solution here.
// Please insert the usual rambling about premature optimization here
Is it planned in future versions of C++?
No. And thank goodness for that.
Assignment is not the same as destroy-then-create. X is not destroyed in your assignment example. X is a live object; the contents of X may be destroyed, but X itself never is. And nor should it be.
If you want to destroy X, then you have that ability, using the explicit-destructor-and-placement-new. Though thanks to the possibility of const members, you'll also need to launder the pointer to the object if you want to be safe. But assignment should never be considered equivalent to that.
If efficiency is your concern, it's much better to use the assign member function. By using assign, the X has the opportunity to reuse the existing allocations. And this would almost certainly make it faster than your "destroy-plus-construct" version. The cost of moving a linked list into another object is trivial; the cost of having to destroy all of those allocations only to allocate them again is not.
This is especially important for std::list, as it has a lot of allocations.
Worst-case scenario, assign will be no less efficient than whatever else you could come up with from outside the class. And best-case, it will be far better.
When you have a statement involving a move assignment:
x = std::move(y);
The destructor is not called for x before doing the move. However, after the move, at some point the destructor will be called for y. The idea behind a move assignment operator is that it might be able to move the contents of y to x in a simple way (for example, copying a pointer to y's storage into x). It also has to ensure its previous contents are destroyed properly (it may opt to swap this with y, because you know y may not be used anymore, and that y's destructor will be called).
If the move assignment is inlined, the compiler might be able to deduce that all the operations necessary for moving storage from y to x are just equivalent to in-place construction.
Re your final question
” Am I right?
No.
Your ideas about what's permitted or not, are wrong. The compiler is permitted to substitute any optimization as long as it preserves the observable effects. This is called the "as if" rule. Possible optimizations include removing all of that code, if it does not affect anything observable. In particular, your "isn't allowed" for the second example is completely false, and the reasoning "it eliminates assignment" applies also to your first example, where you draw the opposite conclusion, i.e. there's a self-contradiction right there.

Pass by value vs pass by rvalue reference

When should I declare my function as:
void foo(Widget w);
as opposed to:
void foo(Widget&& w);?
Assume this is the only overload (as in, I pick one or the other, not both, and no other overloads). No templates involved. Assume that the function foo requires ownership of the Widget (e.g. const Widget& is not part of this discussion). I'm not interested in any answer outside the scope of these circumstances. (See addendum at end of post for why these constraints are part of the question.)
The primary difference that my colleagues and I can come up with is that the rvalue reference parameter forces you to be explicit about copies. The caller is responsible for making an explicit copy and then passing it in with std::move when you want a copy. In the pass by value case, the cost of the copy is hidden:
//If foo is a pass by value function, calling + making a copy:
Widget x{};
foo(x); //Implicit copy
//Not shown: continues to use x locally
//If foo is a pass by rvalue reference function, calling + making a copy:
Widget x{};
//foo(x); //This would be a compiler error
auto copy = x; //Explicit copy
foo(std::move(copy));
//Not shown: continues to use x locally
Other than forcing people to be explicit about copying and changing how much syntactic sugar you get when calling the function, how else are these different? What do they say differently about the interface? Are they more or less efficient than one another?
Other things that my colleagues and I have already thought of:
The rvalue reference parameter means that you may move the argument, but does not mandate it. It is possible that the argument you passed in at the call site will be in its original state afterwards. It's also possible the function would eat/change the argument without ever calling a move constructor but assume that because it was an rvalue reference, the caller relinquished control. Pass by value, if you move into it, you must assume that a move happened; there's no choice.
Assuming no elisions, a single move constructor call is eliminated with pass by rvalue.
The compiler has better opportunity to elide copies/moves with pass by value. Can anyone substantiate this claim? Preferably with a link to gcc.godbolt.org showing optimized generated code from gcc/clang rather than a line in the standard. My attempt at showing this was probably not able to successfully isolate the behavior: https://godbolt.org/g/4yomtt
Addendum: why am I constraining this problem so much?
No overloads - if there were other overloads, this would devolve into a discussion of pass by value vs a set of overloads that include both const reference and rvalue reference, at which point the set of overloads is obviously more efficient and wins. This is well known, and therefore not interesting.
No templates - I'm not interested in how forwarding references fit into the picture. If you have a forwarding reference, you call std::forward anyway. The goal with a forwarding reference is to pass things as you received them. Copies aren't relevant because you just pass an lvalue instead. It's well known, and not interesting.
foo requires ownership of Widget (aka no const Widget&) - We're not talking about read-only functions. If the function was read-only or didn't need to own or extend the lifetime of the Widget, then the answer trivially becomes const Widget&, which again, is well known, and not interesting. I also refer you to why we don't want to talk about overloads.
What do rvalue usages say about an interface versus copying?
rvalue suggests to the caller that the function both wants to own the value and has no intention of letting the caller know of any changes it has made. Consider the following (I know you said no lvalue references in your example, but bear with me):
//Hello. I want my own local copy of your Widget that I will manipulate,
//but I don't want my changes to affect the one you have. I may or may not
//hold onto it for later, but that's none of your business.
void foo(Widget w);
//Hello. I want to take your Widget and play with it. It may be in a
//different state than when you gave it to me, but it'll still be yours
//when I'm finished. Trust me!
void foo(Widget& w);
//Hello. Can I see that Widget of yours? I don't want to mess with it;
//I just want to check something out on it. Read that one value from it,
//or observe what state it's in. I won't touch it and I won't keep it.
void foo(const Widget& w);
//Hello. Ooh, I like that Widget you have. You're not going to use it
//anymore, are you? Please just give it to me. Thank you! It's my
//responsibility now, so don't worry about it anymore, m'kay?
void foo(Widget&& w);
For another way of looking at it:
//Here, let me buy you a new car just like mine. I don't care if you wreck
//it or give it a new paint job; you have yours and I have mine.
void foo(Car c);
//Here are the keys to my car. I understand that it may come back...
//not quite the same... as I lent it to you, but I'm okay with that.
void foo(Car& c);
//Here are the keys to my car as long as you promise to not give it a
//paint job or anything like that
void foo(const Car& c);
//I don't need my car anymore, so I'm signing the title over to you now.
//Happy birthday!
void foo(Car&& c);
Now, if Widgets have to remain unique (as actual widgets in, say, GTK do) then the first option cannot work. The second, third and fourth options make sense, because there's still only one real representation of the data. Anyway, that's what those semantics say to me when I see them in code.
Now, as for efficiency: it depends. rvalue references can save a lot of time if Widget has a pointer to a data member whose pointed-to contents can be rather large (think an array). Since the caller used an rvalue, they're saying they don't care about what they're giving you anymore. So, if you want to move the caller's Widget's contents into your Widget, just take their pointer. No need to meticulously copy each element in the data structure their pointer points to. This can lead to pretty good improvements in speed (again, think arrays). But if the Widget class doesn't have any such thing, this benefit is nowhere to be seen.
Hopefully that gets at what you were asking; if not, I can perhaps expand/clarify things.
The rvalue reference parameter forces you to be explicit about copies.
Yes, pass-by-rvalue-reference got a point.
The rvalue reference parameter means that you may move the argument, but does not mandate it.
Yes, pass-by-value got a point.
But that also gives to pass-by-rvalue the opportunity to handle exception guarantee: if foo throws, widget value is not necessary consumed.
For move-only types (as std::unique_ptr), pass-by-value seems to be the norm (mostly for your second point, and first point is not applicable anyway).
EDIT: standard library contradicts my previous sentence, one of shared_ptr's constructor takes std::unique_ptr<T, D>&&.
For types which have both copy/move (as std::shared_ptr), we have the choice of the coherency with previous types or force to be explicit on copy.
Unless you want to guarantee there is no unwanted copy, I would use pass-by-value for coherency.
Unless you want guaranteed and/or immediate sink, I would use pass-by-rvalue.
For existing code base, I would keep consistency.
Unless the type is a move-only type you normally have an option to pass by reference-to-const and it seems arbitrary to make it "not part of the discussion" but I will try.
I think the choice partly depends on what foo is going to do with the parameter.
The function needs a local copy
Let's say Widget is an iterator and you want to implement your own std::next function. next needs its own copy to advance and then return. In this case your choice is something like:
Widget next(Widget it, int n = 1){
std::advance(it, n);
return it;
}
vs
Widget next(Widget&& it, int n = 1){
std::advance(it, n);
return std::move(it);
}
I think by-value is better here. From the signature you can see it is taking a copy. If the caller wants to avoid a copy they can do a std::move and guarantee the variable is moved from but they can still pass lvalues if they want to.
With pass-by-rvalue-reference the caller cannot guarantee that the variable has been moved from.
Move-assignment to a copy
Let's say you have a class WidgetHolder:
class WidgetHolder {
Widget widget;
//...
};
and you need to implement a setWidget member function. I'm going to assume you already have an overload that takes a reference-to-const:
WidgetHolder::setWidget(const Widget& w) {
widget = w;
}
but after measuring performance you decide you need to optimize for r-values. You have a choice between replacing it with:
WidgetHolder::setWidget(Widget w) {
widget = std::move(w);
}
Or overloading with:
WidgetHolder::setWidget(Widget&& widget) {
widget = std::move(w);
}
This one is a little bit more tricky. It is tempting choose pass-by-value because it accepts both rvalues and lvalues so you don't need two overloads. However it is unconditionally taking a copy so you can't take advantage of any existing capacity in the member variable. The pass by reference-to-const and pass by r-value reference overloads use assignment without taking a copy which might be faster
Move-construct a copy
Now lets say you are writing the constructor for WidgetHolder and as before you have already implemented a constructor that takes an reference-to-const:
WidgetHolder::WidgetHolder(const Widget& w) : widget(w) {
}
and as before you have measured peformance and decided you need to optimize for rvalues. You have a choice between replacing it with:
WidgetHolder::WidgetHolder(Widget w) : widget(std::move(w)) {
}
Or overloading with:
WidgetHolder::WidgetHolder(Widget&& w) : widget(std:move(w)) {
}
In this case, the member variable cannot have any existing capacity since this is the constructor. You are move-constucting a copy. Also, constructors often take many parameters so it can be quite a pain to write all the different permutations of overloads to optimize for r-value references. So in this case it is a good idea to use pass-by-value, especially if the constructor takes many such parameters.
Passing unique_ptr
With unique_ptr the efficiency concerns are less important given that a move is so cheap and it doesn't have any capacity. More important is expressiveness and correctness. There is a good discussion of how to pass unique_ptr here.
When you pass by rvalue reference object lifetimes get complicated. If the callee does not move out of the argument, the destruction of the argument is delayed. I think this is interesting in two cases.
First, you have an RAII class
void fn(RAII &&);
RAII x{underlying_resource};
fn(std::move(x));
// later in the code
RAII y{underlying_resource};
When initializing y, the resource could still be held by x if fn doesn't move out of the rvalue reference. In the pass by value code, we know that x gets moved out of, and fn releases x. This is probably a case where you would want to pass by value, and the copy constructor would likely be deleted, so you wouldn't have to worry about accidental copies.
Second, if the argument is a large object and the function doesn't move out, the lifetime of the vectors data is larger than in the case of pass by value.
vector<B> fn1(vector<A> &&x);
vector<C> fn2(vector<B> &&x);
vector<A> va; // large vector
vector<B> vb = fn1(std::move(va));
vector<C> vc = fn2(std::move(vb));
In the example above, if fn1 and fn2 don't move out of x, then you will end up with all of the data in all of the vectors still alive. If you instead pass by value, only the last vector's data will still be alive (assuming vectors move constructor clears the sources vector).
One issue not mentioned in the other answers is the idea of exception-safety.
In general, if the function throws an exception, we would ideally like to have the strong exception guarantee, meaning that the call has no effect other than raising the exception. If pass-by-value uses the move constructor, then such an effect is essentially unavoidable. So an rvalue-reference argument may be superior in some cases. (Of course, there are various cases where the strong exception guarantee isn't achievable either way, as well as various cases where the no-throw guarantee is available either way. So this is not relevant in 100% of cases. But it's relevant sometimes.)
Choosing between by-value and by-rvalue-ref, with no other overloads, is not meaningful.
With pass by value the actual argument can be an lvalue expression.
With pass by rvalue-ref the actual argument must be an rvalue.
If the function is storing a copy of the argument, then a sensible choice is between pass-by-value, and a set of overloads with pass-by-ref-to-const and pass-by-rvalue-ref. For an rvalue expression as actual argument the set of overloads can avoid one move. It's an engineering gut-feeling decision whether the micro-optimization is worth the added complexity and typing.
One notable difference is that if you move to an pass-by-value function:
void foo(Widget w);
foo(std::move(copy));
compiler must generate a move-constructor call Widget(Widget&&) to create the value object. In case of pass-by-rvalue-reference no such call is needed as the rvalue-reference is passed directly to the method. Usually this does not matter, as move constructors are trivial (or default) and are inlined most of the time.
(you can check it on gcc.godbolt.org -- in your example declare move constructor Widget(Widget&&); and it will show up in assembly)
So my rule of thumb is this:
if the object represents a unique resource (without copy semantics) I prefer to use pass-by-rvalue-reference,
otherwise if it logically makes sense to either move or copy the object, I use pass-by-value.

Is C++'s default copy-constructor inherently unsafe? Are iterators fundamentally unsafe too?

I used to think C++'s object model is very robust when best practices are followed.
Just a few minutes ago, though, I had a realization that I hadn't had before.
Consider this code:
class Foo
{
std::set<size_t> set;
std::vector<std::set<size_t>::iterator> vector;
// ...
// (assume every method ensures p always points to a valid element of s)
};
I have written code like this. And until today, I hadn't seen a problem with it.
But, thinking about it a more, I realized that this class is very broken:
Its copy-constructor and copy-assignment copy the iterators inside the vector, which implies that they will still point to the old set! The new one isn't a true copy after all!
In other words, I must manually implement the copy-constructor even though this class isn't managing any resources (no RAII)!
This strikes me as astonishing. I've never come across this issue before, and I don't know of any elegant way to solve it. Thinking about it a bit more, it seems to me that copy construction is unsafe by default -- in fact, it seems to me that classes should not be copyable by default, because any kind of coupling between their instance variables risks rendering the default copy-constructor invalid.
Are iterators fundamentally unsafe to store? Or, should classes really be non-copyable by default?
The solutions I can think of, below, are all undesirable, as they don't let me take advantage of the automatically-generated copy constructor:
Manually implement a copy constructor for every nontrivial class I write. This is not only error-prone, but also painful to write for a complicated class.
Never store iterators as member variables. This seems severely limiting.
Disable copying by default on all classes I write, unless I can explicitly prove they are correct. This seems to run entirely against C++'s design, which is for most types to have value semantics, and thus be copyable.
Is this a well-known problem, and if so, does it have an elegant/idiomatic solution?
C++ copy/move ctor/assign are safe for regular value types. Regular value types behave like integers or other "regular" values.
They are also safe for pointer semantic types, so long as the operation does not change what the pointer "should" point to. Pointing to something "within yourself", or another member, is an example of where it fails.
They are somewhat safe for reference semantic types, but mixing pointer/reference/value semantics in the same class tends to be unsafe/buggy/dangerous in practice.
The rule of zero is that you make classes that behave like either regular value types, or pointer semantic types that don't need to be reseated on copy/move. Then you don't have to write copy/move ctors.
Iterators follow pointer semantics.
The idiomatic/elegant around this is to tightly couple the iterator container with the pointed-into container, and block or write the copy ctor there. They aren't really separate things once one contains pointers into the other.
Yes, this is a well known "problem" -- whenever you store pointers in an object, you're probably going to need some kind of custom copy constructor and assignment operator to ensure that the pointers are all valid and point at the expected things.
Since iterators are just an abstraction of collection element pointers, they have the same issue.
Is this a well-known problem?
Well, it is known, but I would not say well-known. Sibling pointers do not occur often, and most implementations I have seen in the wild were broken in the exact same way than yours is.
I believe the problem to be infrequent enough to have escaped most people's notice; interestingly, as I follow more Rust than C++ nowadays, it crops up there quite often because of the strictness of the type system (ie, the compiler refuses those programs, prompting questions).
does it have an elegant/idiomatic solution?
There are many types of sibling pointers situations, so it really depends, however I know of two generic solutions:
keys
shared elements
Let's review them in order.
Pointing to a class-member, or pointing into an indexable container, then one can use an offset or key rather than an iterator. It is slightly less efficient (and might require a look-up) however it is a fairly simple strategy. I have seen it used to great effect in shared-memory situation (where using pointers is a no-no since the shared-memory area may be mapped at different addresses).
The other solution is used by Boost.MultiIndex, and consists in an alternative memory layout. It stems from the principle of the intrusive container: instead of putting the element into the container (moving it in memory), an intrusive container uses hooks already inside the element to wire it at the right place. Starting from there, it is easy enough to use different hooks to wire a single elements into multiple containers, right?
Well, Boost.MultiIndex kicks it two steps further:
It uses the traditional container interface (ie, move your object in), but the node to which the object is moved in is an element with multiple hooks
It uses various hooks/containers in a single entity
You can check various examples and notably Example 5: Sequenced Indices looks a lot like your own code.
Is this a well-known problem
Yes. Any time you have a class that contains pointers, or pointer-like data like an iterator, you have to implement your own copy-constructor and assignment-operator to ensure the new object has valid pointers/iterators.
and if so, does it have an elegant/idiomatic solution?
Maybe not as elegant as you might like, and probably is not the best in performance (but then, copies sometimes are not, which is why C++11 added move semantics), but maybe something like this would work for you (assuming the std::vector contains iterators into the std::set of the same parent object):
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
struct findAndPushIterator
{
Foo &foo;
findAndPushIterator(Foo &f) : foo(f) {}
void operator()(const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = foo.s.find(*iter);
if (found != foo.s.end())
foo.v.push_back(found);
}
};
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(), findAndPushIterator(*this));
return *this;
}
//...
};
Or, if using C++11:
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(),
[this](const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = s.find(*iter);
if (found != s.end())
v.push_back(found);
}
);
return *this;
}
//...
};
Yes, of course it's a well-known problem.
If your class stored pointers, as an experienced developer you would intuitively know that the default copy behaviours may not be sufficient for that class.
Your class stores iterators and, since they are also "handles" to data stored elsewhere, the same logic applies.
This is hardly "astonishing".
The assertion that Foo is not managing any resources is false.
Copy-constructor aside, if a element of set is removed, there must be code in Foo that manages vector so that the respective iterator is removed.
I think the idiomatic solution is to just use one container, a vector<size_t>, and check that the count of an element is zero before inserting. Then the copy and move defaults are fine.
"Inherently unsafe"
No, the features you mention are not inherently unsafe; the fact that you thought of three possible safe solutions to the problem is evidence that there is no "inherent" lack of safety here, even though you think the solutions are undesirable.
And yes, there is RAII here: the containers (set and vector) are managing resources. I think your point is that the RAII is "already taken care of" by the std containers. But you need to then consider the container instances themselves to be "resources", and in fact your class is managing them. You're correct that you're not directly managing heap memory, because this aspect of the management problem is taken care of for you by the standard library. But there's more to the management problem, which I'll talk a bit more about below.
"Magic" default behavior
The problem is that you are apparently hoping that you can trust the default copy constructor to "do the right thing" in a non-trivial case such as this. I'm not sure why you expected the right behavior--perhaps you're hoping that memorizing rules-of-thumb such as the "rule of 3" will be a robust way to ensure that you don't shoot yourself in the foot? Certainly that would be nice (and, as pointed out in another answer, Rust goes much further than other low-level languages toward making foot-shooting much harder), but C++ simply isn't designed for "thoughtless" class design of that sort, nor should it be.
Conceptualizing constructor behavior
I'm not going to try to address the question of whether this is a "well-known problem", because I don't really know how well-characterized the problem of "sister" data and iterator-storing is. But I hope that I can convince you that, if you take the time to think about copy-constructor-behavior for every class you write that can be copied, this shouldn't be a surprising problem.
In particular, when deciding to use the default copy-constructor, you must think about what the default copy-constructor will actually do: namely, it will call the copy-constructor of each non-primitive, non-union member (i.e. members that have copy-constructors) and bitwise-copy the rest.
When copying your vector of iterators, what does std::vector's copy-constructor do? It performs a "deep copy", i.e., the data inside the vector is copied. Now, if the vector contains iterators, how does that affect the situation? Well, it's simple: the iterators are the data stored by the vector, so the iterators themselves will be copied. What does an iterator's copy-constructor do? I'm not going to actually look this up, because I don't need to know the specifics: I just need to know that iterators are like pointers in this (and other respect), and copying a pointer just copies the pointer itself, not the data pointed to. I.e., iterators and pointers do not have deep-copying by default.
Note that this is not surprising: of course iterators don't do deep-copying by default. If they did, you'd get a different, new set for each iterator being copied. And this makes even less sense than it initially appears: for instance, what would it actually mean if uni-directional iterators made deep-copies of their data? Presumably you'd get a partial copy, i.e., all the remaining data that's still "in front of" the iterator's current position, plus a new iterator pointing to the "front" of the new data structure.
Now consider that there is no way for a copy-constructor to know the context in which it's being called. For instance, consider the following code:
using iter = std::set<size_t>::iterator; // use typedef pre-C++11
std::vector<iter> foo = getIters(); // get a vector of iterators
useIters(foo); // pass vector by value
When getIters is called, the return value might be moved, but it might also be copy-constructed. The assignment to foo also invokes a copy-constructor, though this may also be elided. And unless useIters takes its argument by reference, then you've also got a copy constructor call there.
In any of these cases, would you expect the copy constructor to change which std::set is pointed to by the iterators contained by the std::vector<iter>? Of course not! So naturally std::vector's copy-constructor can't be designed to modify the iterators in that particular way, and in fact std::vector's copy-constructor is exactly what you need in most cases where it will actually be used.
However, suppose std::vector could work like this: suppose it had a special overload for "vector-of-iterators" that could re-seat the iterators, and that the compiler could somehow be "told" only to invoke this special constructor when the iterators actually need to be re-seated. (Note that the solution of "only invoke the special overload when generating a default constructor for a containing class that also contains an instance of the iterators' underlying data type" wouldn't work; what if the std::vector iterators in your case were pointing at a different standard set, and were being treated simply as a reference to data managed by some other class? Heck, how is the compiler supposed to know whether the iterators all point to the same std::set?) Ignoring this problem of how the compiler would know when to invoke this special constructor, what would the constructor code look like? Let's try it, using _Ctnr<T>::iterator as our iterator type (I'll use C++11/14isms and be a bit sloppy, but the overall point should be clear):
template <typename T, typename _Ctnr>
std::vector< _Ctnr<T>::iterator> (const std::vector< _Ctnr<T>::iterator>& rhs)
: _data{ /* ... */ } // initialize underlying data...
{
for (auto i& : rhs)
{
_data.emplace_back( /* ... */ ); // What do we put here?
}
}
Okay, so we want each new, copied iterator to be re-seated to refer to a different instance of _Ctnr<T>. But where would this information come from? Note that the copy-constructor can't take the new _Ctnr<T> as an argument: then it would no longer be a copy-constructor. And in any case, how would the compiler know which _Ctnr<T> to provide? (Note, too, that for many containers, finding the "corresponding iterator" for the new container may be non-trivial.)
Resource management with std:: containers
This isn't just an issue of the compiler not being as "smart" as it could or should be. This is an instance where you, the programmer, have a specific design in mind that requires a specific solution. In particular, as mentioned above, you have two resources, both std:: containers. And you have a relationship between them. Here we get to something that most of the other answers have stated, and which by this point should be very, very clear: related class members require special care, since C++ does not manage this coupling by default. But what I hope is also clear by this point is that you shouldn't think of the problem as arising specifically because of data-member coupling; the problem is simply that default-construction isn't magic, and the programmer must be aware of the requirements for correctly copying a class before deciding to let the implicitly-generated constructor handle copying.
The elegant solution
...And now we get to aesthetics and opinions. You seem to find it inelegant to be forced to write a copy-constructor when you don't have any raw pointers or arrays in your class that must be manually managed.
But user-defined copy constructors are elegant; allowing you to write them is C++'s elegant solution to the problem of writing correct non-trivial classes.
Admittedly, this seems like a case where the "rule of 3" doesn't quite apply, since there's a clear need to either =delete the copy-constructor or write it yourself, but there's no clear need (yet) for a user-defined destructor. But again, you can't simply program based on rules of thumb and expect everything to work correctly, especially in a low-level language such as C++; you must be aware of the details of (1) what you actually want and (2) how that can be achieved.
So, given that the coupling between your std::set and your std::vector actually creates a non-trivial problem, solving the problem by wrapping them together in a class that correctly implements (or simply deletes) the copy-constructor is actually a very elegant (and idiomatic) solution.
Explicitly defining versus deleting
You mention a potential new "rule of thumb" to follow in your coding practices: "Disable copying by default on all classes I write, unless I can explicitly prove they are correct." While this might be a safer rule of thumb (at least in this case) than the "rule of 3" (especially when your criterion for "do I need to implement the 3" is to check whether a deleter is required), my above caution against relying on rules of thumb still applies.
But I think the solution here is actually simpler than the proposed rule of thumb. You don't need to formally prove the correctness of the default method; you simply need to have a basic idea of what it would do, and what you need it to do.
Above, in my analysis of your particular case, I went into a lot of detail--for instance, I brought up the possibility of "deep-copying iterators". You don't need to go into this much detail to determine whether or not the default copy-constructor will work correctly. Instead, simply imagine what your manually-created copy constructor will look like; you should be able to tell pretty quickly how similar your imaginary explicitly-defined constructor is to the one the compiler would generate.
For example, a class Foo containing a single vector data will have a copy constructor that looks like this:
Foo::Foo(const Foo& rhs)
: data{rhs.data}
{}
Without even writing that out, you know that you can rely on the implicitly-generated one, because it's exactly the same as what you'd have written above.
Now, consider the constructor for your class Foo:
Foo::Foo(const Foo& rhs)
: set{rhs.set}
, vector{ /* somehow use both rhs.set AND rhs.vector */ } // ...????
{}
Right away, given that simply copying vector's members won't work, you can tell that the default constructor won't work. So now you need to decide whether your class needs to be copyable or not.

Return Value Optimization and private copy constructors

I've written a simple linked list because a recent interview programming challenge showed me how rusty my C++ has gotten. On my list I declared a private copy constructor because I wanted to explicitly avoid making any copies (and of course, laziness). I ran in to some trouble when I wanted to return an object by value that owns one of my lists.
class Foo
{
MyList<int> list; // MyList has private copy constructor
public:
Foo() {};
};
class Bar
{
public:
Bar() {};
Foo getFoo()
{
return Foo();
}
};
I get a compiler error saying that MyList has a private copy constructor when I try to return a Foo object by value. Should Return-Value-Optimization negate the need for any copying? Am I required to write a copy constructor? I'd never heard of move constructors until I started looking for solutions to this problem, is that the best solution? If so, I'll have to read up on them. If not, what is the preferred way to solve this problem?
The standard explicitly states that the constructor still needs to be accessible, even if it is optimized away. See 12.8/32 in a recent draft.
I prefer making an object movable and non-copyable in such situations. It makes ownership very clear and explicit.
Otherwise, your users can always use a shared_ptr. Hiding shared ownership is at best a questionable idea (unless you can guarantee all your values are immutable).
The basic problem is that return by value might copy. The C++ implementation is not required by the standard to apply copy-elision where it does apply. That's why the object still has to be copyable: so that the implementation's decision when to use it doesn't affect whether the code is well-formed.
Anyway, it doesn't necessarily apply to every copy that the user might like it to. For example there is no elision of copy assignment.
I think your options are:
implement a proper copy. If someone ends up with a slow program due to copying it then their profiler will tell them, you don't have to make it your job to stop them if you don't want to.
implement a proper move, but no copy (C++11 only).
change getFoo to take a Foo& (or maybe Foo*) parameter, and avoid a copy by somehow mutating their object. An efficient swap would come in handy for that. This is fairly pointless if getFoo really returns a default-constructed Foo as in your example, since the caller needs to construct a Foo before they call getFoo.
return a dynamically-allocated Foo wrapped in a smart pointer: either auto_ptr or unique_ptr. Functions defined to create an object and transfer sole ownership to their caller should not return shared_ptr since it has no release() function.
provide a copy constructor but make it blow up somehow (fail to link, abort, throw an exception) if it's ever used. The problems with this are (1) it's doomed to fail but the compiler says nothing, (2) you're enforcing quality of implementation, so your class doesn't work if someone deliberately disables RVO for whatever reason.
I may have missed some.
The solution would be implementing your own copy constructor that would use other methods of MyList to implement the copy semantics.
... I wanted to explicitly avoid making any copies
You have to choose. Either you can't make copies of an object, like std::istream; then you have to hold such objects in pointers/references, since these can be copied (in C++11, you can use move semantics instead). Or you implement the copy constructor, which is probably easier then solving problems on each place a copy is needed.

Does D have something akin to C++0x's move semantics?

A problem of "value types" with external resources (like std::vector<T> or std::string) is that copying them tends to be quite expensive, and copies are created implicitly in various contexts, so this tends to be a performance concern. C++0x's answer to this problem is move semantics, which is conceptionally based on the idea of resource pilfering and technically powered by rvalue references.
Does D have anything similar to move semantics or rvalue references?
I believe that there are several places in D (such as returning structs) that D manages to make them moves whereas C++ would make them a copy. IIRC, the compiler will do a move rather than a copy in any case where it can determine that a copy isn't needed, so struct copying is going to happen less in D than in C++. And of course, since classes are references, they don't have the problem at all.
But regardless, copy construction already works differently in D than in C++. Generally, instead of declaring a copy constructor, you declare a postblit constructor: this(this). It does a full memcpy before this(this) is called, and you only make whatever changes are necessary to ensure that the new struct is separate from the original (such as doing a deep copy of member variables where needed), as opposed to creating an entirely new constructor that must copy everything. So, the general approach is already a bit different from C++. It's also generally agreed upon that structs should not have expensive postblit constructors - copying structs should be cheap - so it's less of an issue than it would be in C++. Objects which would be expensive to copy are generally either classes or structs with reference or COW semantics.
Containers are generally reference types (in Phobos, they're structs rather than classes, since they don't need polymorphism, but copying them does not copy their contents, so they're still reference types), so copying them around is not expensive like it would be in C++.
There may very well be cases in D where it could use something similar to a move constructor, but in general, D has been designed in such a way as to reduce the problems that C++ has with copying objects around, so it's nowhere near the problem that it is in C++.
I think all answers completely failed to answer the original question.
First, as stated above, the question is only relevant for structs. Classes have no meaningful move. Also stated above, for structs, a certain amount of move will happen automatically by the compiler under certain conditions.
If you wish to get control over the move operations, here's what you have to do. You can disable copying by annotating this(this) with #disable. Next, you can override C++'s constructor(constructor &&that) by defining this(Struct that). Likewise, you can override the assign with opAssign(Struct that). In both cases, you need to make sure that you destroy the values of that.
For assignment, since you also need to destroy the old value of this, the simplest way is to swap them. An implementation of C++'s unique_ptr would, therefore, look something like this:
struct UniquePtr(T) {
private T* ptr = null;
#disable this(this); // This disables both copy construction and opAssign
// The obvious constructor, destructor and accessor
this(T* ptr) {
if(ptr !is null)
this.ptr = ptr;
}
~this() {
freeMemory(ptr);
}
inout(T)* get() inout {
return ptr;
}
// Move operations
this(UniquePtr!T that) {
this.ptr = that.ptr;
that.ptr = null;
}
ref UniquePtr!T opAssign(UniquePtr!T that) { // Notice no "ref" on "that"
swap(this.ptr, that.ptr); // We change it anyways, because it's a temporary
return this;
}
}
Edit:
Notice I did not define opAssign(ref UniquePtr!T that). That is the copy assignment operator, and if you try to define it, the compiler will error out because you declared, in the #disable line, that you have no such thing.
D have separate value and object semantics :
if you declare your type as struct, it will have value semantic by default
if you declare your type as class, it will have object semantic.
Now, assuming you don't manage the memory yourself, as it's the default case in D - using a garbage collector - you have to understand that object of types declared as class are automatically pointers (or "reference" if you prefer) to the real object, not the real object itself.
So, when passing vectors around in D, what you pass is the reference/pointer. Automatically. No copy involved (other than the copy of the reference).
That's why D, C#, Java and other language don't "need" moving semantic (as most types are object semantic and are manipulated by reference, not by copy).
Maybe they could implement it, I'm not sure. But would they really get performance boost as in C++? By nature, it don't seem likely.
I somehow have the feeling that actually the rvalue references and the whole concept of "move semantics" is a consequence that it's normal in C++ to create local, "temporary" stack objects. In D and most GC languages, it's most common to have objects on the heap, and then there's no overhead with having a temporary object copied (or moved) several times when returning it through a call stack - so there's no need for a mechanism to avoid that overhead too.
In D (and most GC languages) a class object is never copied implicitly and you're only passing the reference around most of the time, so this may mean that you don't need any rvalue references for them.
OTOH, struct objects are NOT supposed to be "handles to resources", but simple value types behaving similar to builtin types - so again, no reason for any move semantics here, IMHO.
This would yield a conclusion - D doesn't have rvalue refs because it doesn't need them.
However, I haven't used rvalue references in practice, I've only had a read on them, so I might have skipped some actual use cases of this feature. Please treat this post as a bunch of thoughts on the matter which hopefully would be helpful for you, not as a reliable judgement.
I think if you need the source to loose the resource you might be in trouble. However being GC'ed you can often avoid needing to worry about multiple owners so it might not be an issue for most cases.