A problem of "value types" with external resources (like std::vector<T> or std::string) is that copying them tends to be quite expensive, and copies are created implicitly in various contexts, so this tends to be a performance concern. C++0x's answer to this problem is move semantics, which is conceptionally based on the idea of resource pilfering and technically powered by rvalue references.
Does D have anything similar to move semantics or rvalue references?
I believe that there are several places in D (such as returning structs) that D manages to make them moves whereas C++ would make them a copy. IIRC, the compiler will do a move rather than a copy in any case where it can determine that a copy isn't needed, so struct copying is going to happen less in D than in C++. And of course, since classes are references, they don't have the problem at all.
But regardless, copy construction already works differently in D than in C++. Generally, instead of declaring a copy constructor, you declare a postblit constructor: this(this). It does a full memcpy before this(this) is called, and you only make whatever changes are necessary to ensure that the new struct is separate from the original (such as doing a deep copy of member variables where needed), as opposed to creating an entirely new constructor that must copy everything. So, the general approach is already a bit different from C++. It's also generally agreed upon that structs should not have expensive postblit constructors - copying structs should be cheap - so it's less of an issue than it would be in C++. Objects which would be expensive to copy are generally either classes or structs with reference or COW semantics.
Containers are generally reference types (in Phobos, they're structs rather than classes, since they don't need polymorphism, but copying them does not copy their contents, so they're still reference types), so copying them around is not expensive like it would be in C++.
There may very well be cases in D where it could use something similar to a move constructor, but in general, D has been designed in such a way as to reduce the problems that C++ has with copying objects around, so it's nowhere near the problem that it is in C++.
I think all answers completely failed to answer the original question.
First, as stated above, the question is only relevant for structs. Classes have no meaningful move. Also stated above, for structs, a certain amount of move will happen automatically by the compiler under certain conditions.
If you wish to get control over the move operations, here's what you have to do. You can disable copying by annotating this(this) with #disable. Next, you can override C++'s constructor(constructor &&that) by defining this(Struct that). Likewise, you can override the assign with opAssign(Struct that). In both cases, you need to make sure that you destroy the values of that.
For assignment, since you also need to destroy the old value of this, the simplest way is to swap them. An implementation of C++'s unique_ptr would, therefore, look something like this:
struct UniquePtr(T) {
private T* ptr = null;
#disable this(this); // This disables both copy construction and opAssign
// The obvious constructor, destructor and accessor
this(T* ptr) {
if(ptr !is null)
this.ptr = ptr;
}
~this() {
freeMemory(ptr);
}
inout(T)* get() inout {
return ptr;
}
// Move operations
this(UniquePtr!T that) {
this.ptr = that.ptr;
that.ptr = null;
}
ref UniquePtr!T opAssign(UniquePtr!T that) { // Notice no "ref" on "that"
swap(this.ptr, that.ptr); // We change it anyways, because it's a temporary
return this;
}
}
Edit:
Notice I did not define opAssign(ref UniquePtr!T that). That is the copy assignment operator, and if you try to define it, the compiler will error out because you declared, in the #disable line, that you have no such thing.
D have separate value and object semantics :
if you declare your type as struct, it will have value semantic by default
if you declare your type as class, it will have object semantic.
Now, assuming you don't manage the memory yourself, as it's the default case in D - using a garbage collector - you have to understand that object of types declared as class are automatically pointers (or "reference" if you prefer) to the real object, not the real object itself.
So, when passing vectors around in D, what you pass is the reference/pointer. Automatically. No copy involved (other than the copy of the reference).
That's why D, C#, Java and other language don't "need" moving semantic (as most types are object semantic and are manipulated by reference, not by copy).
Maybe they could implement it, I'm not sure. But would they really get performance boost as in C++? By nature, it don't seem likely.
I somehow have the feeling that actually the rvalue references and the whole concept of "move semantics" is a consequence that it's normal in C++ to create local, "temporary" stack objects. In D and most GC languages, it's most common to have objects on the heap, and then there's no overhead with having a temporary object copied (or moved) several times when returning it through a call stack - so there's no need for a mechanism to avoid that overhead too.
In D (and most GC languages) a class object is never copied implicitly and you're only passing the reference around most of the time, so this may mean that you don't need any rvalue references for them.
OTOH, struct objects are NOT supposed to be "handles to resources", but simple value types behaving similar to builtin types - so again, no reason for any move semantics here, IMHO.
This would yield a conclusion - D doesn't have rvalue refs because it doesn't need them.
However, I haven't used rvalue references in practice, I've only had a read on them, so I might have skipped some actual use cases of this feature. Please treat this post as a bunch of thoughts on the matter which hopefully would be helpful for you, not as a reliable judgement.
I think if you need the source to loose the resource you might be in trouble. However being GC'ed you can often avoid needing to worry about multiple owners so it might not be an issue for most cases.
Related
I have been searching for this matter on SO and other sources but I couldn't wrap my head around this issue. Using resouces of rvalues and xvalues somewhat new to C++ (with C++11).
Now, do we - C programmers - miss something here? Or there is a corresponding technique in C to benefit from these resource efficiency?
EDIT: This quesiton is not opinion based whatsoever. I just couldn't describe my question. What I am asking is that whether or not there is a corresponding technique in c.
Of course, there is a similar technique in C. We have been doing "move semantics" in C for ages.
Firstly, "move semantics" in C++ is based on a bunch of overload resolution rules that describe how functions with rvalue reference parameters behave during overload resolution. Since C does not support function overloading, this specific matter is not applicable to C. You can still implement move semantics in C manually, by writing dedicated data-moving functions with dedicated names and explicitly calling them when you want to move the data instead of copying it. E.g, for your own data type struct HeavyStruct you can write both a copy_heavy_struct(dst, src) and move_heavy_struct(dst, src) functions with appropriate implementations. You'll just have to manually choose the most appropriate/efficient one to call in each case.
Secondly, the primary purpose of implicit move semantics in C++ is to serve as an alternative to implicit deep-copy semantics in contexts where deep copying is unnecessarily inefficient. Since C does not have implicit deep-copy semantics, the problem does not even arise in C. C always performs shallow copying, which is already pretty similar to move semantics. Basically, you can think of C as an always-move language. It just needs a bit of manual tweaking to bring its move semantics to perfection.
Of course, it is probably impossible to literally reproduce all features of C++ move semantics, since, for example, it is impossible to bind a C pointer to an rvalue. But virtually everything can be "emulated". It just requires a bit more work to be done explicitly/manually.
I don't believe it's move semantics that C is missing. It's all the C++ functionality leading up to move semantics that is "missing" in C. Since you can't do automatic struct copies that call functions to allocate memory, you don't have a system for automatically copy complex and expensive data structures.
Of course, that's the intention. C is a more light-weight language than C++, so the complexity of creating custom copy and assignment constructors is not meant to be part of the language - you just write code to do what needs to be done as functions. If you want "deep copy", then you write something that walks your data structure and allocates memory, etc. If you want shallow copy, you write something that copies the pointers in the data structure to the other one (and perhaps setting the source ones to NULL) - just like a move semantics constructor does.
And of course, you only need L and R value in C (it is either on the left or the right of an = sign), there are no references, and clearly no R value references. This is achieved in C by using address of (turning things into pointers).
So it's not really move semantics that C is missing, it's the complex constructors and assignment operators (etc) that comes with the design of the C++ language that makes move semantics a useful thing in that language. As usual, languages evolve based on their features. If you don't have feature A, and feature B depends on feature A being present, you don't "need" feature B.
Of course, aside from exception handling and const references [and consequently R value references in C++11, which is esentially a const reference that you are allowed to modify], I don't think there is any major feature in C++ that can't be implemented through C. It's just a bit awkward and messy at times (and will not be as pretty syntactically, and the compiler will not give you neat error messages when you override functions the wrong way, you'll need to manually cast pointers, etc, etc). [After stating something like this, someone will point out that "you obviously didn't think of X", but the overall statement is still correct - C can do 99.9% of what you would want to do in C]
No. You have to roll-your-own but like other features of C++ (e.g. polymorphism) you can effect the same semantics but with more coding:
#include<stdlib.h>
typedef struct {
size_t cap;
size_t len;
int* data;
} vector ;
int create_vector(vector *vec,size_t init_cap){
vec->data=malloc(sizeof(int)*init_cap);
if(vec->data==NULL){
return 1;
}
vec->cap=init_cap;
vec->len=0;
return 0;
}
void move_vector(vector* to,vector* from){
//This effects a move...
to->cap=from->cap;
to->len=from->len;
free(to->data);
to->data=from->data;//This is where the move explicitly takes place.
//Can't call destroy_vec() but need to make the object 'safe' to destroy.
from->data=NULL;
from->cap=0;
from->len=0;
}
void destroy_vec(vector *vec){
free(vec->data);
vec->data=NULL;
vec->cap=0;
vec->len=0;
}
Notice how in the move_vector() the data is (well…) moved from one vector to another.
The idea of handing resources between objects is common in C and ultimately amounts to 'move semantics'. C++ just formalised that, cleaned it up and incorporated it in overloading.
You may well even have done it yourself and don't realise because you didn't have a name for it. Anywhere where the 'owner' of a resource is changed can be interpreted as 'move semantics'.
C doesn't have a direct equivalent to move semantics, but the problems that move semantics solve in c++ are much less common in c:
As c also doesn't have copy constructors / assignment operators, copies are by default shallow, whereas in c++ common practice is to implement them as deep copy operations or prevent them in the first place.
C also doesn't have destructors and the RAII pattern, so transferring ownership of a resource comes up less frequently.
The C equivalent to C++ move semantics would be to pass a struct by value, and then to proceed with throwing away the original object without destructing it, relying on the destruction of the copy to be correct.
However, this is very error prone in C, so it's generally avoided. The closest to move semantics that we actually do in C, is when we call realloc() on an array of structs, relying on the bitwise copy to be equivalent to the original. Again, the original is neither destructed nor ever used again.
The difference between the C style copy and C++ move semantics is, that move semantics modify the original, so that its destructor may safely be invoked. With the C bitwise copy approach, we just forget about the contents of the original and don't call a destructor on it.
These more strict semantics make C++ move semantics much easier and safer to use than the C style copy and forget. The only drawback of C++ move semantics is, that it's slightly slower than the C style copy and forget approach: Move semantics copy by element rather than bitwise, then proceed to modify the original, so that the destructor becomes a semantical noop (nevertheless, it's still called). C style copy and forget replace all this by a simple memcpy().
I used to think C++'s object model is very robust when best practices are followed.
Just a few minutes ago, though, I had a realization that I hadn't had before.
Consider this code:
class Foo
{
std::set<size_t> set;
std::vector<std::set<size_t>::iterator> vector;
// ...
// (assume every method ensures p always points to a valid element of s)
};
I have written code like this. And until today, I hadn't seen a problem with it.
But, thinking about it a more, I realized that this class is very broken:
Its copy-constructor and copy-assignment copy the iterators inside the vector, which implies that they will still point to the old set! The new one isn't a true copy after all!
In other words, I must manually implement the copy-constructor even though this class isn't managing any resources (no RAII)!
This strikes me as astonishing. I've never come across this issue before, and I don't know of any elegant way to solve it. Thinking about it a bit more, it seems to me that copy construction is unsafe by default -- in fact, it seems to me that classes should not be copyable by default, because any kind of coupling between their instance variables risks rendering the default copy-constructor invalid.
Are iterators fundamentally unsafe to store? Or, should classes really be non-copyable by default?
The solutions I can think of, below, are all undesirable, as they don't let me take advantage of the automatically-generated copy constructor:
Manually implement a copy constructor for every nontrivial class I write. This is not only error-prone, but also painful to write for a complicated class.
Never store iterators as member variables. This seems severely limiting.
Disable copying by default on all classes I write, unless I can explicitly prove they are correct. This seems to run entirely against C++'s design, which is for most types to have value semantics, and thus be copyable.
Is this a well-known problem, and if so, does it have an elegant/idiomatic solution?
C++ copy/move ctor/assign are safe for regular value types. Regular value types behave like integers or other "regular" values.
They are also safe for pointer semantic types, so long as the operation does not change what the pointer "should" point to. Pointing to something "within yourself", or another member, is an example of where it fails.
They are somewhat safe for reference semantic types, but mixing pointer/reference/value semantics in the same class tends to be unsafe/buggy/dangerous in practice.
The rule of zero is that you make classes that behave like either regular value types, or pointer semantic types that don't need to be reseated on copy/move. Then you don't have to write copy/move ctors.
Iterators follow pointer semantics.
The idiomatic/elegant around this is to tightly couple the iterator container with the pointed-into container, and block or write the copy ctor there. They aren't really separate things once one contains pointers into the other.
Yes, this is a well known "problem" -- whenever you store pointers in an object, you're probably going to need some kind of custom copy constructor and assignment operator to ensure that the pointers are all valid and point at the expected things.
Since iterators are just an abstraction of collection element pointers, they have the same issue.
Is this a well-known problem?
Well, it is known, but I would not say well-known. Sibling pointers do not occur often, and most implementations I have seen in the wild were broken in the exact same way than yours is.
I believe the problem to be infrequent enough to have escaped most people's notice; interestingly, as I follow more Rust than C++ nowadays, it crops up there quite often because of the strictness of the type system (ie, the compiler refuses those programs, prompting questions).
does it have an elegant/idiomatic solution?
There are many types of sibling pointers situations, so it really depends, however I know of two generic solutions:
keys
shared elements
Let's review them in order.
Pointing to a class-member, or pointing into an indexable container, then one can use an offset or key rather than an iterator. It is slightly less efficient (and might require a look-up) however it is a fairly simple strategy. I have seen it used to great effect in shared-memory situation (where using pointers is a no-no since the shared-memory area may be mapped at different addresses).
The other solution is used by Boost.MultiIndex, and consists in an alternative memory layout. It stems from the principle of the intrusive container: instead of putting the element into the container (moving it in memory), an intrusive container uses hooks already inside the element to wire it at the right place. Starting from there, it is easy enough to use different hooks to wire a single elements into multiple containers, right?
Well, Boost.MultiIndex kicks it two steps further:
It uses the traditional container interface (ie, move your object in), but the node to which the object is moved in is an element with multiple hooks
It uses various hooks/containers in a single entity
You can check various examples and notably Example 5: Sequenced Indices looks a lot like your own code.
Is this a well-known problem
Yes. Any time you have a class that contains pointers, or pointer-like data like an iterator, you have to implement your own copy-constructor and assignment-operator to ensure the new object has valid pointers/iterators.
and if so, does it have an elegant/idiomatic solution?
Maybe not as elegant as you might like, and probably is not the best in performance (but then, copies sometimes are not, which is why C++11 added move semantics), but maybe something like this would work for you (assuming the std::vector contains iterators into the std::set of the same parent object):
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
struct findAndPushIterator
{
Foo &foo;
findAndPushIterator(Foo &f) : foo(f) {}
void operator()(const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = foo.s.find(*iter);
if (found != foo.s.end())
foo.v.push_back(found);
}
};
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(), findAndPushIterator(*this));
return *this;
}
//...
};
Or, if using C++11:
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(),
[this](const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = s.find(*iter);
if (found != s.end())
v.push_back(found);
}
);
return *this;
}
//...
};
Yes, of course it's a well-known problem.
If your class stored pointers, as an experienced developer you would intuitively know that the default copy behaviours may not be sufficient for that class.
Your class stores iterators and, since they are also "handles" to data stored elsewhere, the same logic applies.
This is hardly "astonishing".
The assertion that Foo is not managing any resources is false.
Copy-constructor aside, if a element of set is removed, there must be code in Foo that manages vector so that the respective iterator is removed.
I think the idiomatic solution is to just use one container, a vector<size_t>, and check that the count of an element is zero before inserting. Then the copy and move defaults are fine.
"Inherently unsafe"
No, the features you mention are not inherently unsafe; the fact that you thought of three possible safe solutions to the problem is evidence that there is no "inherent" lack of safety here, even though you think the solutions are undesirable.
And yes, there is RAII here: the containers (set and vector) are managing resources. I think your point is that the RAII is "already taken care of" by the std containers. But you need to then consider the container instances themselves to be "resources", and in fact your class is managing them. You're correct that you're not directly managing heap memory, because this aspect of the management problem is taken care of for you by the standard library. But there's more to the management problem, which I'll talk a bit more about below.
"Magic" default behavior
The problem is that you are apparently hoping that you can trust the default copy constructor to "do the right thing" in a non-trivial case such as this. I'm not sure why you expected the right behavior--perhaps you're hoping that memorizing rules-of-thumb such as the "rule of 3" will be a robust way to ensure that you don't shoot yourself in the foot? Certainly that would be nice (and, as pointed out in another answer, Rust goes much further than other low-level languages toward making foot-shooting much harder), but C++ simply isn't designed for "thoughtless" class design of that sort, nor should it be.
Conceptualizing constructor behavior
I'm not going to try to address the question of whether this is a "well-known problem", because I don't really know how well-characterized the problem of "sister" data and iterator-storing is. But I hope that I can convince you that, if you take the time to think about copy-constructor-behavior for every class you write that can be copied, this shouldn't be a surprising problem.
In particular, when deciding to use the default copy-constructor, you must think about what the default copy-constructor will actually do: namely, it will call the copy-constructor of each non-primitive, non-union member (i.e. members that have copy-constructors) and bitwise-copy the rest.
When copying your vector of iterators, what does std::vector's copy-constructor do? It performs a "deep copy", i.e., the data inside the vector is copied. Now, if the vector contains iterators, how does that affect the situation? Well, it's simple: the iterators are the data stored by the vector, so the iterators themselves will be copied. What does an iterator's copy-constructor do? I'm not going to actually look this up, because I don't need to know the specifics: I just need to know that iterators are like pointers in this (and other respect), and copying a pointer just copies the pointer itself, not the data pointed to. I.e., iterators and pointers do not have deep-copying by default.
Note that this is not surprising: of course iterators don't do deep-copying by default. If they did, you'd get a different, new set for each iterator being copied. And this makes even less sense than it initially appears: for instance, what would it actually mean if uni-directional iterators made deep-copies of their data? Presumably you'd get a partial copy, i.e., all the remaining data that's still "in front of" the iterator's current position, plus a new iterator pointing to the "front" of the new data structure.
Now consider that there is no way for a copy-constructor to know the context in which it's being called. For instance, consider the following code:
using iter = std::set<size_t>::iterator; // use typedef pre-C++11
std::vector<iter> foo = getIters(); // get a vector of iterators
useIters(foo); // pass vector by value
When getIters is called, the return value might be moved, but it might also be copy-constructed. The assignment to foo also invokes a copy-constructor, though this may also be elided. And unless useIters takes its argument by reference, then you've also got a copy constructor call there.
In any of these cases, would you expect the copy constructor to change which std::set is pointed to by the iterators contained by the std::vector<iter>? Of course not! So naturally std::vector's copy-constructor can't be designed to modify the iterators in that particular way, and in fact std::vector's copy-constructor is exactly what you need in most cases where it will actually be used.
However, suppose std::vector could work like this: suppose it had a special overload for "vector-of-iterators" that could re-seat the iterators, and that the compiler could somehow be "told" only to invoke this special constructor when the iterators actually need to be re-seated. (Note that the solution of "only invoke the special overload when generating a default constructor for a containing class that also contains an instance of the iterators' underlying data type" wouldn't work; what if the std::vector iterators in your case were pointing at a different standard set, and were being treated simply as a reference to data managed by some other class? Heck, how is the compiler supposed to know whether the iterators all point to the same std::set?) Ignoring this problem of how the compiler would know when to invoke this special constructor, what would the constructor code look like? Let's try it, using _Ctnr<T>::iterator as our iterator type (I'll use C++11/14isms and be a bit sloppy, but the overall point should be clear):
template <typename T, typename _Ctnr>
std::vector< _Ctnr<T>::iterator> (const std::vector< _Ctnr<T>::iterator>& rhs)
: _data{ /* ... */ } // initialize underlying data...
{
for (auto i& : rhs)
{
_data.emplace_back( /* ... */ ); // What do we put here?
}
}
Okay, so we want each new, copied iterator to be re-seated to refer to a different instance of _Ctnr<T>. But where would this information come from? Note that the copy-constructor can't take the new _Ctnr<T> as an argument: then it would no longer be a copy-constructor. And in any case, how would the compiler know which _Ctnr<T> to provide? (Note, too, that for many containers, finding the "corresponding iterator" for the new container may be non-trivial.)
Resource management with std:: containers
This isn't just an issue of the compiler not being as "smart" as it could or should be. This is an instance where you, the programmer, have a specific design in mind that requires a specific solution. In particular, as mentioned above, you have two resources, both std:: containers. And you have a relationship between them. Here we get to something that most of the other answers have stated, and which by this point should be very, very clear: related class members require special care, since C++ does not manage this coupling by default. But what I hope is also clear by this point is that you shouldn't think of the problem as arising specifically because of data-member coupling; the problem is simply that default-construction isn't magic, and the programmer must be aware of the requirements for correctly copying a class before deciding to let the implicitly-generated constructor handle copying.
The elegant solution
...And now we get to aesthetics and opinions. You seem to find it inelegant to be forced to write a copy-constructor when you don't have any raw pointers or arrays in your class that must be manually managed.
But user-defined copy constructors are elegant; allowing you to write them is C++'s elegant solution to the problem of writing correct non-trivial classes.
Admittedly, this seems like a case where the "rule of 3" doesn't quite apply, since there's a clear need to either =delete the copy-constructor or write it yourself, but there's no clear need (yet) for a user-defined destructor. But again, you can't simply program based on rules of thumb and expect everything to work correctly, especially in a low-level language such as C++; you must be aware of the details of (1) what you actually want and (2) how that can be achieved.
So, given that the coupling between your std::set and your std::vector actually creates a non-trivial problem, solving the problem by wrapping them together in a class that correctly implements (or simply deletes) the copy-constructor is actually a very elegant (and idiomatic) solution.
Explicitly defining versus deleting
You mention a potential new "rule of thumb" to follow in your coding practices: "Disable copying by default on all classes I write, unless I can explicitly prove they are correct." While this might be a safer rule of thumb (at least in this case) than the "rule of 3" (especially when your criterion for "do I need to implement the 3" is to check whether a deleter is required), my above caution against relying on rules of thumb still applies.
But I think the solution here is actually simpler than the proposed rule of thumb. You don't need to formally prove the correctness of the default method; you simply need to have a basic idea of what it would do, and what you need it to do.
Above, in my analysis of your particular case, I went into a lot of detail--for instance, I brought up the possibility of "deep-copying iterators". You don't need to go into this much detail to determine whether or not the default copy-constructor will work correctly. Instead, simply imagine what your manually-created copy constructor will look like; you should be able to tell pretty quickly how similar your imaginary explicitly-defined constructor is to the one the compiler would generate.
For example, a class Foo containing a single vector data will have a copy constructor that looks like this:
Foo::Foo(const Foo& rhs)
: data{rhs.data}
{}
Without even writing that out, you know that you can rely on the implicitly-generated one, because it's exactly the same as what you'd have written above.
Now, consider the constructor for your class Foo:
Foo::Foo(const Foo& rhs)
: set{rhs.set}
, vector{ /* somehow use both rhs.set AND rhs.vector */ } // ...????
{}
Right away, given that simply copying vector's members won't work, you can tell that the default constructor won't work. So now you need to decide whether your class needs to be copyable or not.
According to the Google style guidelines, "Few classes need to be copyable. Most should have neither a copy constructor nor an assignment operator."
They recommend you make a class uncopyable (that is, not giving it a copy constructor or assignment operator), and instead recommending passing by reference or pointer in most situations, or using clone() methods which cannot be invoked implicitly.
However, I've heard some arguments against this:
Accessing a reference is (usually) slower than accessing a value.
In some computations, I might want to leave the original object the way it is and just return the changed object.
I might want to store the value of a computation as a local object in a function and return it, which I couldn't do if I returned it by reference.
If a class is small enough, passing by reference is slower.
What are the positives/negatives of following this guideline? Is there any standard "rule of thumb" for making classes uncopyable? What should I consider when creating new classes?
I have two issues with their advice:
It doesn't apply to modern C++, ignoring move constructors/assignment operators, and so assumes that taking objects by value (which would have copied before) is often inefficient.
It doesn't trust the programmer to do the right thing and design their code appropriately. Instead it limits the programmer until they're forced to break the rule.
Whether your class should be copyable, moveable, both or neither should be a design decision based on the uses of the class itself. For example, a std::unique_ptr is a great example of a class that should only be moveable because copying it would invalidate its entire purpose. When you design a class, ask yourself if it makes sense to copy it. Most of the time the answer will be yes.
The advice seems to be based on the belief that programmers default to passing objects around by value which can be expensive when the objects are complex enough. This is just not true any more. You should default to passing objects around by value when you need a copy of the object, and there's no reason to be scared of this - in many cases, the move constructor will be used instead, which is almost always a constant time operation.
Again, the choice of how you should pass objects around is a design decision that should be influenced by a number of factors, such as:
Am I going to need a copy of this object?
Do I need to modify this object?
What is the lifetime of the object?
Is the object optional?
These questions should be asked with every type you write (parameter, return value, variable, whatever). You should find plenty of uses for passing objects by value that don't lead to poor performance due to copying.
If you follow good C++ programming practices, your copy constructors will be bug free, so that shouldn't be a concern. In fact, many classes can get away with just the defaulted copy/move constructors. If a class owns dynamically allocated resources and you use smart pointers appropriately, implementing the copy constructor is often as simple as copying the objects from the pointers - not much room for bugs.
Of course, this advice from Google is for people working on their code to ensure consistency throughout their codebase. That's fine. I don't recommend blindly adopting it in its entirety for a modern C++ project, however.
I am basically trying to figure out, is the whole "move semantics" concept something brand new, or it is just making existing code simpler to implement? I am always interested in reducing the number of times I call copy/constructors but I usually pass objects through using reference (and possibly const) and ensure I always use initialiser lists. With this in mind (and having looked at the whole ugly && syntax) I wonder if it is worth adopting these principles or simply coding as I already do? Is anything new being done here, or is it just "easier" syntactic sugar for what I already do?
TL;DR
This is definitely something new and it goes well beyond just being a way to avoid copying memory.
Long Answer: Why it's new and some perhaps non-obvious implications
Move semantics are just what the name implies--that is, a way to explicitly declare instructions for moving objects rather than copying. In addition to the obvious efficiency benefit, this also affords a programmer a standards-compliant way to have objects that are movable but not copyable. Objects that are movable and not copyable convey a very clear boundary of resource ownership via standard language semantics. This was possible in the past, but there was no standard/unified (or STL-compatible) way to do this.
This is a big deal because having a standard and unified semantic benefits both programmers and compilers. Programmers don't have to spend time potentially introducing bugs into a move routine that can reliably be generated by compilers (most cases); compilers can now make appropriate optimizations because the standard provides a way to inform the compiler when and where you're doing standard moves.
Move semantics is particularly interesting because it very well suits the RAII idiom, which is a long-standing a cornerstone of C++ best practice. RAII encompasses much more than just this example, but my point is that move semantics is now a standard way to concisely express (among other things) movable-but-not-copyable objects.
You don't always have to explicitly define this functionality in order to prevent copying. A compiler feature known as "copy elision" will eliminate quite a lot of unnecessary copies from functions that pass by value.
Criminally-Incomplete Crash Course on RAII (for the uninitiated)
I realize you didn't ask for a code example, but here's a really simple one that might benefit a future reader who might be less familiar with the topic or the relevance of Move Semantics to RAII practices. (If you already understand this, then skip the rest of this answer)
// non-copyable class that manages lifecycle of a resource
// note: non-virtual destructor--probably not an appropriate candidate
// for serving as a base class for objects handled polymorphically.
class res_t {
using handle_t = /* whatever */;
handle_t* handle; // Pointer to owned resource
public:
res_t( const res_t& src ) = delete; // no copy constructor
res_t& operator=( const res_t& src ) = delete; // no copy-assignment
res_t( res_t&& src ) = default; // Move constructor
res_t& operator=( res_t&& src ) = default; // Move-assignment
res_t(); // Default constructor
~res_t(); // Destructor
};
Objects of this class will allocate/provision whatever resource is needed upon construction and then free/release it upon destruction. Since the resource pointed to by the data member can never accidentally be transferred to another object, the rightful owner of a resource is never in doubt. In addition to making your code less prone to abuse or errors (and easily compatible with STL containers), your intentions will be immediately recognized by any programmer familiar with this standard practice.
In the Turing Tar Pit, there is nothing new under the sun. Everything that move semantics does, can be done without move semantics -- it just takes a lot more code, and is a lot more fragile.
What move semantics does is takes a particular common pattern that massively increases efficiency and safety in a number of situations, and embeds it in the language.
It increases efficiency in obvious ways. Moving, be it via swap or move construction, is much faster for many data types than copying. You can create special interfaces to indicate when things can be moved from: but honestly people didn't do that. With move semantics, it becomes relatively easy to do. Compare the cost of moving a std::vector to copying it -- move takes roughly copying 3 pointers, while copying requires a heap allocation, copying every element in the container, and creating 3 pointers.
Even more so, compare reserve on a move-aware std::vector to a copy-only aware one: suppose you have a std::vector of std::vector. In C++03, that was performance suicide if you didn't know the dimensions of every component ahead of time -- in C++11, move semantics makes it as smooth as silk, because it is no longer repeatedly copying the sub-vectors whenever the outer vector resizes.
Move semantics makes every "pImpl pattern" type to have blazing fast performance, while means you can start having complex objects that behave like values instead of having to deal with and manage pointers to them.
On top of these performance gains, and opening up complex-class-as-value, move semantics also open up a whole host of safety measures, and allow doing some things that where not very practical before.
std::unique_ptr is a replacement for std::auto_ptr. They both do roughly the same thing, but std::auto_ptr treated copies as moves. This made std::auto_ptr ridiculously dangerous to use in practice. Meanwhile, std::unique_ptr just works. It represents unique ownership of some resource extremely well, and transfer of ownership can happen easily and smoothly.
You know the problem whereby you take a foo* in an interface, and sometimes it means "this interface is taking ownership of the object" and sometimes it means "this interface just wants to be able to modify this object remotely", and you have to delve into API documentation and sometimes source code to figure out which?
std::unique_ptr actually solves this problem -- interfaces that want to take onwership can now take a std::unique_ptr<foo>, and the transfer of ownership is obvious at both the API level and in the code that calls the interface. std::unique_ptr is an auto_ptr that just works, and has the unsafe portions removed, and replaced with move semantics. And it does all of this with nearly perfect efficiency.
std::unique_ptr is a transferable RAII representation of resource whose value is represented by a pointer.
After you write make_unique<T>(Args&&...), unless you are writing really low level code, it is probably a good idea to never call new directly again. Move semantics basically have made new obsolete.
Other RAII representations are often non-copyable. A port, a print session, an interaction with a physical device -- all of these are resources for whom "copy" doesn't make much sense. Most every one of them can be easily modified to support move semantics, which opens up a whole host of freedom in dealing with these variables.
Move semantics also allows you to put your return values in the return part of a function. The pattern of taking return values by reference (and documenting "this one is out-only, this one is in/out", or failing to do so) can be somewhat replaced by returning your data.
So instead of void fill_vec( std::vector<foo>& ), you have std::vector<foo> get_vec(). This even works with multiple return values -- std::tuple< std::vector<A>, std::set<B>, bool > get_stuff() can be called, and you can load your data into local variables efficiently via std::tie( my_vec, my_set, my_bool ) = get_stuff().
Output parameters can be semantically output-only, with very little overhead (the above, in a worst case, costs 8 pointer and 2 bool copies, regardless of how much data we have in those containers -- and that overhead can be as little as 0 pointer and 0 bool copies with a bit more work), because of move semantics.
There is absolutely something new going on here. Consider unique_ptr which can be moved, but not copied because it uniquely holds ownership of a resource. That ownership can then be transferred by moving it to a new unique_ptr if needed, but copying it would be impossible (as you would then have two references to the owned object).
While many uses of moving may have positive performance implications, the movable-but-not-copyable types are a much bigger functional improvement to the language.
In short, use the new techniques where it indicates the meaning of how your class should be used, or where (significant) performance concerns can be alleviated by movement rather than copy-and-destroy.
No answer is complete without a reference to Thomas Becker's painstakingly exhaustive write up on rvalue references, perfect forwarding, reference collapsing and everything related to that.
see here: http://thbecker.net/articles/rvalue_references/section_01.html
I would say yes because a Move Constructor and Move Assignment operator are now compiler defined for objects that do not define/protect a destructor, copy constructor, or copy assignment.
This means that if you have the following code...
struct intContainer
{
std::vector<int> v;
}
intContainer CreateContainer()
{
intContainer c;
c.v.push_back(3);
return c;
}
The code above would be optimized simply by recompiling with a compiler that supports move semantics. Your container c will have compiler defined move-semantics and thus will call the manually defined move operations for std::vector without any changes to your code.
Since move semantics only apply in the presence of rvalue
references, which are declared by a new token, &&, it seems
very clear that they are something new.
In principle, they are purely an optimizing techique, which
means that:
1. you don't use them until the profiler says it is necessary, and
2. in theory, optimizing is the compiler's job, and move
semantics aren't any more necessary than register.
Concerning 1, we may, in time, end up with an ubiquitous
heuristic as to how to use them: after all, passing an argument
by const reference, rather than by value, is also an
optimization, but the ubiquitous convention is to pass class
types by const reference, and all other types by value.
Concerning 2, compilers just aren't there yet. At least, the
usual ones. The basic principles which could be used to make
move semantics irrelevant are (well?) known, but to date, they
tend to result in unacceptable compile times for real programs.
As a result: if you're writing a low level library, you'll
probably want to consider move semantics from the start.
Otherwise, they're just extra complication, and should be
ignored, until the profiler says otherwise.
Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.
The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.
Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.
One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).
Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.