I have been searching for this matter on SO and other sources but I couldn't wrap my head around this issue. Using resouces of rvalues and xvalues somewhat new to C++ (with C++11).
Now, do we - C programmers - miss something here? Or there is a corresponding technique in C to benefit from these resource efficiency?
EDIT: This quesiton is not opinion based whatsoever. I just couldn't describe my question. What I am asking is that whether or not there is a corresponding technique in c.
Of course, there is a similar technique in C. We have been doing "move semantics" in C for ages.
Firstly, "move semantics" in C++ is based on a bunch of overload resolution rules that describe how functions with rvalue reference parameters behave during overload resolution. Since C does not support function overloading, this specific matter is not applicable to C. You can still implement move semantics in C manually, by writing dedicated data-moving functions with dedicated names and explicitly calling them when you want to move the data instead of copying it. E.g, for your own data type struct HeavyStruct you can write both a copy_heavy_struct(dst, src) and move_heavy_struct(dst, src) functions with appropriate implementations. You'll just have to manually choose the most appropriate/efficient one to call in each case.
Secondly, the primary purpose of implicit move semantics in C++ is to serve as an alternative to implicit deep-copy semantics in contexts where deep copying is unnecessarily inefficient. Since C does not have implicit deep-copy semantics, the problem does not even arise in C. C always performs shallow copying, which is already pretty similar to move semantics. Basically, you can think of C as an always-move language. It just needs a bit of manual tweaking to bring its move semantics to perfection.
Of course, it is probably impossible to literally reproduce all features of C++ move semantics, since, for example, it is impossible to bind a C pointer to an rvalue. But virtually everything can be "emulated". It just requires a bit more work to be done explicitly/manually.
I don't believe it's move semantics that C is missing. It's all the C++ functionality leading up to move semantics that is "missing" in C. Since you can't do automatic struct copies that call functions to allocate memory, you don't have a system for automatically copy complex and expensive data structures.
Of course, that's the intention. C is a more light-weight language than C++, so the complexity of creating custom copy and assignment constructors is not meant to be part of the language - you just write code to do what needs to be done as functions. If you want "deep copy", then you write something that walks your data structure and allocates memory, etc. If you want shallow copy, you write something that copies the pointers in the data structure to the other one (and perhaps setting the source ones to NULL) - just like a move semantics constructor does.
And of course, you only need L and R value in C (it is either on the left or the right of an = sign), there are no references, and clearly no R value references. This is achieved in C by using address of (turning things into pointers).
So it's not really move semantics that C is missing, it's the complex constructors and assignment operators (etc) that comes with the design of the C++ language that makes move semantics a useful thing in that language. As usual, languages evolve based on their features. If you don't have feature A, and feature B depends on feature A being present, you don't "need" feature B.
Of course, aside from exception handling and const references [and consequently R value references in C++11, which is esentially a const reference that you are allowed to modify], I don't think there is any major feature in C++ that can't be implemented through C. It's just a bit awkward and messy at times (and will not be as pretty syntactically, and the compiler will not give you neat error messages when you override functions the wrong way, you'll need to manually cast pointers, etc, etc). [After stating something like this, someone will point out that "you obviously didn't think of X", but the overall statement is still correct - C can do 99.9% of what you would want to do in C]
No. You have to roll-your-own but like other features of C++ (e.g. polymorphism) you can effect the same semantics but with more coding:
#include<stdlib.h>
typedef struct {
size_t cap;
size_t len;
int* data;
} vector ;
int create_vector(vector *vec,size_t init_cap){
vec->data=malloc(sizeof(int)*init_cap);
if(vec->data==NULL){
return 1;
}
vec->cap=init_cap;
vec->len=0;
return 0;
}
void move_vector(vector* to,vector* from){
//This effects a move...
to->cap=from->cap;
to->len=from->len;
free(to->data);
to->data=from->data;//This is where the move explicitly takes place.
//Can't call destroy_vec() but need to make the object 'safe' to destroy.
from->data=NULL;
from->cap=0;
from->len=0;
}
void destroy_vec(vector *vec){
free(vec->data);
vec->data=NULL;
vec->cap=0;
vec->len=0;
}
Notice how in the move_vector() the data is (well…) moved from one vector to another.
The idea of handing resources between objects is common in C and ultimately amounts to 'move semantics'. C++ just formalised that, cleaned it up and incorporated it in overloading.
You may well even have done it yourself and don't realise because you didn't have a name for it. Anywhere where the 'owner' of a resource is changed can be interpreted as 'move semantics'.
C doesn't have a direct equivalent to move semantics, but the problems that move semantics solve in c++ are much less common in c:
As c also doesn't have copy constructors / assignment operators, copies are by default shallow, whereas in c++ common practice is to implement them as deep copy operations or prevent them in the first place.
C also doesn't have destructors and the RAII pattern, so transferring ownership of a resource comes up less frequently.
The C equivalent to C++ move semantics would be to pass a struct by value, and then to proceed with throwing away the original object without destructing it, relying on the destruction of the copy to be correct.
However, this is very error prone in C, so it's generally avoided. The closest to move semantics that we actually do in C, is when we call realloc() on an array of structs, relying on the bitwise copy to be equivalent to the original. Again, the original is neither destructed nor ever used again.
The difference between the C style copy and C++ move semantics is, that move semantics modify the original, so that its destructor may safely be invoked. With the C bitwise copy approach, we just forget about the contents of the original and don't call a destructor on it.
These more strict semantics make C++ move semantics much easier and safer to use than the C style copy and forget. The only drawback of C++ move semantics is, that it's slightly slower than the C style copy and forget approach: Move semantics copy by element rather than bitwise, then proceed to modify the original, so that the destructor becomes a semantical noop (nevertheless, it's still called). C style copy and forget replace all this by a simple memcpy().
Related
I am basically trying to figure out, is the whole "move semantics" concept something brand new, or it is just making existing code simpler to implement? I am always interested in reducing the number of times I call copy/constructors but I usually pass objects through using reference (and possibly const) and ensure I always use initialiser lists. With this in mind (and having looked at the whole ugly && syntax) I wonder if it is worth adopting these principles or simply coding as I already do? Is anything new being done here, or is it just "easier" syntactic sugar for what I already do?
TL;DR
This is definitely something new and it goes well beyond just being a way to avoid copying memory.
Long Answer: Why it's new and some perhaps non-obvious implications
Move semantics are just what the name implies--that is, a way to explicitly declare instructions for moving objects rather than copying. In addition to the obvious efficiency benefit, this also affords a programmer a standards-compliant way to have objects that are movable but not copyable. Objects that are movable and not copyable convey a very clear boundary of resource ownership via standard language semantics. This was possible in the past, but there was no standard/unified (or STL-compatible) way to do this.
This is a big deal because having a standard and unified semantic benefits both programmers and compilers. Programmers don't have to spend time potentially introducing bugs into a move routine that can reliably be generated by compilers (most cases); compilers can now make appropriate optimizations because the standard provides a way to inform the compiler when and where you're doing standard moves.
Move semantics is particularly interesting because it very well suits the RAII idiom, which is a long-standing a cornerstone of C++ best practice. RAII encompasses much more than just this example, but my point is that move semantics is now a standard way to concisely express (among other things) movable-but-not-copyable objects.
You don't always have to explicitly define this functionality in order to prevent copying. A compiler feature known as "copy elision" will eliminate quite a lot of unnecessary copies from functions that pass by value.
Criminally-Incomplete Crash Course on RAII (for the uninitiated)
I realize you didn't ask for a code example, but here's a really simple one that might benefit a future reader who might be less familiar with the topic or the relevance of Move Semantics to RAII practices. (If you already understand this, then skip the rest of this answer)
// non-copyable class that manages lifecycle of a resource
// note: non-virtual destructor--probably not an appropriate candidate
// for serving as a base class for objects handled polymorphically.
class res_t {
using handle_t = /* whatever */;
handle_t* handle; // Pointer to owned resource
public:
res_t( const res_t& src ) = delete; // no copy constructor
res_t& operator=( const res_t& src ) = delete; // no copy-assignment
res_t( res_t&& src ) = default; // Move constructor
res_t& operator=( res_t&& src ) = default; // Move-assignment
res_t(); // Default constructor
~res_t(); // Destructor
};
Objects of this class will allocate/provision whatever resource is needed upon construction and then free/release it upon destruction. Since the resource pointed to by the data member can never accidentally be transferred to another object, the rightful owner of a resource is never in doubt. In addition to making your code less prone to abuse or errors (and easily compatible with STL containers), your intentions will be immediately recognized by any programmer familiar with this standard practice.
In the Turing Tar Pit, there is nothing new under the sun. Everything that move semantics does, can be done without move semantics -- it just takes a lot more code, and is a lot more fragile.
What move semantics does is takes a particular common pattern that massively increases efficiency and safety in a number of situations, and embeds it in the language.
It increases efficiency in obvious ways. Moving, be it via swap or move construction, is much faster for many data types than copying. You can create special interfaces to indicate when things can be moved from: but honestly people didn't do that. With move semantics, it becomes relatively easy to do. Compare the cost of moving a std::vector to copying it -- move takes roughly copying 3 pointers, while copying requires a heap allocation, copying every element in the container, and creating 3 pointers.
Even more so, compare reserve on a move-aware std::vector to a copy-only aware one: suppose you have a std::vector of std::vector. In C++03, that was performance suicide if you didn't know the dimensions of every component ahead of time -- in C++11, move semantics makes it as smooth as silk, because it is no longer repeatedly copying the sub-vectors whenever the outer vector resizes.
Move semantics makes every "pImpl pattern" type to have blazing fast performance, while means you can start having complex objects that behave like values instead of having to deal with and manage pointers to them.
On top of these performance gains, and opening up complex-class-as-value, move semantics also open up a whole host of safety measures, and allow doing some things that where not very practical before.
std::unique_ptr is a replacement for std::auto_ptr. They both do roughly the same thing, but std::auto_ptr treated copies as moves. This made std::auto_ptr ridiculously dangerous to use in practice. Meanwhile, std::unique_ptr just works. It represents unique ownership of some resource extremely well, and transfer of ownership can happen easily and smoothly.
You know the problem whereby you take a foo* in an interface, and sometimes it means "this interface is taking ownership of the object" and sometimes it means "this interface just wants to be able to modify this object remotely", and you have to delve into API documentation and sometimes source code to figure out which?
std::unique_ptr actually solves this problem -- interfaces that want to take onwership can now take a std::unique_ptr<foo>, and the transfer of ownership is obvious at both the API level and in the code that calls the interface. std::unique_ptr is an auto_ptr that just works, and has the unsafe portions removed, and replaced with move semantics. And it does all of this with nearly perfect efficiency.
std::unique_ptr is a transferable RAII representation of resource whose value is represented by a pointer.
After you write make_unique<T>(Args&&...), unless you are writing really low level code, it is probably a good idea to never call new directly again. Move semantics basically have made new obsolete.
Other RAII representations are often non-copyable. A port, a print session, an interaction with a physical device -- all of these are resources for whom "copy" doesn't make much sense. Most every one of them can be easily modified to support move semantics, which opens up a whole host of freedom in dealing with these variables.
Move semantics also allows you to put your return values in the return part of a function. The pattern of taking return values by reference (and documenting "this one is out-only, this one is in/out", or failing to do so) can be somewhat replaced by returning your data.
So instead of void fill_vec( std::vector<foo>& ), you have std::vector<foo> get_vec(). This even works with multiple return values -- std::tuple< std::vector<A>, std::set<B>, bool > get_stuff() can be called, and you can load your data into local variables efficiently via std::tie( my_vec, my_set, my_bool ) = get_stuff().
Output parameters can be semantically output-only, with very little overhead (the above, in a worst case, costs 8 pointer and 2 bool copies, regardless of how much data we have in those containers -- and that overhead can be as little as 0 pointer and 0 bool copies with a bit more work), because of move semantics.
There is absolutely something new going on here. Consider unique_ptr which can be moved, but not copied because it uniquely holds ownership of a resource. That ownership can then be transferred by moving it to a new unique_ptr if needed, but copying it would be impossible (as you would then have two references to the owned object).
While many uses of moving may have positive performance implications, the movable-but-not-copyable types are a much bigger functional improvement to the language.
In short, use the new techniques where it indicates the meaning of how your class should be used, or where (significant) performance concerns can be alleviated by movement rather than copy-and-destroy.
No answer is complete without a reference to Thomas Becker's painstakingly exhaustive write up on rvalue references, perfect forwarding, reference collapsing and everything related to that.
see here: http://thbecker.net/articles/rvalue_references/section_01.html
I would say yes because a Move Constructor and Move Assignment operator are now compiler defined for objects that do not define/protect a destructor, copy constructor, or copy assignment.
This means that if you have the following code...
struct intContainer
{
std::vector<int> v;
}
intContainer CreateContainer()
{
intContainer c;
c.v.push_back(3);
return c;
}
The code above would be optimized simply by recompiling with a compiler that supports move semantics. Your container c will have compiler defined move-semantics and thus will call the manually defined move operations for std::vector without any changes to your code.
Since move semantics only apply in the presence of rvalue
references, which are declared by a new token, &&, it seems
very clear that they are something new.
In principle, they are purely an optimizing techique, which
means that:
1. you don't use them until the profiler says it is necessary, and
2. in theory, optimizing is the compiler's job, and move
semantics aren't any more necessary than register.
Concerning 1, we may, in time, end up with an ubiquitous
heuristic as to how to use them: after all, passing an argument
by const reference, rather than by value, is also an
optimization, but the ubiquitous convention is to pass class
types by const reference, and all other types by value.
Concerning 2, compilers just aren't there yet. At least, the
usual ones. The basic principles which could be used to make
move semantics irrelevant are (well?) known, but to date, they
tend to result in unacceptable compile times for real programs.
As a result: if you're writing a low level library, you'll
probably want to consider move semantics from the start.
Otherwise, they're just extra complication, and should be
ignored, until the profiler says otherwise.
Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.
The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.
Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.
One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).
Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.
A problem of "value types" with external resources (like std::vector<T> or std::string) is that copying them tends to be quite expensive, and copies are created implicitly in various contexts, so this tends to be a performance concern. C++0x's answer to this problem is move semantics, which is conceptionally based on the idea of resource pilfering and technically powered by rvalue references.
Does D have anything similar to move semantics or rvalue references?
I believe that there are several places in D (such as returning structs) that D manages to make them moves whereas C++ would make them a copy. IIRC, the compiler will do a move rather than a copy in any case where it can determine that a copy isn't needed, so struct copying is going to happen less in D than in C++. And of course, since classes are references, they don't have the problem at all.
But regardless, copy construction already works differently in D than in C++. Generally, instead of declaring a copy constructor, you declare a postblit constructor: this(this). It does a full memcpy before this(this) is called, and you only make whatever changes are necessary to ensure that the new struct is separate from the original (such as doing a deep copy of member variables where needed), as opposed to creating an entirely new constructor that must copy everything. So, the general approach is already a bit different from C++. It's also generally agreed upon that structs should not have expensive postblit constructors - copying structs should be cheap - so it's less of an issue than it would be in C++. Objects which would be expensive to copy are generally either classes or structs with reference or COW semantics.
Containers are generally reference types (in Phobos, they're structs rather than classes, since they don't need polymorphism, but copying them does not copy their contents, so they're still reference types), so copying them around is not expensive like it would be in C++.
There may very well be cases in D where it could use something similar to a move constructor, but in general, D has been designed in such a way as to reduce the problems that C++ has with copying objects around, so it's nowhere near the problem that it is in C++.
I think all answers completely failed to answer the original question.
First, as stated above, the question is only relevant for structs. Classes have no meaningful move. Also stated above, for structs, a certain amount of move will happen automatically by the compiler under certain conditions.
If you wish to get control over the move operations, here's what you have to do. You can disable copying by annotating this(this) with #disable. Next, you can override C++'s constructor(constructor &&that) by defining this(Struct that). Likewise, you can override the assign with opAssign(Struct that). In both cases, you need to make sure that you destroy the values of that.
For assignment, since you also need to destroy the old value of this, the simplest way is to swap them. An implementation of C++'s unique_ptr would, therefore, look something like this:
struct UniquePtr(T) {
private T* ptr = null;
#disable this(this); // This disables both copy construction and opAssign
// The obvious constructor, destructor and accessor
this(T* ptr) {
if(ptr !is null)
this.ptr = ptr;
}
~this() {
freeMemory(ptr);
}
inout(T)* get() inout {
return ptr;
}
// Move operations
this(UniquePtr!T that) {
this.ptr = that.ptr;
that.ptr = null;
}
ref UniquePtr!T opAssign(UniquePtr!T that) { // Notice no "ref" on "that"
swap(this.ptr, that.ptr); // We change it anyways, because it's a temporary
return this;
}
}
Edit:
Notice I did not define opAssign(ref UniquePtr!T that). That is the copy assignment operator, and if you try to define it, the compiler will error out because you declared, in the #disable line, that you have no such thing.
D have separate value and object semantics :
if you declare your type as struct, it will have value semantic by default
if you declare your type as class, it will have object semantic.
Now, assuming you don't manage the memory yourself, as it's the default case in D - using a garbage collector - you have to understand that object of types declared as class are automatically pointers (or "reference" if you prefer) to the real object, not the real object itself.
So, when passing vectors around in D, what you pass is the reference/pointer. Automatically. No copy involved (other than the copy of the reference).
That's why D, C#, Java and other language don't "need" moving semantic (as most types are object semantic and are manipulated by reference, not by copy).
Maybe they could implement it, I'm not sure. But would they really get performance boost as in C++? By nature, it don't seem likely.
I somehow have the feeling that actually the rvalue references and the whole concept of "move semantics" is a consequence that it's normal in C++ to create local, "temporary" stack objects. In D and most GC languages, it's most common to have objects on the heap, and then there's no overhead with having a temporary object copied (or moved) several times when returning it through a call stack - so there's no need for a mechanism to avoid that overhead too.
In D (and most GC languages) a class object is never copied implicitly and you're only passing the reference around most of the time, so this may mean that you don't need any rvalue references for them.
OTOH, struct objects are NOT supposed to be "handles to resources", but simple value types behaving similar to builtin types - so again, no reason for any move semantics here, IMHO.
This would yield a conclusion - D doesn't have rvalue refs because it doesn't need them.
However, I haven't used rvalue references in practice, I've only had a read on them, so I might have skipped some actual use cases of this feature. Please treat this post as a bunch of thoughts on the matter which hopefully would be helpful for you, not as a reliable judgement.
I think if you need the source to loose the resource you might be in trouble. However being GC'ed you can often avoid needing to worry about multiple owners so it might not be an issue for most cases.
I have a medium complex C++ class which holds a set of data read from disc. It contains an eclectic mix of floats, ints and structures and is now in general use. During a major code review it was asked whether we have a custom assignment operator or we rely on the compiler generated version and if so, how do we know it works correctly? Well, we didn't write a custom assignment and so a unit test was added to check that if we do:
CalibDataSet datasetA = getDataSet();
CalibDataSet dataSetB = datasetA;
then datasetB is the same as datasetA. A couple of hundred lines or so. Now the customer inists that we cannot rely on the compiler (gcc) being correct for future releases and we should write our own. Are they right to insist on this?
Additional info:
I'm impressed by the answers/comments already posted and the response time.Another way of asking this question might be:
When does a POD structure/class become a 'not' POD structure/class?
It is well-known what the automatically-generated assignment operator will do - that's defined as part of the standard and a standards-compliant C++ compiler will always generate a correctly-behaving assignment operator (if it didn't, then it would not be a standards-compliant compiler).
You usually only need to write your own assignment operator if you've written your own destructor or copy constructor. If you don't need those, then you don't need an assignment operator, either.
The most common reason to explicitly define an assignment operator is to support "remote ownership" -- basically, a class that includes one or more pointers, and owns the resources to which those pointers refer. In such a case, you normally need to define assignment, copying (i.e., copy constructor) and destruction. There are three primary strategies for such cases (sorted by decreasing frequency of use):
Deep copy
reference counting
Transfer of ownership
Deep copy means allocating a new resource for the target of the assignment/copy. E.g., a string class has a pointer to the content of the string; when you assign it, the assignment allocates a new buffer to hold the new content in the destination, and copies the data from the source to the destination buffer. This is used in most current implementations of a number of standard classes such as std::string and std::vector.
Reference counting used to be quite common as well. Many (most?) older implementations of std::string used reference counting. In this case, instead of allocating and copying the data for the string, you simply incremented a reference count to indicate the number of string objects referring to a particular data buffer. You only allocated a new buffer when/if the content of a string was modified so it needed to differ from others (i.e., it used copy on write). With multithreading, however, you need to synchronize access to the reference count, which often has a serious impact on performance, so in newer code this is fairly unusual (mostly used when something stores so much data that it's worth potentially wasting a bit of CPU time to avoid such a copy).
Transfer of ownership is relatively unusual. It's what's done by std::auto_ptr. When you assign or copy something, the source of the assignment/copy is basically destroyed -- the data is transferred from one to the other. This is (or can be) useful, but the semantics are sufficiently different from normal assignment that it's often counterintuitive. At the same time, under the right circumstances, it can provide great efficiency and simplicity. C++0x will make transfer of ownership considerably more manageable by adding a unique_ptr type that makes it more explicit, and also adding rvalue references, which make it easy to implement transfer of ownership for one fairly large class of situations where it can improve performance without leading to semantics that are visibly counterintuitive.
Going back to the original question, however, if you don't have remote ownership to start with -- i.e., your class doesn't contain any pointers, chances are good that you shouldn't explicitly define an assignment operator (or dtor or copy ctor). A compiler bug that stopped implicitly defined assignment operators from working would prevent passing any of a huge number of regression tests.
Even if it did somehow get released, your defense against it would be to just not use it. There's no real room for question that such a release would be replaced within a matter of hours. With a medium to large existing project, you don't want to switch to a compiler until it's been in wide use for a while in any case.
If the compiler isn't generating the assignment properly, then you have bigger problems to worry about than implementing the assignment overload (like the fact that you have a broken compiler). Unless your class contains pointers, it is not necessary to provide your own overload; however, it is reasonable to request an explicit overload, not because the compiler might break (which is absurd), but rather to document your intention that assignment be permitted and behave in that manner. In C++0x, it will be possible to document intent and save time by using = default for the compiler-generated version.
If you have some resources in your class that you are supposed to manage then writing your own assignment operator makes sense else rely on compiler generated one.
I guess the problem of shallow copy deep copy may appear in case you are dealing with strings.
http://www.learncpp.com/cpp-tutorial/912-shallow-vs-deep-copying/
but i think it is always advisable to write assignment overloads for user defined classes.
C++ compilers automatically generate copy constructors and copy-assignment operators. Why not swap too?
These days the preferred method for implementing the copy-assignment operator is the copy-and-swap idiom:
T& operator=(const T& other)
{
T copy(other);
swap(copy);
return *this;
}
(ignoring the copy-elision-friendly form that uses pass-by-value).
This idiom has the advantage of being transactional in the face of exceptions (assuming that the swap implementation does not throw). In contrast, the default compiler-generated copy-assignment operator recursively does copy-assignment on all base classes and data members, and that doesn't have the same exception-safety guarantees.
Meanwhile, implementing swap methods manually is tedious and error-prone:
To ensure that swap does not throw, it must be implemented for all non-POD members in the class and in base classes, in their non-POD members, etc.
If a maintainer adds a new data member to a class, the maintainer must remember to modify that class's swap method. Failing to do so can introduce subtle bugs. Also, since swap is an ordinary method, compilers (at least none I know of) don't emit warnings if the swap implementation is incomplete.
Wouldn't it be better if the compiler generated swap methods automatically? Then the implicit copy-assignment implementation could leverage it.
The obvious answer probably is: the copy-and-swap idiom didn't exist when C++ was developed, and doing this now might break existing code.
Still, maybe people could opt-in to letting the compiler generate swap using the same syntax that C++0x uses for controlling other implicit functions:
void swap() = default;
and then there could be rules:
If there is a compiler-generated swap method, an implicit copy-assignment operator can be implemented using copy-and-swap.
If there is no compiler-generated swap method, an implicit copy-assignment operator would be implemented as before (invoking copy-assigment on all base classes and on all members).
Does anyone know if such (crazy?) things have been suggested to the C++ standards committee, and if so, what opinions committee members had?
This is in addition to Terry's answer.
The reason we had to make swap functions in C++ prior to 0x is because the general free-function std::swap was less efficient (and less versatile) than it could be. It made a copy of a parameter, then had two re-assignments, then released the essentially wasted copy. Making a copy of a heavy-weight class is a waste of time, when we as programmers know all we really need to do is swap the internal pointers and whatnot.
However, rvalue-references relieve this completely. In C++0x, swap is implemented as:
template <typename T>
void swap(T& x, T& y)
{
T temp(std::move(x));
x = std::move(y);
y = std::move(temp);
}
This makes much more sense. Instead of copying data around, we are merely moving data around. This even allows non-copyable types, like streams, to be swapped. The draft of the C++0x standard states that in order for types to be swapped with std::swap, they must be rvalue constructable, and rvalue assignable (obviously).
This version of swap will essentially do what any custom written swap function would do. Consider a class we'd normally write swap for (such as this "dumb" vector):
struct dumb_vector
{
int* pi; // lots of allocated ints
// constructors, copy-constructors, move-constructors
// copy-assignment, move-assignment
};
Previously, swap would make a redundant copy of all our data, before discarding it later. Our custom swap function would just swap the pointer, but can be clumsy to use in some cases. In C++0x, moving achieves the same end result. Calling std::swap would generate:
dumb_vector temp(std::move(x));
x = std::move(y);
y = std::move(temp);
Which translates to:
dumb_vector temp;
temp.pi = x.pi; x.pi = 0; // temp(std::move(x));
x.pi = y.pi; y.pi = 0; // x = std::move(y);
y.pi = temp.pi; temp.pi = 0; // y = std::move(temp);
The compiler will of course get rid of redundant assignment's, leaving:
int* temp = x.pi;
x.pi = y.pi;
y.pi = temp;
Which is exactly what our custom swap would have made in the first place. So while prior to C++0x I would agree with your suggestion, custom swap's aren't really necessary anymore, with the introduction of rvalue-references. std::swap will work perfectly in any class that implements move functions.
In fact, I'd argue implementing a swap function should become bad practice. Any class that would need a swap function would also need rvalue functions. But in that case, there is simply no need for the clutter of a custom swap. Code size does increase (two ravlue functions versus one swap), but rvalue-references don't just apply for swapping, leaving us with a positive trade off. (Overall faster code, cleaner interface, slightly more code, no more swap ADL hassle.)
As for whether or not we can default rvalue functions, I don't know. I'll look it up later or maybe someone else can chime in, but that would sure be helpful. :)
Even so, it makes sense to allow default rvalue functions instead of swap. So in essence, as long as they allow = default rvalue functions, your request has already been made. :)
EDIT: I did a bit of searching, and the proposal for = default move was proposal n2583. According to this (which I don't know how to read very well), it was "moved back." It is listed under the section titled "Not ready for C++0x, but open to resubmit in future ". So looks like it won't be part of C++0x, but may be added later.
Somewhat disappointing. :(
EDIT 2: Looking around a bit more, I found this: Defining Move Special Member Functions which is much more recent, and does look like we can default move. Yay!
swap, when used by STL algorithms, is a free function. There is a default swap implementation: std::swap. It does the obvious. You seem to be under the impression that if you add a swap member function to your data type, STL containers and algorithms will find it and use it. This isn't the case.
You're supposed to specialize std::swap (in the namespace next to your UDT, so it's found by ADL) if you can do better. It is idiomatic to just have it defer to a member swap function.
While we're on the subject, it is also idiomatic in C++0x (in as much as is possible to have idioms on such a new standard) to implement rvalue constructors as a swap.
And yes, in a world where a member swap was the language design instead of a free function swap, this would imply that we'd need a swap operator instead of a function - or else primitive types (int, float, etc) couldn't be treated generically (as they have no member function swap). So why didn't they do this? You'd have to ask the committee members for sure - but I'm 95% certain the reason is the committee has long preferred library implementions of features whenever possible, over inventing new syntax to implement a feature. The syntax of a swap operator would be weird, because unlike =, +, -, etc, and all the other operators, there is no algebraic operator everyone is familiar with for "swap".
C++ is syntactically complex enough. They go to great lengths to not add new keywords or syntax features whenever possible, and only do so for very good reasons (lambdas!).
Does anyone know if such (crazy?) things have been suggested to the C++ standards committee
Send an email to Bjarne. He knows all of this stuff and usually replies within a couple of hours.
Is even compiler-generated move constructor/assignment planned (with the default keyword)?
If there is a compiler-generated swap
method, an implicit copy-assignment
operator can be implemented using
copy-and-swap.
Even though the idiom leaves the object unchanged in case of exceptions, doesn't this idiom, by requiring the creation of a third object, make chances of failure greater in the first place?
There might also be performance implications (copying might be more expensive than "assigning") which is why I don't see such complicated functionality being left to be implemented by the compiler.
Generally I don't overload operator= and I don't worry about this level of exception safety: I don't wrap individual assignments into try blocks - what would I do with the most likely std::bad_alloc at that point? - so I wouldn't care if the object, before they end up destroyed anyway, remained in the original state or not. There may be of course specific situations where you might indeed need it, but I don't see why the principle of "you don't pay for what you don't use" should be given up here.