Related
I'm pretty new to C++, and I have a question regarding some C++ conventions regarding copying. I've googled around and haven't really been able to find good guidance on this, so I'm turning to you fine folks.
Let say you have an object that represents some resource that is technically copyable, but the copy is a expensive and almost always the wrong thing to do. Should you still implement a copy constructor for it? Or is it better to make a member function that is something like make_copy() (for those rare times when you DO want to copy the object).
For instance: lets say you have a class representing a texture stored in video memory. This resource is technically copyable: you can create a new handle for it and copy the memory (either through the CPU or using graphics library calls). But generally speaking, this is not something you really want to do very often. It's expensive, and it's usually the wrong thing to do, and is potentially very wasteful of memory. However, you could imagine corner cases where it would make sense: taking a screenshot and applying filters over it or something. Those cases would be rare, but they would exist.
The reason I'm hesitant making a copy constructor that does this this is that I feel like C++ is a bit too eager to copy stuff. Maybe it reflects the fact that I'm a bit new to the language, but since C++ will call the copy constructor in all sorts of situations where you might not mean for it to do so. Like:
void some_method(Texture t)
{
...
}
Texture t(<arguments>);
Texture t2 = t; // calls copy constructor
some_method(t2); // calls copy constructor
I would much rather these two lines that calls copy constructors be compiler errors (because I feel like they're very easy mistakes to make) and if you want to actually make a copy, you need to be very explicit about it and use a dedicated member function.
Is there some standard practice around this? Some sage advice on when you should (and shouldn't) write copy constructors? Some Scott Meyers chapter I've missed? Or should I just do it whenever it's possible?
EDIT: to be clear about my example: obviously, you should pass the argument by reference, but my point was that this is a very easy mistake to make, just leaving out the ampersand. And if you do that, the compiler will happily replace a cheap pass-by-reference with a very expensive copy, and I would rather the compiler not do that. But I dunno, maybe this isn't a mistake people generally make in C++, and I'm being overly cautious?
Whether you want to use it is your call, but here is a pattern that is relatively light on syntax:
struct ExplicitCopy {
// Implement as usual
ExplicitCopy() = default;
ExplicitCopy(ExplicitCopy &&) = default;
ExplicitCopy &operator = (ExplicitCopy &&) = default;
// Copying happens with an ADL call to this function
friend ExplicitCopy copy(ExplicitCopy const &orig) {
return orig;
}
private:
// Copy operations are private and can't be called accidentally
ExplicitCopy(ExplicitCopy const &) = default;
ExplicitCopy &operator = (ExplicitCopy const &) = default;
};
Then, trying to copy an instance will result in a compiler error from the call to a private constructor.
ExplicitCopy ec;
ExplicitCopy ec2 = ec; // Nope
ExplicitCopy ec3(ec); // Nope
ExplicitCopy ec4 = copy(ec); // Yes
Thanks to copy elision, there is no additional constructor call.
See it live on Coliru
This is kind of a matter of opinion, but if copying makes sense (despite its cost) then I would
provide a copy constructor, noting in comments that it's expensive (which should encourage users of your class to pass by reference where appropriate)
provide a move constructor (which allows users of your class to do things "on the cheap")
Don't forget assignment operators for each case.
And if you follow the rule of zero then the language will do all of this for you.
An example is std::vector. Did you really want to copy 50,000 heavy vector elements? Maybe not, but at least you had the choice to do so.
Counter-examples include standard streams, which are not copyable… but this is because it literally makes no sense to copy a "stream of data" (streams aren't containers!). They're also not moveable, but that's another story.
a class representing a texture stored in video memory.
That sounds like a pointer, so I think it's not unreasonable to expect it should have a pointer semantic for copying, i.e. it does a shallow copy by default. A shallow copy should be very fast, and shouldn't require copying the actual texture in graphic memory, but rather that they become alias for the same texture in GPU memory.
I used to think C++'s object model is very robust when best practices are followed.
Just a few minutes ago, though, I had a realization that I hadn't had before.
Consider this code:
class Foo
{
std::set<size_t> set;
std::vector<std::set<size_t>::iterator> vector;
// ...
// (assume every method ensures p always points to a valid element of s)
};
I have written code like this. And until today, I hadn't seen a problem with it.
But, thinking about it a more, I realized that this class is very broken:
Its copy-constructor and copy-assignment copy the iterators inside the vector, which implies that they will still point to the old set! The new one isn't a true copy after all!
In other words, I must manually implement the copy-constructor even though this class isn't managing any resources (no RAII)!
This strikes me as astonishing. I've never come across this issue before, and I don't know of any elegant way to solve it. Thinking about it a bit more, it seems to me that copy construction is unsafe by default -- in fact, it seems to me that classes should not be copyable by default, because any kind of coupling between their instance variables risks rendering the default copy-constructor invalid.
Are iterators fundamentally unsafe to store? Or, should classes really be non-copyable by default?
The solutions I can think of, below, are all undesirable, as they don't let me take advantage of the automatically-generated copy constructor:
Manually implement a copy constructor for every nontrivial class I write. This is not only error-prone, but also painful to write for a complicated class.
Never store iterators as member variables. This seems severely limiting.
Disable copying by default on all classes I write, unless I can explicitly prove they are correct. This seems to run entirely against C++'s design, which is for most types to have value semantics, and thus be copyable.
Is this a well-known problem, and if so, does it have an elegant/idiomatic solution?
C++ copy/move ctor/assign are safe for regular value types. Regular value types behave like integers or other "regular" values.
They are also safe for pointer semantic types, so long as the operation does not change what the pointer "should" point to. Pointing to something "within yourself", or another member, is an example of where it fails.
They are somewhat safe for reference semantic types, but mixing pointer/reference/value semantics in the same class tends to be unsafe/buggy/dangerous in practice.
The rule of zero is that you make classes that behave like either regular value types, or pointer semantic types that don't need to be reseated on copy/move. Then you don't have to write copy/move ctors.
Iterators follow pointer semantics.
The idiomatic/elegant around this is to tightly couple the iterator container with the pointed-into container, and block or write the copy ctor there. They aren't really separate things once one contains pointers into the other.
Yes, this is a well known "problem" -- whenever you store pointers in an object, you're probably going to need some kind of custom copy constructor and assignment operator to ensure that the pointers are all valid and point at the expected things.
Since iterators are just an abstraction of collection element pointers, they have the same issue.
Is this a well-known problem?
Well, it is known, but I would not say well-known. Sibling pointers do not occur often, and most implementations I have seen in the wild were broken in the exact same way than yours is.
I believe the problem to be infrequent enough to have escaped most people's notice; interestingly, as I follow more Rust than C++ nowadays, it crops up there quite often because of the strictness of the type system (ie, the compiler refuses those programs, prompting questions).
does it have an elegant/idiomatic solution?
There are many types of sibling pointers situations, so it really depends, however I know of two generic solutions:
keys
shared elements
Let's review them in order.
Pointing to a class-member, or pointing into an indexable container, then one can use an offset or key rather than an iterator. It is slightly less efficient (and might require a look-up) however it is a fairly simple strategy. I have seen it used to great effect in shared-memory situation (where using pointers is a no-no since the shared-memory area may be mapped at different addresses).
The other solution is used by Boost.MultiIndex, and consists in an alternative memory layout. It stems from the principle of the intrusive container: instead of putting the element into the container (moving it in memory), an intrusive container uses hooks already inside the element to wire it at the right place. Starting from there, it is easy enough to use different hooks to wire a single elements into multiple containers, right?
Well, Boost.MultiIndex kicks it two steps further:
It uses the traditional container interface (ie, move your object in), but the node to which the object is moved in is an element with multiple hooks
It uses various hooks/containers in a single entity
You can check various examples and notably Example 5: Sequenced Indices looks a lot like your own code.
Is this a well-known problem
Yes. Any time you have a class that contains pointers, or pointer-like data like an iterator, you have to implement your own copy-constructor and assignment-operator to ensure the new object has valid pointers/iterators.
and if so, does it have an elegant/idiomatic solution?
Maybe not as elegant as you might like, and probably is not the best in performance (but then, copies sometimes are not, which is why C++11 added move semantics), but maybe something like this would work for you (assuming the std::vector contains iterators into the std::set of the same parent object):
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
struct findAndPushIterator
{
Foo &foo;
findAndPushIterator(Foo &f) : foo(f) {}
void operator()(const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = foo.s.find(*iter);
if (found != foo.s.end())
foo.v.push_back(found);
}
};
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(), findAndPushIterator(*this));
return *this;
}
//...
};
Or, if using C++11:
class Foo
{
private:
std::set<size_t> s;
std::vector<std::set<size_t>::iterator> v;
public:
Foo() {}
Foo(const Foo &src)
{
*this = src;
}
Foo& operator=(const Foo &rhs)
{
v.clear();
s = rhs.s;
v.reserve(rhs.v.size());
std::for_each(rhs.v.begin(), rhs.v.end(),
[this](const std::set<size_t>::iterator &iter)
{
std::set<size_t>::iterator found = s.find(*iter);
if (found != s.end())
v.push_back(found);
}
);
return *this;
}
//...
};
Yes, of course it's a well-known problem.
If your class stored pointers, as an experienced developer you would intuitively know that the default copy behaviours may not be sufficient for that class.
Your class stores iterators and, since they are also "handles" to data stored elsewhere, the same logic applies.
This is hardly "astonishing".
The assertion that Foo is not managing any resources is false.
Copy-constructor aside, if a element of set is removed, there must be code in Foo that manages vector so that the respective iterator is removed.
I think the idiomatic solution is to just use one container, a vector<size_t>, and check that the count of an element is zero before inserting. Then the copy and move defaults are fine.
"Inherently unsafe"
No, the features you mention are not inherently unsafe; the fact that you thought of three possible safe solutions to the problem is evidence that there is no "inherent" lack of safety here, even though you think the solutions are undesirable.
And yes, there is RAII here: the containers (set and vector) are managing resources. I think your point is that the RAII is "already taken care of" by the std containers. But you need to then consider the container instances themselves to be "resources", and in fact your class is managing them. You're correct that you're not directly managing heap memory, because this aspect of the management problem is taken care of for you by the standard library. But there's more to the management problem, which I'll talk a bit more about below.
"Magic" default behavior
The problem is that you are apparently hoping that you can trust the default copy constructor to "do the right thing" in a non-trivial case such as this. I'm not sure why you expected the right behavior--perhaps you're hoping that memorizing rules-of-thumb such as the "rule of 3" will be a robust way to ensure that you don't shoot yourself in the foot? Certainly that would be nice (and, as pointed out in another answer, Rust goes much further than other low-level languages toward making foot-shooting much harder), but C++ simply isn't designed for "thoughtless" class design of that sort, nor should it be.
Conceptualizing constructor behavior
I'm not going to try to address the question of whether this is a "well-known problem", because I don't really know how well-characterized the problem of "sister" data and iterator-storing is. But I hope that I can convince you that, if you take the time to think about copy-constructor-behavior for every class you write that can be copied, this shouldn't be a surprising problem.
In particular, when deciding to use the default copy-constructor, you must think about what the default copy-constructor will actually do: namely, it will call the copy-constructor of each non-primitive, non-union member (i.e. members that have copy-constructors) and bitwise-copy the rest.
When copying your vector of iterators, what does std::vector's copy-constructor do? It performs a "deep copy", i.e., the data inside the vector is copied. Now, if the vector contains iterators, how does that affect the situation? Well, it's simple: the iterators are the data stored by the vector, so the iterators themselves will be copied. What does an iterator's copy-constructor do? I'm not going to actually look this up, because I don't need to know the specifics: I just need to know that iterators are like pointers in this (and other respect), and copying a pointer just copies the pointer itself, not the data pointed to. I.e., iterators and pointers do not have deep-copying by default.
Note that this is not surprising: of course iterators don't do deep-copying by default. If they did, you'd get a different, new set for each iterator being copied. And this makes even less sense than it initially appears: for instance, what would it actually mean if uni-directional iterators made deep-copies of their data? Presumably you'd get a partial copy, i.e., all the remaining data that's still "in front of" the iterator's current position, plus a new iterator pointing to the "front" of the new data structure.
Now consider that there is no way for a copy-constructor to know the context in which it's being called. For instance, consider the following code:
using iter = std::set<size_t>::iterator; // use typedef pre-C++11
std::vector<iter> foo = getIters(); // get a vector of iterators
useIters(foo); // pass vector by value
When getIters is called, the return value might be moved, but it might also be copy-constructed. The assignment to foo also invokes a copy-constructor, though this may also be elided. And unless useIters takes its argument by reference, then you've also got a copy constructor call there.
In any of these cases, would you expect the copy constructor to change which std::set is pointed to by the iterators contained by the std::vector<iter>? Of course not! So naturally std::vector's copy-constructor can't be designed to modify the iterators in that particular way, and in fact std::vector's copy-constructor is exactly what you need in most cases where it will actually be used.
However, suppose std::vector could work like this: suppose it had a special overload for "vector-of-iterators" that could re-seat the iterators, and that the compiler could somehow be "told" only to invoke this special constructor when the iterators actually need to be re-seated. (Note that the solution of "only invoke the special overload when generating a default constructor for a containing class that also contains an instance of the iterators' underlying data type" wouldn't work; what if the std::vector iterators in your case were pointing at a different standard set, and were being treated simply as a reference to data managed by some other class? Heck, how is the compiler supposed to know whether the iterators all point to the same std::set?) Ignoring this problem of how the compiler would know when to invoke this special constructor, what would the constructor code look like? Let's try it, using _Ctnr<T>::iterator as our iterator type (I'll use C++11/14isms and be a bit sloppy, but the overall point should be clear):
template <typename T, typename _Ctnr>
std::vector< _Ctnr<T>::iterator> (const std::vector< _Ctnr<T>::iterator>& rhs)
: _data{ /* ... */ } // initialize underlying data...
{
for (auto i& : rhs)
{
_data.emplace_back( /* ... */ ); // What do we put here?
}
}
Okay, so we want each new, copied iterator to be re-seated to refer to a different instance of _Ctnr<T>. But where would this information come from? Note that the copy-constructor can't take the new _Ctnr<T> as an argument: then it would no longer be a copy-constructor. And in any case, how would the compiler know which _Ctnr<T> to provide? (Note, too, that for many containers, finding the "corresponding iterator" for the new container may be non-trivial.)
Resource management with std:: containers
This isn't just an issue of the compiler not being as "smart" as it could or should be. This is an instance where you, the programmer, have a specific design in mind that requires a specific solution. In particular, as mentioned above, you have two resources, both std:: containers. And you have a relationship between them. Here we get to something that most of the other answers have stated, and which by this point should be very, very clear: related class members require special care, since C++ does not manage this coupling by default. But what I hope is also clear by this point is that you shouldn't think of the problem as arising specifically because of data-member coupling; the problem is simply that default-construction isn't magic, and the programmer must be aware of the requirements for correctly copying a class before deciding to let the implicitly-generated constructor handle copying.
The elegant solution
...And now we get to aesthetics and opinions. You seem to find it inelegant to be forced to write a copy-constructor when you don't have any raw pointers or arrays in your class that must be manually managed.
But user-defined copy constructors are elegant; allowing you to write them is C++'s elegant solution to the problem of writing correct non-trivial classes.
Admittedly, this seems like a case where the "rule of 3" doesn't quite apply, since there's a clear need to either =delete the copy-constructor or write it yourself, but there's no clear need (yet) for a user-defined destructor. But again, you can't simply program based on rules of thumb and expect everything to work correctly, especially in a low-level language such as C++; you must be aware of the details of (1) what you actually want and (2) how that can be achieved.
So, given that the coupling between your std::set and your std::vector actually creates a non-trivial problem, solving the problem by wrapping them together in a class that correctly implements (or simply deletes) the copy-constructor is actually a very elegant (and idiomatic) solution.
Explicitly defining versus deleting
You mention a potential new "rule of thumb" to follow in your coding practices: "Disable copying by default on all classes I write, unless I can explicitly prove they are correct." While this might be a safer rule of thumb (at least in this case) than the "rule of 3" (especially when your criterion for "do I need to implement the 3" is to check whether a deleter is required), my above caution against relying on rules of thumb still applies.
But I think the solution here is actually simpler than the proposed rule of thumb. You don't need to formally prove the correctness of the default method; you simply need to have a basic idea of what it would do, and what you need it to do.
Above, in my analysis of your particular case, I went into a lot of detail--for instance, I brought up the possibility of "deep-copying iterators". You don't need to go into this much detail to determine whether or not the default copy-constructor will work correctly. Instead, simply imagine what your manually-created copy constructor will look like; you should be able to tell pretty quickly how similar your imaginary explicitly-defined constructor is to the one the compiler would generate.
For example, a class Foo containing a single vector data will have a copy constructor that looks like this:
Foo::Foo(const Foo& rhs)
: data{rhs.data}
{}
Without even writing that out, you know that you can rely on the implicitly-generated one, because it's exactly the same as what you'd have written above.
Now, consider the constructor for your class Foo:
Foo::Foo(const Foo& rhs)
: set{rhs.set}
, vector{ /* somehow use both rhs.set AND rhs.vector */ } // ...????
{}
Right away, given that simply copying vector's members won't work, you can tell that the default constructor won't work. So now you need to decide whether your class needs to be copyable or not.
I am basically trying to figure out, is the whole "move semantics" concept something brand new, or it is just making existing code simpler to implement? I am always interested in reducing the number of times I call copy/constructors but I usually pass objects through using reference (and possibly const) and ensure I always use initialiser lists. With this in mind (and having looked at the whole ugly && syntax) I wonder if it is worth adopting these principles or simply coding as I already do? Is anything new being done here, or is it just "easier" syntactic sugar for what I already do?
TL;DR
This is definitely something new and it goes well beyond just being a way to avoid copying memory.
Long Answer: Why it's new and some perhaps non-obvious implications
Move semantics are just what the name implies--that is, a way to explicitly declare instructions for moving objects rather than copying. In addition to the obvious efficiency benefit, this also affords a programmer a standards-compliant way to have objects that are movable but not copyable. Objects that are movable and not copyable convey a very clear boundary of resource ownership via standard language semantics. This was possible in the past, but there was no standard/unified (or STL-compatible) way to do this.
This is a big deal because having a standard and unified semantic benefits both programmers and compilers. Programmers don't have to spend time potentially introducing bugs into a move routine that can reliably be generated by compilers (most cases); compilers can now make appropriate optimizations because the standard provides a way to inform the compiler when and where you're doing standard moves.
Move semantics is particularly interesting because it very well suits the RAII idiom, which is a long-standing a cornerstone of C++ best practice. RAII encompasses much more than just this example, but my point is that move semantics is now a standard way to concisely express (among other things) movable-but-not-copyable objects.
You don't always have to explicitly define this functionality in order to prevent copying. A compiler feature known as "copy elision" will eliminate quite a lot of unnecessary copies from functions that pass by value.
Criminally-Incomplete Crash Course on RAII (for the uninitiated)
I realize you didn't ask for a code example, but here's a really simple one that might benefit a future reader who might be less familiar with the topic or the relevance of Move Semantics to RAII practices. (If you already understand this, then skip the rest of this answer)
// non-copyable class that manages lifecycle of a resource
// note: non-virtual destructor--probably not an appropriate candidate
// for serving as a base class for objects handled polymorphically.
class res_t {
using handle_t = /* whatever */;
handle_t* handle; // Pointer to owned resource
public:
res_t( const res_t& src ) = delete; // no copy constructor
res_t& operator=( const res_t& src ) = delete; // no copy-assignment
res_t( res_t&& src ) = default; // Move constructor
res_t& operator=( res_t&& src ) = default; // Move-assignment
res_t(); // Default constructor
~res_t(); // Destructor
};
Objects of this class will allocate/provision whatever resource is needed upon construction and then free/release it upon destruction. Since the resource pointed to by the data member can never accidentally be transferred to another object, the rightful owner of a resource is never in doubt. In addition to making your code less prone to abuse or errors (and easily compatible with STL containers), your intentions will be immediately recognized by any programmer familiar with this standard practice.
In the Turing Tar Pit, there is nothing new under the sun. Everything that move semantics does, can be done without move semantics -- it just takes a lot more code, and is a lot more fragile.
What move semantics does is takes a particular common pattern that massively increases efficiency and safety in a number of situations, and embeds it in the language.
It increases efficiency in obvious ways. Moving, be it via swap or move construction, is much faster for many data types than copying. You can create special interfaces to indicate when things can be moved from: but honestly people didn't do that. With move semantics, it becomes relatively easy to do. Compare the cost of moving a std::vector to copying it -- move takes roughly copying 3 pointers, while copying requires a heap allocation, copying every element in the container, and creating 3 pointers.
Even more so, compare reserve on a move-aware std::vector to a copy-only aware one: suppose you have a std::vector of std::vector. In C++03, that was performance suicide if you didn't know the dimensions of every component ahead of time -- in C++11, move semantics makes it as smooth as silk, because it is no longer repeatedly copying the sub-vectors whenever the outer vector resizes.
Move semantics makes every "pImpl pattern" type to have blazing fast performance, while means you can start having complex objects that behave like values instead of having to deal with and manage pointers to them.
On top of these performance gains, and opening up complex-class-as-value, move semantics also open up a whole host of safety measures, and allow doing some things that where not very practical before.
std::unique_ptr is a replacement for std::auto_ptr. They both do roughly the same thing, but std::auto_ptr treated copies as moves. This made std::auto_ptr ridiculously dangerous to use in practice. Meanwhile, std::unique_ptr just works. It represents unique ownership of some resource extremely well, and transfer of ownership can happen easily and smoothly.
You know the problem whereby you take a foo* in an interface, and sometimes it means "this interface is taking ownership of the object" and sometimes it means "this interface just wants to be able to modify this object remotely", and you have to delve into API documentation and sometimes source code to figure out which?
std::unique_ptr actually solves this problem -- interfaces that want to take onwership can now take a std::unique_ptr<foo>, and the transfer of ownership is obvious at both the API level and in the code that calls the interface. std::unique_ptr is an auto_ptr that just works, and has the unsafe portions removed, and replaced with move semantics. And it does all of this with nearly perfect efficiency.
std::unique_ptr is a transferable RAII representation of resource whose value is represented by a pointer.
After you write make_unique<T>(Args&&...), unless you are writing really low level code, it is probably a good idea to never call new directly again. Move semantics basically have made new obsolete.
Other RAII representations are often non-copyable. A port, a print session, an interaction with a physical device -- all of these are resources for whom "copy" doesn't make much sense. Most every one of them can be easily modified to support move semantics, which opens up a whole host of freedom in dealing with these variables.
Move semantics also allows you to put your return values in the return part of a function. The pattern of taking return values by reference (and documenting "this one is out-only, this one is in/out", or failing to do so) can be somewhat replaced by returning your data.
So instead of void fill_vec( std::vector<foo>& ), you have std::vector<foo> get_vec(). This even works with multiple return values -- std::tuple< std::vector<A>, std::set<B>, bool > get_stuff() can be called, and you can load your data into local variables efficiently via std::tie( my_vec, my_set, my_bool ) = get_stuff().
Output parameters can be semantically output-only, with very little overhead (the above, in a worst case, costs 8 pointer and 2 bool copies, regardless of how much data we have in those containers -- and that overhead can be as little as 0 pointer and 0 bool copies with a bit more work), because of move semantics.
There is absolutely something new going on here. Consider unique_ptr which can be moved, but not copied because it uniquely holds ownership of a resource. That ownership can then be transferred by moving it to a new unique_ptr if needed, but copying it would be impossible (as you would then have two references to the owned object).
While many uses of moving may have positive performance implications, the movable-but-not-copyable types are a much bigger functional improvement to the language.
In short, use the new techniques where it indicates the meaning of how your class should be used, or where (significant) performance concerns can be alleviated by movement rather than copy-and-destroy.
No answer is complete without a reference to Thomas Becker's painstakingly exhaustive write up on rvalue references, perfect forwarding, reference collapsing and everything related to that.
see here: http://thbecker.net/articles/rvalue_references/section_01.html
I would say yes because a Move Constructor and Move Assignment operator are now compiler defined for objects that do not define/protect a destructor, copy constructor, or copy assignment.
This means that if you have the following code...
struct intContainer
{
std::vector<int> v;
}
intContainer CreateContainer()
{
intContainer c;
c.v.push_back(3);
return c;
}
The code above would be optimized simply by recompiling with a compiler that supports move semantics. Your container c will have compiler defined move-semantics and thus will call the manually defined move operations for std::vector without any changes to your code.
Since move semantics only apply in the presence of rvalue
references, which are declared by a new token, &&, it seems
very clear that they are something new.
In principle, they are purely an optimizing techique, which
means that:
1. you don't use them until the profiler says it is necessary, and
2. in theory, optimizing is the compiler's job, and move
semantics aren't any more necessary than register.
Concerning 1, we may, in time, end up with an ubiquitous
heuristic as to how to use them: after all, passing an argument
by const reference, rather than by value, is also an
optimization, but the ubiquitous convention is to pass class
types by const reference, and all other types by value.
Concerning 2, compilers just aren't there yet. At least, the
usual ones. The basic principles which could be used to make
move semantics irrelevant are (well?) known, but to date, they
tend to result in unacceptable compile times for real programs.
As a result: if you're writing a low level library, you'll
probably want to consider move semantics from the start.
Otherwise, they're just extra complication, and should be
ignored, until the profiler says otherwise.
I'm trying to understand the purpose of overloading some operators in C++.
Conceptually, an assignment statement can be easily implemented via:
Destruction of the old object followed by copy construction of the new object
Copy construction of the new object, followed by a swap with the old object, followed by destruction of the old object
In fact, often, the copy-and-swap implementation is the implementation of assignment in real code.
Why, then, does C++ allow the programmer to overload the assignment operator, instead of just performing the above?
Was it intended to allow a scenario in which assignment is faster than destruction + construction?
If so, when does that happen? And if not, then what use case was it intended to support?
1) Reference counting
Suppose you have a resource that is ref-counted and it is wrapped in objects.
void operator=(const MyObject& v) {
this->resource = refCount(v->resource);
}
// Here the internal resource will be copied and not the objects.
MyObject A = B;
2) Or you just want to copy the fields without fancy semantics.
void operator=(const MyObject& v) {
this->someField = v->someField;
}
// So this statement should generate just one function call and not a fancy
// collection of temporaries that require construction destruction.
MyObject A = B;
In both cases the code runs much faster. In the second case the effect is similar.
3) Also what about types...
Use the operator to handle assigning other types to your type.
void operator=(const MyObject& v) {
this->someField = v->someField;
}
void operator=(int x) {
this->value = x;
}
void operator=(float y) {
this->floatValue = y;
}
first, note that "Destruction of the old object followed by copy construction of the new object" is not exception safe.
but re "Copy construction of the new object, followed by a swap with the old object, followed by destruction of the old object", that's the swap idiom for implementing an assignment operator, and it's exception safe if done correctly.
in some cases a custom assignment operator can be faster than the swap idiom. for example, direct arrays of POD type can't really be swapped except by way of lower level assignments. so there for the swap idiom you can expect an overhead proportional to the array size.
however, historically there wasn't much focus on swapping and exception safety.
bjarne wanted exceptions originally (if i recall correctly), but they didn't get into the language until 1989 or thereabouts. so the original c++ way of programming was more focused on assignments. to the degree that a failing constructor signalled its failure by assigning 0 to this… i think, that in those days your question would not have made sense. it was just assignments all over.
typewise, some objects have identity, and others have value. it makes sense to assign to value objects, but for identity objects one typically wants to limit the ways that the object can be modified. while this doesn't require the ability to customize copy assignment (only to make it unavailable), with that ability one doesn't need any other language support.
and i think likewise for any other specific reasons one can think of: probably no such reason really requires the general ability, but the general ability is sufficient to cover it all, so it lowers the overall language complexity.
a good source to get more definitive answer than my hunches, recollections and gut feelings, is bjarne's "the design and evolution of c++" book.
probably the question has a definitive answer there.
Destruction of the old object, followed by copy construction of
the new, will not usually work. And the swap idiom is
guaranteed not to work unless the class provides a special swap
function—std::swap uses assignment in its unspecialized
implementation, and using it directly in the assignment operator
will lead to endless recursion.
And of course, the user may want to do something special, e.g.
make the assignment operator private, for example.
And finally, what is almost certainly an overruling reason: the
default assignment operator has to be compatible with C.
Actually, after seeing juanchopanza's answer (which was deleted), I think I ended up figuring it out myself.
Copy-assignment operators allow classes like basic_string to avoid allocating resources unnecessarily when they can re-use them (in this case, memory).
So when you assign to a basic_string, an overloaded copy assignment operator would avoid allocating memory, and would just copy the string data directly to the existing buffer.
If the object had to be destroyed and constructed again, the buffer would have to be reallocated, which would be potentially much more costly for a small string.
(Note that vector could benefit from this too, but only if it knew that the elements' copy constructors would never throw exceptions. Otherwise it would need to maintain its exception safety and actually perform a copy-and-swap.)
It allows you to use assignment of other types as well.
You could have a class Person with an assignment operator that assigns an ID.
But besides that, you don't always want to copy all the members as they are.
The default assignment only does a shallow copy.
For example, if the class contains pointers, or locks, you dont always want to copy them from the other object.
Usually when you have pointers you want to use a deep copy, and maybe create a copy of the object that the pointers are pointed to.
And if you have locks, you want them to be specific to the object, and you don't want to copy their state from the other object.
It is actually a common practice to provide your own copy constructor and assignment operator if your class holds pointers as members.
I have used it often as a conversion constructor but with already existing objects. i.e assigning member variable type, etc to an object.
C++ compilers automatically generate copy constructors and copy-assignment operators. Why not swap too?
These days the preferred method for implementing the copy-assignment operator is the copy-and-swap idiom:
T& operator=(const T& other)
{
T copy(other);
swap(copy);
return *this;
}
(ignoring the copy-elision-friendly form that uses pass-by-value).
This idiom has the advantage of being transactional in the face of exceptions (assuming that the swap implementation does not throw). In contrast, the default compiler-generated copy-assignment operator recursively does copy-assignment on all base classes and data members, and that doesn't have the same exception-safety guarantees.
Meanwhile, implementing swap methods manually is tedious and error-prone:
To ensure that swap does not throw, it must be implemented for all non-POD members in the class and in base classes, in their non-POD members, etc.
If a maintainer adds a new data member to a class, the maintainer must remember to modify that class's swap method. Failing to do so can introduce subtle bugs. Also, since swap is an ordinary method, compilers (at least none I know of) don't emit warnings if the swap implementation is incomplete.
Wouldn't it be better if the compiler generated swap methods automatically? Then the implicit copy-assignment implementation could leverage it.
The obvious answer probably is: the copy-and-swap idiom didn't exist when C++ was developed, and doing this now might break existing code.
Still, maybe people could opt-in to letting the compiler generate swap using the same syntax that C++0x uses for controlling other implicit functions:
void swap() = default;
and then there could be rules:
If there is a compiler-generated swap method, an implicit copy-assignment operator can be implemented using copy-and-swap.
If there is no compiler-generated swap method, an implicit copy-assignment operator would be implemented as before (invoking copy-assigment on all base classes and on all members).
Does anyone know if such (crazy?) things have been suggested to the C++ standards committee, and if so, what opinions committee members had?
This is in addition to Terry's answer.
The reason we had to make swap functions in C++ prior to 0x is because the general free-function std::swap was less efficient (and less versatile) than it could be. It made a copy of a parameter, then had two re-assignments, then released the essentially wasted copy. Making a copy of a heavy-weight class is a waste of time, when we as programmers know all we really need to do is swap the internal pointers and whatnot.
However, rvalue-references relieve this completely. In C++0x, swap is implemented as:
template <typename T>
void swap(T& x, T& y)
{
T temp(std::move(x));
x = std::move(y);
y = std::move(temp);
}
This makes much more sense. Instead of copying data around, we are merely moving data around. This even allows non-copyable types, like streams, to be swapped. The draft of the C++0x standard states that in order for types to be swapped with std::swap, they must be rvalue constructable, and rvalue assignable (obviously).
This version of swap will essentially do what any custom written swap function would do. Consider a class we'd normally write swap for (such as this "dumb" vector):
struct dumb_vector
{
int* pi; // lots of allocated ints
// constructors, copy-constructors, move-constructors
// copy-assignment, move-assignment
};
Previously, swap would make a redundant copy of all our data, before discarding it later. Our custom swap function would just swap the pointer, but can be clumsy to use in some cases. In C++0x, moving achieves the same end result. Calling std::swap would generate:
dumb_vector temp(std::move(x));
x = std::move(y);
y = std::move(temp);
Which translates to:
dumb_vector temp;
temp.pi = x.pi; x.pi = 0; // temp(std::move(x));
x.pi = y.pi; y.pi = 0; // x = std::move(y);
y.pi = temp.pi; temp.pi = 0; // y = std::move(temp);
The compiler will of course get rid of redundant assignment's, leaving:
int* temp = x.pi;
x.pi = y.pi;
y.pi = temp;
Which is exactly what our custom swap would have made in the first place. So while prior to C++0x I would agree with your suggestion, custom swap's aren't really necessary anymore, with the introduction of rvalue-references. std::swap will work perfectly in any class that implements move functions.
In fact, I'd argue implementing a swap function should become bad practice. Any class that would need a swap function would also need rvalue functions. But in that case, there is simply no need for the clutter of a custom swap. Code size does increase (two ravlue functions versus one swap), but rvalue-references don't just apply for swapping, leaving us with a positive trade off. (Overall faster code, cleaner interface, slightly more code, no more swap ADL hassle.)
As for whether or not we can default rvalue functions, I don't know. I'll look it up later or maybe someone else can chime in, but that would sure be helpful. :)
Even so, it makes sense to allow default rvalue functions instead of swap. So in essence, as long as they allow = default rvalue functions, your request has already been made. :)
EDIT: I did a bit of searching, and the proposal for = default move was proposal n2583. According to this (which I don't know how to read very well), it was "moved back." It is listed under the section titled "Not ready for C++0x, but open to resubmit in future ". So looks like it won't be part of C++0x, but may be added later.
Somewhat disappointing. :(
EDIT 2: Looking around a bit more, I found this: Defining Move Special Member Functions which is much more recent, and does look like we can default move. Yay!
swap, when used by STL algorithms, is a free function. There is a default swap implementation: std::swap. It does the obvious. You seem to be under the impression that if you add a swap member function to your data type, STL containers and algorithms will find it and use it. This isn't the case.
You're supposed to specialize std::swap (in the namespace next to your UDT, so it's found by ADL) if you can do better. It is idiomatic to just have it defer to a member swap function.
While we're on the subject, it is also idiomatic in C++0x (in as much as is possible to have idioms on such a new standard) to implement rvalue constructors as a swap.
And yes, in a world where a member swap was the language design instead of a free function swap, this would imply that we'd need a swap operator instead of a function - or else primitive types (int, float, etc) couldn't be treated generically (as they have no member function swap). So why didn't they do this? You'd have to ask the committee members for sure - but I'm 95% certain the reason is the committee has long preferred library implementions of features whenever possible, over inventing new syntax to implement a feature. The syntax of a swap operator would be weird, because unlike =, +, -, etc, and all the other operators, there is no algebraic operator everyone is familiar with for "swap".
C++ is syntactically complex enough. They go to great lengths to not add new keywords or syntax features whenever possible, and only do so for very good reasons (lambdas!).
Does anyone know if such (crazy?) things have been suggested to the C++ standards committee
Send an email to Bjarne. He knows all of this stuff and usually replies within a couple of hours.
Is even compiler-generated move constructor/assignment planned (with the default keyword)?
If there is a compiler-generated swap
method, an implicit copy-assignment
operator can be implemented using
copy-and-swap.
Even though the idiom leaves the object unchanged in case of exceptions, doesn't this idiom, by requiring the creation of a third object, make chances of failure greater in the first place?
There might also be performance implications (copying might be more expensive than "assigning") which is why I don't see such complicated functionality being left to be implemented by the compiler.
Generally I don't overload operator= and I don't worry about this level of exception safety: I don't wrap individual assignments into try blocks - what would I do with the most likely std::bad_alloc at that point? - so I wouldn't care if the object, before they end up destroyed anyway, remained in the original state or not. There may be of course specific situations where you might indeed need it, but I don't see why the principle of "you don't pay for what you don't use" should be given up here.