Guidance for when to explicitly enable/disable copying in C++ classes? - c++

A colleague is cleaning up a couple of libraries. In doing so he's been reading API design for C++ and it talks about explicitly enabling or disabling copying in C++ classes. This is the same thing that Sutter and Alexandrescu say in their C++ Coding Standards.
He agrees that one should follow this advice, but what neither book seems to say are what are those guiding principles that tell when to enable or disable.
Any guidance one way or the other? Thanks!

It depends on the role the classes play in the application. Unless the
class represents a value, where identity isn't significant, you should
ban copy and assignment. Similarly if the class is polymorphic. As a
generally rule, if you're allocating objects of the class type
dynamically, it shouldn't be copiable. And inversely, if the class is
copiable, you shouldn't allocate instances of it dynamically. (But
there are some exceptions, and it's not rare to allocate dynamically and
avoid copying big objects, even when the semantics argue otherwise.)
If you're designing a low-level library, the choice is less clear.
Something like std::vector can play many roles in an application; in
most of them, copying wouldn't be appropriate, but banning copy would
make it unusable in the few where it is appropriate.

Classes which are non-copyable should be the exception, not the rule. Your class should be non-copyable if and only if you cannot retain value semantics while copying- for example, named mutexes, unique-ownership pointers. Else, your class should be copyable. Many C++ libraries depend on copyability, especially pre-C++0x where they cannot be movable.

Contrary to DeadMG, I believe most classes should be non-copyable.
Here is what Stroustrup wrote in his "The Design and Evolution of C++" book:
"I personally consider it unfortunate that copy operations are defined by default and I prohibit copying of objects of many of my classes"

I think you should try to write as little code as possible to have the class doing what it is supposed to do. If no one is trying to copy the class and it is not going to be copied in the near future then do not add stuff like a copy constructor or assignment operator. Just make the class non copyable.
When someday you actually want to copy the class, then add things like the copy constructor. But until then having the class non copyable means less code to test and maintain.

I sincerely believe that copy semantics should be provided automatically, or not at all.
However, badly written libraries may sometimes benefit from a manual copy constructor.
Note that the situation is very different in C++ (because copy semantics are usually required by the standard library !) than in C++0x, where my advice pretty much always applies.

Related

Be-friend'ing std::tuple

I have a custom class, and I'd like to minimize the chances that someone on my team accidentally copies it, as that could break certain invariants within our system. To this end, I made the copy constructor private, as there is no reason anyone should need to copy it in any legitimate usage of the class.
However, under-the-hood of the framework that the class is a part of, a copy construction of the object into a std::tuple is required. I tried to use friend, but the compiler still complains, as the inner class(es?) of std::tuple require friend-access as well.
What is the best way to get what I want?
If the framework requires your class to be copyable, you really should provide a copyable class.
If your class really is only movable, or not even that, then maybe the framework should have a std::unique_ptr or similar to the object instead? Or you could create a movable adaptor class around that std::unique_ptr which forwards the interface...
Part of the forward-facing interface to users of the class is, whether it is moveable and/or copyable. If you are trying to make it non-copyable, unless you happen to be a component of the target application area... this limits code reuse, and it may confuse potential users of the class as to whether or not it is safe to copy / move it.
It may be that the framework doesn't really need to make a copy, and can be refactored to make moves instead?
It's very unclear from the question why you don't want it to be copyable. You seem to say that bad things will happen, but for some reason you aren't concerned if the framework makes a copy. Is it really okay to make copies or not?
It may be that you need to make a separate system for tracking / enforcing the invariant that you are concerned about, rather than just try to prohibit copying this class.

When should you make a class uncopyable?

According to the Google style guidelines, "Few classes need to be copyable. Most should have neither a copy constructor nor an assignment operator."
They recommend you make a class uncopyable (that is, not giving it a copy constructor or assignment operator), and instead recommending passing by reference or pointer in most situations, or using clone() methods which cannot be invoked implicitly.
However, I've heard some arguments against this:
Accessing a reference is (usually) slower than accessing a value.
In some computations, I might want to leave the original object the way it is and just return the changed object.
I might want to store the value of a computation as a local object in a function and return it, which I couldn't do if I returned it by reference.
If a class is small enough, passing by reference is slower.
What are the positives/negatives of following this guideline? Is there any standard "rule of thumb" for making classes uncopyable? What should I consider when creating new classes?
I have two issues with their advice:
It doesn't apply to modern C++, ignoring move constructors/assignment operators, and so assumes that taking objects by value (which would have copied before) is often inefficient.
It doesn't trust the programmer to do the right thing and design their code appropriately. Instead it limits the programmer until they're forced to break the rule.
Whether your class should be copyable, moveable, both or neither should be a design decision based on the uses of the class itself. For example, a std::unique_ptr is a great example of a class that should only be moveable because copying it would invalidate its entire purpose. When you design a class, ask yourself if it makes sense to copy it. Most of the time the answer will be yes.
The advice seems to be based on the belief that programmers default to passing objects around by value which can be expensive when the objects are complex enough. This is just not true any more. You should default to passing objects around by value when you need a copy of the object, and there's no reason to be scared of this - in many cases, the move constructor will be used instead, which is almost always a constant time operation.
Again, the choice of how you should pass objects around is a design decision that should be influenced by a number of factors, such as:
Am I going to need a copy of this object?
Do I need to modify this object?
What is the lifetime of the object?
Is the object optional?
These questions should be asked with every type you write (parameter, return value, variable, whatever). You should find plenty of uses for passing objects by value that don't lead to poor performance due to copying.
If you follow good C++ programming practices, your copy constructors will be bug free, so that shouldn't be a concern. In fact, many classes can get away with just the defaulted copy/move constructors. If a class owns dynamically allocated resources and you use smart pointers appropriately, implementing the copy constructor is often as simple as copying the objects from the pointers - not much room for bugs.
Of course, this advice from Google is for people working on their code to ensure consistency throughout their codebase. That's fine. I don't recommend blindly adopting it in its entirety for a modern C++ project, however.

Are C++11 move semantics doing something new, or just making semantics clearer?

I am basically trying to figure out, is the whole "move semantics" concept something brand new, or it is just making existing code simpler to implement? I am always interested in reducing the number of times I call copy/constructors but I usually pass objects through using reference (and possibly const) and ensure I always use initialiser lists. With this in mind (and having looked at the whole ugly && syntax) I wonder if it is worth adopting these principles or simply coding as I already do? Is anything new being done here, or is it just "easier" syntactic sugar for what I already do?
TL;DR
This is definitely something new and it goes well beyond just being a way to avoid copying memory.
Long Answer: Why it's new and some perhaps non-obvious implications
Move semantics are just what the name implies--that is, a way to explicitly declare instructions for moving objects rather than copying. In addition to the obvious efficiency benefit, this also affords a programmer a standards-compliant way to have objects that are movable but not copyable. Objects that are movable and not copyable convey a very clear boundary of resource ownership via standard language semantics. This was possible in the past, but there was no standard/unified (or STL-compatible) way to do this.
This is a big deal because having a standard and unified semantic benefits both programmers and compilers. Programmers don't have to spend time potentially introducing bugs into a move routine that can reliably be generated by compilers (most cases); compilers can now make appropriate optimizations because the standard provides a way to inform the compiler when and where you're doing standard moves.
Move semantics is particularly interesting because it very well suits the RAII idiom, which is a long-standing a cornerstone of C++ best practice. RAII encompasses much more than just this example, but my point is that move semantics is now a standard way to concisely express (among other things) movable-but-not-copyable objects.
You don't always have to explicitly define this functionality in order to prevent copying. A compiler feature known as "copy elision" will eliminate quite a lot of unnecessary copies from functions that pass by value.
Criminally-Incomplete Crash Course on RAII (for the uninitiated)
I realize you didn't ask for a code example, but here's a really simple one that might benefit a future reader who might be less familiar with the topic or the relevance of Move Semantics to RAII practices. (If you already understand this, then skip the rest of this answer)
// non-copyable class that manages lifecycle of a resource
// note: non-virtual destructor--probably not an appropriate candidate
// for serving as a base class for objects handled polymorphically.
class res_t {
using handle_t = /* whatever */;
handle_t* handle; // Pointer to owned resource
public:
res_t( const res_t& src ) = delete; // no copy constructor
res_t& operator=( const res_t& src ) = delete; // no copy-assignment
res_t( res_t&& src ) = default; // Move constructor
res_t& operator=( res_t&& src ) = default; // Move-assignment
res_t(); // Default constructor
~res_t(); // Destructor
};
Objects of this class will allocate/provision whatever resource is needed upon construction and then free/release it upon destruction. Since the resource pointed to by the data member can never accidentally be transferred to another object, the rightful owner of a resource is never in doubt. In addition to making your code less prone to abuse or errors (and easily compatible with STL containers), your intentions will be immediately recognized by any programmer familiar with this standard practice.
In the Turing Tar Pit, there is nothing new under the sun. Everything that move semantics does, can be done without move semantics -- it just takes a lot more code, and is a lot more fragile.
What move semantics does is takes a particular common pattern that massively increases efficiency and safety in a number of situations, and embeds it in the language.
It increases efficiency in obvious ways. Moving, be it via swap or move construction, is much faster for many data types than copying. You can create special interfaces to indicate when things can be moved from: but honestly people didn't do that. With move semantics, it becomes relatively easy to do. Compare the cost of moving a std::vector to copying it -- move takes roughly copying 3 pointers, while copying requires a heap allocation, copying every element in the container, and creating 3 pointers.
Even more so, compare reserve on a move-aware std::vector to a copy-only aware one: suppose you have a std::vector of std::vector. In C++03, that was performance suicide if you didn't know the dimensions of every component ahead of time -- in C++11, move semantics makes it as smooth as silk, because it is no longer repeatedly copying the sub-vectors whenever the outer vector resizes.
Move semantics makes every "pImpl pattern" type to have blazing fast performance, while means you can start having complex objects that behave like values instead of having to deal with and manage pointers to them.
On top of these performance gains, and opening up complex-class-as-value, move semantics also open up a whole host of safety measures, and allow doing some things that where not very practical before.
std::unique_ptr is a replacement for std::auto_ptr. They both do roughly the same thing, but std::auto_ptr treated copies as moves. This made std::auto_ptr ridiculously dangerous to use in practice. Meanwhile, std::unique_ptr just works. It represents unique ownership of some resource extremely well, and transfer of ownership can happen easily and smoothly.
You know the problem whereby you take a foo* in an interface, and sometimes it means "this interface is taking ownership of the object" and sometimes it means "this interface just wants to be able to modify this object remotely", and you have to delve into API documentation and sometimes source code to figure out which?
std::unique_ptr actually solves this problem -- interfaces that want to take onwership can now take a std::unique_ptr<foo>, and the transfer of ownership is obvious at both the API level and in the code that calls the interface. std::unique_ptr is an auto_ptr that just works, and has the unsafe portions removed, and replaced with move semantics. And it does all of this with nearly perfect efficiency.
std::unique_ptr is a transferable RAII representation of resource whose value is represented by a pointer.
After you write make_unique<T>(Args&&...), unless you are writing really low level code, it is probably a good idea to never call new directly again. Move semantics basically have made new obsolete.
Other RAII representations are often non-copyable. A port, a print session, an interaction with a physical device -- all of these are resources for whom "copy" doesn't make much sense. Most every one of them can be easily modified to support move semantics, which opens up a whole host of freedom in dealing with these variables.
Move semantics also allows you to put your return values in the return part of a function. The pattern of taking return values by reference (and documenting "this one is out-only, this one is in/out", or failing to do so) can be somewhat replaced by returning your data.
So instead of void fill_vec( std::vector<foo>& ), you have std::vector<foo> get_vec(). This even works with multiple return values -- std::tuple< std::vector<A>, std::set<B>, bool > get_stuff() can be called, and you can load your data into local variables efficiently via std::tie( my_vec, my_set, my_bool ) = get_stuff().
Output parameters can be semantically output-only, with very little overhead (the above, in a worst case, costs 8 pointer and 2 bool copies, regardless of how much data we have in those containers -- and that overhead can be as little as 0 pointer and 0 bool copies with a bit more work), because of move semantics.
There is absolutely something new going on here. Consider unique_ptr which can be moved, but not copied because it uniquely holds ownership of a resource. That ownership can then be transferred by moving it to a new unique_ptr if needed, but copying it would be impossible (as you would then have two references to the owned object).
While many uses of moving may have positive performance implications, the movable-but-not-copyable types are a much bigger functional improvement to the language.
In short, use the new techniques where it indicates the meaning of how your class should be used, or where (significant) performance concerns can be alleviated by movement rather than copy-and-destroy.
No answer is complete without a reference to Thomas Becker's painstakingly exhaustive write up on rvalue references, perfect forwarding, reference collapsing and everything related to that.
see here: http://thbecker.net/articles/rvalue_references/section_01.html
I would say yes because a Move Constructor and Move Assignment operator are now compiler defined for objects that do not define/protect a destructor, copy constructor, or copy assignment.
This means that if you have the following code...
struct intContainer
{
std::vector<int> v;
}
intContainer CreateContainer()
{
intContainer c;
c.v.push_back(3);
return c;
}
The code above would be optimized simply by recompiling with a compiler that supports move semantics. Your container c will have compiler defined move-semantics and thus will call the manually defined move operations for std::vector without any changes to your code.
Since move semantics only apply in the presence of rvalue
references, which are declared by a new token, &&, it seems
very clear that they are something new.
In principle, they are purely an optimizing techique, which
means that:
1. you don't use them until the profiler says it is necessary, and
2. in theory, optimizing is the compiler's job, and move
semantics aren't any more necessary than register.
Concerning 1, we may, in time, end up with an ubiquitous
heuristic as to how to use them: after all, passing an argument
by const reference, rather than by value, is also an
optimization, but the ubiquitous convention is to pass class
types by const reference, and all other types by value.
Concerning 2, compilers just aren't there yet. At least, the
usual ones. The basic principles which could be used to make
move semantics irrelevant are (well?) known, but to date, they
tend to result in unacceptable compile times for real programs.
As a result: if you're writing a low level library, you'll
probably want to consider move semantics from the start.
Otherwise, they're just extra complication, and should be
ignored, until the profiler says otherwise.

How often do you implement the big three?

I was just musing about the number of questions here that either are about the "big three" (copy constructor, assignment operator and destructor) or about problems caused by them not being implemented correctly, when it occurred to me that I could not remember the last time I had implemented them myself. A swift grep on my two most active projects indicate that I implement all three in only one class out of about 150.
That's not to say I don't implement/declare one or more of them - obviously base classes need a virtual destructor, and a large number of my classes forbid copying using the private copy ctor & assignment op idiom. But fully implemented, there is this single lonely class, which does some reference counting.
So I was wondering am I unusual in this? How often do you implement all three of these functions? Is there any pattern to the classes where you do implement them?
I think that it's rare that you need all three. Most classes that require an explicit destructor aren't really suitable for copying.
It's just better design to use self-destructing members (which normally don't require things like copy-construction) than a big explicit destructor.
I rarely implement them, but often declare them private (copy constructors and assignemt operators, that is).
Like you, almost never.
But I'm not tied to the STL approach of programming where you copy everything in and around in containers - usually if it's not a primitive, I'll use a pointer, smart or otherwise.
I mainly use RAII patterns, thus avoid writing destructors. Although, I do put empty bodies in my .cc file to help keep code bloat down.
And, like you, I'll declare them private and unimplemented to prevent any accidental invoking.
It really depends on what type of problems you are working on. I have been working on a new project for the past few months and I think every class inherits from boost::noncopyable. Nine months ago I worked on a different project that used PODs quite a bit and I leveraged automatic copy ctor and assignment operator. If you are using boost::shared_ptr (and you should be), it should be rare to write your own copy ctor or assignment operator nowadays.
Most of the time, hardly ever. This is because the members that are used (reference based smart ptr, etc) already implement the proper semantics, or the object is non-copyable.
A few patterns come up when I find myself implementing these:
destructive copy , i.e. move pattern like auto_ptr or lock
dispose pattern which hardly every comes up in C++, but I've used it about three times in my career (and just a week ago in fact)
pimpl pattern, where the pimpl is fwd declared in the header, and managed by a smart ptr. Then the empty dtor goes in the .cc file but still classifies as "not complier generated"
And one other trivial one that prints "I was destroyed" when I think I might have a circular reference somewhere and just want to make sure.
Any class that owns some pointers members need to define this three operations to implement deep copy (See here for a deep description).

Is it good practice to generally make heavyweight classes non-copyable?

I have a Shape class containing potentially many vertices, and I was contemplating making copy-constructor/copy-assignment private to prevent accidental needless copying of my heavyweight class (for example, passing by value instead of by reference).
To make a copy of Shape, one would have to deliberately call a "clone" or "duplicate" method.
Is this good practice? I wonder why STL containers don't use this approach, as I rarely want to pass them by value.
Restricting your users isn't always a good idea. Just documenting that copying may be expensive is enough. If a user really wants to copy, then using the native syntax of C++ by providing a copy constructor is a much cleaner approach.
Therefore, I think the real answer depends on the context. Perhaps the real class you're writing (not the imaginary Shape) shouldn't be copied, perhaps it should. But as a general approach, I certainly can't say that one should discourage users from copying large objects by forcing them to use explicit method calls.
IMHO, providing a copy constructor and assignment operator or not depend more of what your class modelizes than the cost of copying.
If your class represent values, that is if passing an object or a copy of the object doesn't make a difference, then provide them (and provide the equality operator also)
If your class isn't, that is if you think that object of the class have an identity and a state (one also speak of entities), don't. If a copy make sense, provide it with a clone or copy member.
There are sometimes classes you can't easily classify. Containers are in that position. It is meaninfull the consider them as entities and pass them only by reference and have special operations to make a copy when needed. You can also consider them simply as agregation of values and so copying makes sense. The STL was designed around value types. And as everything is a value, it makes sense for containers to be so. That allows things like map<int, list<> > which are usefull. (Remember, you can't put nocopyable classes in an STL container).
Generally, you do not make classes non-copyable just because they are heavy (you had shown a good example STL).
You make them non-copyable when they connected to some non-copyable resource like socket, file, lock or they are not designed to be copied at all (for example have some internal structures that can be hardly deep copied).
However, in your case your object is copyable so leave it as this.
Small note about clone() -- it is used as polymorphic copy constructor -- it has different
meaning and used differently.
Most programmers are already aware of the cost of copying various objects, and know how to avoid copies, using techniques such as pass by reference.
Note the STL's vector, string, map, list etc. could all be variously considered 'heavyweight' objects (especially something like a vector with 10,000 elements!). Those classes all still provide copy constructors and assignment operators, so if you know what you're doing (such as making a std::list of vectors), you can copy them when necessary.
So if it's useful, provide them anyway, but be sure to document they are expensive operations.
Depending on your needs...
If you want to ensure that a copy won't happen by mistake, and making a copy would cause a severe bottleneck or simply doesn't make sense, then this is good practice. Compiling errors are better than performance investigations.
If you are not sure how your class will be used, and are unsure if it's a good idea or not then it is not good practice. Most of the time you would not limit your class in this way.