There's a problem I've been running into lately and since I'm a self taught C++ programer I'd really like to know how professionals in the real world solve it.
Is it a good idea to write a default constructor for all classes? Aren't there certain parts of the STL that won't work if your classes don't have default constructors?
IF SO, then how does one write a default constructor that does sensible things? That is, how can I assign default values to my private members if there simply are not sensible default values? I can only think of two solutions:
Use pointers (or unique_ptrs) for each member and that way a nullptr means the field is uninitialized.
OR
Add extra fields/logic/methods to do the work of checking to see whether or not a field has been initialized and rely on that (think kind of like unique_ptr's "reset" method).
How do people solve problems like this in the real world?
If it doesn't make sense for your data type to have a default constructor, then don't write one.
(STL is long dead, but I assume you mean the standard library.) Most standard library containers work well even if the contained type doesn't have a default constructor. Some notable gotchas:
std::vector<T>::resize(n) requires T to have a default constructor. But without one, you can use erase and insert instead.
std::map<K,V>::operator[] and std::unordered_map<K,V>::operator[] require V to have a default constructor. But without one, you can use find and insert instead.
Is it a good idea to write a default constructor for all classes?
No. If there is no sensible “default value” for your type, you should not write a default constructor. Doing so would violate the principle of least surprise.
That said, many types do have a sensible default value.
the number zero (more generally: the neutral element)
the empty string
an empty list
a 0 × 0 matrix
the time-zone UTC ± 00:00
…
For such types, you really should define a default constructor.
Other types don't have a natural default value but have an “empty” state that can be reached by performing certain operations. It is sensible to default-construct such an object to have that state.
an I/O stream disconnected from any source / sink that fails every operation (can be reached by reaching the end of the file or encountering an I/O error)
a lock guard that doesn't hold a lock (can be reached by releasing the lock)
a smart pointer that doesn't own an object (can be reached by releasing the managed object)
…
For these types, it is a trade-off whether to define a default constructor. Doing so does no harm but makes your type slightly more complicated. If it is a public type found in a library, then it is probably worth the trouble. If you're going to implement a move constructor (and assignment operator) anyway, you can equally well define the default constructor to construct the object in a state that would be reached by moving away from it.
For other types, you just cannot define a sensible default.
a day of the week
a name for a baby
…
Do not invent an artificial “null” state for these types just to make them default-constructible. Doing so complicates the code and forces you to provide less useful class invariants as you could without that artificial state. If somebody really needs that additional state, they can easily use an optional<T> but going the other way round is not possible.
Aren't there certain parts of the STL that won't work if your classes don't have default constructors?
Yes. std::vector::resize is probably the most prominent example. But don't worry about these. If your type does not have a sensible default value, performing that operation isn't meaningful either. It's not the fault of your type. It's inherent to the nature of the concept you're trying to model.
Is it a good idea to write a default constructor for all classes?
No. Sometimes there is no sense in having default values for object.
Aren't there certain parts of the STL that won't work if your classes don't have default constructors?
There are some parts which require DefaultConstructible objects. And there are ways to circumvent it (overloads which takes a object to use instead of default constructed).
Related
Lately I see a lot of material about generic programming, and I still cannot wrap my head around one thing, when designing types. I am not sure what is the best way, let me explain.
For some types, it is natural to provide a default constructor.
All the possible constructions of that type will be valid or a default makes sense, so it makes sense to provide a default. This is the case for basic types.
Later, there are some types for which default constructing them does not yield a value. For example, in the standard library we have std::function<Sig> and std::thread, for example. Nevertheless, they are default-constructible, even if they are not holding a value.
Later, we have the proposed optional<T> in the standard. It makes a lot of sense to use it for basic types, since for basic types all the possible assignments represent a valid value (except double and float NaN), but I don't see how you would use it for a thread or a std::function<Sig>, since these types don't hold a "value" when constructed. It is AS IF these types had "optional" embedded in the type directly.
This has these drawbacks. Since there is no "natural" default (or value) construction, such as with an int:
Now I have to litter my class with if (valid) in all my design and signal the error. OR
make it less safe to use if I don't do this check. Precondition -> assign before using if default-constructed.
So when I want to design a type, I always find the question: should I make it default constructible?
Pros:
Easier to reuse in more generic contexts, because my type will more easily model SemiRegular or Regular if I add the appropiate operations.
Cons:
Litter the whole class with if statements or making a contract with the user in which the class is more unsafe to use.
For example, let's say I have a class Song with ID, artist, title, duration and year. It is very nice for the standard library to make the type default constructible. But:
I can't just find a natural way to construct a "default Song".
I have to litter with if (validsong) or make it unsafe to use.
So my questions are:
How should I design a type that has no "natural (as in value)" defaults? Should I provide a default constructor or not?
In the case I choose to provide a default constructor, how does optional<T> fit into all this puzzle? My view is that making a type that is not "naturally" default constructible provide a default constructor makes optional<T> useless in this case.
Should optional<T> just be used for types whose domain of values is complete, meaning, I cannot assign an invalid value to its representation because all of them hold a value, such as in int?
Why were types such as std::function<Sig> made default constructible in the first place in the standard? When constructed, it does not hold a value, so I don't see why a default constructor should be provided. You could always do: optional<function<void ()>>, for example. Is this just a design choice and both are valid or there is one design, in this case, about choosing default vs non-default constructible superior to the other?
(Note: a problem with lots of questions in one question is that some parts of it can be duplicate. Best to ask smaller questions, and check each for prior posts. "One question per question" is a good policy; easier said than done sometimes, I guess.)
Why were types such as std::function made default constructible in the first place in the standard? When constructed, it does not hold a value, so I don't see why a default constructor should be provided. You could always do: optional<function<void ()>>, for example.
See Why do std::function instances have a default constructor?
How should I design a type that has no "natural (as in value)" defaults? Should I provide a default constructor or not?
Default constructors for types that have a tough time meaningfully defining themselves without some kind of data is how a lot of classes implement a null value. Is optional a better choice? I usually think so, but I'm assuming you're aware that std::optional was voted out of C++14. Even if it were the perfect answer it can't be everyone's answer...it's not soup yet.
It will always add some overhead to do runtime tracking of if the value is bound or not. Perhaps not a lot of overhead. But when you are using a language whose raison d'etre is to allow abstraction while still letting you shoot yourself in the foot as close to the metal as you want...shaving off a byte per value in a giant vector can be important.
So even if optional<T> semantics and compile-time checking were perfect, you still might face a scenario where it's advantageous to scrap it and allow your type to encode its own nullity. Gotta push those pixels or polygons or packets or... pfafftowns.
In the case I choose to provide a default constructor, how does optional fit into all this puzzle? My view is that making a type that is not "naturally" default constructible provide a default constructor makes optional useless in this case.
Should optional just be used for types whose domain of values is complete, meaning, I cannot assign an invalid value to its representation because all of them hold a value (except float and double NaN I guess).
In my own case, I found myself wanting to distinguish at compile-time checking between routines that could handle null pointers and those which could not. But suddenly an optional<pointer> offered this situation of either the optional being unbound, being bound to a null pointer, and being bound to a non-null pointer. The compile-time sanity check seeming less the win it had.
So how about optional references? They're controversial to the point that last I heard they're one of the sticking points in the set of things that delayed std::optional from C++14. Which was a bit annoying after I'd converted my optional pointers to optional references. :-/
I had a vague idea to write a book about "pathological C++" where you pick some idea and start taking it to its logical conclusions. optional<T> was one kick I got on and going with essentially the principles you identify. Remove the possibility of "nullity" from being encoded in the type itself, and then suddenly you can get the compiler doing the type-checking for whether a given bit of code is prepared to expect a null or not.
(These days I tend toward suspecting if you get very hung up on this kind of "pathological C++" you'll wind up reinventing Haskell. :-/ See the popular Data.Maybe monad.)
According to the Google style guidelines, "Few classes need to be copyable. Most should have neither a copy constructor nor an assignment operator."
They recommend you make a class uncopyable (that is, not giving it a copy constructor or assignment operator), and instead recommending passing by reference or pointer in most situations, or using clone() methods which cannot be invoked implicitly.
However, I've heard some arguments against this:
Accessing a reference is (usually) slower than accessing a value.
In some computations, I might want to leave the original object the way it is and just return the changed object.
I might want to store the value of a computation as a local object in a function and return it, which I couldn't do if I returned it by reference.
If a class is small enough, passing by reference is slower.
What are the positives/negatives of following this guideline? Is there any standard "rule of thumb" for making classes uncopyable? What should I consider when creating new classes?
I have two issues with their advice:
It doesn't apply to modern C++, ignoring move constructors/assignment operators, and so assumes that taking objects by value (which would have copied before) is often inefficient.
It doesn't trust the programmer to do the right thing and design their code appropriately. Instead it limits the programmer until they're forced to break the rule.
Whether your class should be copyable, moveable, both or neither should be a design decision based on the uses of the class itself. For example, a std::unique_ptr is a great example of a class that should only be moveable because copying it would invalidate its entire purpose. When you design a class, ask yourself if it makes sense to copy it. Most of the time the answer will be yes.
The advice seems to be based on the belief that programmers default to passing objects around by value which can be expensive when the objects are complex enough. This is just not true any more. You should default to passing objects around by value when you need a copy of the object, and there's no reason to be scared of this - in many cases, the move constructor will be used instead, which is almost always a constant time operation.
Again, the choice of how you should pass objects around is a design decision that should be influenced by a number of factors, such as:
Am I going to need a copy of this object?
Do I need to modify this object?
What is the lifetime of the object?
Is the object optional?
These questions should be asked with every type you write (parameter, return value, variable, whatever). You should find plenty of uses for passing objects by value that don't lead to poor performance due to copying.
If you follow good C++ programming practices, your copy constructors will be bug free, so that shouldn't be a concern. In fact, many classes can get away with just the defaulted copy/move constructors. If a class owns dynamically allocated resources and you use smart pointers appropriately, implementing the copy constructor is often as simple as copying the objects from the pointers - not much room for bugs.
Of course, this advice from Google is for people working on their code to ensure consistency throughout their codebase. That's fine. I don't recommend blindly adopting it in its entirety for a modern C++ project, however.
Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.
The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.
Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.
One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).
Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.
I have a medium complex C++ class which holds a set of data read from disc. It contains an eclectic mix of floats, ints and structures and is now in general use. During a major code review it was asked whether we have a custom assignment operator or we rely on the compiler generated version and if so, how do we know it works correctly? Well, we didn't write a custom assignment and so a unit test was added to check that if we do:
CalibDataSet datasetA = getDataSet();
CalibDataSet dataSetB = datasetA;
then datasetB is the same as datasetA. A couple of hundred lines or so. Now the customer inists that we cannot rely on the compiler (gcc) being correct for future releases and we should write our own. Are they right to insist on this?
Additional info:
I'm impressed by the answers/comments already posted and the response time.Another way of asking this question might be:
When does a POD structure/class become a 'not' POD structure/class?
It is well-known what the automatically-generated assignment operator will do - that's defined as part of the standard and a standards-compliant C++ compiler will always generate a correctly-behaving assignment operator (if it didn't, then it would not be a standards-compliant compiler).
You usually only need to write your own assignment operator if you've written your own destructor or copy constructor. If you don't need those, then you don't need an assignment operator, either.
The most common reason to explicitly define an assignment operator is to support "remote ownership" -- basically, a class that includes one or more pointers, and owns the resources to which those pointers refer. In such a case, you normally need to define assignment, copying (i.e., copy constructor) and destruction. There are three primary strategies for such cases (sorted by decreasing frequency of use):
Deep copy
reference counting
Transfer of ownership
Deep copy means allocating a new resource for the target of the assignment/copy. E.g., a string class has a pointer to the content of the string; when you assign it, the assignment allocates a new buffer to hold the new content in the destination, and copies the data from the source to the destination buffer. This is used in most current implementations of a number of standard classes such as std::string and std::vector.
Reference counting used to be quite common as well. Many (most?) older implementations of std::string used reference counting. In this case, instead of allocating and copying the data for the string, you simply incremented a reference count to indicate the number of string objects referring to a particular data buffer. You only allocated a new buffer when/if the content of a string was modified so it needed to differ from others (i.e., it used copy on write). With multithreading, however, you need to synchronize access to the reference count, which often has a serious impact on performance, so in newer code this is fairly unusual (mostly used when something stores so much data that it's worth potentially wasting a bit of CPU time to avoid such a copy).
Transfer of ownership is relatively unusual. It's what's done by std::auto_ptr. When you assign or copy something, the source of the assignment/copy is basically destroyed -- the data is transferred from one to the other. This is (or can be) useful, but the semantics are sufficiently different from normal assignment that it's often counterintuitive. At the same time, under the right circumstances, it can provide great efficiency and simplicity. C++0x will make transfer of ownership considerably more manageable by adding a unique_ptr type that makes it more explicit, and also adding rvalue references, which make it easy to implement transfer of ownership for one fairly large class of situations where it can improve performance without leading to semantics that are visibly counterintuitive.
Going back to the original question, however, if you don't have remote ownership to start with -- i.e., your class doesn't contain any pointers, chances are good that you shouldn't explicitly define an assignment operator (or dtor or copy ctor). A compiler bug that stopped implicitly defined assignment operators from working would prevent passing any of a huge number of regression tests.
Even if it did somehow get released, your defense against it would be to just not use it. There's no real room for question that such a release would be replaced within a matter of hours. With a medium to large existing project, you don't want to switch to a compiler until it's been in wide use for a while in any case.
If the compiler isn't generating the assignment properly, then you have bigger problems to worry about than implementing the assignment overload (like the fact that you have a broken compiler). Unless your class contains pointers, it is not necessary to provide your own overload; however, it is reasonable to request an explicit overload, not because the compiler might break (which is absurd), but rather to document your intention that assignment be permitted and behave in that manner. In C++0x, it will be possible to document intent and save time by using = default for the compiler-generated version.
If you have some resources in your class that you are supposed to manage then writing your own assignment operator makes sense else rely on compiler generated one.
I guess the problem of shallow copy deep copy may appear in case you are dealing with strings.
http://www.learncpp.com/cpp-tutorial/912-shallow-vs-deep-copying/
but i think it is always advisable to write assignment overloads for user defined classes.
I have a Shape class containing potentially many vertices, and I was contemplating making copy-constructor/copy-assignment private to prevent accidental needless copying of my heavyweight class (for example, passing by value instead of by reference).
To make a copy of Shape, one would have to deliberately call a "clone" or "duplicate" method.
Is this good practice? I wonder why STL containers don't use this approach, as I rarely want to pass them by value.
Restricting your users isn't always a good idea. Just documenting that copying may be expensive is enough. If a user really wants to copy, then using the native syntax of C++ by providing a copy constructor is a much cleaner approach.
Therefore, I think the real answer depends on the context. Perhaps the real class you're writing (not the imaginary Shape) shouldn't be copied, perhaps it should. But as a general approach, I certainly can't say that one should discourage users from copying large objects by forcing them to use explicit method calls.
IMHO, providing a copy constructor and assignment operator or not depend more of what your class modelizes than the cost of copying.
If your class represent values, that is if passing an object or a copy of the object doesn't make a difference, then provide them (and provide the equality operator also)
If your class isn't, that is if you think that object of the class have an identity and a state (one also speak of entities), don't. If a copy make sense, provide it with a clone or copy member.
There are sometimes classes you can't easily classify. Containers are in that position. It is meaninfull the consider them as entities and pass them only by reference and have special operations to make a copy when needed. You can also consider them simply as agregation of values and so copying makes sense. The STL was designed around value types. And as everything is a value, it makes sense for containers to be so. That allows things like map<int, list<> > which are usefull. (Remember, you can't put nocopyable classes in an STL container).
Generally, you do not make classes non-copyable just because they are heavy (you had shown a good example STL).
You make them non-copyable when they connected to some non-copyable resource like socket, file, lock or they are not designed to be copied at all (for example have some internal structures that can be hardly deep copied).
However, in your case your object is copyable so leave it as this.
Small note about clone() -- it is used as polymorphic copy constructor -- it has different
meaning and used differently.
Most programmers are already aware of the cost of copying various objects, and know how to avoid copies, using techniques such as pass by reference.
Note the STL's vector, string, map, list etc. could all be variously considered 'heavyweight' objects (especially something like a vector with 10,000 elements!). Those classes all still provide copy constructors and assignment operators, so if you know what you're doing (such as making a std::list of vectors), you can copy them when necessary.
So if it's useful, provide them anyway, but be sure to document they are expensive operations.
Depending on your needs...
If you want to ensure that a copy won't happen by mistake, and making a copy would cause a severe bottleneck or simply doesn't make sense, then this is good practice. Compiling errors are better than performance investigations.
If you are not sure how your class will be used, and are unsure if it's a good idea or not then it is not good practice. Most of the time you would not limit your class in this way.