"CopyConstructible" requirement for C++ stl container element - c++

Regarding to the requirement for C++ stl container element, the standard says: the element type should be CopyConstructible, and there is a table for CopyConstructible requirements. Also by various books (Josuttis, etc.), the generated copy should be "equivalent to" the source.
I think I need some clarity here. What is exactly "equivalent to"? Also I am a bit confused with the relation between the "CopyConstructible" and the "deep/shallow copy". In general, a copy constructor is either shallow copy or deep copy. So which one applies to the "CopyConstructible", and which does not?
Thanks for any comments!

Deep or shallow copy both work. For instance, shared_ptr always does a shallow copy (with some extra reference counting stuff), and you can use them in containers just fine. It depends on the semantics of copy-operation.
Equivalent means your program should not depend on whether it works with the original or with the copy.

If you put something into a container, when you retrieve it you will get something that is equivalent to what you put in. So long as that is meaningful for your objects then you will get out something useful from the container.
Whether that is a shallow or deep copy depends on the semantics that you want for your object type. Your object might be pointer-like, handle-like or perhaps container like. It might contain some mutable cache data that you may or may not choose to duplicate on a copy operation.
So long as your copy constructor is accessible and does what you need it to do to preserve the semantics of your object type then you satisfy the CopyConstructible requirement.

In general, STL containers may copy your elements around at some stage, during some kinds of operations or algorithms, so the litmus test is:
Element original(....); // construct this awesome object
original.this_and_that(); // do stuff to it until the value is perfect...
Element another(original);
Could you use another happily instead of original?
That's effectively what the CopyConstructible requirement's saying: you better be able to have this copied into another object and still be happy with the result. It's not a draconian restriction - you just need to think it through and write your copy constructor correspondingly.
But, it's significant in that some operations like find() may use == to compare elements (for other containers, it may be '<'), so if a side-effect of being copied is that you can't compare elements meaningfully, then your finds et al may stop working - think that through too! (The Standard says for containers, "== is an equivalence relation" (23.1-5).)

Thinking that "a copy constructor performs either a deep or a shallow copy" is slightly limiting, and something of a red herring.
Whilst it's true that, depending on what your object stores as members, you may need to do some deep copying in order to gain equivalence, as far as the type's interface is concerned, it doesn't really matter how you performed the copy... as long as you did perform the copy and you ended up with an equivalent object.
And if A is equivalent to B, then for a properly-designed type, A==B.
The entire requirement is just saying: "the element type must be copyable". All the rest is down to usual common sense of writing a proper copy constructor.

There are a couple of different concepts being discussed here. CopyConstructible only requires that you can create a new element of that type using an existing element as the source from which to copy. It is unrelated to deep or shallow or even equivalence: the container does not care what you are doing while copying as long as it is allowed to perform that copy.
The second concept is the concept of equivalence, when you use an object in a container it will get copied, and the number of copies performed is unknown --the implementation might copy it just once to store it internally, or it might make multiple copies internally. What you want is to be able to extract the element from the container and use it as if it was the original object, here is where equivalence comes in: the n-th copy that you extract should be equivalent to the object that you inserted.
The concepts of deep and shallow copying are directly related to equivalence, depending on the domain that you are modeling, equivalence might require either shallow or deep copying, and depending on other constraints you might have to choose one or the other --as long as in your domain they are equivalent-- or there might even be intermediate options, where a partially deep copy is performed, or some members are not copied at all.

Related post: What is the difference between equality and equivalence?
Edited to make this a "non comment".
Equivalence means "equal for all intents and purposes". For example: 1.0001 and 1 are never equal, but under certain circumstances they are equivalent. This is the general answer.
What the books mean are that the copied objects must satisfy the strict weak ordering condition with the original object: copy < original == false && original < copy == false.

Related

Copying classes between ranges that might overlap

In C, we have the functions memcpy and memmove to efficiently copy data around. The former yields undefined behavior if the source and destination regions overlap, but the latter is guaranteed to deal with that "as expected," presumably by noticing the direction of overlap and (if necessary) choosing a different algorithm.
The above functions are available in C++ (as std::memcpy and std::memmove), of course, but they don't really work with non-trivial classes. Instead, we get std::copy and std::copy_backward. Each of these works if the source and destination ranges don't overlap; moreover, each is guaranteed to work for one "direction" of overlap.
What can we use if we want to copy from one region to another and we don't know at compile-time if the ranges may overlap or in what direction that overlap may occur? It doesn't seem that we have an option. For general iterators it may be difficult to determine if ranges overlap, so I understand why no solution is provided in that case, but what about when we're dealing with pointers? Ideally, there'd be a function like:
template<class T>
T * copy_either_direction(const T * inputBegin, const T * inputEnd, T * outputBegin) {
if ("outputBegin ∈ [inputBegin, inputEnd)") {
outputBegin += (inputEnd - inputBegin);
std::copy_backward(inputBegin, inputEnd, outputBegin);
return outputBegin;
} else {
return std::copy(inputBegin, inputEnd, outputBegin);
}
}
(A similar function with T * replaced with std::vector<T>::iterator would also be nice. Even better would be if this were guaranteed to work if inputBegin == outputBegin, but that's a separate gripe of mine.)
Unfortunately, I don't see a sensible way to write the condition in the if statement, as comparing pointers into separate blocks of memory often yields undefined behavior. On the other hand, the implementation clearly has its own way to do this, as std::memmove inherently requires one. Thus, any implementation could provide such a function, thereby filling a need that the programmer simply can't. Since std::memmove was considered useful, why not copy_either_direction? Is there a solution I'm missing?
memmove works because it traffics in pointers to contiguous bytes, so the ranges of the two blocks to be copied are well defined. copy and move take iterators that don't necessarily point at contiguous ranges. For example, a list iterator can jump around in memory; there's no range that the code can look at, and no meaningful notion of overlap.
I recently learned that std::less is specialized for pointers in a way that provides a total order, presumably to allow one to store pointers in std::sets and related classes. Assuming that this must agree with the standard ordering whenever the latter is defined, I think the following will work:
#include <functional>
template<class T>
T * copy_either_direction(const T * inputBegin, const T * inputEnd, T * outputBegin) {
if (std::less<const T *>()(inputBegin, outputBegin)) {
outputBegin += (inputEnd - inputBegin);
std::copy_backward(inputBegin, inputEnd, outputBegin);
return outputBegin;
} else {
return std::copy(inputBegin, inputEnd, outputBegin);
}
}
What can we use if we want to copy from one region to another and we don't know at compile-time if the ranges may overlap or in what direction that overlap may occur?
This is not a logically consistent concept.
After a copy operation, you will have two objects. And each object is defined by a separate and distinct region of memory. You cannot have objects which overlap in this way (you can have subobjects, but an object type cannot be its own subobject). And therefore, it is impossible to copy an object on top of part of itself.
To move an object on top of itself is also not logically consistent. Why? Because moving is a fiction in C++; after the move, you still have two perfectly functional objects. A move operation is merely a destructive copy, one that steals resources owned by the other object. It is still there, and it is still a viable object.
And since an object cannot overlap with another object, it is again impossible.
Trivially copyable types get around this because they are just blocks of bits, with no destructors or specialized copy operations. So their lifetime is not as rigid as that of others. A non-trivially copyable type cannot do this because:
The experience with memmove suggests that there could be a solution in this case (and perhaps also for iterators into contiguous containers).
This is neither possible nor generally desirable for types which are not trivially copyable in C++.
The rules for trivial copyability are that the type has no non-trivial copy/move constructors/assignment operators, as well as no non-trivial destructor. A trivial copy/move constructor/assignment is nothing more than a memcpy, and a trivial destructor does nothing. And therefore, these rules effectively ensures that the type is nothing more than a "block of bits". And one "block of bits" is no different from another, so copying it via memmove is a legal construct.
If a type has a real destructor, then the type is maintaining some sort of invariant that requires actual effort to maintain. It may free up a pointer or release a file handle or whatever. Given that, it makes no sense to copy the bits, because now you have two objects that reference the same pointer/file handle. That's a bad thing to do, because the class will generally want to control how that gets handled.
There is no way to solve this problem without the class itself being involved in the copy operation. Different class have different behavior with respect to managing their internals. Indeed, that is the entire purpose of objects having copy constructors and assignment operators. So that a class can decide for itself how to maintain the sanity of its own state.
And it doesn't even have to be a pointer or file handle. Maybe each class instance has a unique identifier; such a value is generated at construction time, and it never gets copied (new copies get new values). For you to violate such a restriction with memmove leaves your program in an indeterminate state, because you will have code that expects such identifiers to be unique.
That's why memmoveing for non-trivially copyable types yields undefined behavior.

Why does the STL use the assignment operator to move items around in containers?

For example if we had a 10 element vector, and we decided to erase the 5th element, all elements after the 5th would have to be moved back one element. Why is the assignment operator used here? Why not just copy the bits since we know the old copy is about to be overwritten / invalidated?
Obviously this changed we the move constructor. Let's just assume a move constructor / assignment isn't used here.
The fundamental reason is so that C++ objects can preserve class invariants involving the address of the object (or its sub-objects)
Therefore if the value of an object is to be established at a different address the class might need to execute code to preserve the invariant. That code goes in the move/copy constructor/assignment and the swap function if applicable.
A (possibly over-elaborate) example of such an invariant is, "exactly one sub-object is registered with some global collection of pointers, but which one varies between objects". If all you do is copy the bits, then there's no opportunity to update the global collection with the "right" sub-object according to the object state.
There is a serious limitation in C++03, that there are classes for which it is appropriate to just copy all the bits in the case you describe, but where it is not appropriate to just copy all the bits for a copy assignment in general, and a copy is unnecessarily expensive in your case. That limitation is what C++11 move semantics address, giving the class the chance to behave differently.

Can we say bye to copy constructors?

Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.
The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.
Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.
One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).
Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.

Is it safe to use std::auto_ptr in std::map?

I am aware that you have to be careful with auto pointers (and why), especially with the STL. But I don't see a problem with this:
std::map<T1, std::auto_ptr<T2> >;
Is this safe?
I see how it would break in an std::vector, because it has to copy its items from time to time, but is this also true for the value type of an std::map?
Edit: Apparently, regardless whether it's safe, I cannot (technically) populate the map, but I'll leave the question open for theoretical considerations. Otherwise, consider it closed.
It's not safe. Technically, std::auto_ptr doesn't meet the requirement of CopyConstructible or Assignable because copies made using auto_ptr's copy constructor or copy assignment operator aren't equivalent to the source of the copy after the copy operation. These requirements must be met for any type used with any standard container.
You may find that you appear to get the behaviour you expect on one implementation for one particular use case but if you violate the requirements of the container you can't expect your code to work in all situations.
Still not safe. If nothing else, the first time you retrieved the pointer from the map you would transfer ownership and make it invalid.
There is an extremely narrow valid use case for auto_ptr. The only one I can remember coming across is when an object wants to make it explicitly clear that it takes ownership and responsibility for a pointer that you hand it.

Does moving leave the object in a usable state?

Say I have two vectors and I move one unto the other, v1 = std::move(v2); will v2 still be in a usable state after this?
From n3290, 17.6.5.15 Moved-from state of library types [lib.types.movedfrom]
Objects of types defined in the C++ standard library may be moved from (12.8). Move operations may be explicitly specified or implicitly generated. Unless otherwise specified, such moved-from objects shall be placed in a valid but unspecified state.
Since the state is valid, this means you can safely operate on v2 (e.g. by assigning to it, which would put it back to a known state). Since it is unspecified however, it means you cannot for instance rely on any particular value for v2.empty() as long as it is in this state (but calling it won't crash the program).
Note that this axiom of move semantics ("Moved from objects are left in a valid but unspecified state") is something that all code should strive towards (most of the time), not just the Standard Library components. Much like the semantics of copy constructors should be making a copy, but are not enforced to.
No, it is left in an unspecified state.
Excerpt from open-std-org article -
.. move() gives its target the value of its argument, but is not obliged to preserve the value of its source. So, for a vector, move() could reasonably be expected to leave its argument as a zero-capacity vector to avoid having to copy all the elements. In other words, move is a potentially destructive read.
If you want to use v2 after the move, you will want to do something like:
v1 = std::move(v2);
v2.clear();
At this point, v1 will have the original contents of v2 and v2 will be in a well-defined empty state. This works on all of the STL containers (and strings, for that matter), and if you are implementing your own classes that support move-semantics, you'll probably want to do something similar.
If your particular implementation of the STL actually DOES leave the object in an empty state, then the second clear() will be essentially a no-op. In fact, if this is the case, it would be a legal optimization for a compiler to eliminate the clear() after the move.