struct big_struct{
vector<int> a_vector;
map<string, int> a_map;
};
big_struct make_data(){
big_struct return_this;
// do stuff, build that data, etc
return return_this;
}
int main(){
auto data = make_data();
}
I have seen move semantics applied to constructors, but in this bit of code, I'm wondering if the big struct is copied entirely when returned or not. I'm not even sure it is related to move semantics. Does C++ always copies this kind of data, or is it optimized? Could this code be change or improved?
What about a function that returns a vector or a map? Is that map/vector copied?
You don't need to change anything. What you have right now is the rule of zero. Since both std::map and std::vector are moveable your class automatically gets move operations added to it.
Since return_this is a function local object it will be treated as an rvalue and it will either be moved for you or NRVO will kick in and no move or copy will happen.
Your code will either produce a default construction call for return_this and a move constructor call for data or you will see a single default constructor call for data (NRVO makes return_this and data the same thing).
As stated here, your class actually has a move-constructor (implicitly generated one), so it shouldn't be copied in your code, at least once (in main).
One problem is, what you're relying upon is called NRVO, and compilers are not required to implement it (unlike its happier simpler brother, RVO.) So your struct has a chance, quite very small, to be copied in the return statement—but so small that return-by-move (like return std::move(return_this);) is never actually recommended. Chances are quite high the NRVO will actually be applied if you really have a single return statement in your function that returns a single named object.
Related
Let's say I have a simple function returnString that returns a string by value:
std::string returnString() {
std::string s;
// Use s in such a way to defeat return value mandatory copy-elision
return s;
}
I have another function that wants to heap-allocate the result of this. Easy enough.
void caller() {
std::string* heap_allocated_string = new std::string(returnString());
}
Instead of std::string though, consider an arbitrary type T. I believe that by the language rules, the following two statements are true. Are they?
In C++14, I believe this is not ideal, since for some types if the move ctor is not free or not defined, we might be doing unnecessary work compared to just directly constructing on the heap.
In C++17, this triggers mandatory copy elision, so even if the type did not define a move constructor, no extra copy would be created, and the move constructor would not be called.
In general though, for a generic type, is there a better way to do this, without modifying the called function?
In C++14, I believe this is not ideal, since for some types if the move ctor is not free or not defined, we might be doing unnecessary work compared to just directly constructing on the heap.
Define "not ideal".
If the object has to be constructed via a factory function which returns by value, and you want to heap-allocate the object instead, and you have to do this "without modifying the called function," then that is as good as it's going to get.
Plus, you need not worry about the lack of copy/move in C++14. The reason being that it is (almost) impossible to return a non-copyable, non-moveable object by value in C++14. There is technically a way to do it (through the use of list-initialization syntax in the return statement), but if the function is written as you've stated it, then whatever type it returns must be copyable or moveable.
Furthermore, the new expression on your end doesn't even require named RVO; this part is just eliding a temporary, and there's no reason why a compiler wouldn't be able to optimize that move away.
So basically, there has to be a copy/move constructor for the function to compile, and any copy/move will be optimized away on your end for all practical purposes. So there's nothing to be concerned about.
Have found comparable questions but not exactly with such a case.
Take the following code for example:
#include <iostream>
#include <string>
#include <vector>
struct Inner
{
int a, b;
};
struct Outer
{
Inner inner;
};
std::vector<Inner> vec;
int main()
{
Outer * an_outer = new Outer;
vec.push_back(std::move(an_outer->inner));
delete an_outer;
}
Is this safe? Even if those were polymorphic classes or ones with custom destructors?
My concern regards the instance of "Outer" which has a member variable "inner" moved away. From what I learned, moved things should not be touched anymore. However does that include the delete call that is applied to outer and would technically call delete on inner as well (and thus "touch" it)?
Neither std::move, nor move semantics more generally, have any effect on the object model. They don't stop objects from existing, nor prevent you from using those objects in the future.
What they do is ask to borrow encapsulated resources from the thing you're "moving from". For example, a vector, which directly only stores a pointer some dynamically-allocated data: the concept of ownership of that data can be "stolen" by simply copying that pointer then telling the vector to null the pointer and never have anything to do with that data again. It's yielded. The data belongs to you now. You have the last pointer to it that exists in the universe.
All of this is achieved simply by a bunch of hacks. The first is std::move, which just casts your vector expression to vector&&, so when you pass the result of it to a construction or assignment operation, the version that takes vector&& (the move constructor, or move-assignment operator) is triggered instead of the one that takes const vector&, and that version performs the steps necessary to do what I described in the previous paragraph.
(For other types that we make, we traditionally keep following that pattern, because that's how we can have nice things and persuade people to use our libraries.)
But then you can still use the vector! You can "touch" it. What exactly you can do with it is discoverable from the documentation for vector, and this extends to any other moveable type: the constraints emplaced on your usage of a moved-from object depend entirely on its type, and on the decisions made by the person who designed that type.
None of this has any impact on the lifetime of the vector. It still exists, it still takes memory, and it will still be destructed when the time comes. (In this particular example you can actually .clear() it and start again adding data to a new buffer.)
So, even if ints had any sort of concept of this (they don't; they encapsulate no indirectly-stored data, and own no resources; they have no constructors, so they also have no constructors taking int&&), the delete "touch"ing them would be entirely safe. And, more generally, none of this depends on whether the thing you've moved from is a member or not.
More generally, if you had a type T, and an object of that type, and you moved from it, and one of the constraints for T was that you couldn't delete it after moving from it, that would be a bug in T. That would be a serious mistake by the author of T. Your objects all need to be destructible. The mistake could manifest as a compilation failure or, more likely, undefined behaviour, depending on what exactly the bug was.
tl;dr: Yes, this is safe, for several reasons.
std::move is a cast to an rvalue-reference, which primarily changes which constructor/assignment operator overload is chosen. In your example the move-constructor is the default generated move-constructor, which just copies the ints over so nothing happens.
Whether or not this generally safe depends on the way your classes implement move construction/assignment. Assume for example that your class instead held a pointer. You would have to set that to nullptr in the moved-from class to avoid destroying the pointed-to data, if the moved-from class is destroyed.
Because just defining move-semantics is a custom way almost always leads to problems, the rule of five says that if you customize any of:
the copy constructor
the copy assignment operator
the move constructor
the move assignment operator
the destructor
you should usually customize all to ensure that they behave consistently with the expectations a caller would usually have for your class.
Copy constructors were traditionally ubiquitous in C++ programs. However, I'm doubting whether there's a good reason to that since C++11.
Even when the program logic didn't need copying objects, copy constructors (usu. default) were often included for the sole purpose of object reallocation. Without a copy constructor, you couldn't store objects in a std::vector or even return an object from a function.
However, since C++11, move constructors have been responsible for object reallocation.
Another use case for copy constructors was, simply, making clones of objects. However, I'm quite convinced that a .copy() or .clone() method is better suited for that role than a copy constructor because...
Copying objects isn't really commonplace. Certainly it's sometimes necessary for an object's interface to contain a "make a duplicate of yourself" method, but only sometimes. And when it is the case, explicit is better than implicit.
Sometimes an object could expose several different .copy()-like methods, because in different contexts the copy might need to be created differently (e.g. shallower or deeper).
In some contexts, we'd want the .copy() methods to do non-trivial things related to program logic (increment some counter, or perhaps generate a new unique name for the copy). I wouldn't accept any code that has non-obvious logic in a copy constructor.
Last but not least, a .copy() method can be virtual if needed, allowing to solve the problem of slicing.
The only cases where I'd actually want to use a copy constructor are:
RAII handles of copiable resources (quite obviously)
Structures that are intended to be used like built-in types, like math vectors or matrices -
simply because they are copied often and vec3 b = a.copy() is too verbose.
Side note: I've considered the fact that copy constructor is needed for CAS, but CAS is needed for operator=(const T&) which I consider redundant basing on the exact same reasoning;
.copy() + operator=(T&&) = default would be preferred if you really need this.)
For me, that's quite enough incentive to use T(const T&) = delete everywhere by default and provide a .copy() method when needed. (Perhaps also a private T(const T&) = default just to be able to write copy() or virtual copy() without boilerplate.)
Q: Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Specifically, am I correct in that move constructors took over the responsibility of object reallocation in C++11 completely? I'm using "reallocation" informally for all the situations when an object needs to be moved someplace else in the memory without altering its state.
The problem is what is the word "object" referring to.
If objects are the resources that variables refers to (like in java or in C++ through pointers, using classical OOP paradigms) every "copy between variables" is a "sharing", and if single ownership is imposed, "sharing" becomes "moving".
If objects are the variables themselves, since each variables has to have its own history, you cannot "move" if you cannot / don't want to impose the destruction of a value in favor of another.
Cosider for example std::strings:
std::string a="Aa";
std::string b=a;
...
b = "Bb";
Do you expect the value of a to change, or that code to don't compile? If not, then copy is needed.
Now consider this:
std::string a="Aa";
std::string b=std::move(a);
...
b = "Bb";
Now a is left empty, since its value (better, the dynamic memory that contains it) had been "moved" to b. The value of b is then chaged, and the old "Aa" discarded.
In essence, move works only if explicitly called or if the right argument is "temporary", like in
a = b+c;
where the resource hold by the return of operator+ is clearly not needed after the assignment, hence moving it to a, rather than copy it in another a's held place and delete it is more effective.
Move and copy are two different things. Move is not "THE replacement for copy". It an more efficient way to avoid copy only in all the cases when an object is not required to generate a clone of itself.
Short anwer
Is the above reasoning correct or am I missing any good reasons why logic objects actually need or somehow benefit from copy constructors?
Automatically generated copy constructors are a great benefit in separating resource management from program logic; classes implementing logic do not need to worry about allocating, freeing or copying resources at all.
In my opinion, any replacement would need to do the same, and doing that for named functions feels a bit weird.
Long answer
When considering copy semantics, it's useful to divide types into four categories:
Primitive types, with semantics defined by the language;
Resource management (or RAII) types, with special requirements;
Aggregate types, which simply copy each member;
Polymorphic types.
Primitive types are what they are, so they are beyond the scope of the question; I'm assuming that a radical change to the language, breaking decades of legacy code, won't happen. Polymorphic types can't be copied (while maintaining the dynamic type) without user-defined virtual functions or RTTI shenanigans, so they are also beyond the scope of the question.
So the proposal is: mandate that RAII and aggregate types implement a named function, rather than a copy constructor, if they should be copied.
This makes little difference to RAII types; they just need to declare a differently-named copy function, and users just need to be slightly more verbose.
However, in the current world, aggregate types do not need to declare an explicit copy constructor at all; one will be generated automatically to copy all the members, or deleted if any are uncopyable. This ensures that, as long as all the member types are correctly copyable, so is the aggregate.
In your world, there are two possibilities:
Either the language knows about your copy-function, and can automatically generate one (perhaps only if explicitly requested, i.e. T copy() = default;, since you want explicitness). In my opinion, automatically generating named functions based on the same named function in other types feels more like magic than the current scheme of generating "language elements" (constructors and operator overloads), but perhaps that's just my prejudice speaking.
Or it's left to the user to correctly implement copying semantics for aggregates. This is error-prone (since you could add a member and forget to update the function), and breaks the current clean separation between resource management and program logic.
And to address the points you make in favour:
Copying (non-polymorphic) objects is commonplace, although as you say it's less common now that they can be moved when possible. It's just your opinion that "explicit is better" or that T a(b); is less explicit than T a(b.copy());
Agreed, if an object doesn't have clearly defined copy semantics, then it should have named functions to cover whatever options it offers. I don't see how that affects how normal objects should be copied.
I've no idea why you think that a copy constructor shouldn't be allowed to do things that a named function could, as long as they are part of the defined copy semantics. You argue that copy constructors shouldn't be used because of artificial restrictions that you place on them yourself.
Copying polymorphic objects is an entirely different kettle of fish. Forcing all types to use named functions just because polymorphic ones must won't give the consistency you seem to be arguing for, since the return types would have to be different. Polymorphic copies will need to be dynamically allocated and returned by pointer; non-polymorphic copies should be returned by value. In my opinion, there is little value in making these different operations look similar without being interchangable.
One case where copy constructors come in useful is when implementing the strong exception guarantees.
To illustrate the point, let's consider the resize function of std::vector. The function might be implemented roughly as follows:
void std::vector::resize(std::size_t n)
{
if (n > capacity())
{
T *newData = new T [n];
for (std::size_t i = 0; i < capacity(); i++)
newData[i] = std::move(m_data[i]);
delete[] m_data;
m_data = newData;
}
else
{ /* ... */ }
}
If the resize function were to have a strong exception guarantee we need to ensure that, if an exception is thrown, the state of the std::vector before the resize() call is preserved.
If T has no move constructor, then we will default to the copy constructor. In this case, if the copy constructor throws an exception, we can still provide strong exception guarantee: we simply delete the newData array and no harm to the std::vector has been done.
However, if we were using the move constructor of T and it threw an exception, then we have a bunch of Ts that were moved into the newData array. Rolling this operation back isn't straight-forward: if we try to move them back into the m_data array the move constructor of T may throw an exception again!
To resolve this issue we have the std::move_if_noexcept function. This function will use the move constructor of T if it is marked as noexcept, otherwise the copy constructor will be used. This allows us to implement std::vector::resize in such a way as to provide a strong exception guarantee.
For completeness, I should mention that C++11 std::vector::resize does not provide a strong exception guarantee in all cases. According to www.cplusplus.com we have the the follow guarantees:
If n is less than or equal to the size of the container, the function never throws exceptions (no-throw guarantee).
If n is greater and a reallocation happens, there are no changes in the container in case of exception (strong guarantee) if the type of the elements is either copyable or no-throw moveable.
Otherwise, if an exception is thrown, the container is left with a valid state (basic guarantee).
Here's the thing. Moving is the new default- the new minimum requirement. But copying is still often a useful and convenient operation.
Nobody should bend over backwards to offer a copy constructor anymore. But it is still useful for your users to have copyability if you can offer it simply.
I would not ditch copy constructors any time soon, but I admit that for my own types, I only add them when it becomes clear I need them- not immediately. So far this is very, very few types.
Coming from a Java background, I am trying to understand pointers/references in C++. I am trying to return a vector from a function. Writing:
vector<char*> f(){
vector<char*> vec;
return vec;
}
would return the copy of the vector, correct? A better way would be to return a pointer to vector like this:
vector<char*>* f(){
vector<char*>* vec = new vector<char*>;
return vec;
}
Am I correct, or is this totally wrong?
In C++03 the returning by value most likely leads to RVO (Return Value Optimization) which will elide the unnecessary copy. In C++11 move semantics will take care of the copy.
So, why return by value in the first place? Because it prevents unnecessary objects with dynamic lifetimes. Your example code also doesn't respect any allocation policy a user of your function might want to use.
In general, returning a container is even in C++11 still a bad idea: It restricts users to that specific container as it is not possible to move across containers, only to copy. The standard library solves this problem with OutputIteratorS. Your algorithm would most likely be written as:
template<typename OutputIterator>
OutputIterator f(OutputIterator o);
This way you abstract away from the container and also circumvent the original problem.
You're wrong, you do not want to do this in C++. Pretty much every C++ compiler out there has what's called Named Return Value Optimization, which will (effectively) cause the vec to be moved, not copied, by allocating space for the return value on the stack, which is then basically constructed "in place". This eliminates the overhead.
The Wikipedia article on this gives a reasonable rundown.
Am I correct, or is this totally wrong?
This is totally wrong, at least in C++11 where move semantics exists, and as long as you do not need to create aliases of the value you return (which does not seem to be your case and, even if it were, would likely require the use of smart pointers rather than raw pointers).
Returning a vector by value is OK now. Most of the time, even in C++98, the compiler would elide the call to the copy constructor anyway (and to the move constructor in C++11). This is called the (Named) Return Value Optimization.
In C++11, all the containers of the Standard Library support move constructors, so even when a copy or move is not elided, returning a container by value is not expensive.
I've written a simple linked list because a recent interview programming challenge showed me how rusty my C++ has gotten. On my list I declared a private copy constructor because I wanted to explicitly avoid making any copies (and of course, laziness). I ran in to some trouble when I wanted to return an object by value that owns one of my lists.
class Foo
{
MyList<int> list; // MyList has private copy constructor
public:
Foo() {};
};
class Bar
{
public:
Bar() {};
Foo getFoo()
{
return Foo();
}
};
I get a compiler error saying that MyList has a private copy constructor when I try to return a Foo object by value. Should Return-Value-Optimization negate the need for any copying? Am I required to write a copy constructor? I'd never heard of move constructors until I started looking for solutions to this problem, is that the best solution? If so, I'll have to read up on them. If not, what is the preferred way to solve this problem?
The standard explicitly states that the constructor still needs to be accessible, even if it is optimized away. See 12.8/32 in a recent draft.
I prefer making an object movable and non-copyable in such situations. It makes ownership very clear and explicit.
Otherwise, your users can always use a shared_ptr. Hiding shared ownership is at best a questionable idea (unless you can guarantee all your values are immutable).
The basic problem is that return by value might copy. The C++ implementation is not required by the standard to apply copy-elision where it does apply. That's why the object still has to be copyable: so that the implementation's decision when to use it doesn't affect whether the code is well-formed.
Anyway, it doesn't necessarily apply to every copy that the user might like it to. For example there is no elision of copy assignment.
I think your options are:
implement a proper copy. If someone ends up with a slow program due to copying it then their profiler will tell them, you don't have to make it your job to stop them if you don't want to.
implement a proper move, but no copy (C++11 only).
change getFoo to take a Foo& (or maybe Foo*) parameter, and avoid a copy by somehow mutating their object. An efficient swap would come in handy for that. This is fairly pointless if getFoo really returns a default-constructed Foo as in your example, since the caller needs to construct a Foo before they call getFoo.
return a dynamically-allocated Foo wrapped in a smart pointer: either auto_ptr or unique_ptr. Functions defined to create an object and transfer sole ownership to their caller should not return shared_ptr since it has no release() function.
provide a copy constructor but make it blow up somehow (fail to link, abort, throw an exception) if it's ever used. The problems with this are (1) it's doomed to fail but the compiler says nothing, (2) you're enforcing quality of implementation, so your class doesn't work if someone deliberately disables RVO for whatever reason.
I may have missed some.
The solution would be implementing your own copy constructor that would use other methods of MyList to implement the copy semantics.
... I wanted to explicitly avoid making any copies
You have to choose. Either you can't make copies of an object, like std::istream; then you have to hold such objects in pointers/references, since these can be copied (in C++11, you can use move semantics instead). Or you implement the copy constructor, which is probably easier then solving problems on each place a copy is needed.