c++ unique_ptr argument passing - c++

Suppose I have the following code:
class B { /* */ };
class A {
vector<B*> vb;
public:
void add(B* b) { vb.push_back(b); }
};
int main() {
A a;
B* b(new B());
a.add(b);
}
Suppose that in this case, all raw pointers B* can be handled through unique_ptr<B>.
Surprisingly, I wasn't able to find how to convert this code using unique_ptr. After a few tries, I came up with the following code, which compiles:
class A {
vector<unique_ptr<B>> vb;
public:
void add(unique_ptr<B> b) { vb.push_back(move(b)); }
};
int main() {
A a;
unique_ptr<B> b(new B());
a.add(move(b));
}
So my simple question: is this the way to do it and in particular, is move(b) the only way to do it? (I was thinking of rvalue references but I don't fully understand them.)
And if you have a link with complete explanations of move semantics, unique_ptr, etc. that I was not able to find, don't hesitate to share it.
EDIT According to http://thbecker.net/articles/rvalue_references/section_01.html, my code seems to be OK.
Actually, std::move is just syntactic sugar. With object x of class X, move(x) is just the same as:
static_cast <X&&>(x)
These 2 move functions are needed because casting to a rvalue reference:
prevents function "add" from passing by value
makes push_back use the default move constructor of B
Apparently, I do not need the second std::move in my main() if I change my "add" function to pass by reference (ordinary lvalue ref).
I would like some confirmation of all this, though...

I am somewhat surprised that this is not answered very clearly and explicitly here, nor on any place I easily stumbled upon. While I'm pretty new to this stuff, I think the following can be said.
The situation is a calling function that builds a unique_ptr<T> value (possibly by casting the result from a call to new), and wants to pass it to some function that will take ownership of the object pointed to (by storing it in a data structure for instance, as happens here into a vector). To indicate that ownership has been obtained by the caller, and it is ready to relinquish it, passing a unique_ptr<T> value is in place. Ther are as far as I can see three reasonable modes of passing such a value.
Passing by value, as in add(unique_ptr<B> b) in the question.
Passing by non-const lvalue reference, as in add(unique_ptr<B>& b)
Passing by rvalue reference, as in add(unique_ptr<B>&& b)
Passing by const lvalue reference would not be reasonable, since it does not allow the called function to take ownership (and const rvalue reference would be even more silly than that; I'm not even sure it is allowed).
As far as valid code goes, options 1 and 3 are almost equivalent: they force the caller to write an rvalue as argument to the call, possibly by wrapping a variable in a call to std::move (if the argument is already an rvalue, i.e., unnamed as in a cast from the result of new, this is not necessary). In option 2 however, passing an rvalue (possibly from std::move) is not allowed, and the function must be called with a named unique_ptr<T> variable (when passing a cast from new, one has to assign to a variable first).
When std::move is indeed used, the variable holding the unique_ptr<T> value in the caller is conceptually dereferenced (converted to rvalue, respectively cast to rvalue reference), and ownership is given up at this point. In option 1. the dereferencing is real, and the value is moved to a temporary that is passed to the called function (if the calles function would inspect the variable in the caller, it would find it hold a null pointer already). Ownership has been transferred, and there is no way the caller could decide to not accept it (doing nothing with the argument causes the pointed-to value to be destroyed at function exit; calling the release method on the argument would prevent this, but would just result in a memory leak). Surprisingly, options 2. and 3. are semantically equivalent during the function call, although they require different syntax for the caller. If the called function would pass the argument to another function taking an rvalue (such as the push_back method), std::move must be inserted in both cases, which will transfer ownership at that point. Should the called function forget to do anything with the argument, then the caller will find himself still owning the object if holding a name for it (as is obligatory in option 2); this in spite of that fact that in case 3, since the function prototype asked the caller to agree to the release of ownership (by either calling std::move or supplying a temporary). In summary the methods do
Forces caller to give up ownership, and be sure to actually claim it.
Force caller to possess ownership, and be prepared (by supplying a non const reference) to give it up; however this is not explicit (no call of std::move required or even allowed), nor is taking away ownership assured. I would consider this method rather unclear in its intention, unless it is explicitly intended that taking ownership or not is at discretion of the called function (some use can be imagined, but callers need to be aware)
Forces caller to explicitly indicate giving up ownership, as in 1. (but actual transfer of ownership is delayed until after the moment of function call).
Option 3 is fairly clear in its intention; provided ownership is actually taken, it is for me the best solution. It is slightly more efficient than 1 in that no pointer values are moved to temporaries (the calls to std::move are in fact just casts and cost nothing); this might be especially relevant if the pointer is handed through several intermediate functions before its contents is actually being moved.
Here is some code to experiment with.
class B
{
unsigned long val;
public:
B(const unsigned long& x) : val(x)
{ std::cout << "storing " << x << std::endl;}
~B() { std::cout << "dropping " << val << std::endl;}
};
typedef std::unique_ptr<B> B_ptr;
class A {
std::vector<B_ptr> vb;
public:
void add(B_ptr&& b)
{ vb.push_back(std::move(b)); } // or even better use emplace_back
};
void f() {
A a;
B_ptr b(new B(123)),c;
a.add(std::move(b));
std::cout << "---" <<std::endl;
a.add(B_ptr(new B(4567))); // unnamed argument does not need std::move
}
As written, output is
storing 123
---
storing 4567
dropping 123
dropping 4567
Note that values are destroyed in the ordered stored in the vector. Try changing the prototype of the method add (adapting other code if necessary to make it compile), and whether or not it actually passes on its argument b. Several permutations of the lines of output can be obtained.

Yes, this is how it should be done. You are explicitly transferring ownership from main to A. This is basically the same as your previous code, except it's more explicit and vastly more reliable.

So my simple question: is this the way to do it and in particular, is this "move(b)" the only way to do it? (I was thinking of rvalue references but I don't fully understand it so...)
And if you have a link with complete explanations of move semantics, unique_ptr... that I was not able to find, don't hesitate.
Shameless plug, search for the heading "Moving into members". It describes exactly your scenario.

Your code in main could be simplified a little, since C++14:
a.add( make_unique<B>() );
where you can put arguments for B's constructor inside the inner parentheses.
You could also consider a class member function that takes ownership of a raw pointer:
void take(B *ptr) { vb.emplace_back(ptr); }
and the corresponding code in main would be:
a.take( new B() );
Another option is to use perfect forwarding for adding vector members:
template<typename... Args>
void emplace(Args&&... args)
{
vb.emplace_back( std::make_unique<B>(std::forward<Args>(args)...) );
}
and the code in main:
a.emplace();
where, as before, you could put constructor arguments for B inside the parentheses.
Link to working example

Related

Using make_shared with emplace_back and push_back - any difference?

some_vector.push_back(make_shared<ClassName>());
some_vector.emplace_back(make_shared<ClassName>());
I want to check that my understanding is correct that for make_shared and in general for all other functions that returns an object those two calls are identical. Here make_shared will create a new shared_ptr, and then this pointer will be moved into the container both in push_back and emplace_back. Is this correct, or will there be some difference?
vector<T>::push_back has a T&& overload, which does the same as the vector<T>::emplace_back T&& version.
The difference is that emplace_back will perfect-forward any set of arguments to the T's constructor, while push_back only takes T&& or T const&. When you actually pass a T&& or T const& the standard specification of their behaviour is the same.
I want to add a small detail to Yakk's answer.
The forwarding of arguments for the emplace_back-case can introduce horrible bugs in doubt - even for vectors of shared pointers - if not used with special care, see for instance
#include <vector>
#include <memory>
struct SimpleStruct {};
auto main() -> int
{
std::vector<std::shared_ptr<SimpleStruct>> v;
SimpleStruct a;
v.emplace_back(std::addressof(a)); // compiles, UB
v.push_back(std::addressof(a)); // fails to compile
}
Yes, that's a kind of an extreme example since code like this should always be used with special care or questioned in general, but it emphasizes, that one should only refer to emplace_back if one hasn't the to copy object already at hands and its only purpose is to be added to the vector, and refer to push_back for all common copy/move-construction cases. It would be nice if the language/standard library could force that from scratch for emplace_back, i.e. only accepting the custom non-copy/move constructors in order to have this clear separation but even if it's possible in an acceptable way, it would be in conflict with many template-context scenarios (fast-forwarding) and the error-prone usage is still possible, although a bit more explicit.
According to my example from above, code refactorization is an important point here in doubt. Simply imagine that the previous code used raw pointers, i.e. the actual underlying bug was already persistent there and hidden by emplace_back -usage. It would also had been hidden by push_back -usage there but not as soon as you update your code to the shared pointer way.
Even if it's not relevant for your particular specific use-case, I think it's worth to be mentioned here since one should be totally confident about the underlying differences between both methods.
Thanks to Human-Compiler in the comments for mentioning my used previous wrong terminology here.
To understand this problem let's first consider what would be the result of calling std::make_shared<class_type>(),
It returns temporary object which means Xvalue an eXpiring value whose resources can be reused. Now let's see both cases,
some_vector.push_back(make_shared<ClassName>());
std::vector have two overload of push_back and one of them accept rvalue reference that isconstexpr void push_back( T&& value );
It means value is moved into new element, but how? rvalue overload of push_back will move construct new value by invoking shared_ptr( shared_ptr&& r ) noexcept; and ownership of r will be taken and r become empty.
some_vector.emplace_back(make_shared<ClassName>());
In emplace_back( Args&&... args ) element is constructed through std::allocator_traits::construct by perfect forwarding args.. through std::forward<Args>(args)..., It means rvalue will perfect forward and cause same move constructor shared_ptr( shared_ptr&& r ) noexcept; to be invoked.
Conclusion is, both push_back and emplace_back have same effect.
But what is explained above doesn't happen because compiler comes into the picture and what it does, it perform optimization, It means rather than creating temporary objects and moving them into other objects, it directly creates objects in place.
Again result is same in both cases.
Below, supporting code for compiler optimization theory is included and as you can see output only prints one constructor call.
#include <iostream>
using std::cout;
using std::endl;
class Object{
public:
explicit Object(int );
Object(const Object& );
Object(Object&& );
};
Object::Object(int ){
cout<< __PRETTY_FUNCTION__<< endl;
}
Object::Object(const Object& ){
cout<< __PRETTY_FUNCTION__<< endl;
}
Object::Object(Object&& ){
cout<< __PRETTY_FUNCTION__<< endl;
}
int main(){
[[maybe_unused]] Object obj(Object(1));
}
Output:
Object::Object(int)
some_vector.push_back(make_shared<ClassName>()); rvalue reference is passed to the function, the push_back simply calls emplace_back.
void push_back(value_type&& __x)
{ emplace_back(std::move(__x)); }

Everything in c++ by default is passed by value

In C++, are all types passed by value unless it comes with a & or * symbol?
For example in Java, passing an array as a function argument would be by default passing by reference. Does C++ give you more control over this?
EDIT: Thanks for all your responses, I think I understand the whole pass-by-value thing more clearly. For anyone who is still confused about how Java passes by value (a copy of the object reference), this answer really cleared it up for me.
In C++, are all types passed by value unless it comes with a & or *
symbol?
No if you pass something as * parameter (a pointer thereof) it is still passed by value. A copy of the pointer being passed is made. But both the original and copy point to the same memory. It is similar concept in C# - I believe also in Java, just you don't use * there.
That is why if you make changes to the outer objects using this pointer (e.g. using dereferencing), changes will be visible in original object too.
But if you just say assign a new value to the pointer, nothing will happen to the outer object. e.g.
void foo(int* ptr)
{
// ...
// Below, nothing happens to original object to which ptr was
// pointing, before function call, just ptr - the copy of original pointer -
// now points to a different object
ptr = &someObj;
// ...
}
For example in Java, passing an array as a function argument would be
by default passing by reference. Does C++ give you more control over
this?
In C++ or C if you pass array (e.g. int arr[]), what is being passed is treated as pointer to the first element of the array. Hence, what I said above holds true in this case too.
About & you are correct. You can even apply & to pointers (e.g., int *&), in which case now, the pointer indeed gets passed by reference - there is no copy made.
Probably tangential to your question, but I often take another direction to understand what happens when you call a function in C++.
The difference between
void foo(Bar bar); // [1]
void foo(Bar& bar); // [2]
void foo(Bar* bar); // [3]
is that the body in [1] will receive a copy of the original bar (we call this by value, but I prefer to think of it as my own copy).
The body of [2] will be working with the exact same bar object; no copies. Whether we can modify that bar object depends on whether the argument was Bar& bar (as illustrated) or const Bar& bar. Notice that in a well-formed program,[2] will always receive an object (no null references; let's leave dangling references aside).
The body of [3] will receive a copy of the pointer to the original bar. Whether or not I can modify the pointer and/or the object being pointed depends on whether the argument was const Bar* bar, const Bar* const bar, Bar* const bar, or Bar* bar (yes, really). The pointer may or may not be null.
The reason why I make this mental distinction is because a copy of the object may or may not have reference semantics. For example: a copy of an instance of this class:
struct Foo {
std::shared_ptr<FooImpl> m_pimpl;
};
would, by default, have the same "contents" as the original one (a new shared pointer pointing to the same FooImpl pointer). This, of course, depends on how did the programmer design the class.
For that reason I prefer to think of [1] as "takes a copy of bar", and if I need to know whether such copy will be what I want and what I need I go and study the class directly to understand what does that class in particular means by copy.

Good practice: Constant to non-constant cast

When a function doesn't modify an object argument, I always make it ask for a constant reference even if the referenced object isn't really constant. Is this wrong?
For a wrapper class, I'd like to write this:
template<class B>
class Wrapper{
private:
B* base_;
public:
Wrapper(const B& b) { base_ = const_cast<B*>(&b); }
void ModifyBase();
};
The constructor doesn't modify the base so it asks for a constant reference.
The wrapper have some methods who will need to modify the base so it needs to store a non-constant pointer (thus the conversion).
I feel my solution is not the best.
Is there a better way to do this?
Is there any accepted convention?
When you choose your parameter to be a const reference, you're telling the user "You can trust that if you pass me an object, it will not get modified [through this reference]†." You should do that as often as possible, because the user can understand more about what your function will and won't do just from looking at the types. Also, passing around mutable references can lead to code that is difficult to reason about.
However, in your question, your const is not telling the truth. It is casting away the constness and storing a non-const pointer - this means the object may very well get modified. You lied to the user! It doesn't matter that the constructor itself does nothing to do the object. It allows it to be modified by other member functions. This is bad behaviour. Your constructor should not take a const reference.
Not only that, but your current implementation allows undefined behaviour. Even if an object that is originally declared as const is given to your Wrapper, it doesn't care. It casts away it's constness and allows the other member functions to modify it. Modifying an object that was originally const is undefined behaviour.
† See 6502's comment
It doesn't really matter that the ctor won't alter the object in the ctor, what happens after the ctor is done is why you need a non-const object pointer to B. So it has to do with ownership and lifetime of the B object passed in: if you want to take ownership (via the & reference, then the object must be non-const because it can be altered. If you want to simply copy the B object passed in, then don't use a refernce, pass by value and store a pointer to the copy.

const parameter vs const reference parameter

Implementation 1:
foo(const Bar x);
Implementation 2:
foo(const Bar & x);
If the object will not be changed within the function, why would you ever copy it(implementation 1).
Will this be automatically optimized by the compiler?
Summary: Even though the object is declared as const in the function declaration, it is still possible that the object be edited via some other alias &.
If you are the person writing the library and know that your functions don't do that or that the object is big enough to justify the dereferencing cost on every operation, than
foo(const Bar & x); is the way to go.
Part 2:
Will this be automatically optimized by the compiler?
Since we established that they are not always equivalent, and the conditions for equivalence is non-trivial, it would generally be very hard for the compiler to ensure them, so almost certainly no
you ask,
“If the object will not be changed within the function, why would you ever copy it(implementation 1).”
well there are some bizarre situations where an object passed by reference might be changed by other code, e.g.
namespace g { int x = 666; }
void bar( int ) { g::x = 0; }
int foo( int const& a ) { assert( a != 0 ); bar( a ); return 1000/a; } // Oops
int main() { foo( g::x ); }
this has never happened to me though, since the mid 1990s.
so, this aliasing is a theoretical problem for the single argument of that type.
with two arguments of the same type it gets more of a real possibility. for example, an assignment operator might get passed the object that it's called on. when the argument is passed by value (as in the minimal form of the swap idiom) it's no problem, but if not then self-assignment generally needs to be avoided.
you further ask,
“Will this be automatically optimized by the compiler?”
no, not in general, for the above mentioned reason
the compiler can generally not guarantee that there will be no aliasing for a reference argument (one exception, though, is where the machine code of a call is inlined)
however, on the third hand, the language could conceivably have supported the compiler in this, e.g. by providing the programmer with a way to explicitly accept any such optimization, like, a way to say ”this code is safe to optimize by replacing pass by value with pass by reference, go ahead as you please, compiler”
Indeed, in those circumstances you would normally use method 2.
Typically, you would only use method 1 if the object is tiny, so that it's cheaper to copy it once than to pay to access it repeatedly through a reference (which also incurs a cost). In TC++PL, Stroustrup develops a complex number class and passes it around by value for exactly this reason.
It may be optimized in some circumstances, but there are plenty of things that can prevent it. The compiler can't avoid the copy if:
the copy constructor or destructor has side effects and the argument passed is not a temporary.
you take the address of x, or a reference to it, and pass it to some code that might be able to compare it against the address of the original.
the object might change while foo is running, for example because foo calls some other function that changes it. I'm not sure whether this is something you mean to rule out by saying "the object will not be changed within the function", but if not then it's in play.
You'd copy it if any of those things matters to your program:
if you want the side effects of copying, take a copy
if you want "your" object to have a different address from the user-supplied argument, take a copy
if you don't want to see changes made to the original during the running of your function, take a copy
You'd also copy it if you think a copy would be more efficient, which is generally assumed to be the case for "small" types like int. Iterators and predicates in standard algorithms are also taken by value.
Finally, if your code plans to copy the object anyway (including by assigning to an existing object) then a reasonable idiom is to take the copy as the parameter in the first place. Then move/swap from your parameter.
What if the object is changed from elsewhere?
void f(const SomeType& s);
void g(const SomeType s);
int main() {
SomeType s;
std::thread([&](){ /* s is non-const here, and we can modify it */}
// we get a const reference to the object which we see as const,
// but others might not. So they can modify it.
f(s);
// we get a const *copy* of the object,
// so what anyone else might do to the original doesn't matter
g(s);
}
What if the object is const, but has mutable members? Then you can still modify the object, and so it's very important whether you have a copy or a reference to the original.
What if the object contains a pointer to another object? If s is const, the pointer will be const, but what it points to is not affected by the constness of s. But creating a copy will (hopefully) give us a deep copy, so we get our own (const) object with a separate (const) pointer pointing to a separate (non-const) object.
There are a number of cases where a const copy is different than a const reference.

Why is it preferable to write func( const Class &value )?

Why would one use func( const Class &value ) rather than just func( Class value )? Surely modern compilers will do the most efficient thing using either syntax. Is this still necessary or just a hold over from the days of non-optimizing compilers?
Just to add, gcc will produce similar assembler code output for either syntax. Perhaps other compilers do not?
Apparently, this is just not the case. I had the impression from some code long ago that gcc did this, but experimentation proves this wrong. Credit is due to to Michael Burr, whose answer to a similar question would be nominated if given here.
There are 2 large semantic differences between the 2 signatures.
The first is the use of & in the type name. This signals the value is passed by reference. Removing this causes the object to be passed by value which will essentially pass a copy of the object into the function (via the copy constructor). For operations which simply need to read data (typical for a const &) doing a full copy of the object creates unnecssary overhead. For classes which are not small or are collections, this overhead is not trivial.
The second is the use of const. This prevents the function from accidentally modifying the contents of value via the value reference. It allows the caller some measure of assurance the value will not be mutated by the function. Yes passing a copy gives the caller a much deeper assurance of this in many cases.
The first form doesn't create a copy of the object, it just passes a reference (pointer) to the existing copy. The second form creates a copy, which can be expensive. This isn't something that is optimized away: there are semantic differences between having a copy of an object vs. having the original, and copying requires a call to the class's copy constructor.
For very small classes (like <16 bytes) with no copy constructor it is probably more efficient to use the value syntax rather than pass references. This is why you see void foo(double bar) and not void foo(const double &var). But in the interests of not micro-optimizing code that doesn't matter, as a general rule you should pass all real-deal objects by reference and only pass built-in types like int and void * by value.
There is a huge difference which nobody has mentioned yet: object slicing. In some cases, you may need const& (or &) to get correct behavior.
Consider another class Derived which inherits from Class. In client code, you create an instance of Derived which you pass to func(). If you have func(const Class&), that same instance will get passed. As others have said, func(Class) will make a copy, you will have a new (temporary) instance of Class (not Derived) in func.
This difference in behavior (not performance) can be important if func in turn does a downcast. Compare the results of running the following code:
#include <typeinfo.h>
struct Class
{
virtual void Foo() {};
};
class Derived : public Class {};
void f(const Class& value)
{
printf("f()\n");
try
{
const Derived& d = dynamic_cast<const Derived&>(value);
printf("dynamic_cast<>\n");
}
catch (std::bad_cast)
{
fprintf(stderr, "bad_cast\n");
}
}
void g(Class value)
{
printf("g()\n");
try
{
const Derived& d = dynamic_cast<const Derived&>(value);
printf("dynamic_cast<>\n");
}
catch (std::bad_cast)
{
fprintf(stderr, "bad_cast\n");
}
}
int _tmain(int argc, _TCHAR* argv[])
{
Derived d;
f(d);
g(d);
return 0;
}
Surely modern compilers will do the
most efficient thing using either
syntax
The compiler doesn't compile what you "mean", it compiles what you tell it to. Compilers are only smart for lower level optimizations and problems the programmer overlooks (such as computation inside a for loop, dead code etc).
What you tell the compiler to do in the second example, is to make a copy of the class - which it will do without thinking - even if you didn't use it, that's what you asked the compiler to do.
The second example explicitly asks the compiler to use the same variable - conserving space and precious cycles (no copy is needed). The const is there for mistakes - since Class &value can be written to (sometimes it's desired).
Here are the differences between some parameter declarations:
copied out modifiable
func(Class value) Y N Y
func(const Class value) Y N N
func(Class &value) N Y Y
func(const Class &value) N N N
where:
copied: a copy of the input parameter is made when the function is called
out: value is an "out" parameter, which means modifications made within func() will be visible outside the function after it returns
modifiable: value can be modified within func()
So the differences between func(Class value) and func(const Class &value) are:
The first one makes a copy of the input parameter (by calling the Class copy constructor), and allows code inside func() to modify value
The second one does not make a copy, and does not allow code inside func() to modify value
If you use the former, and then try to change value, by accident, the compiler will give you an error.
If you use the latter, and then try to change value, it won't.
Thus the former makes it easier to catch mistakes.
The first example is pass by reference. Rather than pass the type, C++ will pass a reference to the object (generally, references are implemented with pointers... So it's likely an object of size 4 bytes)... In the second example, the object is passed by value... if it is a big, complex object then likely it's a fairly heavyweight operation as it involves copy construction of a new "Class".
The reason that an optimizing compiler can't handle this for you is the issue of separate compilation. In C++, when the compiler is generating code for a caller, it may not have access to the code of the function itself. The most common calling convention that I know of usually has the caller invoke the copy-constructor which means it's not possible for the compilation of the function itself to prevent the copy constructor if it's not necessary.
The only time that passing a parameter by value is preferable is when you are going to copy the parameter anyway.
std::string toUpper( const std::string &value ) {
std::string retVal(value);
transform(retVal.begin(), retVal.end(), charToUpper());
return retVal;
}
Or
std::string toUpper( std::string value ) {
transform(value.begin(), value.end(), charToUpper());
return value;
}
In this case the second example is the same speed as the first if the value parameter is a regular object, but faster if the value parameter is a R-Value.
Although most compilers will do this optimisation already I don't expect to rely on this feature till C++0X, esp since I expect it could confuse most programmers who would probably change it back.
See Want Speed? Pass by Value. for a better explaination than I could give.