Implementation 1:
foo(const Bar x);
Implementation 2:
foo(const Bar & x);
If the object will not be changed within the function, why would you ever copy it(implementation 1).
Will this be automatically optimized by the compiler?
Summary: Even though the object is declared as const in the function declaration, it is still possible that the object be edited via some other alias &.
If you are the person writing the library and know that your functions don't do that or that the object is big enough to justify the dereferencing cost on every operation, than
foo(const Bar & x); is the way to go.
Part 2:
Will this be automatically optimized by the compiler?
Since we established that they are not always equivalent, and the conditions for equivalence is non-trivial, it would generally be very hard for the compiler to ensure them, so almost certainly no
you ask,
“If the object will not be changed within the function, why would you ever copy it(implementation 1).”
well there are some bizarre situations where an object passed by reference might be changed by other code, e.g.
namespace g { int x = 666; }
void bar( int ) { g::x = 0; }
int foo( int const& a ) { assert( a != 0 ); bar( a ); return 1000/a; } // Oops
int main() { foo( g::x ); }
this has never happened to me though, since the mid 1990s.
so, this aliasing is a theoretical problem for the single argument of that type.
with two arguments of the same type it gets more of a real possibility. for example, an assignment operator might get passed the object that it's called on. when the argument is passed by value (as in the minimal form of the swap idiom) it's no problem, but if not then self-assignment generally needs to be avoided.
you further ask,
“Will this be automatically optimized by the compiler?”
no, not in general, for the above mentioned reason
the compiler can generally not guarantee that there will be no aliasing for a reference argument (one exception, though, is where the machine code of a call is inlined)
however, on the third hand, the language could conceivably have supported the compiler in this, e.g. by providing the programmer with a way to explicitly accept any such optimization, like, a way to say ”this code is safe to optimize by replacing pass by value with pass by reference, go ahead as you please, compiler”
Indeed, in those circumstances you would normally use method 2.
Typically, you would only use method 1 if the object is tiny, so that it's cheaper to copy it once than to pay to access it repeatedly through a reference (which also incurs a cost). In TC++PL, Stroustrup develops a complex number class and passes it around by value for exactly this reason.
It may be optimized in some circumstances, but there are plenty of things that can prevent it. The compiler can't avoid the copy if:
the copy constructor or destructor has side effects and the argument passed is not a temporary.
you take the address of x, or a reference to it, and pass it to some code that might be able to compare it against the address of the original.
the object might change while foo is running, for example because foo calls some other function that changes it. I'm not sure whether this is something you mean to rule out by saying "the object will not be changed within the function", but if not then it's in play.
You'd copy it if any of those things matters to your program:
if you want the side effects of copying, take a copy
if you want "your" object to have a different address from the user-supplied argument, take a copy
if you don't want to see changes made to the original during the running of your function, take a copy
You'd also copy it if you think a copy would be more efficient, which is generally assumed to be the case for "small" types like int. Iterators and predicates in standard algorithms are also taken by value.
Finally, if your code plans to copy the object anyway (including by assigning to an existing object) then a reasonable idiom is to take the copy as the parameter in the first place. Then move/swap from your parameter.
What if the object is changed from elsewhere?
void f(const SomeType& s);
void g(const SomeType s);
int main() {
SomeType s;
std::thread([&](){ /* s is non-const here, and we can modify it */}
// we get a const reference to the object which we see as const,
// but others might not. So they can modify it.
f(s);
// we get a const *copy* of the object,
// so what anyone else might do to the original doesn't matter
g(s);
}
What if the object is const, but has mutable members? Then you can still modify the object, and so it's very important whether you have a copy or a reference to the original.
What if the object contains a pointer to another object? If s is const, the pointer will be const, but what it points to is not affected by the constness of s. But creating a copy will (hopefully) give us a deep copy, so we get our own (const) object with a separate (const) pointer pointing to a separate (non-const) object.
There are a number of cases where a const copy is different than a const reference.
Related
I've seen a similar question to this, but i'd like some clarification...
Assuming a basic C++ class:
class MyClass
{
public:
struct SomeData
{
std::wstring name;
std::vector<int> someValues;
};
void DoSomething(const SomeData data);
}
I understand that data will be passed as const to DoSomething and that is ok since data will not be modified in any way by the function...but I am used to seeing & specified with const parameters to ensure that they are passed by reference, e.g.
void DoSomething(const SomeData& data);
That seems more efficient to me. If we omit the &, then isn't data being passed by value to DoSomething? I'm not sure why it would ever be preferable to pass a const parameter by value when you can pass by reference and avoid the copy occurring?
Pass by value/reference and const-correctness are two different concepts. But used together.
Pass by Value
void DoSomething (SomeData data);
Pass by value is used when it is less costly to copy and do not want to keep references to foreign objects. This function could (if it is inside a class) keep a pointer to this in some case and have its own copy.
Pass by reference
void DoSomething (SomeData& data);
Always use pass by reference if you know this might cause a performance loss copying the struct. This function could (if it is inside a class) keep a pointer to this in some case and pointing to a foreign object. Keeping pointers to foreign objects mean you should aware of its life-time and when this foreign object goes out of bound. More importantly changes to foreign object appears to your pointer.
const correctness
void DoSomething (const SomeData data); // const for local copy
void DoSomething (const SomeData& data); // const for callers object
Adding constto pass by value or reference means this function does not change it. But not having or having & decides which object you are trying to add safety of modifying. const is a very helpful tool in C++ in terms of Documenting APIs, provide compile time safety, allow more compiler optimizations.
Read this article.
The biggest problem with void DoSomething(const SomeData data) is that it conflates interface and implementation. From the caller's point of view, the const doesn't change anything; the function receives a copy anyway, and the original object is not modified. What the implementation does or does not with its own, function-internal copy should not bother the caller and should thus not be expressed in the interface.
The const does make the implementation more const-correct if the copy is not changed, but leaking implementation details into the interface is a high price to pay. I recommend not using void DoSomething(const SomeData data).
As always, performance gains or losses should not be overestimated here. It's more about semantics and conventions.
Passing a const value is mostly informational for the caller, it shows intent. This is important to make code easy to read, understand and maintain.
It might also be possible for the compiler to add some extra optimizations if it knows that the function doesn't modify its argument. For example it might cause the compiler to not perform a copy at all.
This:
void DoSomething(const SomeData data);
is a bit unusual, because while it does not change anything for the caller (who can pass a const or non-const value), it restricts what the function can do internally. There's not a lot of value in that, and it's not commonly done.
Passing by reference (const or not) is more efficient if the value is expensive to copy, including if it is larger than approximately two pointers on the target platform. In other words, if SomeData were a struct containing two integers it would probably be more efficient to pass it by value. But if it contains a std::map or some larger data, better pass it by reference.
An exception to this is if the function is going to copy the value anyway, then it is better to take it by value, because the value might be "moved" instead of copied if the caller allows it.
Suppose I have the following code:
class B { /* */ };
class A {
vector<B*> vb;
public:
void add(B* b) { vb.push_back(b); }
};
int main() {
A a;
B* b(new B());
a.add(b);
}
Suppose that in this case, all raw pointers B* can be handled through unique_ptr<B>.
Surprisingly, I wasn't able to find how to convert this code using unique_ptr. After a few tries, I came up with the following code, which compiles:
class A {
vector<unique_ptr<B>> vb;
public:
void add(unique_ptr<B> b) { vb.push_back(move(b)); }
};
int main() {
A a;
unique_ptr<B> b(new B());
a.add(move(b));
}
So my simple question: is this the way to do it and in particular, is move(b) the only way to do it? (I was thinking of rvalue references but I don't fully understand them.)
And if you have a link with complete explanations of move semantics, unique_ptr, etc. that I was not able to find, don't hesitate to share it.
EDIT According to http://thbecker.net/articles/rvalue_references/section_01.html, my code seems to be OK.
Actually, std::move is just syntactic sugar. With object x of class X, move(x) is just the same as:
static_cast <X&&>(x)
These 2 move functions are needed because casting to a rvalue reference:
prevents function "add" from passing by value
makes push_back use the default move constructor of B
Apparently, I do not need the second std::move in my main() if I change my "add" function to pass by reference (ordinary lvalue ref).
I would like some confirmation of all this, though...
I am somewhat surprised that this is not answered very clearly and explicitly here, nor on any place I easily stumbled upon. While I'm pretty new to this stuff, I think the following can be said.
The situation is a calling function that builds a unique_ptr<T> value (possibly by casting the result from a call to new), and wants to pass it to some function that will take ownership of the object pointed to (by storing it in a data structure for instance, as happens here into a vector). To indicate that ownership has been obtained by the caller, and it is ready to relinquish it, passing a unique_ptr<T> value is in place. Ther are as far as I can see three reasonable modes of passing such a value.
Passing by value, as in add(unique_ptr<B> b) in the question.
Passing by non-const lvalue reference, as in add(unique_ptr<B>& b)
Passing by rvalue reference, as in add(unique_ptr<B>&& b)
Passing by const lvalue reference would not be reasonable, since it does not allow the called function to take ownership (and const rvalue reference would be even more silly than that; I'm not even sure it is allowed).
As far as valid code goes, options 1 and 3 are almost equivalent: they force the caller to write an rvalue as argument to the call, possibly by wrapping a variable in a call to std::move (if the argument is already an rvalue, i.e., unnamed as in a cast from the result of new, this is not necessary). In option 2 however, passing an rvalue (possibly from std::move) is not allowed, and the function must be called with a named unique_ptr<T> variable (when passing a cast from new, one has to assign to a variable first).
When std::move is indeed used, the variable holding the unique_ptr<T> value in the caller is conceptually dereferenced (converted to rvalue, respectively cast to rvalue reference), and ownership is given up at this point. In option 1. the dereferencing is real, and the value is moved to a temporary that is passed to the called function (if the calles function would inspect the variable in the caller, it would find it hold a null pointer already). Ownership has been transferred, and there is no way the caller could decide to not accept it (doing nothing with the argument causes the pointed-to value to be destroyed at function exit; calling the release method on the argument would prevent this, but would just result in a memory leak). Surprisingly, options 2. and 3. are semantically equivalent during the function call, although they require different syntax for the caller. If the called function would pass the argument to another function taking an rvalue (such as the push_back method), std::move must be inserted in both cases, which will transfer ownership at that point. Should the called function forget to do anything with the argument, then the caller will find himself still owning the object if holding a name for it (as is obligatory in option 2); this in spite of that fact that in case 3, since the function prototype asked the caller to agree to the release of ownership (by either calling std::move or supplying a temporary). In summary the methods do
Forces caller to give up ownership, and be sure to actually claim it.
Force caller to possess ownership, and be prepared (by supplying a non const reference) to give it up; however this is not explicit (no call of std::move required or even allowed), nor is taking away ownership assured. I would consider this method rather unclear in its intention, unless it is explicitly intended that taking ownership or not is at discretion of the called function (some use can be imagined, but callers need to be aware)
Forces caller to explicitly indicate giving up ownership, as in 1. (but actual transfer of ownership is delayed until after the moment of function call).
Option 3 is fairly clear in its intention; provided ownership is actually taken, it is for me the best solution. It is slightly more efficient than 1 in that no pointer values are moved to temporaries (the calls to std::move are in fact just casts and cost nothing); this might be especially relevant if the pointer is handed through several intermediate functions before its contents is actually being moved.
Here is some code to experiment with.
class B
{
unsigned long val;
public:
B(const unsigned long& x) : val(x)
{ std::cout << "storing " << x << std::endl;}
~B() { std::cout << "dropping " << val << std::endl;}
};
typedef std::unique_ptr<B> B_ptr;
class A {
std::vector<B_ptr> vb;
public:
void add(B_ptr&& b)
{ vb.push_back(std::move(b)); } // or even better use emplace_back
};
void f() {
A a;
B_ptr b(new B(123)),c;
a.add(std::move(b));
std::cout << "---" <<std::endl;
a.add(B_ptr(new B(4567))); // unnamed argument does not need std::move
}
As written, output is
storing 123
---
storing 4567
dropping 123
dropping 4567
Note that values are destroyed in the ordered stored in the vector. Try changing the prototype of the method add (adapting other code if necessary to make it compile), and whether or not it actually passes on its argument b. Several permutations of the lines of output can be obtained.
Yes, this is how it should be done. You are explicitly transferring ownership from main to A. This is basically the same as your previous code, except it's more explicit and vastly more reliable.
So my simple question: is this the way to do it and in particular, is this "move(b)" the only way to do it? (I was thinking of rvalue references but I don't fully understand it so...)
And if you have a link with complete explanations of move semantics, unique_ptr... that I was not able to find, don't hesitate.
Shameless plug, search for the heading "Moving into members". It describes exactly your scenario.
Your code in main could be simplified a little, since C++14:
a.add( make_unique<B>() );
where you can put arguments for B's constructor inside the inner parentheses.
You could also consider a class member function that takes ownership of a raw pointer:
void take(B *ptr) { vb.emplace_back(ptr); }
and the corresponding code in main would be:
a.take( new B() );
Another option is to use perfect forwarding for adding vector members:
template<typename... Args>
void emplace(Args&&... args)
{
vb.emplace_back( std::make_unique<B>(std::forward<Args>(args)...) );
}
and the code in main:
a.emplace();
where, as before, you could put constructor arguments for B inside the parentheses.
Link to working example
Prior to C++11, if I had a function that operated on large objects, my instinct would be to write functions with this kind of prototype.
void f(A &return_value, A const ¶meter_value);
(Here, return_value is just a blank object which will receive the output of the function. A is just some class which is large and expensive to copy.)
In C++11, taking advantage of move semantics, the default recommendation (as I understand it) is the more straightforward:
A f(A const ¶meter_value);
Is there ever still a need to do it the old way, passing in an object to hold the return value?
Others have covered the case where A might not have a cheap move constructor. I'm assuming your A does. But there is still one more situation where you might want to pass in an "out" parameter:
If A is some type like vector or string and it is known that the "out" parameter already has resources (such as memory) that can be reused within f, then it makes sense to reuse that resource if you can. For example consider:
void get_info(std::string&);
bool process_info(const std::string&);
void
foo()
{
std::string info;
for (bool not_done = true; not_done;)
{
info.clear();
get_info(info);
not_done = process_info(info);
}
}
vs:
std::string get_info();
bool process_info(const std::string&);
void
foo()
{
for (bool not_done = true; not_done;)
{
std::string info = get_info();
not_done = process_info(info);
}
}
In the first case, capacity will build up in the string as the loop executes, and that capacity is then potentially reused on each iteration of the loop. In the second case a new string is allocated on every iteration (neglecting the small string optimization buffer).
Now this isn't to say that you should never return std::string by value. Just that you should be aware of this issue and apply engineering judgment on a case by case basis.
It is possible for an object to be large and expensive to copy, and for which move semantics cannot improve on copying. Consider:
struct A {
std::array<double,100000> m_data;
};
It may not be a good idea to design your objects this way, but if you have an object of this type for some reason and you want to write a function to fill the data in then you might do it using an out param.
It depends: does your compiler support return-value-optimization, and is your function f designed to be able to use the RVO your compiler supports?
If so, then yes, by all means return by value. You will gain nothing at all by passing a mutable parameter, and you'll gain a great deal of code clarity by doing it this way. If not, then you have to investigate the definition of A.
For some types, a move is nothing more than a copy. If A doesn't contain anything that is actually worth moving (pointers transferring ownership and so forth), then you're not going to gain anything by moving. A move isn't free, after all; it's simply a copy that knows that anything owned by the original is being transferred to the copy. If the type doesn't own anything, then a move is just a copy.
What's better as default, to return a copy (1) or a reference (2) from a getter function?
class foo {
public:
std::string str () { // (1)
return str_;
}
const std::string& str () { // (2)
return str_;
}
private:
std::string str_;
};
I know 2) could be faster but don't have to due to (N)RVO. 1) is safer concerning dangling references but the object will probably outlife or the reference is never stored.
What's your default when you write a class and don't know (yet) whether performance and lifetime issues matter?
Additional question: Does the game change when the member is not a plain string but rather a vector?
Well it really depends on what you expect the behaviour to be, by default.
Do you expect the caller to see changes made to str_ unbeknownst(what a word!) to them? Then you need to pass back a reference. Might be good if you can have a refcounted data member and return that.
If you expect the caller to get a copy, do 1).
My rule of thumb is to return a copy for simple basic datatypes such as int, string etc. For a bit more complicated structures where copying may be costlier (like vector you mentioned) I prefer to return a const-reference.
The compiler will not be able to perform (N)RVO in this case. The (named) return value optimization is an optimization where the compiler creates the function auto variables in the place of the return value to avoid having to copy:
std::string f()
{
std::string result;
//...
return result;
}
When the compiler sees the code above (and assuming that if any other return is present it will also return the result variable) it knows that the variable result has as only possible fate being copied over the returned temporary and then destroyed. The compiler can then remove the result variable altogether and use the return temporary as the only variable. I insist: the compiler does not remove the return temporary, it removes the local function variable. The return temporary is required to fulfill the compilers call convention.
When you are returning a member of your class, the member must exist, and the call convention requires the returned object to be in a particular location (stack address usually). The compiler cannot create the method attribute over the returned object location, nor can it elide making the copy.
I'm returning a reference, because a string seems not "cheap to copy" to me. It's a complex data type with dynamic memory management and all that.
The "if you want the caller to get a copy, you should return by value" argument is moot, because it doesn't preclude copies at all. The caller can still do the following and get a copy anyway
string s = obj.str();
You need to explicitly create a reference on the caller side to be able to refer to the data member directly afterwards - but why would you do that? There definitely are enough user defined types that are cheap to copy
Smart Pointers
Iterators
All of the non-class types.
Returning a reference to an object's internals as part of its public interface can be a code smell if not outright bad design.
Before returning a reference to an internal object in a public interface, the designer should pause. Doing so couples users of your class to part of your design. Often it is outright unnecessary, sometimes it indicates further design work is needed. At times it is necessary, as commenters have noted.
If there is no special reason to use a value type as return value, I always return a const reference. If I need (or expect to need) a (writable) copy, I add a copy ctor and an assignment operator to the returned class if not already available. For the usage think of:
const MyClass & ref = container.GetAt( 1234 ); // need only reference
MyClass copy = container.GetAt( 1234 ); // get writable copy
Actually this is quite straight forward, isn't it?
if its a small basic type - primatives like int and long and their wrappers and other basic things like 'Point' - return a copy
if its a string, or any other complex type - return a reference.
The only problem I have with returning a const-reference, which is something I would typically do for non basic types, is that there is nothing to stop the caller removing the "const"ness and then modifying the value.
Personally, I'd suggest that such code is a bug. If they know you're returning a reference and continue to cast away the const then it's on their head.
Why would one use func( const Class &value ) rather than just func( Class value )? Surely modern compilers will do the most efficient thing using either syntax. Is this still necessary or just a hold over from the days of non-optimizing compilers?
Just to add, gcc will produce similar assembler code output for either syntax. Perhaps other compilers do not?
Apparently, this is just not the case. I had the impression from some code long ago that gcc did this, but experimentation proves this wrong. Credit is due to to Michael Burr, whose answer to a similar question would be nominated if given here.
There are 2 large semantic differences between the 2 signatures.
The first is the use of & in the type name. This signals the value is passed by reference. Removing this causes the object to be passed by value which will essentially pass a copy of the object into the function (via the copy constructor). For operations which simply need to read data (typical for a const &) doing a full copy of the object creates unnecssary overhead. For classes which are not small or are collections, this overhead is not trivial.
The second is the use of const. This prevents the function from accidentally modifying the contents of value via the value reference. It allows the caller some measure of assurance the value will not be mutated by the function. Yes passing a copy gives the caller a much deeper assurance of this in many cases.
The first form doesn't create a copy of the object, it just passes a reference (pointer) to the existing copy. The second form creates a copy, which can be expensive. This isn't something that is optimized away: there are semantic differences between having a copy of an object vs. having the original, and copying requires a call to the class's copy constructor.
For very small classes (like <16 bytes) with no copy constructor it is probably more efficient to use the value syntax rather than pass references. This is why you see void foo(double bar) and not void foo(const double &var). But in the interests of not micro-optimizing code that doesn't matter, as a general rule you should pass all real-deal objects by reference and only pass built-in types like int and void * by value.
There is a huge difference which nobody has mentioned yet: object slicing. In some cases, you may need const& (or &) to get correct behavior.
Consider another class Derived which inherits from Class. In client code, you create an instance of Derived which you pass to func(). If you have func(const Class&), that same instance will get passed. As others have said, func(Class) will make a copy, you will have a new (temporary) instance of Class (not Derived) in func.
This difference in behavior (not performance) can be important if func in turn does a downcast. Compare the results of running the following code:
#include <typeinfo.h>
struct Class
{
virtual void Foo() {};
};
class Derived : public Class {};
void f(const Class& value)
{
printf("f()\n");
try
{
const Derived& d = dynamic_cast<const Derived&>(value);
printf("dynamic_cast<>\n");
}
catch (std::bad_cast)
{
fprintf(stderr, "bad_cast\n");
}
}
void g(Class value)
{
printf("g()\n");
try
{
const Derived& d = dynamic_cast<const Derived&>(value);
printf("dynamic_cast<>\n");
}
catch (std::bad_cast)
{
fprintf(stderr, "bad_cast\n");
}
}
int _tmain(int argc, _TCHAR* argv[])
{
Derived d;
f(d);
g(d);
return 0;
}
Surely modern compilers will do the
most efficient thing using either
syntax
The compiler doesn't compile what you "mean", it compiles what you tell it to. Compilers are only smart for lower level optimizations and problems the programmer overlooks (such as computation inside a for loop, dead code etc).
What you tell the compiler to do in the second example, is to make a copy of the class - which it will do without thinking - even if you didn't use it, that's what you asked the compiler to do.
The second example explicitly asks the compiler to use the same variable - conserving space and precious cycles (no copy is needed). The const is there for mistakes - since Class &value can be written to (sometimes it's desired).
Here are the differences between some parameter declarations:
copied out modifiable
func(Class value) Y N Y
func(const Class value) Y N N
func(Class &value) N Y Y
func(const Class &value) N N N
where:
copied: a copy of the input parameter is made when the function is called
out: value is an "out" parameter, which means modifications made within func() will be visible outside the function after it returns
modifiable: value can be modified within func()
So the differences between func(Class value) and func(const Class &value) are:
The first one makes a copy of the input parameter (by calling the Class copy constructor), and allows code inside func() to modify value
The second one does not make a copy, and does not allow code inside func() to modify value
If you use the former, and then try to change value, by accident, the compiler will give you an error.
If you use the latter, and then try to change value, it won't.
Thus the former makes it easier to catch mistakes.
The first example is pass by reference. Rather than pass the type, C++ will pass a reference to the object (generally, references are implemented with pointers... So it's likely an object of size 4 bytes)... In the second example, the object is passed by value... if it is a big, complex object then likely it's a fairly heavyweight operation as it involves copy construction of a new "Class".
The reason that an optimizing compiler can't handle this for you is the issue of separate compilation. In C++, when the compiler is generating code for a caller, it may not have access to the code of the function itself. The most common calling convention that I know of usually has the caller invoke the copy-constructor which means it's not possible for the compilation of the function itself to prevent the copy constructor if it's not necessary.
The only time that passing a parameter by value is preferable is when you are going to copy the parameter anyway.
std::string toUpper( const std::string &value ) {
std::string retVal(value);
transform(retVal.begin(), retVal.end(), charToUpper());
return retVal;
}
Or
std::string toUpper( std::string value ) {
transform(value.begin(), value.end(), charToUpper());
return value;
}
In this case the second example is the same speed as the first if the value parameter is a regular object, but faster if the value parameter is a R-Value.
Although most compilers will do this optimisation already I don't expect to rely on this feature till C++0X, esp since I expect it could confuse most programmers who would probably change it back.
See Want Speed? Pass by Value. for a better explaination than I could give.