I ran across this article on copy ellision in C++ and I've seen comments about it in the boost library. This is appealing, as I prefer my functions to look like
verylargereturntype DoSomething(...)
rather than
void DoSomething(..., verylargereturntype& retval)
So, I have two questions about this
Google has virtually no documentation on this at all, how real is this?
How can I check that this optimization is actually occuring? I assume it involves looking at the assembly, but lets just say that isn't my strong suit. If anyone can give a very basic example as to what successful ellision looks like, that would be very useful
I won't be using copy ellision just to prettify things, but if I can be guaranteed that it works, it sounds pretty useful.
I think this is a very commonly applied optimization because:
it's not difficult for the compiler to do
it can be a huge gain
it's an area of C++ that was a commonly critiqued before the optimization became common
If you're just curious, put a debug printf() in your copy constructor:
class foo {
public:
foo(): x(0) {};
foo(int x_) : x( x_) {};
foo( foo const& other) : x( other.x) {
printf( "copied a foo\n");
};
static foo foobar() {
foo tmp( 2);
return tmp;
}
private:
int x;
};
int main()
{
foo myFoo;
myFoo = foo::foobar();
return 0;
}
Prints out "copied a foo" when I run an unoptimmized build, but nothing when I build optimized.
From your cited article:
Although copy elision is never required by the standard, recent versions of every compiler I’ve tested do perform these optimizations today. But even if you don’t feel comfortable returning heavyweight objects by value, copy elision should still change the way you write code.
It is better known as Return Value Optimization.
The only way to know for sure is to look at the assembly, but you're asking the wrong question. You don't need to know if the compiler is eliding the copy unless it matters to the program timing. A profiler should easily tell you if you're spending too much time in the copy constructor.
The poor man's way to figure it out is to put a static counter in the copy constructor and try both forms of your function. If the counts are the same, you've successfully avoided the copy.
Google "Named Return Value Optimization" and "Return Value Optimization" instead. Modern compilers will in fact not perform the copy in many cases.
You can check if it's occurring by returning a type with side effects -- such as printing a message. Wikipedia has some good examples of where program output changes when RVO and/or NRVO is in effect.
Example of what it looks like:
#include <iostream>
struct Foo {
int a;
Foo(int a) : a(a) {}
Foo(const Foo &rhs) : a(rhs.a) { std::cout << "copying\n"; }
};
int main() {
Foo f = Foo(1);
}
If you see no output, then copy elision has taken place. That's elision of a copy from an initializer. The other legal case of copy elision is a return value, and is tested by:
Foo getFoo() {
return Foo(1);
}
int main() {
Foo f = getFoo();
}
or more excitingly for a named return value:
Foo getFoo() {
Foo f(1);
return f;
}
int main() {
Foo f = getFoo();
}
g++ performs all those elisions for me with no optimisation flags, but you can't really know whether more complex code will outwit the compiler.
Note that copy elision doesn't help with assignment, so the following will always result in a call to operator= if that operator prints anything:
Foo f(1);
f = getFoo();
Returning by value therefore can still result in "a copy", even if copy constructor elision is performed. So for chunking great classes it's still a performance consideration at the design stage. You don't want to write your code such that fixing it later will be a big deal if it turns out your app spends a significant proportion of its time in copying that could have been avoided.
To answer question 2, you could write a demo program where you write a class DemoReturnType; which has instrumented constructors and destructors which just write to cout when they are called. This should give you enough information about what your compiler is capable of.
Rvalue references solve this problem in C++0x. Whether or not you can obtain an rvalue-enabled compiler is another question - last time I checked only Visual Studio 2010 supports it.
Related
(This question is inspired by Nicolai Josuttis' CppCon 2017 talk.)
Consider the following source file (for an object, not a complete program):
#include <string>
class C {
std::string s_;
public:
C(std::string s) : s_(s) { };
void bar();
};
void foo() {
std::string hello { "The quick brown fox jumped over the lazy dog" };
C c { hello };
c.bar();
}
And its compilation result on GodBolt.
Even with the -O2 (and even with -O3) it seems a string constructor is called three times. Specifically, s is constructed, used only to construct s_, then destructed. My questions:
Is the compiler allowed to simply construct s_ from the arguments to the ctor, not constructing s at all?
If not, is the compiler allowed to move-construct s_ from s, seeing how the latter is unused?
If any of the previous answers is "yes" - why aren't gcc and clang doing so?
If s is properly constructed, can't the compiler avoid the construction of hello, seeing how it has no other use? Or at least move from it?
Under as-if, I'm certain most of what you ask for could be done, assuming you go to link time and you make bar empty and never override new anywhere.
But then, under as-if, your program is an empty program, it has no observable effects.
The compiler is not permitted to move construct s_ from s under the rules of the abstract machine. If you want it to be move constructed, std::move it.
The situations where an lvalue can be treated as an rvalue are limited and specific and involve return x; statements. This is not a return x; statement.
So your code must copy s into s_. Quite possibly it should also generate a warning as a quality of implementation issue.
The compiler is not permitted to elide s into s_. There have been some c++20 proposals to permit much more aggressive elision rules.
But as of right now, elision is only permitted under as-if, with prvalues, or with return x; statements. As-if elimination is really, really hard to prove with something as complex as allocation, most compilers don't try. And it isn't possible at object file generation, because someone could replace the global allocator.
Imagine a global allocator override that prints out how many allocations are done. Then those "never used" objects are used in that they should print out the allocations they do.
Or a global allocator that calls exit after 2 allocations. The abtract machine resulting should never call bar(); if we eliminate your extra objects, the program doesn't behave as the standard mandates.
As this wiki page says (code exerted as below), return value optimization is an allowed by C++ compiler, but still depends on the implementation. To reduce the cost of copying, is it recommended to do optimize it manually (assign the object of function to a reference, like const C& obj = f();) or leave the compiler to do such optimization in practice?
#include <iostream>
struct C {
C() {}
C(const C&) { std::cout << "A copy was made.\n"; }
};
C f() {
return C();
}
int main() {
std::cout << "Hello World!\n";
C obj = f();
}
EDIT: Update the change as const reference.
You can't (portably) use the temporary return value to initialise a non-const reference, so that's certainly not recommended.
Using it to initialise a const reference wouldn't have any effect on whether or not the copy/move of the return expression's value might be elided; although it would eliminate the notional copy/move used to initialise the variable from the returned value, whether or not that might have been elided. Of course, that's not the same as initialising a (non-reference) variable, since you can't modify it.
In practice, any compiler with a decent optimiser will elide copies and moves wherever it's allowed to. If you're not using a decent optimiser, then you can't expect decent performance anyway.
To manually make sure you don't get any redundant copies of objects, you need to do a bit more than what you did so far. Earlier I answered that it wouldn't be possible, but I was wrong. Also, your use of const& may disallow certain operations on the returned value that you do want to allow. Here's what I would do if you need to do the optimisations manually:
#include <iostream>
struct S {
S()
{ std::cout << "default constructor\n"; }
S(const S &)
{ std::cout << "copy constructor\n"; }
S(S &&)
{ std::cout << "move constructor\n"; }
~S()
{ std::cout << "destructor\n"; }
};
S f() { return {}; }
int main() {
auto&&s = f();
std::cout << "main\n";
}
This prints "default constructor", followed by "main", and then "destructor". This is the output regardless of whether any copy elision takes place. Inside main, s is a named reference, so it is an lvalue, and it is not const-qualified. You can do everything with it that you otherwise could.
Given that it turns out fairly easy to avoid relying on copy elision in cases such as these, as long as you take care to pay attention to it from the start, it may be worth your efforts if you have to worry about other compilers not performing copy elision. Most compilers are capable of that, and there is a fair chance that if a compiler doesn't, it will have other, bigger, problems anyway, so there is a good argument for not worrying about it.
However, at the same time, copy elision is somewhat unreliable: even current optimising compilers do not always perform it, simply because there may be corner cases where copy elision would make sense, but is not permitted by the standard or not possible for that particular implementation. Forcing yourself to write code that doesn't rely on copy elision means you cannot get stuck in that situation.
That said, there are still some cases where copy elision can only realistically be eliminated by optimising compilers, so you may have no choice but to rely on it:
Suppose we add void m(); to S's definition. Suppose we now edit f to
S f() {
S s;
s.m();
return s;
}
This is more difficult to rewrite into a form that guarantees no redundant copies. Yet at the same time, copies are unnecessary, as can easily be determined from the fact that with GCC (and probably other compilers too), by default, no copies are made.
My final conclusion is that it's probably not worth optimising for compilers that don't perform RVO, but it is worth thinking carefully about what exactly makes it work, and writing code in such a way that RVO remains not only possible, but becomes something a compiler is very likely to do.
What does it take to use the move assignment operator of std::string (in VC11)?
I hoped it'd be used automatically as v isn't needed after the assignment anymore.
Is std::move required in this case? If so, I might as well use the non-C++11 swap.
#include <string>
struct user_t
{
void set_name(std::string v)
{
name_ = v;
// swap(name_, v);
// name_ = std::move(v);
}
std::string name_;
};
int main()
{
user_t u;
u.set_name("Olaf");
return 0;
}
I hoped it'd be used automatically as v isn't needed after the assignment anymore. Is std::move required in this case?
Movement always must be explicitly stated for lvalues, unless they are being returned (by value) from a function.
This prevents accidentally moving something. Remember: movement is a destructive act; you don't want it to just happen.
Also, it would be strange if the semantics of name_ = v; changed based on whether this was the last line in a function. After all, this is perfectly legal code:
name_ = v;
v[0] = 5; //Assuming v has at least one character.
Why should the first line execute a copy sometimes and a move other times?
If so, I might as well use the non-C++11 swap.
You can do as you like, but std::move is more obvious as to the intent. We know what it means and what you're doing with it.
The accepted answer is a good answer (and I've upvoted it). But I wanted to address this question in a little more detail:
The core of my question is: Why doesn't it pick the move assignment
operator automatically? The compiler knows v isn't used after the
assignment, doesn't it? Or does C++11 not require the compiler to be
that smart?
This possibility was looked at during the design of move semantics. At an extreme, you might want the compiler to do some static analysis and move from objects whenever possible:
void set_name(std::string v)
{
name_ = v; // move from v if it can be proven that some_event is false?
if (some_event)
f(v);
}
Ultimately demanding this kind of analysis from the compiler is very tricky. Some compilers may be able to make the proof, and others may not. Thus leading to code that isn't really portable.
Ok, so what about some simpler cases without if statements?
void foo()
{
X x;
Y y(x);
X x2 = x; // last use? move?
}
Well, it is difficult to know if y.~Y() will notice x has been moved from. And in general:
void foo()
{
X x;
// ...
// lots of code here
// ...
X x2 = x; // last use? move?
}
it is difficult for the compiler to analyze this to know if x is truly no longer used after the copy construction to x2.
So the original "move" proposal gave a rule for implicit moves that was really simple, and very conservative:
lvalues can only be implicitly moved from in cases where copy
elision is already permissible.
For example:
#include <cassert>
struct X
{
int i_;
X() : i_(1) {}
~X() {i_ = 0;}
};
struct Y
{
X* x_;
Y() : x_(0) {}
~Y() {assert(x_ != 0); assert(x_->i_ != 0);}
};
X foo(bool some_test)
{
Y y;
X x;
if (some_test)
{
X x2;
return x2;
}
y.x_ = &x;
return x;
}
int main()
{
X x = foo(false);
}
Here, by C++98/03 rules, this program may or may not assert, depending on whether or not copy elision at return x happens. If it does happen, the program runs fine. If it doesn't happen, the program asserts.
And so it was reasoned: When RVO is allowed, we are already in an area where there are no guarantees regarding the value of x. So we should be able to take advantage of this leeway and move from x. The risk looked small and the benefit looked huge. Not only would this mean that many existing programs would become much faster with a simple recompile, but it also meant that we could now return "move only" types from factory functions. This is a very large benefit to risk ratio.
Late in the standardization process, we got a little greedy and also said that implicit move happens when returning a by-value parameter (and the type matches the return type). The benefits seem relatively large here too, though the chance for code breakage is slightly larger since this is not a case where RVO was (or is) legal. But I don't have a demonstration of breaking code for this case.
So ultimately, the answer to your core question is that the original design of move semantics took a very conservative route with respect to breaking existing code. Had it not, it would surely have been shot down in committee. Late in the process, there were a few changes that made the design a bit more aggressive. But by this time the core proposal was firmly entrenched in the standard with a majority (but not unanimous) support.
In your example, set_name takes the string by value. Inside set_name, however, v
is an lvalue. Let's treat these cases separately:
user_t u;
std::string str("Olaf"); // Creates string by copying a char const*.
u.set_name(std::move(str)); // Moves string.
Inside set_name you invoke the assignment operator of std::string,
which incurs an unnecessary copy. But there is also an rvalue
overload of operator=,
which makes more sense in your case:
void set_name(std::string v)
{
name_ = std::move(v);
}
This way, the only copying that takes place is the string constrution
(std::string("Olaf")).
The most interesting C++ question I've encountered recently goes as follows:
We determined (through profiling) that our algorithm spends a lot of time in debug mode in MS Visual Studio 2005 with functions of the following type:
MyClass f(void)
{
MyClass retval;
// some computation to populate retval
return retval;
}
As most of you probably know, the return here calls a copy constructor to pass out a copy of retval and then the destructor on retval. (Note: the reason release mode is very fast for this is because of the return value optimization. However, we want to turn this off when we debug so that we can step in and nicely see things in the debugger IDE.)
So, one of our guys came up with a cool (if slightly flawed) solution to this, which is, create a conversion operator:
MyClass::MyClass(MyClass *t)
{
// construct "*this" by transferring the contents of *t to *this
// the code goes something like this
this->m_dataPtr = t->m_dataPtr;
// then clear the pointer in *t so that its destruction still works
// but becomes 'trivial'
t->m_dataPtr = 0;
}
and also changing the function above to:
MyClass f(void)
{
MyClass retval;
// some computation to populate retval
// note the ampersand here which calls the conversion operator just defined
return &retval;
}
Now, before you cringe (which I am doing as I write this), let me explain the rationale. The idea is to create a conversion operator that basically does a "transfer of contents" to the newly constructed variable. The savings happens because we're no longer doing a deep copy, but simply transferring the memory by its pointer. The code goes from a 10 minute debug time to a 30 second debug time, which, as you can imagine, has a huge positive impact on productivity. Granted, the return value optimization does a better job in release mode, but at the cost of not being able to step in and watch our variables.
Of course, most of you will say "but this is abuse of a conversion operator, you shouldn't be doing this kind of stuff" and I completely agree. Here's an example why you shouldn't be doing it too (this actually happened:)
void BigFunction(void)
{
MyClass *SomeInstance = new MyClass;
// populate SomeInstance somehow
g(SomeInstance);
// some code that uses SomeInstance later
...
}
where g is defined as:
void g(MyClass &m)
{
// irrelevant what happens here.
}
Now this happened accidentally, i.e., the person who called g() should not have passed in a pointer when a reference was expected. However, there was no compiler warning (of course). The compiler knew exactly how to convert, and it did so. The problem is that the call to g() will (because we've passed it a MyClass * when it was expecting a MyClass &) called the conversion operator, which is bad, because it set the internal pointer in SomeInstance to 0, and rendered SomeInstance useless for the code that occured after the call to g(). ... and time consuming debugging ensued.
So, my question is, how do we gain this speedup in debug mode (which has as direct debugging time benefit) with clean code that doesn't open the possibility to make such other terrible errors slip through the cracks?
I'm also going to sweeten the pot on this one and offer my first bounty on this one once it becomes eligible. (50 pts)
You need to use something called "swaptimization".
MyClass f(void)
{
MyClass retval;
// some computation to populate retval
return retval;
}
int main() {
MyClass ret;
f().swap(ret);
}
This will prevent a copy and keep the code clean in all modes.
You can also try the same trick as auto_ptr, but that's more than a little iffy.
If your definition of g is written the same as in your code base I'm not sure how it compiled since the compiler isn't allowed to bind unnamed temporaries to non-const references. This may be a bug in VS2005.
If you make the converting constructor explicit then you can use it in your function(s) (you would have to say return MyClass(&retval);) but it won't be allowed to be called in your example unless the conversion was explicitly called out.
Alternately move to a C++11 compiler and use full move semantics.
(Do note that the actual optimization used is Named Return Value Optimization or NRVO).
The problem is occuring because you're using MyClass* as a magic device, sometimes but not always. Solution: use a different magic device.
class MyClass;
class TempClass { //all private except destructor, no accidental copies by callees
friend MyClass;
stuff* m_dataPtr; //unfortunately requires duplicate data
//can't really be tricked due to circular dependancies.
TempClass() : m_dataPtr(NULL) {}
TempClass(stuff* p) : m_dataPtr(p) {}
TempClass(const TempClass& p) : m_dataPtr(p) {}
public:
~TempClass() {delete m_dataPtr;}
};
class MyClass {
stuff* m_dataPtr;
MyClass(const MyClass& b) {
m_dataPtr = new stuff();
}
MyClass(TempClass& b) {
m_dataPtr = b.m_dataPtr ;
b.m_dataPtr = NULL;
}
~MyClass() {delete m_dataPtr;}
//be sure to overload operator= too.
TempClass f(void) //note: returns hack. But it's safe
{
MyClass retval;
// some computation to populate retval
return retval;
}
operator TempClass() {
TempClass r(m_dataPtr);
m_dataPtr = nullptr;
return r;
}
Since TempClass is almost all private (friending MyClass), other objects cannot create, or copy TempClass. This means the hack can only be created by your special functions when clearly told to, preventing accidental usage. Also, since this doesn't use pointers, memory can't be accidentally leaked.
Move semantics have been mentioned, you've agreed to look them up for education, so that's good. Here's a trick they use.
There's a function template std::move which turns an lvalue into an rvalue reference, that is to say it gives "permission" to move from an object[*]. I believe you can imitate this for your class, although I won't make it a free function:
struct MyClass;
struct MovableMyClass {
MyClass *ptr;
MovableMyClass(MyClass *ptr) : ptr(ptr) {}
};
struct MyClass {
MyClass(const MovableMyClass &tc) {
// unfortunate, we need const reference to bind to temporary
MovableMyClass &t = const_cast<MovableMyClass &>(tc);
this->m_dataPtr = t.ptr->m_dataPtr;
t.ptr->m_dataPtr = 0;
}
MovableMyClass move() {
return MovableMyClass(this);
}
};
MyClass f(void)
{
MyClass retval;
return retval.move();
}
I haven't tested this, but something along those lines. Note the possibility of doing something const-unsafe with a MovableMyClass object that actually is const, but it should be easier to avoid ever creating one of those than it is to avoid creating a MyClass* (which you've found out is quite difficult!)
[*] Actually I'm pretty sure I've over-simplified that to the point of being wrong, it's actually about affecting what overload gets chosen rather than "turning" anything into anything else as such. But causing a move instead of a copy is what std::move is for.
A different approach, given your special scenario:
Change MyClass f(void) (or operator+) to something like the following:
MyClass f(void)
{
MyClass c;
inner_f(c);
return c;
}
And let inner_f(c) hold the actual logic:
#ifdef TESTING
# pragma optimize("", off)
#endif
inline void inner_f(MyClass& c)
{
// actual logic here, setting c to whatever needed
}
#ifdef TESTING
# pragma optimize("", on)
#endif
Then, create an additional build configurations for this kind of testing, in which TESTING is included in the preprocessor definitions.
This way, you can still take advantage of RVO in f(), but the actual logic will not be optimized on your testing build. Note that the testing build can either be a release build or a debug build with optimizations turned on. Either way, the sensitive parts of the code will not be optimized (you can use the #pragma optimize in other places too, of course - in the code above it only affects inner_f itself, and not code called from it).
Possible solutions
Set higher optimization options for the compiler so it optimizes out the copy construction
Use heap allocation and return pointers or pointer wrappers, preferably with garbage collection
Use the move semantics introduced in C++11; rvalue references, std::move, move constructors
Use some swap trickery, either in the copy constructor or the way DeadMG did, but I don't recommend them with a good conscience. An inappropriate copy constructor like that could cause problems, and the latter is a bit ugly and needs easily destructible default objects which might not be true for all cases.
+1: Check and optimize your copy constructors, if they take so long then something isn't right about them.
I would prefer to simply pass the object by reference to the calling function when MyClass is too big to copy:
void f(MyClass &retval) // <--- no worries !
{
// some computation to populate retval
}
Just simple KISS principle.
Okay I think I have a solution to bypass the Return Value Optimization in release mode, but it depends on the compiler and not guaranteed to work. It is based on this.
MyClass f (void)
{
MyClass retval;
MyClass dummy;
// ...
volatile bool b = true;
if b ? retval : dummy;
}
As for why the copy construction takes so long in DEBUG mode, I have no idea. The only possible way to speed it up while remaining in DEBUG mode is to use rvalue references and move semantics. You already discovered move semantics with your "move" constructor that accepts pointer. C++11 gives a proper syntax for this kind of move semantics. Example:
// Suppose MyClass has a pointer to something that would be expensive to clone.
// With move construction we simply move this pointer to the new object.
MyClass (MyClass&& obj) :
ptr (obj.ptr)
{
// We set the source object to some trivial state so it is easy to delete.
obj.ptr = NULL;
}
MyClass& operator = (MyClass&& obj) :
{
// Here we simply swap the pointer so the old object will be destroyed instead of the temporary.
std::swap(ptr, obj.ptr);
return *this;
}
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why copy constructor is not called in this case?
When you pass an object to a function by value or return an object from a function by value, the copy constructor must be called. However, in some compilers this does not happen? Any explanation?
I assume they are referring to return-value optimization implemented in many compilers where the code:
CThing DoSomething();
gets turned into
void DoSomething(CThing& thing);
with thing being declared on the stack and passed in to DoSomething:
CThing thing;
DoSomething(thing);
which prevents CThing from needing to be copied.
It often doesn't happen because it doesn't need to happen. This is called copy elision. In many cases, the function doesn't need to make copies, so the compiler optimizes them away. For example, with the following function:
big_type foo(big_type bar)
{
return bar + 1;
}
big_type a = foo(b);
Will get converted to something like:
void foo(const big_type& bar, big_type& out)
{
out = bar + 1;
}
big_type a;
foo(b, a);
The removal of the return value is called the "Return Value Optimization" (RVO), and is implemented by most compilers, even when optimizations are turned off!
The compiler may call the copy constructor for pass-by-value or return-by-value, but it doesn't have to. The standard allows for optimizing it away (in standardese it's called copy elision) and in practice many compilers will do so, even if you don't have optimizations turned on. The explanation is pretty detailed, so I'll point you at C++ FAQ LITE.
Short version:
struct Foo
{
int a, b;
Foo(int A, int B) : a(A), b(B) {}
};
Foo make_me_a_foo(int x)
{
// ...blah, blah blah...
return Foo(x, x+1); // (1)
}
Foo bar = make_me_a_foo(42); // (2)
The trick here is the compiler is allowed to construct bar from line (2) directly in line (1) without incurring any overhead of constructing temporary Foo objects.