A function needs to return two values to the caller. What is the best way to implement?
Option 1:
pair<U,V> myfunc()
{
...
return make_pair(getU(),getV());
}
pair<U,V> mypair = myfunc();
Option 1.1:
// Same defn
U u; V v;
tie(u,v) = myfunc();
Option 2:
void myfunc(U& u , V& v)
{
u = getU(); v= getV();
}
U u; V v;
myfunc(u,v);
I know with Option2, there are no copies/moves but it looks ugly. Will there be any copies/moves occur in Option1, 1.1? Lets assume U and V are huge objects supporting both copy/move operations.
Q: Is it theoretically possible for any RVO/NRVO optimizations as per the standard? If yes, has gcc or any other compiler implemented yet?
Will RVO happen when returning std::pair?
Yes it can.
Is it guaranteed to happen?
No it is not.
C++11 standard: Section 12.8/31:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.
Copy elision is not a guaranteed feature. It is an optimization compilers are allowed to perform whenever they can. There is nothing special w.r.t std::pair. If a compiler is good enough to detect an optimization opportunity it will do so. So your question is compiler specific but yes same rule applies to std::pair as to any other class.
While RVO is not guaranteed, in C++11 the function as you have defined it I believe MUST move-return at the very least, so I would suggest leaving the clearer definition rather than warping it to accept output-variables (Unless you have a specific policy for using them).
Also, even if this example did use RVO, your explicit use of make_pair means you will always have at least one extra pair construction and thus a move operation. Change it to return a brace-initialized expression:
return { getU(), getV() };
RVO or Copy elision is dependant on compiler so if you want to have RVO and avoid call to Copy constructor best option is to use pointers.
In our product we use use pointers and boost containers pointer to avoid Copy constructor. and this indeed gives performance boost of around 10%.
Coming to your question,
In option 1 U and V's copy constructor will not be called as you are not returning U or V but returning std::pair object so it's copy constructor will be called and most compilers will definately use RVO here to avoid that.
Thanks
Niraj Rathi
If you need to do additional work on u and v after having created the pair, I find the following pattern pretty flexible in C++17:
pair<U,V> myfunc()
{
auto out = make_pair(getU(),getV());
auto& [u, v] = out;
// Work with u and v
return out;
}
This should be a pretty easy case for the compiler to use named return value optimization
Related
Consider the following piece of code:
std::vector<int> Foo() {
std::vector<int> v = Bar();
return v;
}
return v is O(1), since NRVO will omit the copy, constructing v directly in the storage where the function's return value would otherwise be moved or copied to. Now consider the functionally analogous code:
void Foo(std::vector<int> * to_be_filled) {
std::vector<int> v = Bar();
*to_be_filled = v;
}
A similar argument could be made here, as *to_be_filled = v could conceivably be compiled to an O(1) move-assign, since it's a local variable that's going out of scope (it should be easy enough for the compiler to verify that v has no external references in this case, and thus promote it to an rvalue on its last use). Is this the case? Is there a subtle reason why not?
Furthermore, it feels like this pattern can be extended to any context where an lvalue goes out of scope:
void Foo(std::vector<int> * to_be_filled) {
if (Baz()) {
std::vector<int> v = Bar();
*to_be_filled = v;
}
...
}
Do / can / is it useful / reasonable to expect compilers to find patterns such as the *to_be_filled = v and then automatically optimize them to assume rvalue semantics?
Edit:
g++ 7.3.0 does not perform any such optimizations in -O3 mode.
The compiler is not permitted to arbitrarily decide to transform an lvalue name into an rvalue to be moved from. It can only do so where the C++ standard permits it to do so. Such as in a return statement (and only when its return <identifier>;).
*to_be_filled = v; will always perform a copy. Even if it's the last statement that can access v, it is always a copy. Compilers aren't allowed to change that.
My understanding is that return v is O(1), since NRVO will (in effect) make v into an rvalue, which then makes use of std::vector's move-constructor.
That's not how it works. NRVO would eliminate the move/copy entirely. But the ability for return <identifier>; to be an rvalue is not an "optimization". It's actually a requirement that compilers treat them as rvalues.
Compilers have a choice about copy elision. Compilers don't have a choice about what return <identifier>; does. So the above will either not move at all (if NRVO happens) or will move the object.
Is there a subtle reason why not?
One reason this isn't allowed is because the location of a statement should not arbitrarily change what that statement is doing. See, return <identifier>; will always move from the identifier (if it's a local variable). It doesn't matter where it is in the function. By virtue of being a return statement, we know that if the return is executed, nothing after it will be executed.
That's not the case for arbitrary statements. The behavior of the expression *to_be_filled = v; should not change based on where it happens to be in code. You shouldn't be able to turn a move into a copy just because you add another line to the function.
Another reason is that arbitrary statements can get really complicated really quickly. return <identifier>; is very simple; it copies/moves the identifier to the return value and returns.
By contrast, what happens if you have a reference to v, and that gets used by to_be_filled somehow. Sure that can't happen in your case, but what about other, more complex cases? The last expression could conceivably read from a reference to a moved-from object.
It's a lot harder to do that in return <identifier>; cases.
I am reading the official CPPCoreGuidelines to understand correctly when it's reliable to count on RVO and when not.
At F20 it is written:
If a type is expensive to move (e.g., array), consider
allocating it on the free store and return a handle (e.g.,
unique_ptr), or passing it in a reference to non-const target object
to fill (to be used as an out-parameter)
I understand that the non-STL types are not optimized to move, but how can I easy detect other types expensive to move, so I will not use RVO on them?
You seem to have misunderstood what "RVO" is. "RVO" stands for "return value optimization" and it's a compiler optimization that prevents any move or copy constructor from being invoked. E.g.
std::vector<huge_thing> foo()
{
std::vector<huge_thing> result{/* ... */};
return result;
}
void bar()
{
auto v = foo(); // (0)
}
Any decent compiler will not execute any copy/move operation and simply construct v in place at (0). In C++17, this is mandatory thanks to the changes to prvalues.
In terms of expensive moves: sure, there can be types expensive to move - but I cannot think of any instance where a move would be more expensive than a copy.
Therefore:
Rely on RVO, especially in C++17 - this does not incur any cost even for types "expensive to move".
If a type is expensive to move, it's also expensive to copy - so you don't really have a choice there. Redesign your code so that you don't need the copy/move if possible.
It is sometimes claimed that C++11/14 can get you a performance boost even when merely compiling C++98 code. The justification is usually along the lines of move semantics, as in some cases the rvalue constructors are automatically generated or now part of the STL. Now I'm wondering whether these cases were previously actually already handled by RVO or similar compiler optimizations.
My question then is if you could give me an actual example of a piece of C++98 code that, without modification, runs faster using a compiler supporting the new language features. I do understand that a standard conforming compiler is not required to do the copy elision and just by that reason move semantics might bring about speed, but I'd like to see a less pathological case, if you will.
EDIT: Just to be clear, I am not asking whether new compilers are faster than old compilers, but rather if there is code whereby adding -std=c++14 to my compiler flags it would run faster (avoid copies, but if you can come up with anything else besides move semantics, I'd be interested, too)
I am aware of 5 general categories where recompiling a C++03 compiler as C++11 can cause unbounded performance increases that are practically unrelated to quality of implementation. These are all variations of move semantics.
std::vector reallocate
struct bar{
std::vector<int> data;
};
std::vector<bar> foo(1);
foo.back().data.push_back(3);
foo.reserve(10); // two allocations and a delete occur in C++03
every time the foo's buffer is reallocated in C++03 it copied every vector in bar.
In C++11 it instead moves the bar::datas, which is basically free.
In this case, this relies on optimizations inside the std container vector. In every case below, the use of std containers is just because they are C++ objects that have efficient move semantics in C++11 "automatically" when you upgrade your compiler. Objects that don't block it that contain a std container also inherit the automatic improved move constructors.
NRVO failure
When NRVO (named return value optimization) fails, in C++03 it falls back on copy, on C++11 it falls back on move. Failures of NRVO are easy:
std::vector<int> foo(int count){
std::vector<int> v; // oops
if (count<=0) return std::vector<int>();
v.reserve(count);
for(int i=0;i<count;++i)
v.push_back(i);
return v;
}
or even:
std::vector<int> foo(bool which) {
std::vector<int> a, b;
// do work, filling a and b, using the other for calculations
if (which)
return a;
else
return b;
}
We have three values -- the return value, and two different values within the function. Elision allows the values within the function to be 'merged' with the return value, but not with each other. They both cannot be merged with the return value without merging with each other.
The basic issue is that NRVO elision is fragile, and code with changes not near the return site can suddenly have massive performance reductions at that spot with no diagnostic emitted. In most NRVO failure cases C++11 ends up with a move, while C++03 ends up with a copy.
Returning a function argument
Elision is also impossible here:
std::set<int> func(std::set<int> in){
return in;
}
in C++11 this is cheap: in C++03 there is no way to avoid the copy. Arguments to functions cannot be elided with the return value, because the lifetime and location of the parameter and return value is managed by the calling code.
However, C++11 can move from one to the other. (In a less toy example, something might be done to the set).
push_back or insert
Finally elision into containers does not happen: but C++11 overloads rvalue move insert operators, which saves copies.
struct whatever {
std::string data;
int count;
whatever( std::string d, int c ):data(d), count(c) {}
};
std::vector<whatever> v;
v.push_back( whatever("some long string goes here", 3) );
in C++03 a temporary whatever is created, then it is copied into the vector v. 2 std::string buffers are allocated, each with identical data, and one is discarded.
In C++11 a temporary whatever is created. The whatever&& push_back overload then moves that temporary into the vector v. One std::string buffer is allocated, and moved into the vector. An empty std::string is discarded.
Assignment
Stolen from #Jarod42's answer below.
Elision cannot occur with assignment, but move-from can.
std::set<int> some_function();
std::set<int> some_value;
// code
some_value = some_function();
here some_function returns a candidate to elide from, but because it is not used to construct an object directly, it cannot be elided. In C++03, the above results in the contents of the temporary being copied into some_value. In C++11, it is moved into some_value, which basically is free.
For the full effect of the above, you need a compiler that synthesizes move constructors and assignment for you.
MSVC 2013 implements move constructors in std containers, but does not synthesize move constructors on your types.
So types containing std::vectors and similar do not get such improvements in MSVC2013, but will start getting them in MSVC2015.
clang and gcc have long since implemented implicit move constructors. Intel's 2013 compiler will support implicit generation of move constructors if you pass -Qoption,cpp,--gen_move_operations (they don't do it by default in an effort to be cross-compatible with MSVC2013).
if you have something like:
std::vector<int> foo(); // function declaration.
std::vector<int> v;
// some code
v = foo();
You got a copy in C++03, whereas you got a move assignment in C++11.
so you have free optimisation in that case.
In general I would like to know when and why a modern day compiler, say gcc 4.7 and up using c++11, can not apply an NVRO optimization.
EDIT: I oversimplified this code mistakenly not returning any local variables. A better example was supplied by #cooky451 below see ideone.com/APySue
I saw some snippets of code to answers on other questions that were as such
A f(A&& v)
{
return v;
}
and they were changed to be
A f(A&& v)
{
return std::move(v);
}
because they said that the rvalue passed in which is assigned to an lvalue v was still an rvalue and could be moved. However, others wrote that this will remove the ability for NVRO. Wny is this? If the compiler knows that a temporary is being returned can't it construct it directly in place without moving anything? I guess I don't understand why case one would have NVRO but not case 2. I might have the facts wrong hence the question. Also, I read that case 2 was an anti pattern for this reason and that you shouldn't return std::move like this. Any additional insight would be helpful. I was told that behind the scenes the compiler will create something like this below: A& __hidden__ is the assignment to the function, myValue in this case.
A myValue = f(A());
// behind the scenes pseudo code for direct in place construction
void f(A&& v, A& __hidden__ )
{
__hidden__ = v;
return;
}
Both won't use RVO, because it's impossible. The problem is: && is still just a reference. The variable you're returning is not inside your function-local scope! So, without std::move, you'll copy, and with it you'll move. I would advice btw not to expect something per rvalue-reference, as long as you're not writing a move constructor/assignment operator or some perfect-forwarding template-code. Just take it by value. It has a small overhead in some cases, but it's really not going to be significant. And it makes code a lot more readable. And simpler, as the caller can either copy or move the arguments, and you don't have to provide additional const& overloads.
I'm confused as to what is going on in the following code snippet. Is move really necessary here? What would be the most optimal + safe way of returning the temporary set?
set<string> getWords()
{
set<string> words;
for (auto iter = wordIndex.begin(); iter != wordIndex.end(); ++iter)
{
words.insert(iter->first);
}
return move(words);
}
My calling code simply does set<string> words = foo.getWords()
First off, the set is not temporary, but local.
Second, the correct way to return it is via return words;.
Not only is this the only way you allow for return-value optimization, but moreover, the local variable will also bind to the move constructor of the returned object in the (unusual) case where the copy is not elided altogether. So it's a true triple-win scenario.
There's no need to use move here. Simply return "words". It will participate in the so-called "return value optimization".
Section 12.8 in the C++11 standard requires the move constructor to be called (if it exists) in the case where a local variable is returned. In essence, the compiler will take care of calling std::move for you.
No, the explicit move is not necessarily the most optimal way to move the set. Since the set is being returned by value, the compiler may perform named return value optimization on the set, meaning it may elide the copy and directly construct the set in-place at the call-site where the return value is to be stored. Explicit moving will inhibit this.