It is sometimes claimed that C++11/14 can get you a performance boost even when merely compiling C++98 code. The justification is usually along the lines of move semantics, as in some cases the rvalue constructors are automatically generated or now part of the STL. Now I'm wondering whether these cases were previously actually already handled by RVO or similar compiler optimizations.
My question then is if you could give me an actual example of a piece of C++98 code that, without modification, runs faster using a compiler supporting the new language features. I do understand that a standard conforming compiler is not required to do the copy elision and just by that reason move semantics might bring about speed, but I'd like to see a less pathological case, if you will.
EDIT: Just to be clear, I am not asking whether new compilers are faster than old compilers, but rather if there is code whereby adding -std=c++14 to my compiler flags it would run faster (avoid copies, but if you can come up with anything else besides move semantics, I'd be interested, too)
I am aware of 5 general categories where recompiling a C++03 compiler as C++11 can cause unbounded performance increases that are practically unrelated to quality of implementation. These are all variations of move semantics.
std::vector reallocate
struct bar{
std::vector<int> data;
};
std::vector<bar> foo(1);
foo.back().data.push_back(3);
foo.reserve(10); // two allocations and a delete occur in C++03
every time the foo's buffer is reallocated in C++03 it copied every vector in bar.
In C++11 it instead moves the bar::datas, which is basically free.
In this case, this relies on optimizations inside the std container vector. In every case below, the use of std containers is just because they are C++ objects that have efficient move semantics in C++11 "automatically" when you upgrade your compiler. Objects that don't block it that contain a std container also inherit the automatic improved move constructors.
NRVO failure
When NRVO (named return value optimization) fails, in C++03 it falls back on copy, on C++11 it falls back on move. Failures of NRVO are easy:
std::vector<int> foo(int count){
std::vector<int> v; // oops
if (count<=0) return std::vector<int>();
v.reserve(count);
for(int i=0;i<count;++i)
v.push_back(i);
return v;
}
or even:
std::vector<int> foo(bool which) {
std::vector<int> a, b;
// do work, filling a and b, using the other for calculations
if (which)
return a;
else
return b;
}
We have three values -- the return value, and two different values within the function. Elision allows the values within the function to be 'merged' with the return value, but not with each other. They both cannot be merged with the return value without merging with each other.
The basic issue is that NRVO elision is fragile, and code with changes not near the return site can suddenly have massive performance reductions at that spot with no diagnostic emitted. In most NRVO failure cases C++11 ends up with a move, while C++03 ends up with a copy.
Returning a function argument
Elision is also impossible here:
std::set<int> func(std::set<int> in){
return in;
}
in C++11 this is cheap: in C++03 there is no way to avoid the copy. Arguments to functions cannot be elided with the return value, because the lifetime and location of the parameter and return value is managed by the calling code.
However, C++11 can move from one to the other. (In a less toy example, something might be done to the set).
push_back or insert
Finally elision into containers does not happen: but C++11 overloads rvalue move insert operators, which saves copies.
struct whatever {
std::string data;
int count;
whatever( std::string d, int c ):data(d), count(c) {}
};
std::vector<whatever> v;
v.push_back( whatever("some long string goes here", 3) );
in C++03 a temporary whatever is created, then it is copied into the vector v. 2 std::string buffers are allocated, each with identical data, and one is discarded.
In C++11 a temporary whatever is created. The whatever&& push_back overload then moves that temporary into the vector v. One std::string buffer is allocated, and moved into the vector. An empty std::string is discarded.
Assignment
Stolen from #Jarod42's answer below.
Elision cannot occur with assignment, but move-from can.
std::set<int> some_function();
std::set<int> some_value;
// code
some_value = some_function();
here some_function returns a candidate to elide from, but because it is not used to construct an object directly, it cannot be elided. In C++03, the above results in the contents of the temporary being copied into some_value. In C++11, it is moved into some_value, which basically is free.
For the full effect of the above, you need a compiler that synthesizes move constructors and assignment for you.
MSVC 2013 implements move constructors in std containers, but does not synthesize move constructors on your types.
So types containing std::vectors and similar do not get such improvements in MSVC2013, but will start getting them in MSVC2015.
clang and gcc have long since implemented implicit move constructors. Intel's 2013 compiler will support implicit generation of move constructors if you pass -Qoption,cpp,--gen_move_operations (they don't do it by default in an effort to be cross-compatible with MSVC2013).
if you have something like:
std::vector<int> foo(); // function declaration.
std::vector<int> v;
// some code
v = foo();
You got a copy in C++03, whereas you got a move assignment in C++11.
so you have free optimisation in that case.
Related
Consider the following piece of code:
std::vector<int> Foo() {
std::vector<int> v = Bar();
return v;
}
return v is O(1), since NRVO will omit the copy, constructing v directly in the storage where the function's return value would otherwise be moved or copied to. Now consider the functionally analogous code:
void Foo(std::vector<int> * to_be_filled) {
std::vector<int> v = Bar();
*to_be_filled = v;
}
A similar argument could be made here, as *to_be_filled = v could conceivably be compiled to an O(1) move-assign, since it's a local variable that's going out of scope (it should be easy enough for the compiler to verify that v has no external references in this case, and thus promote it to an rvalue on its last use). Is this the case? Is there a subtle reason why not?
Furthermore, it feels like this pattern can be extended to any context where an lvalue goes out of scope:
void Foo(std::vector<int> * to_be_filled) {
if (Baz()) {
std::vector<int> v = Bar();
*to_be_filled = v;
}
...
}
Do / can / is it useful / reasonable to expect compilers to find patterns such as the *to_be_filled = v and then automatically optimize them to assume rvalue semantics?
Edit:
g++ 7.3.0 does not perform any such optimizations in -O3 mode.
The compiler is not permitted to arbitrarily decide to transform an lvalue name into an rvalue to be moved from. It can only do so where the C++ standard permits it to do so. Such as in a return statement (and only when its return <identifier>;).
*to_be_filled = v; will always perform a copy. Even if it's the last statement that can access v, it is always a copy. Compilers aren't allowed to change that.
My understanding is that return v is O(1), since NRVO will (in effect) make v into an rvalue, which then makes use of std::vector's move-constructor.
That's not how it works. NRVO would eliminate the move/copy entirely. But the ability for return <identifier>; to be an rvalue is not an "optimization". It's actually a requirement that compilers treat them as rvalues.
Compilers have a choice about copy elision. Compilers don't have a choice about what return <identifier>; does. So the above will either not move at all (if NRVO happens) or will move the object.
Is there a subtle reason why not?
One reason this isn't allowed is because the location of a statement should not arbitrarily change what that statement is doing. See, return <identifier>; will always move from the identifier (if it's a local variable). It doesn't matter where it is in the function. By virtue of being a return statement, we know that if the return is executed, nothing after it will be executed.
That's not the case for arbitrary statements. The behavior of the expression *to_be_filled = v; should not change based on where it happens to be in code. You shouldn't be able to turn a move into a copy just because you add another line to the function.
Another reason is that arbitrary statements can get really complicated really quickly. return <identifier>; is very simple; it copies/moves the identifier to the return value and returns.
By contrast, what happens if you have a reference to v, and that gets used by to_be_filled somehow. Sure that can't happen in your case, but what about other, more complex cases? The last expression could conceivably read from a reference to a moved-from object.
It's a lot harder to do that in return <identifier>; cases.
I am reading the official CPPCoreGuidelines to understand correctly when it's reliable to count on RVO and when not.
At F20 it is written:
If a type is expensive to move (e.g., array), consider
allocating it on the free store and return a handle (e.g.,
unique_ptr), or passing it in a reference to non-const target object
to fill (to be used as an out-parameter)
I understand that the non-STL types are not optimized to move, but how can I easy detect other types expensive to move, so I will not use RVO on them?
You seem to have misunderstood what "RVO" is. "RVO" stands for "return value optimization" and it's a compiler optimization that prevents any move or copy constructor from being invoked. E.g.
std::vector<huge_thing> foo()
{
std::vector<huge_thing> result{/* ... */};
return result;
}
void bar()
{
auto v = foo(); // (0)
}
Any decent compiler will not execute any copy/move operation and simply construct v in place at (0). In C++17, this is mandatory thanks to the changes to prvalues.
In terms of expensive moves: sure, there can be types expensive to move - but I cannot think of any instance where a move would be more expensive than a copy.
Therefore:
Rely on RVO, especially in C++17 - this does not incur any cost even for types "expensive to move".
If a type is expensive to move, it's also expensive to copy - so you don't really have a choice there. Redesign your code so that you don't need the copy/move if possible.
I have the following function:
void read_int(std::vector<int> &myVector)
Which allows me to fill myVector through it reference. It is used like this:
std::vector<int> myVector;
read_int(myVector);
I want to refactor a bit the code (keeping the original function) to in the end have this:
auto myVector = read_int(); // auto is std::vector<int>
What would be the best intermediate function to achieve this?
It seems to me that the following straight-forward answer is suboptimal:
std::vector<int> read_int() {
std::vector<int> myVector_temp;
read_int(myVector_temp);
return myVector_temp;
}
The obvious answer is correct, and basically optimal.
void do_stufF(std::vector<int>& on_this); // (1)
std::vector<int> do_stuff_better() { // (2)
std::vector<int> myVector_temp; // (3)
do_stuff(myVector_temp); // (4)
return myVector_temp; // (5)
}
At (3) we create a named return value in automatic storage (on the stack).
At (5) we only ever return the named return value from the function, and we never return anything else but that named return value anywhere else in the function.
Because of (3) and (5), the compiler is allowed to (and most likely will) elide the existence of the myVector_temp object. It will directly construct the return value of the function, and call it myVector_temp. It still needs there to be an existing move or copy constructor, but it does not call it.
On the other end, when calling do_stuff_better, some compilers can also elide the assignment at call:
std::vector<int> bob = do_stuff_better(); // (6)
The compiler is allowed to effectively pass a "pointer to bob" and tell do_stuff_better() to construct its return value in bob's location, eliding this copy construction as well (well, it can arrange how the call occurs such that the location that do_stuff_better() is asked to construct its return value in is the same as the location of bob).
And in C++11, even if the requirements for both elisions are not met, or the compiler chooses not to use them, in both cases a move must be done instead of a copy.
At line (5) we are returning a locally declared automatic storage duration variable in a plain and simple return statement. This makes the return an implicit move if not elided.
At line (6), the function returns an unnamed object, which is an rvalue. When bob is constructed from it, it move-constructs.
moveing a std::vector consists of copying the value of ~3 pointers, and then zeroing the source, regardless of how big the vector is. No elements need be copied or moved.
Both of the above elisions, where we remove the named local variable within do_stuff_better(), and we remove the return value of do_stuff_better() and instead directly construct bob, are somewhat fragile. Learning the rules under which your compiler is allowed to do those elisions, and also the situations where your compiler actually does the elisions, is worthwhile.
As an example of how it is fragile, if you had a branch where you did a return std::vector<int>() in your do_stuff_better() after checking an error state, the in-function elision would probably be blocked.
Even if elision is blocked or your compiler doesn't implement it for a case, the fact that the container is move'd means that the run time costs are going to be minimal.
I think, you have to read more about move semantics (link to Google query, there are a lot of papers on this - just choose one).
In short, in C++ all STL containers are written in such way, that returning them from function will cause their contents to be moved from the returned value (so called right-hand reference) to the variable you are assigning it to. In effect you'll only copy a few fields of the std::vector instead of its data. That's a lot faster than copying its contents.
I saw code somewhere in which someone decided to copy an object and subsequently move it to a data member of a class. This left me in confusion in that I thought the whole point of moving was to avoid copying. Here is the example:
struct S
{
S(std::string str) : data(std::move(str))
{}
};
Here are my questions:
Why aren't we taking an rvalue-reference to str?
Won't a copy be expensive, especially given something like std::string?
What would be the reason for the author to decide to make a copy then a move?
When should I do this myself?
Before I answer your questions, one thing you seem to be getting wrong: taking by value in C++11 does not always mean copying. If an rvalue is passed, that will be moved (provided a viable move constructor exists) rather than being copied. And std::string does have a move constructor.
Unlike in C++03, in C++11 it is often idiomatic to take parameters by value, for the reasons I am going to explain below. Also see this Q&A on StackOverflow for a more general set of guidelines on how to accept parameters.
Why aren't we taking an rvalue-reference to str?
Because that would make it impossible to pass lvalues, such as in:
std::string s = "Hello";
S obj(s); // s is an lvalue, this won't compile!
If S only had a constructor that accepts rvalues, the above would not compile.
Won't a copy be expensive, especially given something like std::string?
If you pass an rvalue, that will be moved into str, and that will eventually be moved into data. No copying will be performed. If you pass an lvalue, on the other hand, that lvalue will be copied into str, and then moved into data.
So to sum it up, two moves for rvalues, one copy and one move for lvalues.
What would be the reason for the author to decide to make a copy then a move?
First of all, as I mentioned above, the first one is not always a copy; and this said, the answer is: "Because it is efficient (moves of std::string objects are cheap) and simple".
Under the assumption that moves are cheap (ignoring SSO here), they can be practically disregarded when considering the overall efficiency of this design. If we do so, we have one copy for lvalues (as we would have if we accepted an lvalue reference to const) and no copies for rvalues (while we would still have a copy if we accepted an lvalue reference to const).
This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided, and better when rvalues are provided.
P.S.: To provide some context, I believe this is the Q&A the OP is referring to.
To understand why this is a good pattern, we should examine the alternatives, both in C++03 and in C++11.
We have the C++03 method of taking a std::string const&:
struct S
{
std::string data;
S(std::string const& str) : data(str)
{}
};
in this case, there will always be a single copy performed. If you construct from a raw C string, a std::string will be constructed, then copied again: two allocations.
There is the C++03 method of taking a reference to a std::string, then swapping it into a local std::string:
struct S
{
std::string data;
S(std::string& str)
{
std::swap(data, str);
}
};
that is the C++03 version of "move semantics", and swap can often be optimized to be very cheap to do (much like a move). It also should be analyzed in context:
S tmp("foo"); // illegal
std::string s("foo");
S tmp2(s); // legal
and forces you to form a non-temporary std::string, then discard it. (A temporary std::string cannot bind to a non-const reference). Only one allocation is done, however. The C++11 version would take a && and require you to call it with std::move, or with a temporary: this requires that the caller explicitly creates a copy outside of the call, and move that copy into the function or constructor.
struct S
{
std::string data;
S(std::string&& str): data(std::move(str))
{}
};
Use:
S tmp("foo"); // legal
std::string s("foo");
S tmp2(std::move(s)); // legal
Next, we can do the full C++11 version, that supports both copy and move:
struct S
{
std::string data;
S(std::string const& str) : data(str) {} // lvalue const, copy
S(std::string && str) : data(std::move(str)) {} // rvalue, move
};
We can then examine how this is used:
S tmp( "foo" ); // a temporary `std::string` is created, then moved into tmp.data
std::string bar("bar"); // bar is created
S tmp2( bar ); // bar is copied into tmp.data
std::string bar2("bar2"); // bar2 is created
S tmp3( std::move(bar2) ); // bar2 is moved into tmp.data
It is pretty clear that this 2 overload technique is at least as efficient, if not more so, than the above two C++03 styles. I'll dub this 2-overload version the "most optimal" version.
Now, we'll examine the take-by-copy version:
struct S2 {
std::string data;
S2( std::string arg ):data(std::move(x)) {}
};
in each of those scenarios:
S2 tmp( "foo" ); // a temporary `std::string` is created, moved into arg, then moved into S2::data
std::string bar("bar"); // bar is created
S2 tmp2( bar ); // bar is copied into arg, then moved into S2::data
std::string bar2("bar2"); // bar2 is created
S2 tmp3( std::move(bar2) ); // bar2 is moved into arg, then moved into S2::data
If you compare this side-by-side with the "most optimal" version, we do exactly one additional move! Not once do we do an extra copy.
So if we assume that move is cheap, this version gets us nearly the same performance as the most-optimal version, but 2 times less code.
And if you are taking say 2 to 10 arguments, the reduction in code is exponential -- 2x times less with 1 argument, 4x with 2, 8x with 3, 16x with 4, 1024x with 10 arguments.
Now, we can get around this via perfect forwarding and SFINAE, allowing you to write a single constructor or function template that takes 10 arguments, does SFINAE to ensure that the arguments are of appropriate types, and then moves-or-copies them into the local state as required. While this prevents the thousand fold increase in program size problem, there can still be a whole pile of functions generated from this template. (template function instantiations generate functions)
And lots of generated functions means larger executable code size, which can itself reduce performance.
For the cost of a few moves, we get shorter code and nearly the same performance, and often easier to understand code.
Now, this only works because we know, when the function (in this case, a constructor) is called, that we will be wanting a local copy of that argument. The idea is that if we know that we are going to be making a copy, we should let the caller know that we are making a copy by putting it in our argument list. They can then optimize around the fact that they are going to give us a copy (by moving into our argument, for example).
Another advantage of the 'take by value" technique is that often move constructors are noexcept. That means the functions that take by-value and move out of their argument can often be noexcept, moving any throws out of their body and into the calling scope (who can avoid it via direct construction sometimes, or construct the items and move into the argument, to control where throwing happens). Making methods nothrow is often worth it.
This is probably intentional and is similar to the copy and swap idiom. Basically since the string is copied before the constructor, the constructor itself is exception safe as it only swaps (moves) the temporary string str.
You don't want to repeat yourself by writing a constructor for the move and one for the copy:
S(std::string&& str) : data(std::move(str)) {}
S(const std::string& str) : data(str) {}
This is much boilerplate code, especially if you have multiple arguments. Your solution avoids that duplication on the cost of an unnecessary move. (The move operation should be quite cheap, however.)
The competing idiom is to use perfect forwarding:
template <typename T>
S(T&& str) : data(std::forward<T>(str)) {}
The template magic will choose to move or copy depending on the parameter that you pass in. It basically expands to the first version, where both constructor were written by hand. For background information, see Scott Meyer's post on universal references.
From a performance aspect, the perfect forwarding version is superior to your version as it avoids the unnecessary moves. However, one can argue that your version is easier to read and write. The possible performance impact should not matter in most situations, anyway, so it seems to be a matter of style in the end.
A function needs to return two values to the caller. What is the best way to implement?
Option 1:
pair<U,V> myfunc()
{
...
return make_pair(getU(),getV());
}
pair<U,V> mypair = myfunc();
Option 1.1:
// Same defn
U u; V v;
tie(u,v) = myfunc();
Option 2:
void myfunc(U& u , V& v)
{
u = getU(); v= getV();
}
U u; V v;
myfunc(u,v);
I know with Option2, there are no copies/moves but it looks ugly. Will there be any copies/moves occur in Option1, 1.1? Lets assume U and V are huge objects supporting both copy/move operations.
Q: Is it theoretically possible for any RVO/NRVO optimizations as per the standard? If yes, has gcc or any other compiler implemented yet?
Will RVO happen when returning std::pair?
Yes it can.
Is it guaranteed to happen?
No it is not.
C++11 standard: Section 12.8/31:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.
Copy elision is not a guaranteed feature. It is an optimization compilers are allowed to perform whenever they can. There is nothing special w.r.t std::pair. If a compiler is good enough to detect an optimization opportunity it will do so. So your question is compiler specific but yes same rule applies to std::pair as to any other class.
While RVO is not guaranteed, in C++11 the function as you have defined it I believe MUST move-return at the very least, so I would suggest leaving the clearer definition rather than warping it to accept output-variables (Unless you have a specific policy for using them).
Also, even if this example did use RVO, your explicit use of make_pair means you will always have at least one extra pair construction and thus a move operation. Change it to return a brace-initialized expression:
return { getU(), getV() };
RVO or Copy elision is dependant on compiler so if you want to have RVO and avoid call to Copy constructor best option is to use pointers.
In our product we use use pointers and boost containers pointer to avoid Copy constructor. and this indeed gives performance boost of around 10%.
Coming to your question,
In option 1 U and V's copy constructor will not be called as you are not returning U or V but returning std::pair object so it's copy constructor will be called and most compilers will definately use RVO here to avoid that.
Thanks
Niraj Rathi
If you need to do additional work on u and v after having created the pair, I find the following pattern pretty flexible in C++17:
pair<U,V> myfunc()
{
auto out = make_pair(getU(),getV());
auto& [u, v] = out;
// Work with u and v
return out;
}
This should be a pretty easy case for the compiler to use named return value optimization