What kind of optimizations does rvalue guarantee? - c++

I want to construct an object with another using rvalue.
class BigDataClass{
public:
BigDataClass(); //some default BigData
BigDataClass(BigDataClass&& anotherBigData);
private:
BigDataClass(BigDataClass& anotherBigData);
BigDataPtr m_data;
};
So now I want to do something like:
BigDataClass someData;
BigDataClass anotherData(std::move(someData));
So now anotherData gets rValue. It's an eXpiring Value in fact, so as http://en.cppreference.com/w/cpp/utility/move states compiler now
has an oppourtunity to optimize the initialization of anotherData with moving
someData to another.
In my opinion we can in fact get 2 different things:
Optimized approach: data moved. It's optimized, fast and we're happy
Nonoptimized approach: data not moved. We have to copy data from object to another AND delete data from the first one(as far as I know after changing object to rvalue once we cannot use it, because it has got no ownership of data, that it held). In fact it can be even slower than initialization with lvalue referrence due to deletion operation.
Can we really get so unoptimized way of data initialization?

You said:
So now anotherData gets rValue. It's an eXpiring Value in fact, so as http://en.cppreference.com/w/cpp/utility/move states compiler now has an oppourtunity to optimize the initialization of anotherData with moving someData to another.
Actually, what it stated was:
Code that receives such an xvalue has the opportunity to optimize away unnecessary overhead by moving data out of the argument, leaving it in a valid but unspecified state.
That is, it's the code that's responsible for optimization here, not the compiler. All std::move(someData) does is cast its argument to an rvalue reference. Given that BigDataClass has a constructor that takes an rvalue reference, that constructor is preferred, and that constructor will be the one that is called. There isn't any room for change here from the compiler's point of view. Thus the code will do whatever the BigDataClass(BigDataClass&&) constructor does.

Looks like you are confused with what is optimization and what is not. Using move constructor (when available) is not an optimization, it is mandated by standard. It is not that the compiler has this opportunity, it has to do this.
On the other hand, copy elision is an optimization which compiler has an opportunity to perform. How relibale it is, depends on your compiler (though they are applying it pretty uniformely) and the actual function code.

You think about what the optimizer can do with move semantics. Simply nothing itself! You, the coder, has to implement the code which is the optimization compared against the constructor with a const ref.
The question can go to the opposite:
If the compiler already knows that you have a rvalue which is passed as const ref to a constructor, the compiler is able to do the construction as if the value is generated in the constructor itself. Copy eliding is done very often by up to date compilers. The question here is ( for me ) how many effort I should spend to write some rvalue reference constructions to get the same result as the compiler already builds on the fly for me.
OK, in c++11 you have a lot of opportunities to handle code for forwarding and moving by your algorithms. And yes, some benefit can be generated. But I see the benefit only for templated code where I have the need to move/forward some of the parameters to (meta)template functions.
And on the opposite: Handling rvalue references must taken with care and the meaning of a valid but undefined state rise some questions on every user who use your interface implementation. See also: What can I do with a moved-from object?
What kind of optimizations does rvalue guarantee
Simply nothing. You have to implement it!

Related

Why can't the C++ compiler elide the move when moving a POD into an optional with RVO?

Consider the following code (godbolt):
#include <optional>
#include <array>
struct LargeType {
std::array<int, 256> largeContents;
};
LargeType doSomething();
std::optional<LargeType> wrapIntoOptional(){
return std::optional<LargeType> {doSomething()};
}
As you see, there is a function returning a large POD and then a function wrapping it into a std::optional. As visible in godbolt, the compiler creates a memcpy here, so it cannot fully elide moving the object. Why is this?
If I understand it correctly, the C++ language would allow eliding the move due to the as-if rule, as there are no visible side effects to it. So it seems that the compiler really cannot avoid it. But why?
My (probably incorrect) understanding how the compiler could optimize the memcpy out is to hand a reference to the storage inside the optional to doSomething() (as I guess such large objects get passed by hidden reference anyway). The optional itself would already lie on the stack of the caller of wrapIntoOptional due to RVO. As the definition of the constructor of std::optional is in the header, it is available to the compiler, so it should be able to inline it, so it can hand that storage location to doSomething in the first place. So what's wrong about my intuition here?
To clarify: I don't argue that the C++ language requires the compiler to inline this. I just thought it would be a reasonable optimization and given that wrapping things into optionals is a common operation, it would be an optimization that is implemented in modern compilers.
It is impossible to elide any copy/move through a constructor call into storage managed by the constructed object.
The constructor takes the object as a reference. In order to bind the reference to something there must be an object, so the prvalue from doSomething() must be materialized into a temporary object to bind to the reference and the constructor then must copy/move from that temporary into its own storage.
It is impossible to elide through function parameters. That would require knowing the implementation of the function and the way C++ is specified it is possible to compile each function only knowing the declarations of other functions (aside from constant expression evaluation). This would break that or require a new type of annotation in the declaration.
None of this prevents the compiler from optimizing in a way that doesn't affect the observable behavior though. If your compiler is not figuring out that the extra copy can be avoided and has no observable side effects when seeing all relevant function/constructor definitions, then that's something you could complain to your compiler vendor about. The concept of copy elision is about allowing the compiler to optimize away a copy/move even though it would have had observable side effects.
You can add noexcept to elide copy:
https://godbolt.org/z/rrGEfrdzc

C++: Can the Compiler Optimize a Passing by Value?

One commonly known compiler optimisation is is the so-called return value optimisation. This optimisation basically allows the compiler to not copy a local variable that is being returned from a function, but instead moving it.
However, I was wondering if the same is also possible for passing arguments to a function by value if it is known that the return value of the function will overwrite the original argument.
Here is an example. Let's assume we have the following function:
std::vector<Foo> modify(std::vector<Foo> data) {
/* Do some funny things to data */
return data;
}
This function is then used in the following way:
std::vector<Foo> bigData = /* big data */;
bigData = modify(bigData); // Here copying the data into the function could be omitted
Now, in this case it can be clearly determined that the return value of the function call will override the argument that is passed into the function per value. My question is whether current compilers are able to optimise this code in a way so that the argument data is not copied when passed to the function, or if this might even be a part of the so-called return value optimisation.
Update
Let's take C++11 into account. I wonder if the following understanding is correct: If the value passed to a function parameter by value is an r-value, and the type of the parameter has a move-constructor, the move constructor will be used instead of the copy constructor.
For example:
std::vector<Foo> bigData = /* big data */;
bigData = modify(std::move(bigData));
If this is assumption is correct, this eliminates the copy operation when passing the value. From the answers already given it seems that the optimisation I referred to earlier is not commonly undertaken. Looking at this manual approach I don't really understand why, as appears to be pretty straightforward to apply.
It's hard to say for sure because in principle compilers can optimize many things, as long as they are certain it has the same behavior. However, in my experience, this optimization will not occur without inlining. Consider the following code:
__attribute__((noinline)) std::vector<double> modify(std::vector<double> data) {
std::sort(data.begin(), data.end());
return data;
}
std::vector<double> blah(std::vector<double> v) {
v = modify(v);
return v;
}
You can look at the assembly generated for this for various compilers; here I have clang 4.0 with O3 optimization: https://godbolt.org/g/xa2Dhf. If you look at the assembly carefully, you'll see a call to operator new in blah. This proves that blah is indeed performing a copy in order to call modify.
Of course, if inlining occurs, it should be pretty trivial for the compiler to remove the copy.
In C++11 the compiler could determine that bigData is reassigned after use in the function and pass it as rvalue, but there is no guarantee for that, unlike for the RVO (from c++17).
For std::vector at least you can make sure this happens by calling the function as modify(std::move(bigData)), which will construct the value in modify from the rvalue reference, which it cannot optimize with the RVO afaik, because it is the function parameter, which is explicitly excluded from this optimization (3rd point here). However the compiler should understand that the return value is an r-value, and move it into big-data again.
Whether some compilers elide a move from an object into a function and out of the function back into the object I don't know for sure, but I know nothing that explicitly allows it, and since the move-constructor could have observable side-effects, that probably means, that it is not allowed (cf. the Notes section in above link).
That is really compiler specific and depends on how you perform operations(whether we are modifying the data or not) with the data. Mostly you shouldn't expect the compiler to do such kind of optimizations unless you really benchmark it. I did some tests with VS2012 compiler that performs copy operations though we don't modify it.
Please have a look at this post(Does the compiler optimize the function parameters passed by value?), that may give you a better idea I hope.

How to avoid copy-constructing a return value

I'm a newcomer to C++ and I ran into a problem recently returning a reference to a local variable. I solved it by changing the return value from std::string& to an std::string. However, to my understanding this can be very inefficient. Consider the following code:
string hello()
{
string result = "hello";
return result;
}
int main()
{
string greeting = hello();
}
To my understanding, what happens is:
hello() is called.
Local variable result is assigned a value of "hello".
The value of result is copied into the variable greeting.
This probably doesn't matter that much for std::string, but it can definitely get expensive if you have, for example, a hash table with hundreds of entries.
How do you avoid copy-constructing a returned temporary, and instead return a copy of the pointer to the object (essentially, a copy of the local variable)?
Sidenote: I've heard that the compiler will sometimes perform return-value optimization to avoid calling the copy constructor, but I think it's best not to rely on compiler optimizations to make your code run efficiently.)
The description in your question is pretty much correct. But it is important to understand that this is behavior of the abstract C++ machine. In fact, the canonical description of abstract return behavior is even less optimal
result is copied into a nameless intermediate temporary object of type std::string. That temporary persists after the function's return.
That nameless intermediate temporary object is then copied to greeting after function returns.
Most compilers have always been smart enough to eliminate that intermediate temporary in full accordance with the classic copy elision rules. But even without that intermediate temporary the behavior has always been seen as grossly suboptimal. Which is why a lot of freedom was given to compilers in order to provide them with optimization opportunities in return-by-value contexts. Originally it was Return Value Optimization (RVO). Later Named Return Value Optimization was added to it (NRVO). And finally, in C++11, move semantics became an additional way to optimize the return behavior in such cases.
Note that under NRVO in your example the initialization of result with "hello" actually places that "hello" directly into greeting from the very beginning.
So in modern C++ the best advice is: leave it as is and don't avoid it. Return it by value. (And prefer to use immediate initialization at the point of declaration whenever you can, instead of opting for default initialization followed by assignment.)
Firstly, the compiler's RVO/NRVO capabilities can (and will) eliminate the copying. In any self-respecting compiler RVO/NRVO is not something obscure or secondary. It is something compiler writers do actively strive to implement and implement properly.
Secondly, there's always move semantics as a fallback solution if RVO/NRVO somehow fails or is not applicable. Moving is naturally applicable in return-by-value contexts and it is much less expensive than full-blown copying for non-trivial objects. And std::string is a movable type.
I disagree with the sentence "I think it's best not to rely on compiler optimizations to make your code run efficiently." That's basically the compiler's whole job. Your job is to write clear, correct, and maintainable source code. For every performance issue I've ever had to fix, I've had to fix a hundred or more issues caused by a developer trying to be clever instead of doing something simple, correct, and maintainable.
Let's take a look at some of the things you could do to try to "help" the compiler and see how they affect the maintainability of the source code.
You could return the data via reference
For example:
void hello(std::string& outString)
Returning data using a reference makes the code at the call-site hard to read. It's nearly impossible to tell what function calls mutate state as a side effect and which don't. Even if you're really careful with const qualifying the references it's going to be hard to read at the call site. Consider the following example:
void hello(std::string& outString); //<-This one could modify outString
void out(const std::string& toWrite); //<-This one definitely doesn't.
. . .
std::string myString;
hello(myString); //<-This one maybe mutates myString - hard to tell.
out(myString); //<-This one certainly doesn't, but it looks identical to the one above
Even the declaration of hello isn't clear. Does it modify outString, or was the author just sloppy and forgot to const qualify the reference? Code that is written in a functional style is easier to read and understand and harder to accidentally break.
Avoid returning the data via reference
You could return a pointer to the object instead of returning the object.
Returning a pointer to the object makes it hard to be sure your code is even correct. Unless you use a unique_ptr you have to trust that anybody using your method is thorough and makes sure to delete the pointer when they're done with it, but that isn't very RAII. std::string is already a type of RAII wrapper for a char* that abstracts away the data lifetime issues associated with returning a pointer. Returning a pointer to a std::string just re-introduces the problems that std::string was designed to solve. Relying on a human being to be diligent and carefully read the documentation for your function and know when to delete the pointer and when not to delete the pointer is unlikely to have a positive outcome.
Avoid returning a pointer to the object instead of returning the object
That brings us to move constructors.
A move constructor will just transfer ownership of the pointed-to data from 'result' to its final destination. Afterwards, accessing the 'result' object is invalid but that doesn't matter - your method ended and the 'result' object went out of scope. No copy, just a transfer of ownership of the pointer with clear semantics.
Normally the compiler will call the move constructor for you. If you're really paranoid (or have specific knowledge that the compiler isn't going to help you) you can use std::move.
Use move constructors if at all possible
Finally modern compilers are amazing. With a modern C++ compiler, 99% of the time the compiler is going to do some sort of optimization to eliminate the copy. The other 1% of the time it's probably not going to matter for performance. In specific circumstances the compiler can re-write a method like std::string GetString(); to void GetString(std::string& outVar); automatically. The code is still easy to read, but in the final assembly you get all of the real or imagined speed benefits of returning by reference. Don't sacrifice readability and maintainability for performance unless you have specific knowledge that the solution doesn't meet your business requirements.
There are plenty of ways to achieve that:
1) Return some data by the reference
void SomeFunc(std::string& sResult)
{
sResult = "Hello world!";
}
2) Return pointer to the object
CSomeHugeClass* SomeFunc()
{
CSomeHugeClass* pPtr = new CSomeHugeClass();
//...
return(pPtr);
}
3) C++ 11 could utilize a move constructor in such cases. See this this and this for the additional info.

Functions returning a collection of objects in C++

In my current project I need to implement quite a few functions/methods that take some parameters and generate a collection of results (rather large). So in order to return this collection without copying, I can either create a new collection and return a smart pointer:
boost::shared_ptr<std::vector<Stuff> > generate();
or take a reference to a vector which will be populated:
void generate(std::vector<Stuff> &output);
Both approaches have benefits. The first clearly shows that the vector is the output of the function, it is trivial to use in a parallelized scenario, etc. The second might be more efficient when called in a loop (because we don't allocate memory every time), but then it is not that obvious that the parameter is the output, and someone needs to clean the old data from the vector...
Which would be more customary in real life (i.e. what is the best practise)? In C#/java I would argue that the first one, what is the case in C++?
Also, is it possible to effectively return a vector by value using C++11? What would the pitfalls be?
do correctness first, then optimize if necessary
with both move semantics and Return Value Optimization conspiring to make an ordinary function result non-copying, you would probably have to work at it to make it sufficiently inefficient to be worth optimization work
so, just return the collection as a function result, then MEASURE if you feel that it's too slow
You should return by value.
is it possible to effectively return a vector by value using C++11?
Yes, C++11 supports move semantics. You return a value, but the compiler knows it's a temporary, and therefore can invoke a special constructor (move constructor) that is especially designed to simply "steal the guts" of the returned object. After all, you won't use that temporary object anymore, so why copying it when you can just move its content?
Apart from this, it may be worth mentioning that most C++ compilers, even pre-C++11, implement (Named) Return Value Optimization, which would elide the copy anyway, incurring in no overhead. Thus, you may want to actually measure the performance penalty you (possibly) get before optimizing.
I think you should pass by reference, or return a shared pointer, only when you need reference semantics. This does not seem to be your case.
There is an alternative approach. If you can make your functions template, make them take an output iterator (whose type is a template argument) as argument:
tempalte<class OutputIterator>
void your_algorithm(OutputIterator out) {
for(/*condition*/) {
++out = /* calculation */;
}
}
This has the advantage that the caller can decide in what kind of collection he wants to store the result (the output iterator could for instance write directly to a file, or store the result in a std::vector, or filter it, etc.).
The best practise will probably be surprising to you. I would recommend returning by value in both C++03 and C++11.
In C++03, if you create a std::vector local to generate and return it, the copy may be elided by the compiler (and almost certainly will be). See C++03 §12.8/15:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object with the same cv-unqualified type as the function return type, the copy operation can be omitted by constructing the automatic object directly into the function's return value
In C++11, if you create a std::vector local to generate and return it, the copy will first be considered as a move first (which will already be very fast) and then that may be elided (and almost certainly will be). See C++11 §12.8/31:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
And §12.8/32:
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
So return by value!
Believe it or not, I'm going to suggest that instead of either of those approaches just take the obvious implementation and return by value! Compilers are very often able to optimize away the notional copy that would be induced, removing it completely. By writing the code in the most obvious manner you make it very clear to future maintainers what the intent is.
But let's say you try return by value and your program runs too slow and let's further suppose that your profiler shows that the return by value is in fact your bottleneck. In this case I would allocate the container on the heap and return as an auto_ptr in C++03 or a unique_ptr in C++11 to clearly indicate that ownership is being transferred and that the generate isn't keeping a copy of that shared_ptr for its own purposes later.
Finally, the series at http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ provides a great perspective on almost the exact same question.

Does the compiler optimize the function parameters passed by value?

Lets say I have a function where the parameter is passed by value instead of const-reference. Further, lets assume that only the value is used inside the function i.e. the function doesn't try to modify it. In that case will the compiler will be able to figure out that it can pass the value by const-reference (for performance reasons) and generate the code accordingly? Is there any compiler which does that?
If you pass a variable instead of a temporary, the compiler is not allowed to optimize away the copy if the copy constructor of it does anything you would notice when running the program ("observable behavior": inputs/outputs, or changing volatile variables).
Apart from that, the compiler is free to do everything it wants (it only needs to resemble the observable behavior as-if it wouldn't have optimized at all).
Only when the argument is an rvalue (most temporary), the compiler is allowed to optimize the copy to the by-value parameter even if the copy constructor has observable side effects.
Only if the function is not exported there is a chance the compiler to convert call-by-reference to call-by-value (or vise-versa).
Otherwise, due to the calling convention, the function must keep the call-by-value/reference semantic.
I'm not aware of any general guarantees that this will be done, but if the called function is inlined, then this would then allow the compiler to see that an unnecessary copy is being made, and if the optimization level is high enough, the copy operation would be eliminated. GCC can do this at least.
You might want to think about whether the class of this parameter value has a copy constructor or not. If it doesn't, then the performance difference between pass-by-value and pass-by-const-ref is probably neglible.
On the other hand, if class does have a copy constructor that does stuff, then the optimization you are hoping for probably will not happen because the compiler cannot remove the call to the constructor--it cannot know that the side effects of the constructor are not important to you.
You might be able to get more useful answers if you say what the class of the parameter is, or if it is a custom class, describe what fields it has and whether it has a copy constructor.
With all optimisations the answer is generally "maybe". The only way to check is to examine the output assembly and see what it's really doing. If the standard allows it, whether or not it really happens is down to the whims of the compiler. You should not rely on it happening because an arbitrary change elsewhere in your codebase may change the heuristics used by the optimizer which might cause it to stop performing a certain optimization.
Play it safe: code it how you intend - pass by reference if that's what you want. However, if you're writing templated code which could work on types of any size, the choice is not so clear. Personally I'd side with passing by const reference - the compiler could also perform a different optimisation, where a small type which can fit inside the size of a reference is passed by value, rather than by const reference. But again, it might happen, it might not.
This post is an excellent reference to this kind of optimization:
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/