When is RVO garanteed to apply / does apply with C++20 compilers - c++

The C++ core guidelines states that
F.20: For “out” output values, prefer return values to output
parameters
But then gives the following exception:
struct Package { // exceptional case: expensive-to-move object
char header[16];
char load[2024 - 16];
};
Package fill(); // Bad: large return value
void fill(Package&); // OK
Isn't it supposed to be a case where the return value optimization kicks in ? Is RVO prevented in this case ? Or still not as efficient as passing by reference ? Or is it that some compilers don't manage to do it ?
More generally, when should I rely on the compiler optimizing return values as efficiently as the pass-by-reference technique ?

"Plain" RVO (i.e., returning a prvalue or "temporary" in common parlance) is guaranteed in C++17 and well-supported even before that.
NRVO (i.e., returning a local variable) can be finicky and is not guaranteed, and if it's not performed then you get a move instead. If your move is expensive, you may want to avoid that.
In the example, there's a decent chance that fill needs to use the latter.

Or still not as efficient as passing by reference ?
If RVO applies, then it is equally efficient to return a value, as it is to use an output reference.
Is RVO prevented in this case?
No. Being "big" does not prevent the object from being RVO'd.
When is RVO garanteed to apply / does apply with C++20 compilers
A case where it does not apply:
... A return statement can involve an invocation of a constructor to perform a copy or move of the operand if it is not a prvalue or if its type differs from the return type of the function.
So, it depends on the implementation of the function whether copy-elision is guaranteed.
The guidelines indeed fail to explain why the recommendation should be followed.
Note that the exception says:
Exceptions
If a type is expensive to move (e.g., array<BigPOD>), consider allocating it on the free store and return a handle (e.g., unique_ptr), or passing it in a reference to non-const target object to fill (to be used as an out-parameter).
The highlighted suggestion in the exception makes more sense to me. It makes it clear that the object is too big for stack, and thus reduces the chance stack overflows.

Related

C++: Can the Compiler Optimize a Passing by Value?

One commonly known compiler optimisation is is the so-called return value optimisation. This optimisation basically allows the compiler to not copy a local variable that is being returned from a function, but instead moving it.
However, I was wondering if the same is also possible for passing arguments to a function by value if it is known that the return value of the function will overwrite the original argument.
Here is an example. Let's assume we have the following function:
std::vector<Foo> modify(std::vector<Foo> data) {
/* Do some funny things to data */
return data;
}
This function is then used in the following way:
std::vector<Foo> bigData = /* big data */;
bigData = modify(bigData); // Here copying the data into the function could be omitted
Now, in this case it can be clearly determined that the return value of the function call will override the argument that is passed into the function per value. My question is whether current compilers are able to optimise this code in a way so that the argument data is not copied when passed to the function, or if this might even be a part of the so-called return value optimisation.
Update
Let's take C++11 into account. I wonder if the following understanding is correct: If the value passed to a function parameter by value is an r-value, and the type of the parameter has a move-constructor, the move constructor will be used instead of the copy constructor.
For example:
std::vector<Foo> bigData = /* big data */;
bigData = modify(std::move(bigData));
If this is assumption is correct, this eliminates the copy operation when passing the value. From the answers already given it seems that the optimisation I referred to earlier is not commonly undertaken. Looking at this manual approach I don't really understand why, as appears to be pretty straightforward to apply.
It's hard to say for sure because in principle compilers can optimize many things, as long as they are certain it has the same behavior. However, in my experience, this optimization will not occur without inlining. Consider the following code:
__attribute__((noinline)) std::vector<double> modify(std::vector<double> data) {
std::sort(data.begin(), data.end());
return data;
}
std::vector<double> blah(std::vector<double> v) {
v = modify(v);
return v;
}
You can look at the assembly generated for this for various compilers; here I have clang 4.0 with O3 optimization: https://godbolt.org/g/xa2Dhf. If you look at the assembly carefully, you'll see a call to operator new in blah. This proves that blah is indeed performing a copy in order to call modify.
Of course, if inlining occurs, it should be pretty trivial for the compiler to remove the copy.
In C++11 the compiler could determine that bigData is reassigned after use in the function and pass it as rvalue, but there is no guarantee for that, unlike for the RVO (from c++17).
For std::vector at least you can make sure this happens by calling the function as modify(std::move(bigData)), which will construct the value in modify from the rvalue reference, which it cannot optimize with the RVO afaik, because it is the function parameter, which is explicitly excluded from this optimization (3rd point here). However the compiler should understand that the return value is an r-value, and move it into big-data again.
Whether some compilers elide a move from an object into a function and out of the function back into the object I don't know for sure, but I know nothing that explicitly allows it, and since the move-constructor could have observable side-effects, that probably means, that it is not allowed (cf. the Notes section in above link).
That is really compiler specific and depends on how you perform operations(whether we are modifying the data or not) with the data. Mostly you shouldn't expect the compiler to do such kind of optimizations unless you really benchmark it. I did some tests with VS2012 compiler that performs copy operations though we don't modify it.
Please have a look at this post(Does the compiler optimize the function parameters passed by value?), that may give you a better idea I hope.

When is return by value ok?

I'm still not quite sure when return-by-value is a good idea in C++ an when not. In the following case, is it ok?
vector<int> to_vec(const Eigen::MatrixXi& in){
vector<int> out;
// copy contents of in into out
return out;
}
Eigen::MatrixXi to_eigen(const vector<int>& in){
Eigen::MatrixXi out;
// copy contents of in into out
return out
}
Depending on how those objects vector and MatrixXi actually work, it could result in an expensive copy. On the other hand, I assume that they leverage C++'s move functionality to inexpensively copy the by reusing the underlying data.
Without exactly knowing the implementation, what can I assume?
In such a situation where you're declaring a local variable, initializing it and returning it by value, you can be pretty safe in assuming that your compiler will elide the copy.
This case is known as named return value optimization. Essentially, instead of allocating the return value in the function call, it'll be done at the call site and passed in as a reference. Returning by value is the best choice here, as you don't need to declare a variable at the call site to pass in, but the performance will be as if you had.
In C++17, copy elision will be mandatory in most cases involving prvalues (e.g. T t = get_t(); or return get_t()), but is still optional for NRVO.
The Thumb rules regarding return values in C++ are:
never return a reference to a local variable
never return a pointer to a local variable
don't return a named value using move semantics
as for (3) - This is a known concern with C++ - we all learned that when an object returns by value - it activates the copy constructor. this is theoretically true, but practically wrong. the compiler will utilize copy elision on objects when optimization are turned on.
copy elision is an optimization technique that makes the value be created within the caller scope and not in the callee scope, hence preventing an expensive copy. modification on that object will take place in the callee scope.
as for (1) and (2), there is also a corner case regarding coroutines and generators, but unless you know you're dealing with them, (1) and (2) are always valid.

Functions returning a collection of objects in C++

In my current project I need to implement quite a few functions/methods that take some parameters and generate a collection of results (rather large). So in order to return this collection without copying, I can either create a new collection and return a smart pointer:
boost::shared_ptr<std::vector<Stuff> > generate();
or take a reference to a vector which will be populated:
void generate(std::vector<Stuff> &output);
Both approaches have benefits. The first clearly shows that the vector is the output of the function, it is trivial to use in a parallelized scenario, etc. The second might be more efficient when called in a loop (because we don't allocate memory every time), but then it is not that obvious that the parameter is the output, and someone needs to clean the old data from the vector...
Which would be more customary in real life (i.e. what is the best practise)? In C#/java I would argue that the first one, what is the case in C++?
Also, is it possible to effectively return a vector by value using C++11? What would the pitfalls be?
do correctness first, then optimize if necessary
with both move semantics and Return Value Optimization conspiring to make an ordinary function result non-copying, you would probably have to work at it to make it sufficiently inefficient to be worth optimization work
so, just return the collection as a function result, then MEASURE if you feel that it's too slow
You should return by value.
is it possible to effectively return a vector by value using C++11?
Yes, C++11 supports move semantics. You return a value, but the compiler knows it's a temporary, and therefore can invoke a special constructor (move constructor) that is especially designed to simply "steal the guts" of the returned object. After all, you won't use that temporary object anymore, so why copying it when you can just move its content?
Apart from this, it may be worth mentioning that most C++ compilers, even pre-C++11, implement (Named) Return Value Optimization, which would elide the copy anyway, incurring in no overhead. Thus, you may want to actually measure the performance penalty you (possibly) get before optimizing.
I think you should pass by reference, or return a shared pointer, only when you need reference semantics. This does not seem to be your case.
There is an alternative approach. If you can make your functions template, make them take an output iterator (whose type is a template argument) as argument:
tempalte<class OutputIterator>
void your_algorithm(OutputIterator out) {
for(/*condition*/) {
++out = /* calculation */;
}
}
This has the advantage that the caller can decide in what kind of collection he wants to store the result (the output iterator could for instance write directly to a file, or store the result in a std::vector, or filter it, etc.).
The best practise will probably be surprising to you. I would recommend returning by value in both C++03 and C++11.
In C++03, if you create a std::vector local to generate and return it, the copy may be elided by the compiler (and almost certainly will be). See C++03 §12.8/15:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object with the same cv-unqualified type as the function return type, the copy operation can be omitted by constructing the automatic object directly into the function's return value
In C++11, if you create a std::vector local to generate and return it, the copy will first be considered as a move first (which will already be very fast) and then that may be elided (and almost certainly will be). See C++11 §12.8/31:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
And §12.8/32:
When the criteria for elision of a copy operation are met or would be met save for the fact that the source object is a function parameter, and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
So return by value!
Believe it or not, I'm going to suggest that instead of either of those approaches just take the obvious implementation and return by value! Compilers are very often able to optimize away the notional copy that would be induced, removing it completely. By writing the code in the most obvious manner you make it very clear to future maintainers what the intent is.
But let's say you try return by value and your program runs too slow and let's further suppose that your profiler shows that the return by value is in fact your bottleneck. In this case I would allocate the container on the heap and return as an auto_ptr in C++03 or a unique_ptr in C++11 to clearly indicate that ownership is being transferred and that the generate isn't keeping a copy of that shared_ptr for its own purposes later.
Finally, the series at http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ provides a great perspective on almost the exact same question.

Regarding returning containers in C++: pointer VS non-pointer

I need to get this straight. With the code below here:
vector<unsigned long long int> getAllNumbersInString(string line){
vector<unsigned long long int> v;
string word;
stringstream stream(line);
unsigned long long int num;
while(getline(stream, word, ',')){
num = atol(word.c_str());
v.push_back(num);
}
return v;
}
This sample code simply turns an input string into a series of unsigned long long int stored in vector.
In this case above, if I have another function calls this function, and we appear to have about 100,000 elements in the vector, does this mean, when we return it, a new vector will be created and will have elements created identically to the one in the function, and then the original vector in the function will be eliminated upon returning? Is my understanding correct so far?
Normally, I will write the code in such a way that all functions will return pointer when it comes to containers, however, program design-wise, and with my understanding above, should we always return a pointer when it comes to container?
The std::vector will most likely (if your compiler optimizations are turned on) be constructed directly in the function's return value. This is known as copy/move elision and is an optimization the compiler is allowed to make:
in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv-unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value
This quote is taken from the C++11 standard but is similar for C++03. It is important to note that copy/move elision does not have to occur at all - it is entirely up to the compiler. Most modern compilers will handle your example with no problems at all.
If elision does not occur, C++11 will still provide you with a further benefit over C++03:
In C++03, without copy elision, returning a std::vector like this would have involved, as you say, copying all of the elements over to the returned object and then destroyed the local std::vector.
In C++11, the std::vector will be moved out of the function. Moving allows the returned std::vector to steal the contents of the std::vector that is about to be destroyed. This is much more efficient that copying the contents over.
You may have expected that the object would just be copied because it is an lvalue, but there is a special rule that makes copies like this first be considered as moves:
When the criteria for elision of a copy operation are met [...] and the object to be copied is designated by an lvalue, overload resolution to select the constructor for the copy is first performed as if the object were designated by an rvalue.
As for whether you should return a pointer to your container: the answer is almost certainly no. You shouldn't be passing around pointers unless its completely necessary, and when it is necessary, you're much better off using smart pointers. As we've seen, in your case it's not necessary at all because there's little to no overhead in passing it by value.
It is safe, and I would say preferable, to return by value with any reasonable compiler. The C++ standard allows copy elision, in this case named return value optimization (NRVO), which means this extra copy you are worried about doesn't take place.
Note that this is a case of an optimization that is allowed to modify the observable behaviour of a program.
Note 2. As has been mentioned in other answers, C++11 introduces move semantics, which means that, in cases where RVO doesn't apply, you may still have a very cheap operation where the contents of the object being returned are transfered to the caller. In the case of std::vector, this is extremely cheap. But bear in mind that not all types can be moved.
Your understanding is correct.
But compilers can apply copy elision through RVO and NRVO and remove the extra copy being generated.
Should we always return a pointer when it comes to container?
If you can, ofcourse you should avoid retun by value especially for non POD types.
That depends on whether or not you need reference semantics.
In general, if you do not need reference semantics, I would say you should not use a pointer, because in C++11 container classes support move semantics, so returning a collection by value is fast. Also, the compiler can elide the call to the moved constructor (this is called Named Return Value Optimization or NRVO), so that no overhead at all will be introduced.
However, if you do need to create separate, consistent views of your collection (i.e. aliases), so that for instance insertions into the returned vector will be "seen" in several places that share the ownership of that vector, then you should consider returning a smart pointer.

Should I return std::strings?

I'm trying to use std::string instead of char* whenever possible, but I worry I may be degrading performance too much. Is this a good way of returning strings (no error checking for brevity)?
std::string linux_settings_provider::get_home_folder() {
return std::string(getenv("HOME"));
}
Also, a related question: when accepting strings as parameters, should I receive them as const std::string& or const char*?
Thanks.
Return the string.
I think the better abstraction is worth it. Until you can measure a meaningful performance difference, I'd argue that it's a micro-optimization that only exists in your imagination.
It took many years to get a good string abstraction into C++. I don't believe that Bjarne Stroustroup, so famous for his conservative "only pay for what you use" dictum, would have permitted an obvious performance killer into the language. Higher abstraction is good.
Return the string, like everyone says.
when accepting strings as parameters, should I receive them as const std::string& or const char*?
I'd say take any const parameters by reference, unless either they're lightweight enough to take by value, or in those rare cases where you need a null pointer to be a valid input meaning "none of the above". This policy isn't specific to strings.
Non-const reference parameters are debatable, because from the calling code (without a good IDE), you can't immediately see whether they're passed by value or by reference, and the difference is important. So the code may be unclear. For const params, that doesn't apply. People reading the calling code can usually just assume that it's not their problem, so they'll only occasionally need to check the signature.
In the case where you're going to take a copy of the argument in the function, your general policy should be to take the argument by value. Then you already have a copy you can use, and if you would have copied it into some specific location (like a data member) then you can move it (in C++11) or swap it (in C++03) to get it there. This gives the compiler the best opportunity to optimize cases where the caller passes a temporary object.
For string in particular, this covers the case where your function takes a std::string by value, and the caller specifies as the argument expression a string literal or a char* pointing to a nul-terminated string. If you took a const std::string& and copied it in the function, that would result in the construction of two strings.
The cost of copying strings by value varies based on the STL implementation you're working with:
std::string under MSVC uses the short string optimisation, so that short strings (< 16 characters iirc) don't require any memory allocation (they're stored within the std::string itself), while longer ones require a heap allocation every time the string is copied.
std::string under GCC uses a reference counted implementation: when constructing a std::string from a char*, a heap allocation is done every time, but when passing by value to a function, a reference count is simply incremented, avoiding the memory allocation.
In general, you're better off just forgetting about the above and returning std::strings by value, unless you're doing it thousands of times a second.
re: parameter passing, keep in mind that there's a cost from going from char*->std::string, but not from going from std::string->char*. In general, this means you're better off accepting a const reference to a std::string. However, the best justification for accepting a const std::string& as an argument is that then the callee doesn't have to have extra code for checking vs. null.
Seems like a good idea.
If this is not part of a realtime software (like a game) but a regular application, you should be more than fine.
Remember, "Premature optimization is the root of all evil"
It's human nature to worry about performance especially when programming language supports low-level optimization.
What we shouldn't forget as programmers though is that program performance is just one thing among many that we can optimize and admire. In addition to program speed we can find beauty in our own performance. We can minimize our efforts while trying to achieve maximum visual output and user-interface interactiveness. Do you think that could be more motivation that worrying about bits and cycles in a long run... So yes, return string:s. They minimize your code size, and your efforts, and make the amount of work you put in less depressing.
In your case Return Value Optimization will take place so std::string will not be copied.
Beware when you cross module boundaries.
Then it's best to return primitive types since C++ types are not necessarily binary compatible across even different versions of the same compiler.
I agree with the other posters, that you should use string.
But know, that depending on how aggressively your compiler optimizes temporaries, you will probably have some extra overhead (over using a dynamic array of chars). (Note: The good news is that in C++0a, the judicious use of rvalue references will not require compiler optimizations to buy efficiency here - and programmers will be able to make some additional performance guarantees about their code without relying on the quality of the compiler.)
In your situation, is the extra overhead worth introducing manual memory management? Most reasonable programmers would disagree - but if your application does end up having performance issues, the next step would be to profile your application - thus, if you do introduce complexity, you only do it once you have good evidence that it is needed to improve overall efficiency.
Someone mentioned that Return Value optimization (RVO) is irrelevant here - I disagree.
The standard text (C++03) on this reads (12.2):
[Begin Standard Quote]
Temporaries of class type are created in various contexts: binding an rvalue to a reference (8.5.3), returning an rvalue (6.6.3), a conversion that creates an rvalue (4.1, 5.2.9, 5.2.11, 5.4), throwing an exception (15.1), entering a handler (15.3), and in some initializations (8.5). [Note: the lifetime of exception objects is described in 15.1. ] Even when the creation of the temporary object is avoided (12.8), all the semantic
restrictions must be respected as if the temporary object was created. [Example: even if the copy constructor is not called, all the semantic restrictions, such as accessibility (clause 11), shall be satisfied. ]
[Example:
struct X {
X(int);
X(const X&);
˜X();
};
X f(X);
void g()
{
X a(1);
X b = f(X(2));
a = f(a);
}
Here, an implementation might use a temporary in which to construct X(2) before passing it to f() using X’s copy-constructor; alternatively, X(2) might be constructed in the space used to hold the argument. Also, a temporary might be used to hold the result of f(X(2)) before copying it to b using X’s copyconstructor; alternatively, f()’s result might be constructed in b. On the other hand, the expression a=f(a) requires a temporary for either the argument a or the result of f(a) to avoid undesired aliasing of
a. ]
[End Standard Quote]
Essentially, the text above says that you can possibly rely on RVO in initialization situations, but not in assignment situations. The reason is, when you are initializing an object, there is no way that what you are initializing it with could ever be aliased to the object itself (which is why you never do a self check in a copy constructor), but when you do an assignment, it could.
There is nothing about your code, that inherently prohibits RVO - but read your compiler documentation to ensure that you can truly rely on it, if you do indeed need it.
I agree with duffymo. You should make an understandable working application first and then, if there is a need, attack optimization. It is at this point that you will have an idea where the major bottlenecks are and will be able to more efficiently manage your time in making a faster app.
I agree with #duffymo. Don't optimize until you have measured, this holds double true when doing micro-optimizations. And always: measure before and after you've optimized, to see if you actually changed things to the better.
Return the string, it's not that big of a loss in term of performance but it will surely ease your job afterward.
Plus, you could always inline the function but most optimizer will fix it anyways.
If you pass a referenced string and you work on that string you don't need to return anything. ;)