So for example, which of these is faster:
if(object.GetMemberX().IsSomething())
{
Store(object.GetMemberX());
object.GetMemberX().FlagSomething();
}
or
typeX x = object.GetMemberX();
if(typeX.IsSomething())
{
Store(x);
x.FlagSomething();
}
I imagine that if GetMemberX() returns a pointer or a reference then in the first example, the compiler can't optimize away the two calls because it has no guarantee that the returned pointer/reference is the same for each invocation?
But in the second example I'm storing it?
If this is true, does it only apply to methods that return a pointer/reference? If my methods return by value will they benefit / be hindered by the different calls?
This question cannot be answered in the C++ general context ... because it depends on the compiler implementation and optimisation level !
You even cannot be sure that a particular compiler at a particular optimisation level would not generate exactly same code for both versions, because ultimately the actions should be the same.
My advice is to just use the common rule : first write code that is readable (by you and if possible peers ...). When comes the question for performance, profile the program and only optimize the small parts that deserve it, always by benchmarking the different possibilities.
Of course the above concerns mainly low level optimisation like what you are asking here. You should always choose algorythms coherent with the expected usage ...
The second version is more readable. However, it may invoke the copy constructor of typeX. To be readable and fast, use a reference or pointer:
typeX& x = object.GetMemberX();
if(typeX.IsSomething())
{
Store(x);
x.FlagSomething();
}
The only case where you shouldn't use a reference is when object.GetMemberX() returns by value, in that case your second version is the way to go. In that case, storing the return value does not incur overhead because the compiler can optimize the copy away (copy elision).
Related
We have a code base that uses out params extensively because every function can fail with some error enum.
This is getting very messy and the code is sometimes unreadable.
I want to eliminate this pattern and bring a more modern approach.
The goal is to transform:
error_t fn(param_t *out) {
//filling 'out'
}
param_t param;
error_t err = fn(¶m);
into something like:
std::expected<error_t, param_t> fn() {
param_t ret;
//filling 'ret'
return ret;
}
auto& [err, param] = fn();
The following questions are in order to convince myself and others this change is for the best:
I know that on the standard level, NRVO is not mandatory (unlike RVO in c++17) but practically is there any chance it won't happen in any of the major compilers?
Are there any advantages of using out parameters instead of NRVO?
Assuming NRVO happens, is there a a significant change in the generated assembly (assuming an optimized expected implementation [perhaps with the boolean representing whether an error occured completly disappear])?
First off, a few assumptions:
We are looking at functions that are not being inlined. It's going to be almost guaranteed to be absolutely equivalent in that case.
We are going to assume that the call sites of the function actually check the error condition before using the returned value.
We are going to assume that the returned value has not been pre-initialized with partial data.
We are going to assume that we only care about optimized code here.
That being established:
I know that on the standard level, NRVO is not mandatory (unlike RVO in c++17) but practically is there any chance it won't happen in any of the major compilers?
Assuming that NRVO is being performed is a safe bet at this point. I'm sure someone could come up with a contrived situation where it wouldn't happen, but I generally feel confident that in almost all use-cases, NRVO is being performed on current modern compilers.
That being said, I would never rely on this behavior for program correctness. I.E. I wouldn't make a weird copy-constructor with side-effects with the assumption that it doesn't get invoked due to NRVO.
Are there any advantages of using out parameters instead of NRVO?
In general no, but like all things in C++, there are edge-case scenarios where it could come up. Explicit memory layouts for maximizing cache coherency would be a good use-case for "returning by pointer".
Assuming NRVO happens, is there a a significant change in the generated assembly (assuming an optimized expected implementation [perhaps with the boolean representing whether an error occured completly disappear])?
That question doesn't make that much sense to me. expected<> behaves a lot more like a variant<> than a tuple<>, so the "boolean representing whether an error occured completly disappear" doesn't really make sense.
That being said, I think we can use std::variant to estimate:
https://godbolt.org/g/XpqLLG
It's "different" but not necessarily better or worse in my opinion.
When returning a container, I always had to determine if I should use return value or use output parameter. If the performance matter, I chose the second option, otherwise I always chose the first option because it is more intuitive.
Frankly, I personally have been strongly objected about output parameters, possibly because of my mathematical background, but it was kind of OK to use them when I had no other options.
However, the things have been completely changed when it comes to generic programming. There are situations I encountered where a function may not know whether or not the object it returns is a huge container or just a simple value.
Consistently using output parameters may be the solution that I want to know if I can avoid. It is just so awkward if I have to do
int a;
f(a, other_arguments);
compared to
auto a = f(other_arguments);
Furthermore, sometimes the return type of f() has no default constructor. If output parameters are used, there is no graceful way to deal with that case.
I wonder if it is possible to return a "modifier object", a functor taking output parameters to modify them appropriately. (Perhaps this is a kind of lazy evaluation?) Well, returning such objects is not a problem, but the problem is I can't insert an appropriate overload of the assignment operator (or constructor) that takes such an object and triggers it to do its job, when the return type belongs to a library that I can't touch, e.g., std::vector. Of course, conversion operators are not helpful as they have no access to existing resources prepared for the target object.
Some people might ask why not use assign(); define a "generator object" which has begin() & end(), and pass those iterators to std::vector::assign. It is not a solution. For the first reason, the "generator object" does not have the full access to the target object and this may limit what could be done. For the second and more important reason, the call site of my function f() may also be a template which does not know the exact return type of f(), so it cannot determine which of the assignment operator or the assign() member function should be used.
I think that the "modifier object" approach to modify containers should have been already discussed in the past as it is not at all a new idea.
To sum up,
Is it possible to use return values to simulate what would happen when output parameters are used instead, in particular, when outputs are containers?
If not, was adding those supports to the standard discussed before? If it was, what were the issues? Is it a terrible idea?
Edit
The code example I've put above is misleading. The function f() may be used to initialize a local variable, but it may be also used to modify existing variables defined elsewhere. For the first case, as Rakete1111 mentioned, there is no problem with return by value as copy elision comes into play. But for the second case, there may be unnecessary resource releasing/acquiring.
I don't think your "modifier object" was ever proposed (AFAIK). And it will never go into the standard. Why? Because we already have a way to get rid of the expensive copy, and that is return by value (plus compiler optimizations).
Before C++17, compilers were allowed to do basically almost the same thing. This optimization is known as (N)RVO, which optimizes away the copy when returning a (named) temporary from a function.
auto a = f(other_arguments);
Will not return a temporary, then copy it into a. The compiler will optimize the copy away entirely, it is not needed. In theory, you cannot assume that your compiler supports this, but the three major ones do (clang, gcc, MSVC) so no need to worry - I don't know about ICC and the others, so I can't say.
So, as there is no copy (or move) involved, there is no performance penalty of using return values instead of output parameters (most probably, if for some reason your compiler doesn't support it, you'll get a move most of the time). You should always use return parameters if possible, and only use output parameters or some other technique if you measure that you get significantly better performance otherwise.
(Edited, based on comments)
You are right you should avoid output parameters if possible, because the code using them is harder to read and to debug.
Since C++11 we have a feature called move constructors (see reference). You can use std::move on all primitive and STL containers types. It is efficient (question: Efficiency difference between copy and move constructor), because you don't actually copy the values of the variables. Only the pointers are swapped. For your own complex types, you can write your own move constructor.
On the other hand, the only thing you can do is to return a reference to a temporary, which is of undefined behavior, for example:
#include <iostream>
int &&add(int initial, int howMany) {
return std::move(initial + howMany);
}
int main() {
std::cout << add(5, 2) << std::endl;
return 0;
}
Its output could be:
7
You could avoid the problem of temporary variables using global or static ones, but it is bad either. #Rakete is right, there is no good way of achieving it.
In this talk (sorry about the sound) Chandler Carruth suggests not passing by reference, even const reference, in the vast majority of cases due to the way in which it limits the back-end to perform optimisation.
He claims that in most cases the copy is negligible - which I am happy to believe, most data structures/classes etc. have a very small part allocated on the stack - especially when compared with the back-end having to assume pointer aliasing and all the nasty things that could be done to a reference type.
Let's say that we have large object on the stack - say ~4kB and a function that does something to an instance of this object (assume free-standing function).
Classically I would write:
void DoSomething(ExpensiveType* inOut);
ExpensiveType data;
...
DoSomething(&data);
He's suggesting:
ExpensiveType DoSomething(ExpensiveType in);
ExpensiveType data;
...
data = DoSomething(data);
According to what I got from the talk, the second would tend to optimise better. Is there a limit to how big I make something like this though, or is the back-end copy-elision stuff just going to prefer the values in almost all cases?
EDIT: To clarify I'm interested in the whole system, since I feel that this would be a major change to the way I write code, I've had use of refs over values drilled into me for anything larger than integral types for a long time now.
EDIT2: I tested it too, results and code here. No competition really, as we've been taught for a long time, the pointer is a far faster method of doing things. What intrigues me now is why it was suggested during that talk that we move to pass by value, but as the numbers don't support it, it's not something I'm going to do.
I have now watched parts of Chandler's talk. I think the general discussion along the lines "should I now always pass by value" does not do his talk justice. Edit: And actually his talk has been discussed before, here value semantics vs output params with large data structures and in a blog from Eric Niebler, http://ericniebler.com/2013/10/13/out-parameters-vs-move-semantics/.
Back to Chandler. In the key note he specifically (around the 4x-5x minute mark mentioned elsewhere) mentions the following points:
If the optimizer cannot see the code of the called function you have much bigger problems than passing refs or values. It pretty much prevents optimization. (There is a follow-up question at that point about link time optimization which may be discussed later, I don't know.)
He recommends the "new classical" way of returning values using move semantics. Instead of the old school way of passing a reference to an existing object as an in-out parameter the value should be constructed locally and moved out. The big advantage is that the optimizer can be sure that no part of the object is alisased since only the function has access to it.
He mentions threads, storing a variable's value in globals, and observable behaviour like output as examples for unknowns which prevent optimization when only refs/pointers are passed. I think an abstract description could be "the local code can not assume that local value changes are undetected elsewhere, and it cannot assume that a value which is not changed locally has not changed at all". With local copies these assumptions could be made.
Obviously, when passing (and possibly, if objects cannot be moved, when returning) by value, there is a trade-off between the copy cost and the optimization benefits. Size and other things making copying costly will tip the balance towards reference strategies, while lots of optimizable work on the object in the function tips it towards value passing. (His examples involved pointers to ints, not to 4k sized objects.)
Based on the parts I watched I do not think Chandler promoted passing by value as a one-fits-all strategy. I think he dissed passing by reference mostly in the context of passing an out parameter instead of returning a new object. His example was not about a function which modified an existing object.
On a general note:
A program should express the programmer's intent. If you need a copy, by all means do copy! If you want to modify an existing object, by all means use references or pointers. Only if side effects or run time behavior become unbearable; really only then try do do something smart.
One should also be aware that compiler optimizations are sometimes surprising. Other platforms, compilers, compiling options, class libraries or even just small changes in your own code may all prevent the compiler from coming to the rescue. The run-time cost of the change would in many cases come totally unexpected.
Perhaps you took that part of the talk out of context, or something. For large objects, typically it depends on whether the function needs a copy of the object or not. For example:
ExpensiveType DoSomething(ExpensiveType in)
{
cout << in.member;
}
you wasted a lot of resource copying the object unnecessarily, when you could have passed by const reference instead.
But if the function is:
ExpensiveType DoSomething(ExpensiveType in)
{
in.member = 5;
do_something_else(in);
}
and we did not want to modify the calling function's object, then this code is likely to be more efficient than:
ExpensiveType DoSomething(ExpensiveType const &inr)
{
ExpensiveType in = inr;
in.member = 5;
do_something_else(in);
}
The difference comes when invoked with an rvalue (e.g. DoSomething( ExpensiveType(6) ); The latter creates a temporary , makes a copy, then destroys both; whereas the former will create a temporary and use that to move-construct in. (I think this can even undergo copy elision).
NB. Don't use pointers as a hack to implement pass-by-reference. C++ has native pass by reference.
I tend do use the following notation a lot:
const Datum& d = a.calc();
// continue to use d
This works when the result of calc is on the stack, see http://herbsutter.com/2008/01/01/gotw-88-a-candidate-for-the-most-important-const/. Even though compilers probably optimize here, it feels nice to explicitly avoid temporary objects.
Today I realized, that the content of d became invalid after data was written to a member of a. In this particular case the get function simply returned a reference to another member, which was completely unrelated to the write however. Like this:
const Datum& d = a.get();
// ... some operation ...
a.somedata = Datum2();
// now d is invalid.
Again, somedata has nothing to do with d or get() here.
Now I ask myself:
Which side-effects could lead to the invalidation?
Is it bad-practice to assign return values to a const reference? (Especially when I don't know the interior of the function)
My application is single-threaded except for a Qt GUI-Thread.
You seem to be afraid of elision failing. That auto x = some_func(); will result in an extra move construct over auto&& x = some_func(); when some_func() returns a temporary object.
You shouldn't be.
If elision fails, it means your compiler is incompetent, or compiling with frankly hostile settings. And you cannot survive hostile settings or incompetent compilers: incompetent compilers can turn a+=b with integers into for (int i = 0; i < abs(b); ++i) {if (b>0) ++a; else --a;} and violate not one iota of the standard.
Elision is a core language feature. Don't write bad code just because you don't trust it will happen.
You should capture by reference when you want a reference to the data provided by the function, and not an independently stable copy of the data. If you don't understand the function's return value's lifetime, capturing by reference is simply not safe.
Even if you know that the data will be stable, more time is spent maintaining code than writing it: the person reading your code must be able to see at a glance that your assumption holds. And non-local bugs are bad: a seemingly innocuous change to the function you call should not break your code.
The end result is, take things by value unless you have a good reason not to.
Taking things by value makes it easier for you, and the compiler, to reason about your code. It increases locality.
When you have a good reason not to, then take things by reference.
Depending on context, this good reason might not have to be very strong. But it shouldn't be based off an assumption of an incompetent compiler.
Premature pessimization should be avoided, but so should premature optimization. Taking (or storing) references instead of values should be something you do when and if you have identified performance problems. Write clean, easy to understand code. Shove complexity into tightly written types, and have the exterior interface be clean and simple. Take things by value, because values decouple state.
Optimization is fungible. By making more of your code simpler, you can make it easier to work with (and be more productive). Then, when you identify parts where performance matters, you can expend effort there to make the code faster.
A big example is foundational code: foundational code (which is used everywhere) quickly becomes a general drag on performance if not written with performance and ease of use in mind. In that case, you want to hide complexity inside the type, and expose a simple easy to use exterior interface that doesn't require the user to understand the guts.
But code in a random function somewhere? Use values, the easiest to use containers with the friendliest O-notation to the most expensive operation you do and the easiest interface. Vectors if reasonable (avoid premature pessmiziation), but don't sweat a few maps.
Find the 1%-10% of your code that takes up 90%-99% of the time, and make that fast. If the rest of your code has good O-notation performance (so it won't become shockingly slower with larger data sets than you test with), you'll be in good shape. Then start testing with ridiculous data-sets, and find the slow parts then.
Which side-effects could lead to the invalidation?
Holding a reference to the internal state of a class (i.e., versus extending the lifetime of a temporary) and then calling any non-const member functions may potentially invalidate the reference.
Is it bad-practice to assign return values to a const reference? (Especially when I don't know the interior of the function)
I would say the bad practice is to hold a reference to the internal state of a class instance, mutating the class instance, and continuing to use the original reference (unless it is documented that the non-const functions don't invalidate the reference)
I am not sure that I am answering exactly what you expected to hear but...
const keyword has nothing to do with the 'unsafe' here.
Even if you return non const reference, it may become invalid. const means that it is not allowed to modify it. For example if your get() returns a const member or get() itself is defined as const like this const some_refetence_t& get() const { return m_class_member; }.
Now regarding your questions at the end:
Which side-effects could lead to the invalidation?
There could be many side effects if the original value changes. For example let's assume that the returned value is a reference to an object on the heap which is deleted... or the returned value is cached while the original value gets updates. All these are design problems. If by design such case can take place then the return value should be by value (and in case of caching, it must not be cached! :) ).
Is it bad-practice to assign return values to a const reference?
(Especially when I don't know the interior of the function)
Same thing. If by design you don't have to modify the object that you get (by reference or by value) than it is the best to define it as const. Once you define something as const, the compiler will make sure that you are not trying to modify it somehow in the code.
It is rather question of knowing what your functions return. You should also have a full understanding of the return value type and it's semantics.
This is probably a simple question, but this came across my mind. It is regarding the difference between the two functions below:
T func_one(T obj) { //for the purpose of this question,
return obj + obj; //T is a large object and has an overloaded '+' operator
}
T func_two(T obj) {
T output = obj + obj;
return output;
}
In func_one(), rather than creating an object T, assigning it a value and then returning the object, I just return the value itself without creating a new object. If T was a large object, would func_one() be more efficient than func_two() or does func_one() make an object T anyways when returning the sum of the two objects?
The compiler would optimize away fund_two into something similar to func_one which would then be optimized to something else, long story short, you need not to worry about this, unless you really do need to worry about this, then in that case you can look at the asm output.
Short answer: We can't know
Long answer: it depends highly on how T works and your compilers support for return value optimization.
Any function which returns by value can have RVO or NRVO optimization applied to it.
This means that it will construct the return value directly into the calling function, eliminating the copy constructor. As this is the problem with returning large objects by value, this will mean a substantial gain in performance.
The difference between func_one and func_two is that func_one returns an anonymous temporary value, an r-value; this means RVO can trivially be used. func_two returns a named value, an l-value, so NRVO, a much harder optimization, will be used. However, func_two is trivial, so it will almost certainly have NRVO applied, and both functions will be basically identical.
This is assuming you have a modern or even semi-modern compiler; if not, it will depend highly on how you implemented T.
If T has move semantics, your compiler will instead be able to move rather than copy. This should apply to both functions, as temporaries exist in both; however, as func_two returns a named value, it may not be capable of using move semantics. It's up to the compiler, and if the compiler isn't doing RVO or NRVO, I doubt it's doing move.
Finally, it depends on how + operator and = operator are implemented. If, for example, they were implemented as expression templates, then fun_two still requires an assignment, which will slow it down, where as func_one will simply return a highly optimized temporary.
In Summary
In almost all practical contexts, these are identical. In the vanishingly small window where your compiler is acting very strange, func_one is almost universally faster.
Modern compilers can transform the version with the extra variable to the one without (named return value optimization, this is quite a frequent source of questions here on SO, Why isn't the copy-constructor called when returning LOCAL variable for example). So this is not the overhead you should worry about.
The overhead you should worry about, is the function call overhead. An addition takes a modern CPU at most a single cycle. A function call takes between 10 and 20 cycles, depending on the amount of arguments.
I am a bit unsure what you mean with T in your question (is it a template parameter? is it a class? is it a placeholder for a type that you didn't want to disclose in your question?). However, the question whether you have a function call overhead problem depends on that type. And it depends on whether your compiler can inline your function.
Obviously, if it's inlined, you're fine, there's no function call overhead.
If T is a complex type with an expensive operator+() overload, then you are fine as well.
However, if T is int, for instance, and your function is not inlined, then you have roughly 90% overhead in your function.