Out params vs. NRVO in the presence of error codes

Out params vs. NRVO in the presence of error codes - c++

We have a code base that uses out params extensively because every function can fail with some error enum.
This is getting very messy and the code is sometimes unreadable.
I want to eliminate this pattern and bring a more modern approach.
The goal is to transform:
error_t fn(param_t *out) {
//filling 'out'
}
param_t param;
error_t err = fn(&param);
into something like:
std::expected<error_t, param_t> fn() {
param_t ret;
//filling 'ret'
return ret;
}
auto& [err, param] = fn();
The following questions are in order to convince myself and others this change is for the best:
I know that on the standard level, NRVO is not mandatory (unlike RVO in c++17) but practically is there any chance it won't happen in any of the major compilers?
Are there any advantages of using out parameters instead of NRVO?
Assuming NRVO happens, is there a a significant change in the generated assembly (assuming an optimized expected implementation [perhaps with the boolean representing whether an error occured completly disappear])?

First off, a few assumptions:
We are looking at functions that are not being inlined. It's going to be almost guaranteed to be absolutely equivalent in that case.
We are going to assume that the call sites of the function actually check the error condition before using the returned value.
We are going to assume that the returned value has not been pre-initialized with partial data.
We are going to assume that we only care about optimized code here.
That being established:
I know that on the standard level, NRVO is not mandatory (unlike RVO in c++17) but practically is there any chance it won't happen in any of the major compilers?
Assuming that NRVO is being performed is a safe bet at this point. I'm sure someone could come up with a contrived situation where it wouldn't happen, but I generally feel confident that in almost all use-cases, NRVO is being performed on current modern compilers.
That being said, I would never rely on this behavior for program correctness. I.E. I wouldn't make a weird copy-constructor with side-effects with the assumption that it doesn't get invoked due to NRVO.
Are there any advantages of using out parameters instead of NRVO?
In general no, but like all things in C++, there are edge-case scenarios where it could come up. Explicit memory layouts for maximizing cache coherency would be a good use-case for "returning by pointer".
Assuming NRVO happens, is there a a significant change in the generated assembly (assuming an optimized expected implementation [perhaps with the boolean representing whether an error occured completly disappear])?
That question doesn't make that much sense to me. expected<> behaves a lot more like a variant<> than a tuple<>, so the "boolean representing whether an error occured completly disappear" doesn't really make sense.
That being said, I think we can use std::variant to estimate:
https://godbolt.org/g/XpqLLG
It's "different" but not necessarily better or worse in my opinion.

Related

Is method chaining slow C++?

So for example, which of these is faster:
if(object.GetMemberX().IsSomething())
{
Store(object.GetMemberX());
object.GetMemberX().FlagSomething();
}
or
typeX x = object.GetMemberX();
if(typeX.IsSomething())
{
Store(x);
x.FlagSomething();
}
I imagine that if GetMemberX() returns a pointer or a reference then in the first example, the compiler can't optimize away the two calls because it has no guarantee that the returned pointer/reference is the same for each invocation?
But in the second example I'm storing it?
If this is true, does it only apply to methods that return a pointer/reference? If my methods return by value will they benefit / be hindered by the different calls?

This question cannot be answered in the C++ general context ... because it depends on the compiler implementation and optimisation level !
You even cannot be sure that a particular compiler at a particular optimisation level would not generate exactly same code for both versions, because ultimately the actions should be the same.
My advice is to just use the common rule : first write code that is readable (by you and if possible peers ...). When comes the question for performance, profile the program and only optimize the small parts that deserve it, always by benchmarking the different possibilities.
Of course the above concerns mainly low level optimisation like what you are asking here. You should always choose algorythms coherent with the expected usage ...

The second version is more readable. However, it may invoke the copy constructor of typeX. To be readable and fast, use a reference or pointer:
typeX& x = object.GetMemberX();
if(typeX.IsSomething())
{
Store(x);
x.FlagSomething();
}
The only case where you shouldn't use a reference is when object.GetMemberX() returns by value, in that case your second version is the way to go. In that case, storing the return value does not incur overhead because the compiler can optimize the copy away (copy elision).

Would the following code be optimized to one function call?

I have very little (read no) compiler expertise, and was wondering if the following code snippet would automatically be optimized by a relatively recent (VS2008+/GCC 4.3+) compiler:
Object objectPtr = getPtrSomehow();
if (objectPtr->getValue() == something1) // call 1
dosomething1;
else if (objectPtr->getValue() == something2) // call N (there are a few more)
dosomething2;
return;
where getValue() simply returns a member variable that is one of an enum. (The call has no observable effect)
My coding style would be to make one call before the "switch" and save the value to compare it against each of the somethingX's, but I was wondering if this was a moot point with today's compilers.
I was also unsure of what to google to find the answer to this myself.
Thank you,
AK

It's not moot, especially if the method is mutable.
If getValue is not declared const, the call can't be optimized away, as subsequent calls could return different values.
If it is declared const, it's easier, but also not trivial for the compiler to optimize the call. It would need access to the implementation, to make sure the call doesn't have side effects. There's also the chance that it returns a different value even if marked const (modifies and returns a global).

Unless the compiler can examine the definition of getValue() while it compiles that piece of code, it can't elide the second call because it doesn't know whether that call has observable effects and whether it returns the same value the second time around.
Even if it sees the definition, it probably (this is my wild guess from having a few peeks at some compilers' internals) won't go out of its way to check that. The only chance you stand is the implementation being trivial and inlined twice, and then caught by common subexpression elimination. EDIT: Since the definition is in the header, and quite small, it's likely that this (inlining and subsequent CSE) will ocurr. Still, if you want to be sure, check the output of g++ -O2 -S or your compiler's equivalent.
So in summary, you shouldn't expect the optimization to occur. Then again, getValue is probably quite cheap, so it's unlikely to be worth the manual optimizations. What's an extra line compared to a couple of machine cycles? Not much, in most cases. If you're writing code where it is much, you shouldn't be asking but just checking it (disassembly/profiling).

As other answers have noted, the compiler generally cannot eliminate the second call since there may be side effects.
However, some compilers have a way of telling the compiler that the function has no side effects and that this optimization is allowed. In GCC, a function may be declared pure. For example:
int square(int) __attribute__((pure));
says that the function has “no effects except to return a value, and [the] return value depends only on the parameters and/or global variables.”

You wrote:
My coding style would be to make one call before the "switch" and save the value to compare
it against each of the somethingX's, but I was wondering if this was a moot point
with today's compilers.
Yes, it's a moot point. What the compiler does is it's business. Your hands will be full trying to write maintainable code without trying to micromanage a piece of software that is far better at its job than any of us will ever hope to be.
Focus on writing maintainable code and trust the compiler to carry out its task. If your later find your code is too slow, then you can worry about optimizing.
Remember the proverb:
Premature optimization is the root of all evil.

Is it meaningful to optimize i++ as ++i to avoid the temporary variable?

Someone told me that I can write
for (iterator it = somecontainer.begin(); it != somecontainer.end(); ++it)
instead of
for (iterator it = somecontainer.begin(); it != somecontainer.end(); it++)
...since the latter one has the cost of an extra unused temporary variable. Is this optimization useful for modern compiler? Do I need to consider this optimization when writing code?

It's a good habit to get into, since iterators may be arbitrarily complex. For vector::iterator or int indexes, no, it won't make a difference.
The compiler can never eliminate (elide) the copy because copy elision only eliminates intermediate temporaries, not unused ones. For lightweight objects including most iterators, the compiler can optimize out the code implementing the copy. However, it isn't always obvious when it can't. For example, post-incrementing istream_iterator<string> is guaranteed to copy the last string object read. On a non-reference-counted string implementation, that will allocate, copy, and immediately free memory. The issue is even more likely to apply to heavier, non-iterator classes supporting post-increment.
There is certainly no disadvantage. It seems to have become the predominant style over the past decade or two.

I don't normally consider the prefix ++ an optimization. It's just what I write by default, because it might be faster, it's just as easy to write, and there's no downside.
But I doubt it's worth going back and changing existing code to use the prefix version.

Yes, that's conceptually the right thing to do. Do you care if it's i++ or ++i? No, you don't. Which one is better? The second one is better since it's potentially faster. So you choose the second (pre-increment).
In typical cases there will be no difference in emitted code, but the iterators can have whatever implementation. You can't control it and you can't control whether the compiler will emit good code. What you can control is how you conveys your intent. You don't need the post-increment here.

No. (Stylistically I prefer it, because my native language is English which is mostly a verb-precedes-noun language, and so "increment it" reads more easily than "it increment". But that's style and subjective.)
However, unless you're changing somecontainer's contents in your loop, you might consider grabbing the return value of somecontainer.end() into a temporary variable. What you're doing there will actually call the function on every loop.

It depends on the type to which you're applying operator++. If you are talking about a user defined type, that's going to involve copying the entire UDT, which can be very expensive.
However, if you are talking about a builtin type, then there's likely no difference at all.
It's probably good to get in the habit of using preincrement where possible even when postincrement is fine, simply to be in the habit and to give your code a consistent look. You should be leaning on the compiler to optimize some things for you, but there's no reason to make it's job harder than it has to be for no reason whatsoever.

Consider how post-increment is typically implemented in a user-defined type:
MyIterator operator++(int) {
MyIterator retval(*this);
++*this;
return retval;
}
So we have two questions: can my compiler optimise this in cases where the return value is unused, and will my compiler optimise this in cases where the return value is unused?
As for "can it?", there certainly are cases where it can't. If the code isn't inlined, we're out of luck. If the copy constructor of MyIterator has observable side effects then we're out of luck - copy constructor elision allows return values and copies of temporaries to be constructed in place, so at least the value might only be copied once. But it doesn't allow return values not to be constructed at all. Depending on your compiler, memory allocation might well be an observable side-effect.
Of course iterators usually are intended to be inlined, and iterators usually are lightweight and won't take any effort to copy. As long as we're OK on these two counts, I think we're in business - the compiler can inline the code, spot that retval is unused and that its creation is side-effect free, and remove it. This just leaves a pre-increment.
As for "will it?" - try it on your compiler. I can't be bothered, because I always use pre-increment where the two are equivalent ;-)
For "do I need to consider this optimization when writing code?", I would say not as such, but that premature pessimization is as bad a habit as premature optimization. Just because one is (at best) a waste of time does not mean that the other is noble - don't go out of your way to make your code slower just because you can. If someone genuinely prefers to see i++ in a loop then it's fairly unlikely ever to make their code slower, so they can optimise for aesthetics if they must. Personally I'd prefer that people improve their taste...

It will actually make a difference in unoptimized code, which is to say debug builds.

Trusting the Return Value Optimization

How do you go about using the return value optimization?
Is there any cases where I can trust a modern compiler to use the optimization, or should I always go the safe way and return a pointer of some type/use a reference as parameter?
Is there any known cases where the return value optimization cant be made?,
Seems to me that the return value optimization would be fairly easy for a compiler to perform.

Whenever compiler optimizations are enabled (and in most compilers, even when optimizations are disabled), RVO will take place. NRVO is slightly less common, but most compilers will perform this optimization as well, at least when optimizations are enabled.
You're right, the optimization is fairly easy for a compiler to perform, which is why compilers almost always do it. The only cases where it "can't be made" are the ones where the optimization doesn't apply: RVO only applies when you return an unnamed temporary. If you want to return a named local variable, NRVO applies instead, and while it is slightly more complex for a compiler to implement, it's doable, and modern compilers have no problem with it.

See: http://en.wikipedia.org/wiki/Return_value_optimization#Compiler_support

To have the best chance that it occurs, you can return an object constructed directly in the return statement [can anyone remember the name for this idiom - I've forgotten it]:
Foo f() {
....
return Foo( ... );
}
But as with all optimisations, the compiler can always choose not to do it. And at the end of the day, if you need to return a value thee is no alternative to trusting the compiler - pointers and references won't cut it.

Will the c++ compiler optimize away unused return value?

If I have a function that returns an object, but this return value is never used by the caller, will the compiler optimize away the copy? (Possibly an always/sometimes/never answer.)
Elementary example:
ReturnValue MyClass::FunctionThatAltersMembersAndNeverFails()
{
//Do stuff to members of MyClass that never fails
return successfulResultObject;
}
void MyClass::DoWork()
{
// Do some stuff
FunctionThatAltersMembersAndNeverFails();
// Do more stuff
}
In this case, will the ReturnValue object get copied at all? Does it even get constructed? (I know it probably depends on the compiler, but let's narrow this discussion down to the popular modern ones.)
EDIT: Let's simplify this a bit, since there doesn't seem to be a consensus in the general case. What if ReturnValue is an int, and we return 0 instead of successfulResultObject?

If the ReturnValue class has a non-trivial copy constructor, the compiler must not eliminate the call to the copy constructor - it is mandated by the language that it is invoked.
If the copy constructor is inline, the compiler might be able to inline the call, which in turn might cause a elimination of much of its code (also depending on whether FunctionThatAltersMembersAndNeverFails is inline).

They most likely will if the optimization level causes them to inline the code. If not, they would have to generate two different translations of the same code to make it work, which could open up a lot of edge case problems.

The linker can take care of this sort of thing, even if the original caller and called are in different compilation units.
If you have a good reason to be concerned about the CPU load dedicated to a method call (premature optimization is the root of all evil,) you might consider the many inlining options available to you, including (gasp!) a macro.
Do you REALLY need to optimize at this level?

If return value is an int and you return 0 (as in the edited question), then this may get optimized away.
You have to look at the underlying assembly. If the function is not inlined then the underlying assembly will execute a mov eax, 0 (or xor eax, eax) to set eax (which is usually used for integer return values) to 0. If the function is inlined, this will certainly get optimized away.
But this senario isn't too useful if you're worried about what happens when you return objects larger than 32-bits. You'll need to refer to the answers to the unedit question, which paint a pretty good picture: If everything is inlined then most of it will be optimized out. If it is not inlined, then the functions must be called even if they don't really do anything, and that includes the constructor of an object (since the compiler doesn't know whether the constructor modified global variables or did something else weird).

I doubt most compilers could do that if they were in different compilation objects (ie. different files). Maybe if they were both in the same file, they could.

There is a pretty good chance that a peephole optimizer will catch this. Many (most?) compilers implement one, so the answer is probably "yes".
As others have notes this is not a trivial question at the AST rewriting level.
Peephole optimizers work on a representation of the code at a level equivalent to assembly language (but before generation of actual machine code). There is a chance to notice the load of the return value into a register followed by a overwrite with no intermediate read, and just remove the load. This is done on a case by case basis.

just tried this example on compiler explorer, and at -O3 the mov is not generated when the return value is not used.
https://gcc.godbolt.org/z/v5WGPr

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js