I have very little (read no) compiler expertise, and was wondering if the following code snippet would automatically be optimized by a relatively recent (VS2008+/GCC 4.3+) compiler:
Object objectPtr = getPtrSomehow();
if (objectPtr->getValue() == something1) // call 1
dosomething1;
else if (objectPtr->getValue() == something2) // call N (there are a few more)
dosomething2;
return;
where getValue() simply returns a member variable that is one of an enum. (The call has no observable effect)
My coding style would be to make one call before the "switch" and save the value to compare it against each of the somethingX's, but I was wondering if this was a moot point with today's compilers.
I was also unsure of what to google to find the answer to this myself.
Thank you,
AK
It's not moot, especially if the method is mutable.
If getValue is not declared const, the call can't be optimized away, as subsequent calls could return different values.
If it is declared const, it's easier, but also not trivial for the compiler to optimize the call. It would need access to the implementation, to make sure the call doesn't have side effects. There's also the chance that it returns a different value even if marked const (modifies and returns a global).
Unless the compiler can examine the definition of getValue() while it compiles that piece of code, it can't elide the second call because it doesn't know whether that call has observable effects and whether it returns the same value the second time around.
Even if it sees the definition, it probably (this is my wild guess from having a few peeks at some compilers' internals) won't go out of its way to check that. The only chance you stand is the implementation being trivial and inlined twice, and then caught by common subexpression elimination. EDIT: Since the definition is in the header, and quite small, it's likely that this (inlining and subsequent CSE) will ocurr. Still, if you want to be sure, check the output of g++ -O2 -S or your compiler's equivalent.
So in summary, you shouldn't expect the optimization to occur. Then again, getValue is probably quite cheap, so it's unlikely to be worth the manual optimizations. What's an extra line compared to a couple of machine cycles? Not much, in most cases. If you're writing code where it is much, you shouldn't be asking but just checking it (disassembly/profiling).
As other answers have noted, the compiler generally cannot eliminate the second call since there may be side effects.
However, some compilers have a way of telling the compiler that the function has no side effects and that this optimization is allowed. In GCC, a function may be declared pure. For example:
int square(int) __attribute__((pure));
says that the function has “no effects except to return a value, and [the] return value depends only on the parameters and/or global variables.”
You wrote:
My coding style would be to make one call before the "switch" and save the value to compare
it against each of the somethingX's, but I was wondering if this was a moot point
with today's compilers.
Yes, it's a moot point. What the compiler does is it's business. Your hands will be full trying to write maintainable code without trying to micromanage a piece of software that is far better at its job than any of us will ever hope to be.
Focus on writing maintainable code and trust the compiler to carry out its task. If your later find your code is too slow, then you can worry about optimizing.
Remember the proverb:
Premature optimization is the root of all evil.
Related
We have a code base that uses out params extensively because every function can fail with some error enum.
This is getting very messy and the code is sometimes unreadable.
I want to eliminate this pattern and bring a more modern approach.
The goal is to transform:
error_t fn(param_t *out) {
//filling 'out'
}
param_t param;
error_t err = fn(¶m);
into something like:
std::expected<error_t, param_t> fn() {
param_t ret;
//filling 'ret'
return ret;
}
auto& [err, param] = fn();
The following questions are in order to convince myself and others this change is for the best:
I know that on the standard level, NRVO is not mandatory (unlike RVO in c++17) but practically is there any chance it won't happen in any of the major compilers?
Are there any advantages of using out parameters instead of NRVO?
Assuming NRVO happens, is there a a significant change in the generated assembly (assuming an optimized expected implementation [perhaps with the boolean representing whether an error occured completly disappear])?
First off, a few assumptions:
We are looking at functions that are not being inlined. It's going to be almost guaranteed to be absolutely equivalent in that case.
We are going to assume that the call sites of the function actually check the error condition before using the returned value.
We are going to assume that the returned value has not been pre-initialized with partial data.
We are going to assume that we only care about optimized code here.
That being established:
I know that on the standard level, NRVO is not mandatory (unlike RVO in c++17) but practically is there any chance it won't happen in any of the major compilers?
Assuming that NRVO is being performed is a safe bet at this point. I'm sure someone could come up with a contrived situation where it wouldn't happen, but I generally feel confident that in almost all use-cases, NRVO is being performed on current modern compilers.
That being said, I would never rely on this behavior for program correctness. I.E. I wouldn't make a weird copy-constructor with side-effects with the assumption that it doesn't get invoked due to NRVO.
Are there any advantages of using out parameters instead of NRVO?
In general no, but like all things in C++, there are edge-case scenarios where it could come up. Explicit memory layouts for maximizing cache coherency would be a good use-case for "returning by pointer".
Assuming NRVO happens, is there a a significant change in the generated assembly (assuming an optimized expected implementation [perhaps with the boolean representing whether an error occured completly disappear])?
That question doesn't make that much sense to me. expected<> behaves a lot more like a variant<> than a tuple<>, so the "boolean representing whether an error occured completly disappear" doesn't really make sense.
That being said, I think we can use std::variant to estimate:
https://godbolt.org/g/XpqLLG
It's "different" but not necessarily better or worse in my opinion.
I'm working on optimizing a code where most of the objects are allocated on heap.
What I'm trying to understand is: if/why the compiler might not inline a function call that potentially manipulates data on heap.
To make things more clear, suppose you have the following code:
class A
{
public:
void foo() // non-const function
{
// modify data
i++;
...
}
private:
int i;
// can be anything here, including pointers
};
int main()
{
A a; // allocate something on stack
auto ptr = std::make_unique<A>(); // allocate something on heap
a.foo(); // case 1
ptr->foo(); // case 2
return 0;
}
Is it possible that a.foo() gets inlined while ptr->foo() does not?
My guess is that this might be related to the fact the compiler does not have any guarantee that data on heap won't be modified by another thread. However, I don't understand if/why it can have any impact on inlining.
Assume that there are no virtual functions
EDIT: I guess my question is partially theoretical. Suppose you are implementing a compiler, can you think of any legitimate reason why you won't optimize ptr->foo() while optimizing a.foo()?
My guess is that this might be related to the fact the compiler does not have any guarantee that data on heap won't be modified by another thread. However, I don't understand if/why it can have any impact on inlining.
That is not relevant. Inline function and "regular" function calls have the same effect on the heap.
The implementation, inline or not, is in the code segment anyway.
Is it possible that a.foo() gets inlined while ptr->foo() does not?
Highly unlikely. Both of these calls will be probably inlined if the implementation is visible to the compiler and the compiler decide that it would be beneficial.
I used "case 2" in my code numerous times and it was always inlined using g++.
Although it is mostly implementation specific, there are no real limitation that restrict pointer function call compared to calling using an on stack object (beside the virtual functions which you already mentioned).
You should note that the produced inlined code might still be different. Case 2 will have to first determine the actual address which will have an impact on the performance, but it should be pretty much the same from there.
if/why the compiler might not inline a function call that potentially manipulates data on heap.
The compiler is free to inline or not a function call (and might decide that after devirtualization). The inlining decision is the freedom of the compiler (so inline keyword, like register, is often ignored to make optimizing decisions). The compiler often would decide to inline (or not) every particular call (so every occurrence of the called function name).
Suppose you are implementing a compiler, can you think of any legitimate reason why you won't optimize ptr->foo() while optimizing a.foo()?
This is really easy. Often, (among other criteria) the inlining is decided according to the depth of previously inlined nested function calls, or according the current size of the expanded internal representation. So it does happen that a particular occurrence of ptr->foo() would be inlined (e.g. because it occurs in a small function) but another occurrence of a.foo() won't be inlined.
Remember, inlining decisions is generally taken at each call site. And on some compilers, the thresholds used by the compiler may vary or can be tuned.
But inlining does not always speed up execution time (because of CPU cache and branch predictor issues, and many other mysteries....), and that is yet another reason why sometimes a compiler won't inline a particular call.
For GCC compiler, read about inline functions and various optimization options (notice that -finline-limit=100 and -finline-limit=200 will give different inlining decisions; you could even play with different --params options; the MILEPOST GCC project used machine learning techniques to tune these....).
Perhaps some compilers can more easily do devirtualization for stack allocated data (I really don't know, and compilers are making progress on such issues). This is probably the reason why (perhaps!) heap vs stack allocation could influence inlining decisions.
If there is a C or C++ code like this:
if (func())
;
can compiler optimise out call to function func() if it cannot be sure whether function has any side-effects?
Origin of my question: I sometimes call assert macros in a way like this:
if (func())
assert(0);
if I want to make sure that func() is always called and that asssertion fails in debug mode if func() returns wrong value. But recently I was warned that my code doesn't guarantee that function is always called.
If the compiler cannot prove that optimizing away the call to func does not change the observable behavior of your program, it is not allowed to make the optimization.
So unless the compiler can prove that not calling the function has no observable effect, the call will take place. Note that compilers can be smart sometimes, so if you want to be sure, make sure the function actually does have a side effect. (On the other hand, if it doesn't, you need not care.)
This is known as the as-if rule.
(This is a C++ answer. Please post a question for one programming language only, not two.)
No, a function that may have side effects cannot be optimised out, because then you may be "optimising out" side effects. And since by "side effects" we really mean "the things that your program does", a compiler permitted to do such a thing would not be particularly useful. That's why the standard's "as-if" rule prevents the sort of optimisation you're talking about.
Lets say I have a function where the parameter is passed by value instead of const-reference. Further, lets assume that only the value is used inside the function i.e. the function doesn't try to modify it. In that case will the compiler will be able to figure out that it can pass the value by const-reference (for performance reasons) and generate the code accordingly? Is there any compiler which does that?
If you pass a variable instead of a temporary, the compiler is not allowed to optimize away the copy if the copy constructor of it does anything you would notice when running the program ("observable behavior": inputs/outputs, or changing volatile variables).
Apart from that, the compiler is free to do everything it wants (it only needs to resemble the observable behavior as-if it wouldn't have optimized at all).
Only when the argument is an rvalue (most temporary), the compiler is allowed to optimize the copy to the by-value parameter even if the copy constructor has observable side effects.
Only if the function is not exported there is a chance the compiler to convert call-by-reference to call-by-value (or vise-versa).
Otherwise, due to the calling convention, the function must keep the call-by-value/reference semantic.
I'm not aware of any general guarantees that this will be done, but if the called function is inlined, then this would then allow the compiler to see that an unnecessary copy is being made, and if the optimization level is high enough, the copy operation would be eliminated. GCC can do this at least.
You might want to think about whether the class of this parameter value has a copy constructor or not. If it doesn't, then the performance difference between pass-by-value and pass-by-const-ref is probably neglible.
On the other hand, if class does have a copy constructor that does stuff, then the optimization you are hoping for probably will not happen because the compiler cannot remove the call to the constructor--it cannot know that the side effects of the constructor are not important to you.
You might be able to get more useful answers if you say what the class of the parameter is, or if it is a custom class, describe what fields it has and whether it has a copy constructor.
With all optimisations the answer is generally "maybe". The only way to check is to examine the output assembly and see what it's really doing. If the standard allows it, whether or not it really happens is down to the whims of the compiler. You should not rely on it happening because an arbitrary change elsewhere in your codebase may change the heuristics used by the optimizer which might cause it to stop performing a certain optimization.
Play it safe: code it how you intend - pass by reference if that's what you want. However, if you're writing templated code which could work on types of any size, the choice is not so clear. Personally I'd side with passing by const reference - the compiler could also perform a different optimisation, where a small type which can fit inside the size of a reference is passed by value, rather than by const reference. But again, it might happen, it might not.
This post is an excellent reference to this kind of optimization:
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
If I have a function that returns an object, but this return value is never used by the caller, will the compiler optimize away the copy? (Possibly an always/sometimes/never answer.)
Elementary example:
ReturnValue MyClass::FunctionThatAltersMembersAndNeverFails()
{
//Do stuff to members of MyClass that never fails
return successfulResultObject;
}
void MyClass::DoWork()
{
// Do some stuff
FunctionThatAltersMembersAndNeverFails();
// Do more stuff
}
In this case, will the ReturnValue object get copied at all? Does it even get constructed? (I know it probably depends on the compiler, but let's narrow this discussion down to the popular modern ones.)
EDIT: Let's simplify this a bit, since there doesn't seem to be a consensus in the general case. What if ReturnValue is an int, and we return 0 instead of successfulResultObject?
If the ReturnValue class has a non-trivial copy constructor, the compiler must not eliminate the call to the copy constructor - it is mandated by the language that it is invoked.
If the copy constructor is inline, the compiler might be able to inline the call, which in turn might cause a elimination of much of its code (also depending on whether FunctionThatAltersMembersAndNeverFails is inline).
They most likely will if the optimization level causes them to inline the code. If not, they would have to generate two different translations of the same code to make it work, which could open up a lot of edge case problems.
The linker can take care of this sort of thing, even if the original caller and called are in different compilation units.
If you have a good reason to be concerned about the CPU load dedicated to a method call (premature optimization is the root of all evil,) you might consider the many inlining options available to you, including (gasp!) a macro.
Do you REALLY need to optimize at this level?
If return value is an int and you return 0 (as in the edited question), then this may get optimized away.
You have to look at the underlying assembly. If the function is not inlined then the underlying assembly will execute a mov eax, 0 (or xor eax, eax) to set eax (which is usually used for integer return values) to 0. If the function is inlined, this will certainly get optimized away.
But this senario isn't too useful if you're worried about what happens when you return objects larger than 32-bits. You'll need to refer to the answers to the unedit question, which paint a pretty good picture: If everything is inlined then most of it will be optimized out. If it is not inlined, then the functions must be called even if they don't really do anything, and that includes the constructor of an object (since the compiler doesn't know whether the constructor modified global variables or did something else weird).
I doubt most compilers could do that if they were in different compilation objects (ie. different files). Maybe if they were both in the same file, they could.
There is a pretty good chance that a peephole optimizer will catch this. Many (most?) compilers implement one, so the answer is probably "yes".
As others have notes this is not a trivial question at the AST rewriting level.
Peephole optimizers work on a representation of the code at a level equivalent to assembly language (but before generation of actual machine code). There is a chance to notice the load of the return value into a register followed by a overwrite with no intermediate read, and just remove the load. This is done on a case by case basis.
just tried this example on compiler explorer, and at -O3 the mov is not generated when the return value is not used.
https://gcc.godbolt.org/z/v5WGPr