Can an "unused" object be optimized away? - c++

In code like this:
void foo() {
SomeObject obj;
}
one might argue that obj is "unused" and therefore can be optimized away, just like an unused local int might be. That seems like an error to me though, because unlike with an int, there could be important side effects of the SomeObject constructor. So, I am wondering, does the language explicitly require that such local variables not be optimized away? Or does a programmer have to take precautions to prevent such optimization?

If the compiler has the definition of the SomeObject::SomeObject() constructor and the SomeObject destructor available (i.e. if they're defined inline) and can see there are no side effects, then yes, this can be optimised out (provided you don't do anything else with obj that requires it to be fully constructed.)
Otherwise, if the constructor is defined in another translation unit, then the compiler can't know that there are no side effects, so the call will be made (and the destructor too, if that's not inline).
In general, the compiler is at liberty to perform any optimisation that doesn't alter the semantics of the program. In this case, removing an unused local variable whose constructor and destructor do not touch any other code won't alter the meaning of your program, so it's perfectly safe to do.

First, let's correct the example:
void foo() {
SomeObject obj; // not obj()
}
Second, 'as-if' rule applies to optimizers. Thus, it might optimize out the entire object, however, all side effect(s) of constructor(s) / destructor(s), including base class(es) must show up. This means that it's possible that you end up not using additional memory (as long as you don't take the address of obj), but your constructor(s) / destructor(s) will still run.

Yes. Modern compilers are pretty good at removing dead code (assuming you build with optimizations enabled). That includes unused objects - if the constructor and destructor does not have side effects and the compiler can see that (as in; it's not hidden away in a library).

Related

Why can't the C++ compiler elide the move when moving a POD into an optional with RVO?

Consider the following code (godbolt):
#include <optional>
#include <array>
struct LargeType {
std::array<int, 256> largeContents;
};
LargeType doSomething();
std::optional<LargeType> wrapIntoOptional(){
return std::optional<LargeType> {doSomething()};
}
As you see, there is a function returning a large POD and then a function wrapping it into a std::optional. As visible in godbolt, the compiler creates a memcpy here, so it cannot fully elide moving the object. Why is this?
If I understand it correctly, the C++ language would allow eliding the move due to the as-if rule, as there are no visible side effects to it. So it seems that the compiler really cannot avoid it. But why?
My (probably incorrect) understanding how the compiler could optimize the memcpy out is to hand a reference to the storage inside the optional to doSomething() (as I guess such large objects get passed by hidden reference anyway). The optional itself would already lie on the stack of the caller of wrapIntoOptional due to RVO. As the definition of the constructor of std::optional is in the header, it is available to the compiler, so it should be able to inline it, so it can hand that storage location to doSomething in the first place. So what's wrong about my intuition here?
To clarify: I don't argue that the C++ language requires the compiler to inline this. I just thought it would be a reasonable optimization and given that wrapping things into optionals is a common operation, it would be an optimization that is implemented in modern compilers.
It is impossible to elide any copy/move through a constructor call into storage managed by the constructed object.
The constructor takes the object as a reference. In order to bind the reference to something there must be an object, so the prvalue from doSomething() must be materialized into a temporary object to bind to the reference and the constructor then must copy/move from that temporary into its own storage.
It is impossible to elide through function parameters. That would require knowing the implementation of the function and the way C++ is specified it is possible to compile each function only knowing the declarations of other functions (aside from constant expression evaluation). This would break that or require a new type of annotation in the declaration.
None of this prevents the compiler from optimizing in a way that doesn't affect the observable behavior though. If your compiler is not figuring out that the extra copy can be avoided and has no observable side effects when seeing all relevant function/constructor definitions, then that's something you could complain to your compiler vendor about. The concept of copy elision is about allowing the compiler to optimize away a copy/move even though it would have had observable side effects.
You can add noexcept to elide copy:
https://godbolt.org/z/rrGEfrdzc

Can functions be optimized away if they have side effects?

I want to initialize some static data on the main thread.
int32_t GetFoo(ptime t)
{
static HugeBarData data;
return data.Baz(t);
}
int main()
{
GetFoo(); // Avoid data race on static field.
// But will it be optimized away as unnecessary?
// Spawn threads. Call 'GetFoo' on the threads.
}
If the complier may decide to remove it, how can I force it to stay there?
The only side-effecting functions that a C++ compiler can optimize away are unnecessary constructor calls, particularly copy constructors.
Cf Under what conditions does C++ optimize out constructor calls?
Compilers must optimize according to the "as-if" rule. That is, after any optimization, the program must still behave (in the logical sense) as if the code were not optimized.
If there are side-effects to a function, any optimization must preserve the side effects. However, if the compiler can determine that the result of the side-effects don't affect the rest of the program, it can optimize away even the side-effects. Compilers are very conservative about this area. If your compiler optimizes away side-effects of the HugeBarData constructor or Baz call, which are required elsewhere in the program, this is a bug in the compiler.
There are some exceptions where the compiler can make optimizations which alter the behaviour of the program from the non-optimized case, usually involving copies. I don't think any of those exceptions apply here.

Copy constructor is not always called when passing or returning variables

In this answer, it's been mentioned that copy constructor is not necessarily called when passing variables by value into functions or as return values out of functions. Can someone explain when this happens and why? Also how does compiler manage to return the result in such cases?
As said, that is the Return value optimization and Copy elision.
This can happen on passing when the object is newly created and then copied. In this case, the compiler is allowed to optimize that so that the new object is directly created at the right place and no copying is needed (and also the copy constructor will not be called).
For example:
struct A {};
void test(A a) {}
int main() {
test(A()); // probably there will be no copy here
}
For returning, it is similar. You create a new object and then if you return it, that would involve a copy but the compiler is allowed to optimize that copy away (and thus also the call to the copy constructor).
For example:
A returnANewA() {
return A(); // copying would take place here
}
int main() {
A a = returnANewA(); // the compiler is allowed to do that without copying
}
How does the compiler do this: Depending on the calling convention, it knows where the return value must be stored on the stack. In other cases, it of course helps the compiler if it knows the function code. But all that depends on the architecture (x86 or others) and compiler (GCC, Microsoft, or others). The standard just says that the compiler is allowed to omit the call to the copy-constructor.
If you are interested in some platform-dependent details about calling conventions, here are some links. Note however that these details don't really matter. All you have to know is that the compiler is allowed to optimize the copy-constructor-call away (and in most cases will do so).
Wikipedia x86 calling conventions
Apple Developer docs: IA-32 Function Calling Conventions
Question on SO: Calling convention for function returning struct

C++: Calling a constructor to a temporary object

Suppose I have the following:
int main() {
SomeClass();
return 0;
}
Without optimization, the SomeClass() constructor will be called, and then its destructor will be called, and the object will be no more.
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
I suppose the obvious way to go about this is not to use some constructor/destructor function (e.g use a function, or a static method or so), but is there a way to ensure the calling of the constructors/destructors?
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
The bolded part is wrong. That should be: knows there is no observable behaviour
E.g. from § 1.9 of the latest standard (there are more relevant quotes):
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).
As a matter of fact, this whole mechanism underpins the sinlge most ubiquitous C++ language idiom: Resource Acquisition Is Initialization
Backgrounder
Having the compiler optimize away the trivial case-constructors is extremely helpful. It is what allows iterators to compile down to exactly the same performance code as using raw pointer/indexers.
It is also what allows a function object to compile down to the exact same code as inlining the function body.
It is what makes C++11 lambdas perfectly optimal for simple use cases:
factorial = std::accumulate(begin, end, [] (int a,int b) { return a*b; });
The lambda compiles down to a functor object similar to
struct lambda_1
{
int operator()(int a, int b) const
{ return a*b; }
};
The compiler sees that the constructor/destructor can be elided and the function body get's inlined. The end result is optimal 1
More (un)observable behaviour
The standard contains a very entertaining example to the contrary, to spark your imagination.
§ 20.7.2.2.3
[ Note: The use count updates caused by the temporary object construction and destruction are not
observable side effects, so the implementation may meet the effects (and the implied guarantees) via
different means, without creating a temporary. In particular, in the example:
shared_ptr<int> p(new int);
shared_ptr<void> q(p);
p = p;
q = p;
both assignments may be no-ops. —end note ]
IOW: Don't underestimate the power of optimizing compilers. This in no way means that language guarantees are to be thrown out of the window!
1 Though there could be faster algorithms to get a factorial, depending on the problem domain :)
I'm sure is 'SomeClass::SomeClass()' is not implemented as 'inline', the compiler has no way of knowing that the constructor/destructor has no side effects, and it will call the constructor/destructor always.
If the compiler is optimizing away a visible effect of the constructor/destructor call, it is buggy. If it has no visible effect, then you shouldn't notice it anyway.
However let's assume that somehow your constructor or destructor does have a visible effect (so construction and subsequent destruction of that object isn't effectively a no-op) in such a way that the compiler could legitimately think it wouldn't (not that I can think of such a situation, but then, it might be just a lack of imagination on my side). Then any of the following strategies should work:
Make sure that the compiler cannot see the definition of the constructor and/or destructor. If the compiler doesn't know what the constructor/destructor does, it cannot assume it does not have an effect. Note, however, that this also disables inlining. If your compiler does not do cross-module optimization, just putting the constructor/destructor into a different file should suffice.
Make sure that your constructor/destructor actually does have observable behaviour, e.g. through use of volatile variables (every read or write of a volatile variable is considered observable behaviour in C++).
However let me stress again that it's very unlikely that you have to do anything, unless your compiler is horribly buggy (in which case I'd strongly advice you to change the compiler :-)).

C++ ctors: What's the point of using initializer list in a .cpp file?

C++ has this funky quirk of supporting initializer lists for a ctor, such as:
class Foo
{
public:
Foo(int x) : m_x(x) { }
private:
SomeComplexObjectThatTakesAnIntForConstruction m_x;
}
Makes sense so far. More efficient because the member is only initialized once, rather than being default-constructed, and then operator= assigned a value later.
But I commonly come across programmers who put the ctor in their .cpp file, where I can hardly believe it actually has the intended (efficient) effect of actually using the initializer list correctly:
// Foo.cpp
Foo::Foo(int x) : m_x(x)
{
// complex set of things needed to be done, or perhaps dependency-inducing references here...
}
As I understand things, the above won't necessarily generate a single construction for m_x, because the initializer-list is not visible outside of this translation unit, and will result in construction + assignment, no?
// user.cpp
Foo my_foo(9); // how can the ctor for m_x be effectively inlined here?
Or have I misunderstood how initializer-lists function?
Thanks for your help with this ;)
I have chosen to split the initializer-list and body of the construction into two pieces, such as:
class Foo
{
public:
Foo(int x) : m_x(x) { Initialize(); }
private:
void Initialize(); // defined in our .cpp thus isolating dependencies and creating a common call-point for multiple ctors (if present)
SomeComplexObjectThatTakesAnIntForConstruction m_x;
}
You have misunderstood.
The initializer list doesn't need to be visible from other translation units the same way that the constructor body doesn't need to be visible from other translation units. It affects the code which is generated for the constructor itself, not the code which is generated to call the constructor.
Maybe this will clear up the confusion:
Inlining is one particular optimization. It is not the only type of optimization possible. Modern C++ compilers are capable of performing all sorts of other optimizations (loop unrolling, reordering of statements when they don't affect the program's behavior, etc).
The "short cut" or "efficiency gain" that inlining gives you is the elimination of the need to create a new frame on the call stack. Typically, the code generated for a function call looks something like this, where lines prefixed by -- are part of the called function (assuming the C calling convention).
Push the arguments on to the stack
Push the current code address onto the stack
Jump to the address of the function
-- Move the stack pointer forward to create space for local variables
-- Execute the body of the function
-- Move the stack pointer back to remove the local variables
-- Pop the caller's address from the stack and jump to it
Pop the arguments from the stack
If the function is inlined, this becomes just the first three steps performed by the called function:
-- Move the stack pointer forward to create space for local variables
-- Execute the body of the function
-- Move the stack pointer back to remove the local variables
This optimization relies on the ability of the compiler and/or linker to change where code is generated, not what code is generated.
In contrast, the initializer list affects what code is generated, not where it is generated. The compiler can still generate calls to non-default constructors for member variables whether it is doing it directly at the call site or in a separate section of of the program code that the call will jump to.
Initializer lists work fine when implemented in .cpp files - what makes you believe they wouldn't?
An initializer list is still part of the constructor 'call'. It's just a syntax that formalizes how construction of class members will take place (note for novices - it doesn't direct or influence the order of class member construction, but it allows parameters to be passed to the member's constructors). This makes possible the simple rule that when the first statement after the opening brace is reached, all class members have been through their construction, but it doesn't mean that the initializer list needs to occur before the constructor is called.
To address Mordachai's comment:
Having the init list in the header vs. in the .cpp file would affect the 'inline-ability' of the constructor (or the initialization list, if you're deferring the main work of the constructor to a function call in the inlined ctor). However, that's true of any in-header implementation of a member function vs. an implementation in a .cpp file.
I suspect that for most ctors, performance concerns will be due to resource allocation -if they aren't acquiring a resource, they're probably not going to have perf issues - and that's going to take the same amount of time whether the ctor is inlined or not. Note that this still means that init lists are important, whether they're inlined or not, because they prevent the situation (that you mentioned in your question) where:
a member object is default initialized (possibly acquiring a resource in an expensive operation)
re-initializing the member object (which may result in releasing the resource, then acquiring a new resource)
Since resource acquisition/release is typically expensive (whether that resource is memory, a network connection, opening a file) compared to many other things, this is an important anti-pattern to avoid. However, the performance difference between whether these resource acquisitions are inlined or not is probably not significant in most cases, I'd think.
Of course, there are also correctness issues that are addressed by the initialization list. For example, since const members can't be modified, they must be initialized in an initialization list.
The purpose of the initializer-list is not simply a matter of efficiency.
Aside from the cases where a member must be initialized there because there is no way to do otherwise (references, const-members, class members with no default constructor), it is generally "preferred" in the same way you initialise variables when you first declare them.
There are occasions where it is better to use the constructor body to set variables to their correct values, for example if you have two pointers that will point to objects created with new, and you are scared the second new may throw. In this case you should still "initialize" them - to NULL - then create them in the body, the first one inside an auto_ptr just in case (which you release after the second one works).
The purpose of moving the constructor body into the compilation unit is to hide the implementation detail from the interface. This is generally preferred for maintainability which a lot of time is hugely more important than a minor amount of runtime efficiency that saves microseconds.
I think initializer lists are there not for efficiency, but for semantics.
For one, they are a chance to initialize members before the superclass's constructor gets called, which could call virtual member functions, coming back invisibly into the lower-level class, before the lower-level constructor had finished.
For another, they are a way of guaranteeing that certain fields are initialized to non-garbage values, unlike assignment statements inside the constructor code that swim in a syntax of if-statements, loops, etc, that the compiler can't be sure will get executed.
For another, it lets you declare the class as const while still allowing you to initialize it. (But I'm not positive about than one.)