Suppose I have the following:
int main() {
SomeClass();
return 0;
}
Without optimization, the SomeClass() constructor will be called, and then its destructor will be called, and the object will be no more.
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
I suppose the obvious way to go about this is not to use some constructor/destructor function (e.g use a function, or a static method or so), but is there a way to ensure the calling of the constructors/destructors?
However, according to an IRC channel that constructor/destructor call may be optimized away if the compiler thinks there's no side effect to the SomeClass constructors/destructors.
The bolded part is wrong. That should be: knows there is no observable behaviour
E.g. from § 1.9 of the latest standard (there are more relevant quotes):
A conforming implementation executing a well-formed program shall produce the same observable behavior
as one of the possible executions of the corresponding instance of the abstract machine with the same program
and the same input. However, if any such execution contains an undefined operation, this International
Standard places no requirement on the implementation executing that program with that input (not even
with regard to operations preceding the first undefined operation).
As a matter of fact, this whole mechanism underpins the sinlge most ubiquitous C++ language idiom: Resource Acquisition Is Initialization
Backgrounder
Having the compiler optimize away the trivial case-constructors is extremely helpful. It is what allows iterators to compile down to exactly the same performance code as using raw pointer/indexers.
It is also what allows a function object to compile down to the exact same code as inlining the function body.
It is what makes C++11 lambdas perfectly optimal for simple use cases:
factorial = std::accumulate(begin, end, [] (int a,int b) { return a*b; });
The lambda compiles down to a functor object similar to
struct lambda_1
{
int operator()(int a, int b) const
{ return a*b; }
};
The compiler sees that the constructor/destructor can be elided and the function body get's inlined. The end result is optimal 1
More (un)observable behaviour
The standard contains a very entertaining example to the contrary, to spark your imagination.
§ 20.7.2.2.3
[ Note: The use count updates caused by the temporary object construction and destruction are not
observable side effects, so the implementation may meet the effects (and the implied guarantees) via
different means, without creating a temporary. In particular, in the example:
shared_ptr<int> p(new int);
shared_ptr<void> q(p);
p = p;
q = p;
both assignments may be no-ops. —end note ]
IOW: Don't underestimate the power of optimizing compilers. This in no way means that language guarantees are to be thrown out of the window!
1 Though there could be faster algorithms to get a factorial, depending on the problem domain :)
I'm sure is 'SomeClass::SomeClass()' is not implemented as 'inline', the compiler has no way of knowing that the constructor/destructor has no side effects, and it will call the constructor/destructor always.
If the compiler is optimizing away a visible effect of the constructor/destructor call, it is buggy. If it has no visible effect, then you shouldn't notice it anyway.
However let's assume that somehow your constructor or destructor does have a visible effect (so construction and subsequent destruction of that object isn't effectively a no-op) in such a way that the compiler could legitimately think it wouldn't (not that I can think of such a situation, but then, it might be just a lack of imagination on my side). Then any of the following strategies should work:
Make sure that the compiler cannot see the definition of the constructor and/or destructor. If the compiler doesn't know what the constructor/destructor does, it cannot assume it does not have an effect. Note, however, that this also disables inlining. If your compiler does not do cross-module optimization, just putting the constructor/destructor into a different file should suffice.
Make sure that your constructor/destructor actually does have observable behaviour, e.g. through use of volatile variables (every read or write of a volatile variable is considered observable behaviour in C++).
However let me stress again that it's very unlikely that you have to do anything, unless your compiler is horribly buggy (in which case I'd strongly advice you to change the compiler :-)).
Related
Why won't the compiler automatically deduce that a variable is about to go out of scope, and therefore let it be considered an rvalue-reference?
Take for example this code:
#include <string>
int foo(std::string && bob);
int foo(const std::string & bob);
int main()
{
std::string bob(" ");
return foo(bob);
}
Inspecting the assembly code clearly shows that the const & version of "foo" is called at the end of the function.
Compiler Explorer link here: https://godbolt.org/g/mVi9y6
Edit: To clarify, I'm not looking for suggestions for alternative ways to move the variable. Nor am I trying to understand why the compiler chooses the const& version of foo. Those are things that I understand fine.
I'm interested in knowing of a counter example where the compiler converting the last usage of a variable before it goes out of scope into an rvalue-reference would introduce a serious bug into the resulting code. I'm unable to think of code that breaks if a compiler implements this "optimization".
If there's no code that breaks when the compiler automatically makes the last usage of a variable about to go out of scope an rvalue-reference, then why wouldn't compilers implement that as an optimization?
My assumption is that there is some code that would break where compilers to implement that "optimization", and I'd like to know what that code looks like.
The code that I detail above is an example of code that I believe would benefit from an optimization like this.
The order of evaluation for function arguments, such as operator+(foo(bob), foo(bob)) is implementation defined. As such, code such as
return foo(bob) + foo(std::move(bob));
is dangerous, because the compiler that you're using may evaluate the right hand side of the + operator first. That would result in the string bob potentially being moved from, and leaving it in a valid, but indeterminate state. Subsequently, foo(bob) would be called with the resulting, modified string.
On another implementation, the non-move version might be evaluated first, and the code would behave the way a non-expert would expect.
If we make the assumption that some future version of the c++ standard implements an optimization that allows for the compiler to treat the last usage of a variable as an rvalue reference, then
return foo(bob) + foo(bob);
would work with no surprises (assuming appropriate implementations of foo, anyway).
Such a compiler, no matter what order of evaluation it uses for function arguments, would always evaluate the second (and thus last) usage of bob in this context as an rvalue-reference, whether that was the left hand side, or right hand side of the operator+.
Here's a piece of perfectly valid existing code that would be broken by your change:
// launch a thread that does the calculation, moving v to the thread, and
// returns a future for the result
std::future<Foo> run_some_async_calculation_on_vector(std::pmr::vector<int> v);
std::future<Foo> run_some_async_calculation() {
char buffer[2000];
std::pmr::monotonic_buffer_resource rsrc(buffer, 2000);
std::pmr::vector<int> vec(&rsrc);
// fill vec
return run_some_async_calculation_on_vector(vec);
}
Move constructing a container always propagates its allocator, but copy constructing one doesn't have to, and polymorphic_allocator is an allocator that doesn't propagate on container copy construction. Instead, it always reverts to the default memory resource.
This code is safe with copying because run_some_async_calculation_on_vector receives a copy allocated from the default memory resource (which hopefully persists throughout the thread's lifetime), but is completely broken by a move, because then it would have kept rsrc as the memory resource, which will disappear once run_some_async_calculation returns.
The answer to your question is because the standard says it's not allowed to. The compiler can only do that optimization under the as if rule. String has a large constructor and so the compiler isn't going to do the verification it would need to.
To build on this point a bit: all that it takes to write code that "breaks" under this optimization is to have the two different versions of foo print different things. That's it. The compiler produces a program that prints something different than the standard says that it should. That's a compiler bug. Note that RVO does not fall under this category because it is specifically addressed by the standard.
It might make more sense to ask why the standard doesn't say so, e.g.why not extend the rule governing returning at the end of a function, which is implicitly treated as an rvalue. The answer is most likely because it rapidly becomes complicated to define correct behavior. What do you do if the last line were return foo(bob) + foo(bob)? And so on.
Because the fact it will go out of scope does not make it not-an-lvalue while it is in scope. So the - very reasonable - assumption is that the programmer wants the second version of foo() for it. And the standard mandates this behavior AFAIK.
So just write:
int main()
{
std::string bob(" ");
return foo(std::move(bob));
}
... however, it's possible that the compiler will be able to optimize the code further if it can inline foo(), to get about the same effect as your rvalue-reference version. Maybe.
Why won't the compiler automatically deduce that a variable is about to go out of scope, and can therefore be considered an rvalue-reference?
At the time the function gets called, the variable is still in scope. If the compiler changes the logic of which function is a better fit based on what happens after the function call, it will be violating the standard.
In code like this:
void foo() {
SomeObject obj;
}
one might argue that obj is "unused" and therefore can be optimized away, just like an unused local int might be. That seems like an error to me though, because unlike with an int, there could be important side effects of the SomeObject constructor. So, I am wondering, does the language explicitly require that such local variables not be optimized away? Or does a programmer have to take precautions to prevent such optimization?
If the compiler has the definition of the SomeObject::SomeObject() constructor and the SomeObject destructor available (i.e. if they're defined inline) and can see there are no side effects, then yes, this can be optimised out (provided you don't do anything else with obj that requires it to be fully constructed.)
Otherwise, if the constructor is defined in another translation unit, then the compiler can't know that there are no side effects, so the call will be made (and the destructor too, if that's not inline).
In general, the compiler is at liberty to perform any optimisation that doesn't alter the semantics of the program. In this case, removing an unused local variable whose constructor and destructor do not touch any other code won't alter the meaning of your program, so it's perfectly safe to do.
First, let's correct the example:
void foo() {
SomeObject obj; // not obj()
}
Second, 'as-if' rule applies to optimizers. Thus, it might optimize out the entire object, however, all side effect(s) of constructor(s) / destructor(s), including base class(es) must show up. This means that it's possible that you end up not using additional memory (as long as you don't take the address of obj), but your constructor(s) / destructor(s) will still run.
Yes. Modern compilers are pretty good at removing dead code (assuming you build with optimizations enabled). That includes unused objects - if the constructor and destructor does not have side effects and the compiler can see that (as in; it's not hidden away in a library).
Note: No multithreading at all here. Just optimized single-threaded code.
A function call introduces a sequence point. (Apparently.)
Does it follow that a compiler (if the optimizer inlines the function) is not allowed to move/intermingle any instructions prior/after with the function's instructions? (As long as it can "proove" no observable effects obviously.)
Explanatory background:
Now, there is a nice article wrt. a benchmarking class for C++, where the author stated:
The code we time won’t be rearranged by the optimizer and will always
lie between those start / end calls to now(), so we can guarantee our
timing will be valid.
to which I asked how he can be sure, and nick replied:
You can check the comment in this answer
https://codereview.stackexchange.com/a/48884. I quote : “I would be
careful about timing things that are not functions because of
optimizations that the compiler is allowed to do. I am not sure about
the sequencing requirements and the observable behavior understanding
of such a program. With a function call the compiler is not allowed to
move statements across the call point (they are sequenced before or
after the call).”
What we do is basically abstract the callable (function, lambda, block
of code surrounded by lambda) and have a signle call
callable(factor) inside the measure structure that acts as a
barrier (not the barrier in multithreading, I believe I convey the
message).
I am quite unsure about this, especially the quote:
With a function call the compiler is not allowed to
move statements across the call point (they are sequenced before or
after the call).
Now, I was always under the impression that when an optimizer inlines some function (which may very well be the case in a (simple) benchmark scenario), it is free to rearrange whatever it likes as long as it does not affect observable behavior.
That is, as far as the language / the optimizer are concerned, these two snippets are exactly the same:
void f() {
// do stuff / Multiple statements
}
auto start = ...;
f();
auto stop = ...;
vs.
auto start = ...;
// do stuff / Multiple statements
auto stop = ...;
Now, I was always under the impression that when an optimizer inlines
some function (which may very well be the case in a (simple) benchmark
scenario), it is free to rearrange whatever it likes as long as it
does not affect observable behavior.
It absolutely is. The optimizer doesn't even need to inline it for this to occur in theory.
However, timing functions are observable behaviour- specifically, they are I/O on the part of the system. The optimizer cannot know that that I/O will produce the same outcome (it obviously won't) if performed in a different order to other I/O calls, which can include non-obvious things like even memory allocation calls that can invoke syscalls to get their memory.
What this basically means is that by and large, for most function calls, the optimizer can't do a great deal of re-arranging because there's potentially a vast quantity of state involved that it can't reason about.
Furthermore, the optimizer can't really know that re-arranging your function calls will actually make the code run faster, and it will make debugging it harder, so they don't have a great deal of incentive to go screwing around with the program's stated order.
Basically, in theory the optimizer can do this, but in reality it won't because doing so would be a massive undertaking for not a lot of benefit.
You'll only encounter conditions like this if your benchmark is fairly trivial or consists virtually entirely of primitive operations like integer addition- in which case you'll want to check the assembly anyway.
Your concern is perfectly valid, the optimizer is allowed to move anything past a function call if it can prove that this does not change observable behavior (other than runtime, that is).
The point about using a function to stop the optimizer from doing things is not to tell the optimizer about the function. That is, the function must not be inlined, and it must not be included in the same compilation unit. Since optimizers are generally a compiler feature, moving the function definition to a different compilation unit deprives the optimizer of the information necessary to prove anything about the function, and consequently stops it from moving anything across the function call.
Beware that this assumes that there is no linker doing global analysis for optimization. If it does, it can still skrew you.
What the comment you quoted has not considered is that sequence points are not primarily about order of execution (although they do constrain it, they don't act as full barriers), but rather about values of expressions.
C++11 actually gets rid of the "sequence point" terminology completely, and instead discussed ordering of "value computation" and "side effects".
To illustrate, the following code exhibits undefined behavior because it doesn't respect ordering:
int a = 5;
int x = a++ + a;
This version is well-defined:
int a = 5;
a++;
int x = a + a;
When the sequence point / ordering of side effects and value computations guarantees us, is that the a used in x = a + a is 6, not 5. So the compiler cannot rewrite it to:
int a = 5;
int x = a + a;
a++;
However, it's perfectly legal to rewrite it as:
int a = 5;
int x = (a+1) + (a+1);
a++;
The order of execution between assigning x and assigning a isn't constrained, because neither of them is volatile or atomic<T> and they aren't externally visible side effects.
The standard leaves definitively free room for the optimizer to sequence operations across the boundary of a function:
1.9/15 Every evaluation in the calling function (including other function calls) that is not otherwise specifically sequenced before or
after the execution of the body of the called function is
indeterminately sequenced with respect to the execution of the called
function.
as long as the as-if rule is respectd:
1.9/5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible
executions of the corresponding instance of the abstract machine with
the same program and the same input.
The practice of leaving the optimizer in the blind as suggested by cmaster is in general very effective. By the way, the global optimization issue at linking can also be circumvented using dynamic linking of the benchmarked function.
There is, however, another a hard sequencing constraint that can be used to achieve the same purpose, even within the same compilation unit:
1.9/15 When calling a function (whether or not the function is inline), every value computation and side effect associated with any
argument expression, or with the postfix expression designating the
called function, is sequenced before execution of every expression
or statement in the body of the called function.
So you may use safely an expression like:
my_timer_off(stop, f( my_timer_on(start) ) );
This "functional" writing ensures that:
my_timer_on() is evaluated before any statement of f() is executed,
f() is called before the body of my_timer_off() is executed
thus ensuring the sequence timer-on / f / timer-off (the my_timer_xx would take the start/stop by value).
Of course, this assumes that the signature of the benchmarked function f() can be changed to allow the expression above.
I want to initialize some static data on the main thread.
int32_t GetFoo(ptime t)
{
static HugeBarData data;
return data.Baz(t);
}
int main()
{
GetFoo(); // Avoid data race on static field.
// But will it be optimized away as unnecessary?
// Spawn threads. Call 'GetFoo' on the threads.
}
If the complier may decide to remove it, how can I force it to stay there?
The only side-effecting functions that a C++ compiler can optimize away are unnecessary constructor calls, particularly copy constructors.
Cf Under what conditions does C++ optimize out constructor calls?
Compilers must optimize according to the "as-if" rule. That is, after any optimization, the program must still behave (in the logical sense) as if the code were not optimized.
If there are side-effects to a function, any optimization must preserve the side effects. However, if the compiler can determine that the result of the side-effects don't affect the rest of the program, it can optimize away even the side-effects. Compilers are very conservative about this area. If your compiler optimizes away side-effects of the HugeBarData constructor or Baz call, which are required elsewhere in the program, this is a bug in the compiler.
There are some exceptions where the compiler can make optimizations which alter the behaviour of the program from the non-optimized case, usually involving copies. I don't think any of those exceptions apply here.
One of the goals of C++ is to allow user-defined types to behave as nicely as built-in types. One place where this seems to fail is in compiler optimization. If we assume that a const nonvolatile member function is the moral equivalent of a read (for a user-defined type), then why not allow a compiler to eliminate repeated calls to such a function? For example
class C {
...
public:
int get() const;
}
int main() {
C c;
int x{c.get()};
x = c.get(); // why not allow the compiler to eliminate this call
}
The argument for allowing this is the same as the argument for copy elision: while it changes the operational semantics, it should work for code that follows good semantic practice, and provides substantial improvement in efficiency/modularity. (In this example it is obviously silly, but it becomes quite valuable in, say, eliminating redundant iterative safety checks when functions are inlined.)
Of course it wouldn't make sense to allow this for functions that return non-const references, only for functions that return values or const references.
My question is whether there is a fundamental technical argument against this that doesn't equally apply to copy elision.
Note: just to be clear, I am not suggesting the compiler look inside of the definition of get(). I'm saying that the declaration of get() by itself should allow the compiler to elide the extra call. I'm not claiming that it preserves the as-if rule; I'm claiming that, just as in copy elision, this is a case where we want to allow the compiler to violate the as-if rule. If you are writing code where you want a side effect to be semantically visible, and don't want redundant calls to be eliminated, you shouldn't declare your method as const.
New answer based on clarification on the question
C::get would need a stronger annotation than const. As it stands today, the const is a promise that the method doesn't (conceptually) modify the object. It makes not guarantees about interaction with global state or side effects.
Thus if the new version of the C++ standard carved out another exception to the as-if rule, as it did for copy elision, based solely on the fact that a method is marked const, it would break a lot of existing code. The standards committee seems to try pretty hard not to break existing code.
(Copy elision probably broke some code, too, but I think it's actually a pretty narrow exception compared to what you're proposing.)
You might argue that we should re-specify what const means on a method declaration, giving it this stronger meaning. That would mean you could no longer have a C::print method that's const, so it seems this approach would also break a lot of existing code.
So we would have to invent a new annotation, say pure_function. To get that into the standard, you'd have to propose it and probably convince at least one compiler maker to implement it as an extension to illustrate that it's feasible and useful.
I suspect that the incremental utility is pretty low. If your C::get were trivial (no interaction with global state and no observable side effects), then you may as well define it in the class definition, thus making it available for inlining. I believe inlining would allow the compiler to generate code as optimal as a pure_function tag on the declaration (and maybe even more so), so I wouldn't expect the incremental benefit of a pure_function tag to be significant enough to convince the standards committee, compiler makers, and language users to adopt it.
Original answer
C::get could depend on global state and it might have observable side effects, either of which would make it a mistake to elide the second call. It would violate the as-if rule.
The question is whether the compiler knows this at the time it's optimizing at the call site. As your example is written, only the declaration of C::get is in scope. The definition is elsewhere, presumably in another compilation unit. Thus the compiler must assume the worst when it compiles and optimizes the calling code.
Now if the definition of C::get were both trivial and in view, then I suppose it's theoretically possible for the compiler to realize there are no side effects or non-deterministic behavior, but I doubt most optimizers get that aggressive. Unless C::get were inlined, I imagine there would be an exponential growth in the paths to analyze.
And if you want to skip the entire assignment statement (as opposed to just the second call of C::get), then the compiler would also have to examine the assignment operator for side effects and reliance on global state in order to ensure the optimization wouldn't violate the as-if rule.
First of all const-ness of methods (or of references) is totally irrelevant for the optimizer, because constness can be casted away legally (using const-cast) and because, in case of references, there could be aliasing. Const correctness has been designed to help programmers, not the optimizer (another issue is if it really helps or not, but that's a separate unrelated discussion).
Moreover to elide a call to a function the optimizer would also need to be sure that the result doesn't depend and doesn't influence global state.
Compilers sometimes have a way to declare that a function is "pure", i.e. that the result depends only on the arguments and doesn't influence global state (like sin(x)), but how you declare them is implementation dependent because the C++ standard doesn't cover this semantic concept.
Note also that the word const in const reference describes a property of the reference, not of the referenced object. Nothing is known about the const-ness of an object that you're given a const reference of and the object can indeed change or even go out of existence while you have the reference still in your hands. A const reference means simply that you cannot change the object using that reference, not that the object is constant or that it will be constant for a while.
For a description of why a const reference and a value are two very different semantic concepts and of the subtle bugs you can meet if you confuse them see this more detailed answer.
The first answer to your question from Adrian McCarthy was just about as clear as possible:
The const-ness of a member function is a promise that no modification of externally visible state will be made (baring mutable variables in an object instance, for example).
You would expect a const member function which just reported the internal state of an object to always return the same answer. However, it could also interact with the ever changing real world and return a different answer every time.
What if it is a function to return the current time?
Let us put this into an example.
This is a class which converts a timestamp (double) into a human readable string.
class time_str {
// time and its format
double time;
string time_format;
public:
void set_format(const string& time_format);
void set_time(double time);
string get_time() const;
string get_current_time() const;
};
And it is used (clumsily) like so:
time_str a;
a.set_format("hh:mm:ss");
a.set_time(89.432);
cout << a.get_time() << endl;
So far so good. Each invocation to a.get_time(); will return the same result.
However, at some point, we decide to introduce a convenience function which returns the current time in the same format:
cout << a.get_time() << " is different from " << a.get_current_time() << endl;
It is const because it doesn't change the state of the object in any way (though it accesses the time format). However, obviously each call to get_current_time() must return a different answer.