In this question Will a static variable always use up memory? it is stated that compilers are allowed to optimize away a static variable if the address is never taken, e.g. like following:
void f() {
static int i = 3;
printf( "%d", i );
}
If there exists a function which takes its arguments by reference, is the compiler still allowed to optimize away the variable, e.g. as in
void ref( int & i ) {
printf( "%d", i );
}
void f() {
static int i = 3;
g( i );
}
Is the situation different for the "perfect forwarding" case. Here the function body is empty on purpose:
template< typename T >
void fwd( T && i ) {
}
void f() {
static int i = 3;
fwd( i );
}
Furthermore, would the compiler be allowed to optimize the call in the following case. (The function body is empty on purpose again):
void ptr( int * i ) {
}
void f() {
static int i = 3;
ptr( &i );
}
My questions arise from the fact, that references are not a pointer by the standard - but implemented as one usually.
Apart from, "is the compiler allowed to?" I am actually more interested in whether compilers do this kind of optimization?
that compilers are allowed to optimize away a static variable if the address is never taken
You seem to concentrated on the wrong part of the answer. The answer states:
the compiler can do anything it wants to your code so long as the observable behavior is the same
The end. You can take the address, don't take it, calculate the meaning of life and calculate how to heal cancer, the only thing that matters is observable effect. As long as you don't actually heal cancer (or output the results of calculations...), all calculations are just no-op.
f there exists a function which takes its arguments by reference, is the compiler still allowed to optimize away the variable
Yes. The code is just putc('3').
Is the situation different for the "perfect forwarding" case
No. The code is still just putc('3').
would the compiler be allowed to optimize the call in the following case
Yes. This code has no observable effect, contrary to the previous ones. The call to f() can just be removed.
in whether compilers do this kind of optimization?
Copy your code to https://godbolt.org/ and inspect the assembly code. Even with no experience in assembly code, you will see differences with different code and compilers.
Choose x86 gcc (trunk) and remember to enable optimizations -O. Copy code with static, then remove static - did the code change? Repeat for all code snippets.
Compilers are allowed to optimize out variables under the "as-if" rule, meaning that the compiler is allowed to do any optimization that doesn't alter the observable behaviour of the program. Whether the optimization actually occurs depends on how good the compiler's optimizer is, what optimization level you request, and whether the optimization belongs to a class of optimizations that actually improve performance (humans are not very good at predicting this).
In all of the examples you gave, the as-if rule gives the compiler latitude to eliminate the static variable.
In example 1, the definition of f is equivalent to void f() { printf("%d", 3); }. Since this has the exact same observable behaviour as the f you wrote, the compiler is allowed to replace one by the other, optimizing out the variable.
In example 2, since fwd does nothing, the definition of f is equivalent to void f() {}. Again, the as-if rule allows the compiler to replace the f you wrote with this empty function.
Example 3 is very similar to example 2 in terms of the implications of the as-if rule.
If you want to see whether a compiler will actually perform these optimizations, Godbolt is very useful. For example, if you look here, you'll see that at -O2, both GCC and Clang will perform the optimization described for example 1. They probably do this by first inlining ref into f.
Related
If I define static instance of a class, is there optimization in compilers (particularly g++/clang) to omit base register (for thiscalls) when data members accessed directly or indirectly (I mean [base + index * scale + displacement] formula) and just use single displacement constant for all of them? All member functions may became static (in case of sole instance of the class it is reasonable).
I can't check this, because on godbolt.org compiler aggressively optimizes following code to xor eax, eax; ret:
struct A
{
int i;
void f()
{
++i;
}
};
static A a;
int main(int argc, char * argv[])
{
a.i = argc;
}
Short answer: Maybe.
Long answer: A modern compiler certainly has the ability to optimize away fetching the this pointer, and using complex addressing modes is definitely within the reach of all modern compilers that I'm aware of (including, but not limited to: gcc, clang and MS Visual C).
Whether a particular compiler chooses to do so on a specific construct is down to how well the compiler "understands" the code presented to it. As you've just experienced, the compiler removes all of your code, because it doesn't actually "do" anything. You're just assigning a member of a global struct, which is never again used, so the compiler can reason that "well, you never use it again, so I won't do that". Remove static, and it's plausible that the compiler can't know that it's not used elsewhere. Or print the value of a.i, or pass it to an external function that can't be inlined, etc, etc.
In your example, I would really just expect the compiler to store the value of argc into the address of a.i, and that can probably be done in two instructions, move argc from stack into a register, and move that register into the memory calculated for a.i - which is probably a constant address according to the compiler. So no fancy addressing modes needed in this case.
GCC can suggest functions for attribute pure and attribute const with the flags -Wsuggest-attribute=pure and -Wsuggest-attribute=const.
The GCC documentation says:
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.
But what can happen if you attach __attribute__((__pure__)) to a function that doesn't match the above description, and does have side effects? Is it simply the possibility that the function will be called fewer times than you would want it to be, or is it possible to create undefined behaviour or other kinds of serious problems?
Similarly for __attribute__((__const__)) which is stricter again - the documentation states:
Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
But what can actually happen if you attach __attribute__((__const__)) to a function that does access global memory?
I would prefer technical answers with explanations of actual possible scenarios within the scope of GCC / G++, rather than the usual "nasal demons" handwaving that appears whenever undefined behaviour gets mentioned.
But what can happen if you attach __attribute__((__pure__))
to a function that doesn't match the above description,
and does have side effects?
Exactly. Here's a short example:
extern __attribute__((pure)) int mypure(const char *p);
int call_pure() {
int x = mypure("Hello");
int y = mypure("Hello");
return x + y;
}
My version of GCC (4.8.4) is clever enough to remove second call to mypure (result is 2*mypure()). Now imagine if mypure were printf - the side effect of printing string "Hello" would be lost.
Note that if I replace call_pure with
char s[];
int call_pure() {
int x = mypure("Hello");
s[0] = 1;
int y = mypure("Hello");
return x + y;
}
both calls will be emitted (because assignment to s[0] may change output value of mypure).
Is it simply the possibility that the function will be called fewer times
than you would want it to be, or is it possible to create
undefined behaviour or other kinds of serious problems?
Well, it can cause UB indirectly. E.g. here
extern __attribute__((pure)) int get_index();
char a[];
int i;
void foo() {
i = get_index(); // Returns -1
a[get_index()]; // Returns 0
}
Compiler will most likely drop second call to get_index() and use the first returned value -1 which will result in buffer overflow (well, technically underflow).
But what can actually happen if you attach __attribute__((__const__))
to a function that does access global memory?
Let's again take the above example with
int call_pure() {
int x = mypure("Hello");
s[0] = 1;
int y = mypure("Hello");
return x + y;
}
If mypure were annotated with __attribute__((const)), compiler would again drop the second call and optimize return to 2*mypure(...). If mypure actually reads s, this will result in wrong result being produced.
EDIT
I know you asked to avoid hand-waving but here's some generic explanation. By default function call blocks a lot of optimizations inside compiler as it has to be treated as a black box which may have arbitrary side effects (modify any global variable, etc.). Annotating function with const or pure instead allows compiler to treat it more like expression which allows for more aggressive optimization.
Examples are really too numerous to give. The one which I gave above is common subexpression elimination but we could as well easily demonstrate benefits for loop invariants, dead code elimination, alias analysis, etc.
Can a compiler do automatic lvalue-to-rvalue conversion if it can prove that the lvalue won't be used again? Here's an example to clarify what I mean:
void Foo(vector<int> values) { ...}
void Bar() {
vector<int> my_values {1, 2, 3};
Foo(my_values); // may the compiler pretend I used std::move here?
}
If a std::move is added to the commented line, then the vector can be moved into Foo's parameter, rather than copied. However, as written, I didn't use std::move.
It's pretty easy to statically prove that my_values won't be used after the commented line. So s the compiler allowed to move the vector, or is it required to copy it?
The compiler is required to behave as-if the copy occurred from the vector to the call of Foo.
If the compiler can prove that there are is a valid abstract machine behavior with no observable side effects (within the abstract machine behavior, not in a real computer!) that involves moving the std::vector into Foo, it can do this.
In your above case, this (moving has no abstract machine visible side effects) is true; the compiler may not be able to prove it, however.
The possibly observable behavior when copying a std::vector<T> is:
Invoking copy constructors on the elements. Doing so with int cannot be observed
Invoking the default std::allocator<> at different times. This invokes ::new and ::delete (maybe1) In any case, ::new and ::delete has not been replaced in the above program, so you cannot observe this under the standard.
Calling the destructor of T more times on different objects. Not observable with int.
The vector being non-empty after the call to Foo. Nobody examines it, so it being empty is as-if it was not.
References or pointers or iterators to the elements of the exterior vector being different than those inside. No references, vectors or pointers are taken to the elements of the vector outside Foo.
While you may say "but what if the system is out of memory, and the vector is large, isn't that observable?":
The abstract machine does not have an "out of memory" condition, it simply has allocation sometimes failing (throwing std::bad_alloc) for non-constrained reasons. It not failing is a valid behavior of the abstract machine, and not failing by not allocating (actual) memory (on the actual computer) is also valid, so long as the non-existence of the memory has no observable side effects.
A slightly more toy case:
int main() {
int* x = new int[std::size_t(-1)];
delete[] x;
}
while this program clearly allocates way too much memory, the compiler is free to not allocate anything.
We can go further. Even:
int main() {
int* x = new int[std::size_t(-1)];
x[std::size_t(-2)] = 2;
std::cout << x[std::size_t(-2)] << '\n';
delete[] x;
}
can be turned into std::cout << 2 << '\n';. That large buffer must exist abstractly, but as long as your "real" program behaves as-if the abstract machine would, it doesn't actually have to allocate it.
Unfortunately, doing so at any reasonable scale is difficult. There are lots and lots of ways information can leak from a C++ program. So relying on such optimizations (even if they happen) is not going to end well.
1 There was some stuff about coalescing calls to new that might confuse the issue, I am uncertain if it would be legal to skip calls even if there was a replaced ::new.
An important fact is that there are situations that the compiler is not required to behave as-if there was a copy, even if std::move was not called.
When you return a local variable from a function in a line that looks like return X; and X is the identifier, and that local variable is of automatic storage duration (on the stack), the operation is implicitly a move, and the compiler (if it can) can elide the existence of the return value and the local variable into one object (and even omit the move).
The same is true when you construct an object from a temporary -- the operation is implicitly a move (as it is binding to an rvalue) and it can elide away the move completely.
In both these cases, the compiler is required to treat it as a move (not a copy), and it can elide the move.
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return x;
}
that x has no std::move, yet it is moved into the return value, and that operation can be elided (x and the return value can be turned into one object).
This:
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return std::move(x);
}
blocks elision, as does this:
std::vector<int> foo(std::vector<int> x) {
return x;
}
and we can even block the move:
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return (std::vector<int> const&)x;
}
or even:
std::vector<int> foo() {
std::vector<int> x = {1,2,3,4};
return 0,x;
}
as the rules for implicit move are intentionally fragile. (0,x is a use of the much maligned , operator).
Now, relying on implicit-move not occurring in cases like this last , based one is not advised: the standard committee has already changed an implicit-copy case to an implicit-move since implicit-move was added to the language because they deemed it harmless (where the function returns a type A with a A(B&&) ctor, and the return statement is return b; where b is of type B; at C++11 release that did a copy, now it does a move.) Further expansion of implicit-move cannot be ruled out: casting explicitly to a const& is probably the most reliable way to prevent it now and in the future.
In this case, the compiler could move out of my_values. This is because that causes no difference in observable behaviour.
Quoting the C++ standard's definition of observable behaviour:
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
Interpreting this slightly: "files" here includes the standard output stream, and for calls of functions that are not defined by the C++ Standard (e.g. operating system calls, or calls to third party libraries), it must be assumed that those functions might write to a file, so a corollary of this is that non-standard function calls must also be considered observable behaviour.
However your code (as you have shown it) has no volatile variables and no calls to non-standard functions. So the two versions (move or not-move) must have identical observable behaviour and therefore the compiler could do either (or even optimize the function out entirely, etc.)
In practice, of course, it's generally not so easy for a compiler to prove that no non-standard function calls occur, so many optimization opportunities like this are missed. For example, in this case the compiler may not yet know whether or not the default ::operator new has been replaced with a function that generates output.
This question is in different aspect (also limited to gcc). My question is meant only for unnamed objects. Return Value Optimization is allowed to change the observable behavior of the resulting program. This seems to be mentioned in standard also.
However, this "allowed to" term is confusing. Does it mean that RVO is guaranteed to happen on every compiler. Due to RVO below code changes it's observable behavior:
#include<iostream>
int global = 0;
struct A {
A(int *p) {}
A(const A &obj) { ++ global; }
};
A foo () { return A(0); } // <--- RVO happens
int main () {
A obj = foo();
std::cout<<"global = "<<global<<"\n"; // prints 0 instead of 2
}
Is this program suppose to print global = 0 for all implementations irrespective of compiler optimizations and method size of foo ?
According to the standard, the program can print 0, 1 or 2. The specific paragraph in C++11 is 12.8p31 that starts with:
When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects.
Note that both copy elisions are not an optimization that falls in the as-if rule (which requires the behavior of the program to be consistent with the behavior of the same program as-if no optimization had taken place). The standard explicitly allows the implementation to generate different observable behaviors, and it is up to you the programmer to have your program not depend on that (or accept all three possible outcomes).
Note 2: 1 is not mentioned in any of the answers, but it is a possible outcome. There are two potential copies taking place, from the local variable in the function to the returned object to the object in main, the compiler can elide none, one or the two copies generating all three possible outputs.
It cannot be guaranteed. If you tried to write such a guarantee coherently, you would find it impossible to do so.
For example, consider this code:
std::string f() {
std::string first("first");
std::string second("second");
return FunctionThatIsAlwaysFalse() ? first : second;
}
The function FunctionThatIsAlwaysFalse always returns false, but you can only tell that if you do inter-module optimizations. Should the standard require every single compiler to do inter-module optimization so that it can use RVO in this case? How would that work? Or should it prohibit any compiler from using RVO when inter-module optimizations are needed? How would that work? How can it stop compilers that are smart enough to see that RVO is possible from doing it and those that are not from not doing it?
Should the standard list every optimization compilers are required to support with RVO? And should it prohibit RVO in other cases? Wouldn't that kind of defeat the point of optimizing compilers?
And what about the cases where the compiler believes RVO will reduce performance? Should the compiler be required to do an optimization it believes is bad? For example:
if(FunctionCompilerKnowsHasNoSideEffectsAndThinksMostlyReturnsFalse())
return F(3); // Here RVO is a pessimization
else
{
Foo j=F(3);
return Foo(j);
}
Here, if the compiler is not required to do RTO, if can avoid the if and the function call, since without RTO, the code is the same in both halves. Should you force the compiler to do an optimization it thinks makes things worse? Why?
There's really no way to make such a guarantee work.
Pedantically speaking its implementation defined. Modern compilers are intelligent enough to do such kind of optimization.
But there is no guarantee that the behavior would be exactly same across implementations. That's what implementation defined behavior is all about.
"allowed to" in this context means that 0 or 1 or 2 are standard conforming outputs.
EDITED and refined my question after Johannes's valuable answer
bool b = true;
volatile bool vb = true;
void f1() { }
void f2() { b = false; }
void(* volatile pf)() = &f1; //a volatile pointer to function
int main()
{
//different threads start here, some of which may change pf
while(b && vb)
{
pf();
}
}
So, let's forget synchronization for a while. The question is whether b has to be declared volatile. I have read the standard and sort-of know the formal definition of volatile semantics (I even almost understand them, the word almost being the key). But let's be a bit informal here. If the compiler sees that in the loop there is no way for b to change then unless b is volatile, it can optimize it away and assume it is equivalent to while(vb). The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
Please refrain from comments and answers which address the style of this piece of code, this is not a real-world example, this is an experimental theoretical question.
Comments and answers which, apart from answering my question, also address the semantics of volatile in greater detail which you think I have misunderstood are very much welcome.
I hope my question is clear. TIA
Editing once more:
what about this?
bool b = true;
volatile bool vb = true;
void f1() {}
void f2() {b = false;}
void (*pf) () = &f1;
#include <iosrteam>
int main()
{
//threads here
while(b && vb)
{
int x;
std::cin >> x;
if(x == 0)
pf = &f1;
else
pf = &f2;
pf();
}
}
Is there a principal difference between the two programs. If yes, what is the difference?
The question is, in this case pf is itself volatile, so is the compiler allowed to assume that b won't change in the loop even if b is not volatile?
It can't, because you say that pf might be changed by the other threads, and this indirectly changes b if pf is called then by the while loop. So while it is theoretically not required to read b normally, it in practice must read it to determine whether it should short circuit (when b becomes false it must not read vb another time).
Answer to the second part
In this case pf is not volatile anymore, so the compiler can get rid of it and see that f1 has an empty body and f2 sets b to false. It could optimize main as follows
int main()
{
// threads here (which you say can only change "vb")
while(vb)
{
int x;
std::cin >> x;
if(x != 0)
break;
}
}
Answer to older revision
One condition for the compiler to be allowed to optimize the loop away is that the loop does not access or modify any volatile object (See [stmt.iter]p5 in n3126). You do that here, so it can't optimize the loop away. In C++03 a compiler wasn't allowed to optimize even the non-volatile version of that loop away (but compilers did it anyway).
Note that another condition for being able to optimize it away is that the loop contains no synchronization or atomic operations. In a multithreaded program, such should be present anyway though. So even if you get rid of that volatile, if your program is properly coded I don't think the compiler can optimize it away entirely.
The exact requirements on volatile in the current C++ standard in a case like this are, as I understand it, not entirely well-defined by the standard, since the standard doesn't really deal with multi-threading. It's basically a compiler hint. So, instead, I'll address what happens in a typical compiler.
First, suppose the compiler is compiling your functions independently, and then linking them together. In either example, you have a loop in which you're checking a variable, and calling a function pointer. Within the context of that function, the compiler has no idea what the function behind that function pointer will do, and thus it must always re-load b from memory after calling it. Thus, volatile is irrelevant there.
Expanding that to your first actual case, and allowing the compiler to make whole-program optimizations, because pf is volatile the compiler still has no idea what it's going to be pointing at (it can't even assume it's either f1 or f2!), and thus likewise cannot make any assumptions about what will be unmodified across the function-pointer call -- and so volatile on b is still irrelevant.
Your second case is actually simpler -- vb in it is a red herring. If you eliminate that, you can see that even in completely single-threaded semantics, the function-pointer call may modify b. You're not doing anything with undefined behavior, and so the program must operate correctly without volatile -- remember that, if you aren't considering a situation with external thread tweaks, volatile is a no-op. Therefore, without vb in the picture, you cannot possibly need volatile, and it's pretty clear that adding vb changes nothing.
Thus, in summary: You don't need volatile in either case. The difference, insofar as there is one, is that in the first case if fp were not volatile, a sufficiently-advanced compiler could possibly optimize b away, whereas it cannot even without volatile anywhere in the program in the second case. In practice, I do not expect any compilers would actually make that optimization.
volatile only hurts you if you think you could have benefited from an optimization that can't be done or if it communicates something that isn't true.
In your case, you said that these variables can be changed by other threads. Reading code, that's my assumption when I see volatile, so from a maintainer's perspective, that's good -- it's giving me extra information (which is true).
I don't know whether the optimizations are worth trying to salvage since you said this isn't the real code, but if they aren't then there aren't any reasons to not use volatile.
Not using volatile when you are supposed to results in incorrect behavior, since the optimizations are changing the meaning of the code.
I worry about coding the minutia of the standard and behavior of your compilers because things like this can change and even if they don't, your code changes (which could effect the compiler) -- so, unless you are looking for micro-optimization improvements on this specific code, I'd just leave it volatile.