Question about optimization in C++ - c++

I've read that the C++ standard allows optimization to a point where it can actually hinder with expected functionality. When I say this, I'm talking about return value optimization, where you might actually have some logic in the copy constructor, yet the compiler optimizes the call out.
I find this to be somewhat bad, as in someone who doesn't know this might spend quite some time fixing a bug resulting from this.
What I want to know is whether there are any other situations where over-optimization from the compiler can change functionality.
For example, something like:
int x = 1;
x = 1;
x = 1;
x = 1;
might be optimized to a single x=1;
Suppose I have:
class A;
A a = b;
a = b;
a = b;
Could this possibly also be optimized? Probably not the best example, but I hope you know what I mean...

Eliding copy operations is the only case where a compiler is allowed to optimize to the point where side effects visibly change. Do not rely on copy constructors being called, the compiler might optimize away those calls.
For everything else, the "as-if" rule applies: The compiler might optimize as it pleases, as long as the visible side effects are the same as if the compiler had not optimized at all.
("Visible side effects" include, for example, stuff written to the console or the file system, but not runtime and CPU fan speed.)

It might be optimized, yes. But you still have some control over the process, for example, suppose code:
int x = 1;
x = 1;
x = 1;
x = 1;
volatile int y = 1;
y = 1;
y = 1;
y = 1;
Provided that neither x, nor y are used below this fragment, VS 2010 generates code:
int x = 1;
x = 1;
x = 1;
x = 1;
volatile int y = 1;
010B1004 xor eax,eax
010B1006 inc eax
010B1007 mov dword ptr [y],eax
y = 1;
010B100A mov dword ptr [y],eax
y = 1;
010B100D mov dword ptr [y],eax
y = 1;
010B1010 mov dword ptr [y],eax
That is, optimization strips all lines with "x", and leaves all four lines with "y". This is how volatile works, but the point is that you still have control over what compiler does for you.
Whether it is a class, or primitive type - all depends on compiler, how sophisticated it's optimization caps are.
Another code fragment for study:
class A
{
private:
int c;
public:
A(int b)
{
*this = b;
}
A& operator = (int b)
{
c = b;
return *this;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int b = 0;
A a = b;
a = b;
a = b;
return 0;
}
Visual Studio 2010 optimization strips all the code to nothing, in release build with "full optimization" _tmain does just nothing and immediately returns zero.

This will depend on how class A is implemented, whether the compiler can see the implementation and whether it is smart enough. For example, if operator=() in class A has some side effects such optimizing out would change the program behavior and is not possible.

Optimization does not (in proper term) "remove calls to copy or assignments".
It convert a finite state machine in another finite state, machine with a same external behaviour.
Now, if you repeadly call
a=b; a=b; a=b;
what the compiler do depends on what operator= actually is.
If the compiler founds that a call have no chances to alter the state of the program (and the "state of the program" is "everything lives longer than a scope that a scope can access") it will strip it off.
If this cannot be "demonstrated" the call will stay in place.
Whatever the compiler will do, don't worry too much about: the compiler cannot (by contract) change the external logic of a program or of part of it.

i dont know c++ that much but am currently reading Compilers-Principles, techniques and tools
here is a snippet from their section on code optimization:
the machine-independent code-optimization phase attempts to improve
intermediate code so that better target code will result. Usually
better means faster, but other objectives may be desired, such as
shorter code, or target code that consumes less power. for example a
straightforward algorithm generates the intermediate code (1.3) using
an instruction for each operator in the tree representation that comes
from semantic analyzer. a simple intermediate code generation
algorithm followed by code optimization is a reasonable way to
generate good target code. the optimizar can duduce that the
conversion of 60 from integer to floating point can be done once and
for all at compile time, so the inttofloat operation can be eliminated
by replacing the integer 6- by the floating point number 60.0.
moreover t3 is used only once to trasmit its value to id1 so the
optimizer can transform 1.3 into the shorter sequence (1.4)
1.3
t1 - intoffloat(60
t2 -- id3 * id1
ts -- id2 + t2
id1 t3
1.4
t1=id3 * 60.0
id1 = id2 + t1
all and all i mean to say that code optimization should come at a much deeper level and because the code is at such a simple state is doesnt effect what your code does

I had some trouble with const variables and const_cast. The compiler produced incorrect results when it was used to calculate something else. The const variable was optimized away, its old value was made into a compile-time constant. Truly "unexpected behavior". Okay, perhaps not ;)
Example:
const int x = 2;
const_cast<int&>(x) = 3;
int y = x * 2;
cout << y << endl;

Related

Evaluation of constants in for loop condition

for(int i = 0; i < my_function(MY_CONSTANT); ++i){
//code using i
}
In this example, will my_function(MY_CONSTANT) be evaluated at each iteration, or will it be stored automatically? Would this depend on the optimization flags used?
It has to work as if the function is called each time.
However, if the compiler can prove that the function result will be the same each time, it can optimize under the “as if” rule.
E.g. this usually happens with calls to .end() for standard containers.
General advice: when in doubt about whether to micro-optimize a piece of code,
Don't do it.
If you're still thinking of doing it, measure.
Well there was a third point but I've forgetting, maybe it was, still wait.
In other words, decide whether to use a variable based on how clear the code then is, not on imagined performance.
It will be evaluated each iteration. You can save the extra computation time by doing something like
const int stop = my_function(MY_CONSTANT);
for(int i = 0; i < stop; ++i){
//code using i
}
A modern optimizing compiler under the as-if rule may be able to optimize away the function call in the case that you outlined in your comment here. The as-if rule says that conforming compiler only has the emulate the observable behavior, we can see this by going to the draft C++ standard section 1.9 Program execution which says:
[...]Rather, conforming implementations are required to emulate (only)
the observable behavior of the abstract machine as explained below.5
So if you are using a constant expression and my_function does not have observable side effects it could be optimized out. We can put together a simple test (see it live on godbolt):
#include <stdio.h>
#define blah 10
int func( int x )
{
return x + 20 ;
}
void withConstant( int y )
{
for(int i = 0; i < func(blah); i++)
{
printf("%d ", i ) ;
}
}
void withoutConstant(int y)
{
for(int i = 0; i < func(i+y); i++)
{
printf("%d ", i ) ;
}
}
In the case of withConstant we can see it optimizes the computation:
cmpl $30, %ebx #, i
and even in the case of withoutConstant it inlines the calculation instead of performing a function call:
leal 0(%rbp,%rbx), %eax #, D.2605
If my_function is declared constexpr and the argument is really a constant, the value is calculated at compile time and thereby fulfilling the "as-if" and "sequential-consistency with no data-race" rule.
constexpr my_function(const int c);
If your function has side effects it would prevent the compiler from moving it out of the for-loop as it would not fulfil the "as-if" rule, unless the compiler can reason its way out of it.
The compiler might inline my_function, reduce on it as if it was part of the loop and with constant reduction find out that its really only a constant, de-facto removing the call and replacing it with a constant.
int my_function(const int c) {
return 17+c; // inline and constant reduced to the value.
}
So the answer to your question is ... maybe!

constant propagation after register allocation

I am wondering why it is not advisable to do constant propagation after register allocation (RA) as well. After several optimization passes (post RA) there is scope for peephole optimizations like constant propagation/dead-code elimination etc.
I can think of only two reasons,
that these optimizations are easy to do on SSA form.
peephole opt. post RA will result in increased compilation time.
Are there any other reasons?
If it is okay to perform peephole opt. post RA then what should be the data structures/algorithms (any paper, reference etc. would be helpful).
EDIT:
in response to 500 - Internal Server Error's comment.
After optimization passes like phi-elimination (which is, e.g., in llvm-clang, merged with register allocation), global scheduling like: pulling up instructions to parent basic blocks etc.
EDIT2:
In the example shown in figure:
The register allocator figures out that v1 and v2 has the same value and hence, assigns same register (r1) to them. After register allocation a common sub-expression elimination
pass can eliminate r2 = r1 from basic block #4.
See: Constant folding
The example given,
int x = 14;
int y = 7 - x / 2;
return y * (28 / x + 2);
The value x is completely unused after the constant folding. If RA was used first, it would create registers for x. So there is a chance for some pruning before running the RA phase, even if the results are the same. If there are even more variables, spills could be avoided. These would be difficult to undo after the registers are allocated.
I think that instead of constant propagation you are thinking of strength reduction? This is more in the spirit of peephole optimizations; or I don't understand what you mean by constant propagation during the peephole phase, which is usually a back-end portion.
Any Constant folding that was applied before register allocation should be identical, unless variables have been made constant or code was found dead; Ie the CFG has changed.per Mystical
SSA Elimination after Register Allocation describes the LLVM structure. I believe that the SSA could be annotated with constant values so that on Phi elimination unneeded moves can be avoided. This is probably an artifact of the SSA elimination after RA and other compilers won't be experiencing this issue. A separate pass will slow compilation, so addressing the issue in existing passes would be better. I think the following code illustrates the issue,
int foo(int a, int b)
{
int c;
if(a > 0)
c = 7;
else
c = a * b + 10;
return a + c;
}
Upon phi elimination the code looks like,
int foo(int a, int b)
{
int c;
if(a > 0) {
c = 7;
return a + c; /* Should reduce to "a+7" */
} else {
c = a * b + 10;
return a + c;
}
}

What, in short words, does the GCC option -fipa-pta do?

According to the GCC manual, the -fipa-pta optimization does:
-fipa-pta: Perform interprocedural pointer analysis and interprocedural modification and reference analysis. This option can cause excessive
memory and compile-time usage on large compilation units. It is not
enabled by default at any optimization level.
What I assume is that GCC tries to differentiate mutable and immutable data based on pointers and references used in a procedure. Can someone with more in-depth GCC knowledge explain what -fipa-pta does?
I think the word "interprocedural" is the key here.
I'm not intimately familiar with gcc's optimizer, but I've worked on optimizing compilers before. The following is somewhat speculative; take it with a small grain of salt, or confirm it with someone who knows gcc's internals.
An optimizing compiler typically performs analysis and optimization only within each individual function (or subroutine, or procedure, depending on the language). For example, given code like this contrived example:
double *ptr = ...;
void foo(void) {
...
*ptr = 123.456;
some_other_function();
printf("*ptr = %f\n", *ptr);
}
the optimizer will not be able to determine whether the value of *ptr has been changed by the call to some_other_function().
If interprocedural analysis is enabled, then the optimizer can analyze the behavior of some_other_function(), and it may be able to prove that it can't modify *ptr. Given such analysis, it can determine that the expression *ptr must still evaluate to 123.456, and in principle it could even replace the printf call with puts("ptr = 123.456");.
(In fact, with a small program similar to the above code snippet I got the same generated code with -O3 and -O3 -fipa-pta, so I'm probably missing something.)
Since a typical program contains a large number of functions, with a huge number of possible call sequences, this kind of analysis can be very expensive.
As quoted from this article:
The "-fipa-pta" optimization takes the bodies of the called functions into account when doing the analysis, so compiling
void __attribute__((noinline))
bar(int *x, int *y)
{
*x = *y;
}
int foo(void)
{
int a, b = 5;
bar(&a, &b);
return b + 10;
}
with -fipa-pta makes the compiler see that bar does not modify b, and the compiler optimizes foo by changing b+10 to 15
int foo(void)
{
int a, b = 5;
bar(&a, &b);
return 15;
}
A more relevant example is the “slow” code from the “Integer division is slow” blog post
std::random_device entropySource;
std::mt19937 randGenerator(entropySource());
std::uniform_int_distribution<int> theIntDist(0, 99);
for (int i = 0; i < 1000000000; i++) {
volatile auto r = theIntDist(randGenerator);
}
Compiling this with -fipa-pta makes the compiler see that theIntDist is not modified within the loop, and the inlined code can thus be constant-folded in the same way as the “fast” version – with the result that it runs four times faster.

is it always faster to store multiple class calls in a variable?

If you have a method such as this:
float method(myClass foo)
{
return foo.PrivateVar() + foo.PrivateVar();
}
is it always faster/better to do this instead?:
float method(myClass foo)
{
float temp = foo.PrivateVar();
return temp + temp;
}
I know you're not supposed to put a call like foo.PrivateVar() in a for loop for example, because it evaluates it many times when you actually only need to use the value once (in some cases.
for (int i = 0; i < foo.PrivateInt(); i++)
{
//iterate through stuff with 'i'
}
from this I made the assumption to change code like the first example to that in the second, but then I've been told by people to not try to be smarter than the compiler! and that it could very well inline the calls.
I don't want to profile anything, I just want a few simple rules for good practice on this. I'm writing a demo for a job application and I don't want anyone to look at the code and see some rookie mistake.
That completely depends on what PrivateVar() is doing and where it's defined etc. If the compiler has access to the code in PrivateVar() and can guarantee that there are no side effects by calling the function it can do CSE which is basically what you've done in your second code example.
Exactly the same is true for your for loop. So if you want to be sure it's only evaluated once because it's a hugely expensive function (which also means that guaranteeing no side-effects get's tricky even if there aren't any) write it explicitly.
If PrivateVar() is just a getter, write the clearer code - even if the compiler may not do CSE the performance difference won't matter in 99.9999% of all cases.
Edit: CSE stands for Common Subexpression eliminiation and does exactly what it stands for ;) The wiki page shows an example for a simple multiplication, but we can do this for larger code constructs just as well, like for example a function call.
In all cases we have to guarantee that only evaluating the code once doesn't change the semantics, i.e. doing CSE for this code:
a = b++ * c + g;
d = b++ * c * d;
and changing it to:
tmp = b++ * c;
a = tmp + g;
d = tmp * d;
would obviously be illegal (for function calls this is obviously a bit more complex, but it's the same principle).

c++ for loop optimization question

I have the following looking code in VC++:
for (int i = (a - 1) * b; i < a * b && i < someObject->someFunction(); i++)
{
// ...
}
As far as I know compilers optimize all these arithmetic operations and they won't be executed on each loop, but I'm not sure if they can tell that the function above also returns the same value each time and it doesn't need to be called each time.
Is it a better practice to save all calculations into variables, or just rely on compiler optimizations to have a more readable code?
int start = (a - 1) * b;
int expra = a * b;
int exprb = someObject->someFunction();
for (int i = startl i < expra && i < exprb; i++)
{
// ...
}
Short answer: it depends. If the compiler can deduce that running someObject->someFunction() every time and caching the result once both produce the same effects, it is allowed (but not guaranteed) to do so. Whether this static analysis is possible depends on your program: specifically, what the static type of someObject is and what its dynamic type is expected to be, as well as what someFunction() actually does, whether it's virtual, and so on.
In general, if it only needs to be done once, write your code in such a way that it can only be done once, bypassing the need to worry about what the compiler might be doing:
int start = (a - 1) * b;
int expra = a * b;
int exprb = someObject->someFunction();
for (int i = start; i < expra && i < exprb; i++)
// ...
Or, if you're into being concise:
for (int i = (a - 1) * b, expra = a * b, exprb = someObject->someFunction();
i < expra && i < exprb; i++)
// ...
From my experience VC++ compiler won't optimize the function call out unless it can see the function implementation at the point of compiling the calling code. So moving the call outside the loop is a good idea.
If a function resides within the same compilation unit as its caller, the compiler can often deduce some facts about it - e.g. that its output might not change for subsequent calls. In general, however, that is not the case.
In your example, assigning variables for these simple arithmetic expressions does not really change anything with regards to the produced object code and, in my opinion, makes the code less readable. Unless you have a bunch of long expressions that cannot reasonably be put within a line or two, you should avoid using temporary variables - if for no other reason, then just to reduce namespace pollution.
Using temporary variables implies a significant management overhead for the programmer, in order to keep them separate and avoid unintended side-effects. It also makes reusing code snippets harder.
On the other hand, assigning the result of the function to a variable can help the compiler optimise your code better by explicitly avoiding multiple function calls.
Personally, I would go with this:
int expr = someObject->someFunction();
for (int i = (a - 1) * b; i < a * b && i < expr; i++)
{
// ...
}
The compiler cannot make any assumption on whether your function will return the same value at each time. Let's imagine that your object is a socket, how could the compiler possibly know what will be its output?
Also, the optimization that a compiler can make in such loops strongly depends on the whether a and b are declared as const or not, and whether or not they are local. With advanced optimization schemes, it may be able to infer that a and b are neither modified in the loop nor in your function (again, you might imagine that your object holds some reference to them).
Well, in short: go for the second version of your code!
It is very likely that the compiler will call the function each time.
If you are concerned with the readability of code, what about using:
int maxindex = min (expra, exprb);
for (i=start; i<maxindex; i++)
IMHO, long lines does not improve readability.
Writing short lines and doing multiple step to get a result, does not impact the performance, this is exactly why we use compilers.
Effectively what you might be asking is whether the compiler will inline the function someFunction() and whether it will see that someObject is the same instance in each loop, and if it does both it will potentially "cache" the return value and not keep re-evaluating it.
Much of this may depend on what optimisation settings you use, with VC++ as well as any other compiler, although I am not sure VC++ gives you quite as many flags as gnu.
I often find it incredible that programmers rely on compilers to optimise things they can easily optimise themselves. Just move the expression to the first section of the for-loop if you know it will evaluate the same each time:
Just do this and don't rely on the compiler:
for (int i = (a - 1) * b, iMax = someObject->someFunction();
i < a * b && i < iMax; ++i)
{
// body
}