Understanding compiler optimisations for compile time logic - c++

If we have a program which specifies certain fixed conditions at compile time, does the compiler determine and fix which 'branch' of the decision tree the program will always run in?
For example, if the following program is compiled with the -Ofast flag, does the program spend any time at all actually checking the if (aFixedCondition) loop?
int main() {
bool aFixedCondition = true;
if (aFixedCondition)
Run_A();
else
Run_B();
}
Does this also extend to the case in which we do a large number of repeated checks of a fixed condition that is unchanging in our programs lifetime. Something like:
int main() {
bool aFixedCondition = true;
for (int i = 0; i < 100000; i++) {
if (aFixedCondition)
Run_A();
else
Run_B();
}
}
According to this post, it is not a 'bad thing' per se to rely on compiler optimisations. Personally I'd rather not do this, but it is unclear how to re-organise the structure of the above program when it is embedded in a more realistic/complicated code. I also could not find anything relevant about the Ofast flag (here) in relation to the above.

In the code you show, every good compiler will recognize the if condition is true and the Run_B(); code is unreachable if optimization is enabled. It will then remove the evaluation of aFixedCondition and the Run_B(); code from the program, as well as the bool aFixedCondition = true;.
The conditions you show are of course simplistic. It is possible there are controlling expressions in if statements or loops that are always true (or always false) but that the compiler cannot recognize this due to various complications in the program. The state of the art is such that if we can easily see that a condition is always true in some direct line of logic, the compiler should be able to as well, and so we might rely on compiler optimizations in such cases. However, if the expression is not easily seen to be always true (or always false), the optimization is more subject to compiler quality and the particular circumstances of the program.
It is not unusual to use this in testing conditions that are compile-time constants but cannot be done with preprocessor tests. For example, in:
if (sizeof x == 4)
DoCodeA();
else
DoCodeB();
we would expect the compiler to know the size of x (neglecting the possibility it has some run-time variable size, as do C’s variable length arrays) and remove the unselected code from the program.

Related

How much do C/C++ compilers optimize conditional statements?

I recently ran into a situation where I wrote the following code:
for(int i = 0; i < (size - 1); i++)
{
// do whatever
}
// Assume 'size' will be constant during the duration of the for loop
When looking at this code, it made me wonder how exactly the for loop condition is evaluated for each loop. Specifically, I'm curious as to whether or not the compiler would 'optimize away' any additional arithmetic that has to be done for each loop. In my case, would this code get compiled such that (size - 1) would have to be evaluated for every loop iteration? Or is the compiler smart enough to realize that the 'size' variable won't change, thus it could precalculate it for each loop iteration.
This then got me thinking about the general case where you have a conditional statement that may specify more operations than necessary.
As an example, how would the following two pieces of code compile:
if(6)
if(1+1+1+1+1+1)
int foo = 1;
if(foo + foo + foo + foo + foo + foo)
How smart is the compiler? Will the 3 cases listed above be converted into the same machine code?
And while I'm at, why not list another example. What does the compiler do if you are doing an operation within a conditional that won't have any effect on the end result? Example:
if(2*(val))
// Assume val is an int that can take on any value
In this example, the multiplication is completely unnecessary. While this case seems a lot stupider than my original case, the question still stands: will the compiler be able to remove this unnecessary multiplication?
Question:
How much optimization is involved with conditional statements?
Does it vary based on compiler?
Short answer: the compiler is exceptionally clever, and will generally optimise those cases that you have presented (including utterly ignoring irrelevant conditions).
One of the biggest hurdles language newcomers face in terms of truly understanding C++, is that there is not a one-to-one relationship between their code and what the computer executes. The entire purpose of the language is to create an abstraction. You are defining the program's semantics, but the computer has no responsibility to actually follow your C++ code line by line; indeed, if it did so, it would be abhorrently slow as compared to the speed we can expect from modern computers.
Generally speaking, unless you have a reason to micro-optimise (game developers come to mind), it is best to almost completely ignore this facet of programming, and trust your compiler. Write a program that takes the inputs you want, and gives the outputs you want, after performing the calculations you want… and let your compiler do the hard work of figuring out how the physical machine is going to make all that happen.
Are there exceptions? Certainly. Sometimes your requirements are so specific that you do know better than the compiler, and you end up optimising. You generally do this after profiling and determining what your bottlenecks are. And there's also no excuse to write deliberately silly code. After all, if you go out of your way to ask your program to copy a 50MB vector, then it's going to copy a 50MB vector.
But, assuming sensible code that means what it looks like, you really shouldn't spend too much time worrying about this. Because modern compilers are so good at optimising, that you'd be a fool to try to keep up.
The C++ language specification permits the compiler to make any optimization that results in no observable changes to the expected results.
If the compiler can determine that size is constant and will not change during execution, it can certainly make that particular optimization.
Alternatively, if the compiler can also determine that i is not used in the loop (and its value is not used afterwards), that it is used only as a counter, it might very well rewrite the loop to:
for(int i = 1; i < size; i++)
because that might produce smaller code. Even if this i is used in some fashion, the compiler can still make this change and then adjust all other usage of i so that the observable results are still the same.
To summarize: anything goes. The compiler may or may not make any optimization change as long as the observable results are the same.
Yes, there is a lot of optimization, and it is very complex.
It varies based on the compiler, and it also varies based on the compiler options
Check
https://meta.stackexchange.com/questions/25840/can-we-stop-recommending-the-dragon-book-please
for some book recomendations if you really want to understand what a compiler may do. It is a very complex subject.
You can also compile to assembly with the -S option (gcc / g++) to see what the compiler is really doing. Use -O3 / ... / -O0 / -O to experiment with different optimization levels.

Performance impact of using 'break' inside 'for-loop'

I have done my best and read a lot of Q&As on SO.SE, but I haven't found an answer to my particular question. Most for-loop and break related question refer to nested loops, while I am concerned with performance.
I want to know if using a break inside a for-loop has an impact on the performance of my C++ code (assuming the break gets almost never called). And if it has, I would also like to know tentatively how big the penalization is.
I am quite suspicions that it does indeed impact performance (although I do not know how much). So I wanted to ask you. My reasoning goes as follows:
Independently of the extra code for the conditional statements that
trigger the break (like an if), it necessarily ads additional
instructions to my loop.
Further, it probably also messes around when my compiler tries to
unfold the for-loop, as it no longer knows the number of iterations
that will run at compile time, effectively rendering it into a
while-loop.
Therefore, I suspect it does have a performance impact, which could be
considerable for very fast and tight loops.
So this takes me to a follow-up question. Is a for-loop & break performance-wise equal to a while-loop? Like in the following snippet, where we assume that checkCondition() evaluates 99.9% of the time as true. Do I loose the performance advantage of the for-loop?
// USING WHILE
int i = 100;
while( i-- && checkCondition())
{
// do stuff
}
// USING FOR
for(int i=100; i; --i)
{
if(checkCondition()) {
// do stuff
} else {
break;
}
}
I have tried it on my computer, but I get the same execution time. And being wary of the compiler and its optimization voodoo, I wanted to know the conceptual answer.
EDIT:
Note that I have measured the execution time of both versions in my complete code, without any real difference. Also, I do not trust compiling with -s (which I usually do) for this matter, as I am not interested in the particular result of my compiler. I am rather interested in the concept itself (in an academic sense) as I am not sure if I got this completely right :)
The principal answer is to avoid spending time on similar micro optimizations until you have verified that such condition evaluation is a bottleneck.
The real answer is that CPU have powerful branch prediction circuits which empirically work really well.
What will happen is that your CPU will choose if the branch is going to be taken or not and execute the code as if the if condition is not even present. Of course this relies on multiple assumptions, like not having side effects on the condition calculation (so that part of the body loop depends on it) and that that condition will always evaluate to false up to a certain point in which it will become true and stop the loop.
Some compilers also allow you to specify the likeliness of an evaluation as a hint the branch predictor.
If you want to see the semantic difference between the two code versions just compile them with -S and examinate the generated asm code, there's no other magic way to do it.
The only sensible answer to "what is the performance impact of ...", is "measure it". There are very few generic answers.
In the particular case you show, it would be rather surprising if an optimising compiler generated significantly different code for the two examples. On the other hand, I can believe that a loop like:
unsigned sum = 0;
unsigned stop = -1;
for (int i = 0; i<32; i++)
{
stop &= checkcondition(); // returns 0 or all-bits-set;
sum += (stop & x[i]);
}
might be faster than:
unsigned sum = 0;
for (int i = 0; i<32; i++)
{
if (!checkcondition())
break;
sum += x[i];
}
for a particular compiler, for a particular platform, with the right optimization levels set, and for a particular pattern of "checkcondition" results.
... but the only way to tell would be to measure.

Is there a compiler hint for GCC to force branch prediction to always go a certain way?

For the Intel architectures, is there a way to instruct the GCC compiler to generate code that always forces branch prediction a particular way in my code? Does the Intel hardware even support this? What about other compilers or hardwares?
I would use this in C++ code where I know the case I wish to run fast and do not care about the slow down when the other branch needs to be taken even when it has recently taken that branch.
for (;;) {
if (normal) { // How to tell compiler to always branch predict true value?
doSomethingNormal();
} else {
exceptionalCase();
}
}
As a follow on question for Evdzhan Mustafa, can the hint just specify a hint for the first time the processor encounters the instruction, all subsequent branch prediction, functioning normally?
GCC supports the function __builtin_expect(long exp, long c) to provide this kind of feature. You can check the documentation here.
Where exp is the condition used and c is the expected value. For example in you case you would want
if (__builtin_expect(normal, 1))
Because of the awkward syntax this is usually used by defining two custom macros like
#define likely(x) __builtin_expect (!!(x), 1)
#define unlikely(x) __builtin_expect (!!(x), 0)
just to ease the task.
Mind that:
this is non standard
a compiler/cpu branch predictor are likely more skilled than you in deciding such things so this could be a premature micro-optimization
No, there is not. (At least on modern x86 processors.)
__builtin_expect mentioned in other answers influences the way gcc arranges the assembly code. It does not directly influence the CPU's branch predictor. Of course, there will be indirect effects on branch prediction caused by reordering the code. But on modern x86 processors there is no instruction that tells the CPU "assume this branch is/isn't taken".
See this question for more detail: Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?
To be clear, __builtin_expect and/or the use of -fprofile-arcs can improve the performance of your code, both by giving hints to the branch predictor through code layout (see Performance optimisations of x86-64 assembly - Alignment and branch prediction), and also improving cache behaviour by keeping "unlikely" code away from "likely" code.
gcc has long __builtin_expect (long exp, long c) (emphasis mine):
You may use __builtin_expect to provide the compiler with branch
prediction information. In general, you should prefer to use actual
profile feedback for this (-fprofile-arcs), as programmers are
notoriously bad at predicting how their programs actually perform.
However, there are applications in which this data is hard to collect.
The return value is the value of exp, which should be an integral
expression. The semantics of the built-in are that it is expected that
exp == c. For example:
if (__builtin_expect (x, 0))
foo ();
indicates that we do not expect to call foo, since we expect x to be
zero. Since you are limited to integral expressions for exp, you
should use constructions such as
if (__builtin_expect (ptr != NULL, 1))
foo (*ptr);
when testing pointer or floating-point values.
As the documentation notes you should prefer to use actual profile feedback and this article shows a practical example of this and how it in their case at least ends up being an improvement over using __builtin_expect. Also see How to use profile guided optimizations in g++?.
We can also find a Linux kernel newbies article on the kernal macros likely() and unlikely() which use this feature:
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
Note the !! used in the macro we can find the explanation for this in Why use !!(condition) instead of (condition)?.
Just because this technique is used in the Linux kernel does not mean it always makes sense to use it. We can see from this question I recently answered difference between the function performance when passing parameter as compile time constant or variable that many hand rolled optimizations techniques don't work in the general case. We need to profile code carefully to understand whether a technique is effective. Many old techniques may not even be relevant with modern compiler optimizations.
Note, although builtins are not portable clang also supports __builtin_expect.
Also on some architectures it may not make a difference.
The correct way to define likely/unlikely macros in C++11 is the following:
#define LIKELY(condition) __builtin_expect(static_cast<bool>(condition), 1)
#define UNLIKELY(condition) __builtin_expect(static_cast<bool>(condition), 0)
This method is compatible with all C++ versions, unlike [[likely]], but relies on non-standard extension __builtin_expect.
When these macros defined this way:
#define LIKELY(condition) __builtin_expect(!!(condition), 1)
That may change the meaning of if statements and break the code. Consider the following code:
#include <iostream>
struct A
{
explicit operator bool() const { return true; }
operator int() const { return 0; }
};
#define LIKELY(condition) __builtin_expect((condition), 1)
int main() {
A a;
if(a)
std::cout << "if(a) is true\n";
if(LIKELY(a))
std::cout << "if(LIKELY(a)) is true\n";
else
std::cout << "if(LIKELY(a)) is false\n";
}
And its output:
if(a) is true
if(LIKELY(a)) is false
As you can see, the definition of LIKELY using !! as a cast to bool breaks the semantics of if.
The point here is not that operator int() and operator bool() should be related. Which is good practice.
Rather that using !!(x) instead of static_cast<bool>(x) loses the context for C++11 contextual conversions.
As the other answers have all adequately suggested, you can use __builtin_expect to give the compiler a hint about how to arrange the assembly code. As the official docs point out, in most cases, the assembler built into your brain will not be as good as the one crafted by the GCC team. It's always best to use actual profile data to optimize your code, rather than guessing.
Along similar lines, but not yet mentioned, is a GCC-specific way to force the compiler to generate code on a "cold" path. This involves the use of the noinline and cold attributes, which do exactly what they sound like they do. These attributes can only be applied to functions, but with C++11, you can declare inline lambda functions and these two attributes can also be applied to lambda functions.
Although this still falls into the general category of a micro-optimization, and thus the standard advice applies—test don't guess—I feel like it is more generally useful than __builtin_expect. Hardly any generations of the x86 processor use branch prediction hints (reference), so the only thing you're going to be able to affect anyway is the order of the assembly code. Since you know what is error-handling or "edge case" code, you can use this annotation to ensure that the compiler won't ever predict a branch to it and will link it away from the "hot" code when optimizing for size.
Sample usage:
void FooTheBar(void* pFoo)
{
if (pFoo == nullptr)
{
// Oh no! A null pointer is an error, but maybe this is a public-facing
// function, so we have to be prepared for anything. Yet, we don't want
// the error-handling code to fill up the instruction cache, so we will
// force it out-of-line and onto a "cold" path.
[&]() __attribute__((noinline,cold)) {
HandleError(...);
}();
}
// Do normal stuff
⋮
}
Even better, GCC will automatically ignore this in favor of profile feedback when it is available (e.g., when compiling with -fprofile-use).
See the official documentation here: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
As of C++20 the likely and unlikely attributes should be standardized and are already supported in g++9. So as discussed here, you can write
if (a > b) {
/* code you expect to run often */
[[likely]] /* last statement here */
}
e.g. in the following code the else block gets inlined thanks to the [[unlikely]] in the if block
int oftendone( int a, int b );
int rarelydone( int a, int b );
int finaltrafo( int );
int divides( int number, int prime ) {
int almostreturnvalue;
if ( ( number % prime ) == 0 ) {
auto k = rarelydone( number, prime );
auto l = rarelydone( number, k );
[[unlikely]] almostreturnvalue = rarelydone( k, l );
} else {
auto a = oftendone( number, prime );
almostreturnvalue = oftendone( a, a );
}
return finaltrafo( almostreturnvalue );
}
godbolt link comparing the presence/absence of the attribute
__builtin_expect can be used to tell the compiler which way you expect a branch to go. This can influence how the code is generated. Typical processors run code faster sequentially. So if you write
if (__builtin_expect (x == 0, 0)) ++count;
if (__builtin_expect (y == 0, 0)) ++count;
if (__builtin_expect (z == 0, 0)) ++count;
the compiler will generate code like
if (x == 0) goto if1;
back1: if (y == 0) goto if2;
back2: if (z == 0) goto if3;
back3: ;
...
if1: ++count; goto back1;
if2: ++count; goto back2;
if3: ++count; goto back3;
If your hint is correct, this will execute the code without any branches actually performed. It will run faster than the normal sequence, where each if statement would branch around the conditional code and would execute three branches.
Newer x86 processors have instructions for branches that are expected to be taken, or for branches that are expected not to be taken (there's an instruction prefix; not sure about the details). Not sure if the processor uses that. It is not very useful, because branch prediction will handle this just fine. So I don't think you can actually influence the branch prediction.
With regards to the OP, no, there is no way in GCC to tell the processor to always assume the branch is or isn't taken. What you have is __builtin_expect, which does what others say it does. Furthermore, I think you don't want to tell the processor whether the branch is taken or not always. Today's processors, such as the Intel architecture can recognize fairly complex patterns and adapt effectively.
However, there are times you want to assume control of whether by default a branch is predicted taken or not: When you know the code will be called "cold" with respect of branching statistics.
One concrete example: Exception management code. By definition the management code will happen exceptionally, but perhaps when it occurs maximum performance is desired (there may be a critical error to take care off as soon as possible), hence you may want to control the default prediction.
Another example: You may classify your input and jump into the code that handles the result of your classification. If there are many classifications, the processor may collect statistics but lose them because the same classification does not happen soon enough and the prediction resources are devoted to recently called code. I wish there would be a primitive to tell the processor "please do not devote prediction resources to this code" the way you sometimes can say "do not cache this".

c++ attempt to optimize code by replacing tests

I am looking at code that someone else wrote, and it has a lot of debug sections, of type
if(0) { code }
or if(1) { code }
or if(false) { code }
There is even
#if(0)
#endif
(which did not turn gray though - I thought that it should)
I was wondering, if I replace these with some #if 0 (or #ifdef _DEBUG), is it possible to optimize the code ? - or - it will not make any difference ?
I think that it may help, since I have seen code that is within these sections being grayed out - and I thought that this code is removed from the Release executable... Therefore making it faster. Is that true ?
The code that I am thinking of is inside functions that could be called lots of times...
Edit: The code I am referring to is being run millions of times. I am aware that the contents of the if(0) will be ignored...
I am also aware of the benefit of being able to easily debug an issue, by switching a test from 0 to 1...
My question was, the fact that I am adding millions of millions of times the test if(0) does not add overhead... I am trying to figure out what are all the things that could make this code take fewer hours.
If expressions placed inside those IFs are constant and determinable at the time of compilation, then you may be almost sure that the compiler has already removed them off the code for you.
Of course, if you compile in Debug-Mode, and/or if you have optimization-level set to zero, then the compiler may skip that and leave those tests - but with plain zero/one/true/false values it is highly unlikely.
For a compile-time constant branches, you may be sure that the compiler removed the dead ones.
It is able to remove even complex-looking cases like:
const int x = 5;
if( 3 * x * x < 10 ) // ~ 75 < 10
{
doBlah(); // skipped
}
However, without that 'const' marker at X, the expression's value may be not determinable at the compile time, and it may 'leak' into the actual final product.
Also, the value of expression in following code is not necesarily compile-time constant:
const int x = aFunction();
if( 3 * x * x < 10 ) // ~ 75 < 10
{
doBlah(); // skipped
}
X is a constant, but it is initialized with value from a function. X will most probably be not determinable at the time of compilation. In runtime the function could return any value*) so the compiler must assume that X is unknown.
Therefore, if you have possibility, then use preprocessor. In trivial cases that won't do much, because the compiler already knew that. But cases are not always trivial, and you will notice the change vrey often. When optimizer fails to deduce the values, it leaves the code, even if it is dead. Preprocessor on the other hand is guaranteed to remove disabled sections, before they get compiled and optimized. Also, using preprocessor to do that at least will speed up the compilation: the compiler/optimizer will not have to traceconstants/calculate/checkbranches etc.
*) it is possible to write a method/function which return value will actually be determinable at the compilation and optimization phases: if the function is simple and if it gets inlined, its result value might be optimized out along with some branches.. But even if you can somewhat rely on removing the if-0 clauses, you cannot rely on the inlining as much..
If you have code inside an if (0) block, the code generated by the compiler will be the same as if that block wasn't there on any reasonable compiler. The code will still be checked for compile-time errors. (Assuming you don't have any jump labels inside it or something weird like that.)
If you have code inside an if (1) block, the code generated by the compiler will be the same as if the code was just inside braces. It's a common way to give a block of code its own scope so that local variables are destructed where desired.
If you ifdef out code, then the compiler ignores it completely. The code can be completely nonsense, contain syntax errors, or whatever and the compiler will not care.
Typically, #if 0 is used to remove code whilst still keeping it around - for example to easily compare to options, I sometimes do:
#if 1
some sort of code
#else
some other code
#endif
That way, I can quickly switch between the two alternatives.
In this case, the preprocessor will just leave one of the two options in the code.
The constructs of if(0) or if(1) is similar - the compiler will pretty much remove the if, and in the case of 0 also remove the rest of the if-statement.
I think it's rather sloppy to leave this sort of stuff in "completed" code, but it's very useful for debugging/development.
Say for example you are trying a new method for doing something that is much faster:
if (1)
{
fast_function();
}
else
{
slower_function();
}
Now, in one of your testcases, the result shows an error. So you want to quickly go back to slower_funcion and see if the result is the same or not. If it's the same, then you have to look at what else has changed since it last passed. If it's OK with slower_function, you go back and look at why fast_function() is not working as it should in this case.
It's true (depending on your build settings and preprocessor).
Putting debug code in #ifdef _DEBUG (or similar) is a standard way to keep these completely out of your release builds. Usually the debug build #defines it, and the release build does not.
Usually, though, a compiler should also remove code such as if (0), if given the proper optimization flags, but this puts extra work on the compiler, and on the programmer (now you have to go change them all!). I'd definitely leave this to the preprocessor.
You're correct. If you compile with #define DEBUG 0 then you will be actually removing all the #if DEBUG blocks at compile time. Hence, there will be lot less code, and it will run faster.
Just make sure you release your code after making #define DEBUG 0 at release time.
A good optimizing compiler(GCC, MSVC) will remove if(0) and if(1) from code completely... the translation to machine code will NOT test for these conditions...

Is Loop Hoisting still a valid manual optimization for C code?

Using the latest gcc compiler, do I still have to think about these types of manual loop optimizations, or will the compiler take care of them for me well enough?
If your profiler tells you there is a problem with a loop, and only then, a thing to watch out for is a memory reference in the loop which you know is invariant across the loop but the compiler does not. Here's a contrived example, bubbling an element out to the end of an array:
for ( ; i < a->length - 1; i++)
swap_elements(a, i, i+1);
You may know that the call to swap_elements does not change the value of a->length, but if the definition of swap_elements is in another source file, it is quite likely that the compiler does not. Hence it can be worthwhile hoisting the computation of a->length out of the loop:
int n = a->length;
for ( ; i < n - 1; i++)
swap_elements(a, i, i+1);
On performance-critical inner loops, my students get measurable speedups with transformations like this one.
Note that there's no need to hoist the computation of n-1; any optimizing compiler is perfectly capable of discovering loop-invariant computations among local variables. It's memory references and function calls that may be more difficult. And the code with n-1 is more manifestly correct.
As others have noted, you have no business doing any of this until you've profiled and have discovered that the loop is a performance bottleneck that actually matters.
Write the code, profile it, and only think about optimising it when you have found something that is not fast enough, and you can't think of an alternative algorithm that will reduce/avoid the bottleneck in the first place.
With modern compilers, this advice is even more important - if you write simple clean code, the compiler's optimiser can often do a better job of optimising the code than it can if you try to give it snazzy "pre-optimised" code.
Check the generated assembly and see for yourself. See if the computation for the loop-invariant code is being done inside the loop or outside the loop in the assembly code that your compiler generates. If it's failing to do the loop hoisting, do the hoisting yourself.
But as others have said, you should always profile first to find your bottlenecks. Once you've determined that this is in fact a bottleneck, only then should you check to see if the compiler's performing loop hoisting (aka loop-invariant code motion) in the hot spots. If it's not, help it out.
Compilers generally do an excellent job with this type of optimization, but they do miss some cases. Generally, my advice is: write your code to be as readable as possible (which may mean that you hoist loop invariants -- I prefer to read code written that way), and if the compiler misses optimizations, file bugs to help fix the compiler. Only put the optimization into your source if you have a hard performance requirement that can't wait on a compiler fix, or the compiler writers tell you that they're not going to be able to address the issue.
Where they are likely to be important to performance, you still have to think about them.
Loop hoisting is most beneficial when the value being hoisted takes a lot of work to calculate. If it takes a lot of work to calculate, it's probably a call out of line. If it's a call out of line, the latest version of gcc is much less likely than you are to figure out that it will return the same value every time.
Sometimes people tell you to profile first. They don't really mean it, they just think that if you're smart enough to figure out when it's worth worrying about performance, then you're smart enough to ignore their rule of thumb. Obviously, the following code might as well be "prematurely optimized", whether you have profiled or not:
#include <iostream>
bool isPrime(int p) {
for (int i = 2; i*i <= p; ++i) {
if ((p % i) == 0) return false;
}
return true;
}
int countPrimesLessThan(int max) {
int count = 0;
for (int i = 2; i < max; ++i) {
if (isPrime(i)) ++count;
}
return count;
}
int main() {
for (int i = 0; i < 10; ++i) {
std::cout << "The number of primes less than 1 million is: ";
std::cout << countPrimesLessThan(1000*1000);
std::cout << std::endl;
}
}
It takes a "special" approach to software development not to manually hoist that call to countPrimesLessThan out of the loop, whether you've profiled or not.
Early optimizations are bad only if other aspects - like readability, clarity of intent, or structure - are negatively affected.
If you have to declare it anyway, loop hoisting can even improve clarity, and it explicitely documents your assumption "this value doesn't change".
As a rule of thumb I wouldn't hoist the count/end iterator for a std::vector, because it's a common scenario easily optimized. I wouldn't hoist anything that I can trust my optimizer to hoist, and I wouldn't hoist anything known to be not critical - e.g. when running through a list of dozen windows to respond to a button click. Even if it takes 50ms, it will still appear "instanteneous" to the user. (But even that is a dangerous assumption: if a new feature requires looping 20 times over this same code, it suddenly is slow). You should still hoist operations such as opening a file handle to append, etc.
In many cases - very well in loop hoisting - it helps a lot to consider relative cost: what is the cost of the hoisted calculation compared to the cost of running through the body?
As for optimizations in general, there are quite some cases where the profiler doesn't help. Code may have very different behavior depending on the call path. Library writers often don't know their call path otr frequency. Isolating a piece of code to make things comparable can already alter the behavior significantly. The profiler may tell you "Loop X is slow", but it won't tell you "Loop X is slow because call Y is thrashing the cache for everyone else". A profiler couldn't tell you "this code is fast because of your snarky CPU, but it will be slow on Steve's computer".
A good rule of thumb is usually that the compiler performs the optimizations it is able to.
Does the optimization require any knowledge about your code that isn't immediately obvious to the compiler? Then it is hard for the compiler to apply the optimization automatically, and you may want to do it yourself
In most cases, lop hoisting is a fully automatic process requiring no high-level knowledge of the code -- just a lot of lifetime and dependency analysis, which is what the compiler excels at in the first place.
It is possible to write code where the compiler is unable to determine whether something can be hoisted out safely though -- and in those cases, you may want to do it yourself, as it is a very efficient optimization.
As an example, take the snippet posted by Steve Jessop:
for (int i = 0; i < 10; ++i) {
std::cout << "The number of primes less than 1 billion is: ";
std::cout << countPrimesLessThan(1000*1000*1000);
std::cout << std::endl;
}
Is it safe to hoist out the call to countPrimesLessThan? That depends on how and where the function is defined. What if it has side effects? It may make an important difference whether it is called once or ten times, as well as when it is called. If we don't know how the function is defined, we can't move it outside the loop. And the same is true if the compiler is to perform the optimization.
Is the function definition visible to the compiler? And is the function short enough that we can trust the compiler to inline it, or at least analyze the function for side effects? If so, then yes, it will hoist it outside the loop.
If the definition is not visible, or if the function is very big and complicated, then the compiler will probably assume that the function call can not be moved safely, and then it won't automatically hoist it out.
Remember 80-20 Rule.(80% of execution time is spent on 20% critical code in the program)
There is no meaning in optimizing the code which have no significant effect on program's overall efficiency.
One should not bother about such kind of local optimization in the code.So the best approach is to profile the code to figure out the critical parts in the program which consumes heavy CPU cycles and try to optimize it.This kind of optimization will really makes some sense and will result in improved program efficiency.