.
I would like to know all the possible ways (or atleast the popular ones) in which compilers can/do optimize away our code written in C++? I also would like to know how exactly the optimization is done (in each case)!
So far, I'm aware of two optimizations, viz. Empty Base Optimization (EBO) and Return Value Optimization (RVO). What else are there? I've heard of "const" optimization,"unused variable" optimization. What are they?
.
All possible ways? Surely you're joking. For that, look through years of compiler research and practice.
For concrete examples, look up each of the options here: http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
From Standard docs., Section 1.9,
4) This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard
as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance,
an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the
observable behavior of the program are produced.
So actually the standard compliant compiler can perform any kind of optimizations as long as it produces the desired result.
Incredibly broad, because there are many optimizations and compiler writers are always thinking up more. There are tons of them, some optimizing for run time, others optimizing for binary size. Many aren't specifically C++ either, the general compiler optimization techniques are implemented for many compilers/interpreters for many different languages.
Just a handful:
Dead code removal
Loop unrolling
Constants folding
More info:
Optimizing C++ Applications with Intel C++ Compiler
New Optimizations in VC++ 2010
Generic Compiler Optimizations Wiki
Related
I read an article in which different compilers were compared to infer which is the best in different circumstances. It gave me a thought. Even though I tried to google, I didn't manage to find a clear and lucid answer: will the program run faster or slower if I use different compilers to compile it? Suppose, it's some uncommon complicated algorithm that is used along with templating.
Yes. The compiler is what writes a program that implements the behavior you've described with your C or C++ code. Different compilers (or even the same compiler, given different options) can come up with vastly different programs that implement the same behavior.
Remember, your CPU does not execute C or C++ code. It only executes machine code. There is no defined standard for how the former gets transformed into the latter.
It may depend on the compiler, compiler version, compiler optimization settings, C++ language version used when compiling, the linker used, linker optimization options and much more. So in short, the answer to your question is Yes.
So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.
Does it allow the compiler to inline it, knowing that only functions in the same class can access it? Or is it only for the programmer's convenience?
The compiler can (but is not required to) optimize as you suggest, but that's not the point. The point of access modifiers is to catch certain classes (no pun intended) of programming errors at compile time. Private functions are functions that, if someone called them from outside the class, that would be a bug, and you want to know about it as early as possible.
(Any time you ask the question "could the compiler make optimizations based on this information available to it", the answer is "yes, unless there's a specific rule in the standard that says it's not allowed to" (such as the rules for volatile, whose entire purpose is to inhibit optimizations). However, compilers do not necessarily bother optimizing based on any given piece of information. There is, after all, no requirement for compilers to do any optimization in the first place! How clever your compiler is, nowadays, largely depends on how long you are willing to let it run; MSVC's whole-program PGO mode is capable of inlining through virtual method dispatch -- it guesses the most likely target, and falls back to a regular virtual call at runtime if the guess was wrong -- but slows down compiles by at least a factor of two.)
The access specifiers are a part of C++ mechanism to implement OOP principles of Encapsulation and Abstraction and not optimization for compilers.
Some intelligent compiler can possibly implement some optimization through it but it not enforced to do so by the C++ Standard. The purpose of access specifiers is not Optimization but to facilitate language constructs for principles supported by the C++ language.
Code duplication is usually bad and often quite easy to spot. I suppose that compilers could detect it automatically in easiest cases - they already parse the text and get the intermediate representation that they analyze in various ways - detect suspicious patterns like uninitialized variables, optimize emitted code, etc. I guess they could often detect functionally duplicate code this way as well and account for it while emitting machine code.
Are there C++ compilers that can detect duplicate code and only emit corresponding machine code once instead of for each duplicate in the source text?
Some do, some don't.
From the LLVM optimization's page: -mergefunc (MergeFunctions pass, how it works)
The functions are separated in small blocks in the LLVM Intermediate Representation, this optimization pass tries to merge similar blocks. It's not guaranteed to succeed though.
You'll find plenty of other optimizations on this page, even though some of them may appear cryptic at first glance.
I would add a note though, that duplicate code isn't so bad for the compiler / executable, it's bad from a maintenance point of view, and there is nothing a compiler can do about it.
I think the question makes the false assumption that compilers would always want to eliminate code duplication. code duplication is bad for readability/maintainability of source code not necesarily performance of compiled code, indeed one could consider loop unrolling as a compiler adding duplicate code to increase speed. compiled code does not need to follow the same principles as source code and generally doesn't as it is for the machine not for humans to read.
generally compilers are busy compiling not transforming source code, of course IDEs may allow both.
From my knowledge, the code elimination does not usually happen across the functions. So if you write some duplicate piece of code in two different functions there are very less chances(close to none) that piece of code will be eliminated.
There are some optimizations like return value optimization, function inlining which can happen across functions. However most of the optimization is done within the function itself.This is not usually done at the higher language level, by this i mean that the compiler wont look at the C++ code and start optimizing it. Compilers mostly have an intermediary representation, between high level language(C++) and machine language. This intermediary representation(IR) is somewhat similar to machine language but is not exactly the machine language of the system on which code is compiled. Refer to the wiki page http://en.wikipedia.org/wiki/Compiler_optimization, it lists some of those optimizations
Visual C++ does this if you specify 'minimize code size' (/O1). The function provided is described in the docs for /Og, which is deprecated in favour of simpler catch-all options to favor size or favor speed (/O2).
When can optimizations done by the compiler cause my C++ code to exhibit wrong behaviour which would not be present had those optimizations not been performed? For example, not using volatile in certain circumstances can cause the program to behave incorrectly (e.g. not re-reading the value of a variable from memory and instead only reads it once and stores it in register). But are there other pitfalls which one should know about before turning on the most aggressive optimization flag and afterwards wondering why the program doesn't work anymore?
Compiler optimizations should not affect the observable behavior of your program, so in theory, you don't need to worry. In practice, if your program strays in to undefined behavior, anything could already happen, so if your program breaks when you enable optimizations, you've merely exposed existing bugs - it wasn't optimization that broke it.
One common optimization point is the return value optimisation (RVO) and named return value optimization (NRVO) which basically means objects returned by value from functions get constructed directly in the object which is receiving them, rather than making a copy. This adjusts the order and number of constructor, copy constructor and destructor calls - but usually with those functions correctly written, there's still no observable difference in the behavior.
Besides the case you mentioned, timing can change in multi-threaded code such that what appears to be working no longer does. Placement of local variables can vary such that harmful behaviour like a memory buffer overrun occurs in debug but not release, optimized or non-optimized, or vice versa. But all of these are bugs that were there already, just exposed by compiler option changes.
This is assuming the compiler has no bugs in its optimizer.
I've only run into it with floating point math. Sometimes the optimizations for speed can change the answer a little. Of course with floating point math, the definition of "right" is not always easy to come up with so you have to run some tests and see if the optimizations are doing what you're expecting. The optimizations don't necessarily make the result wrong, just different.
Other than that, I've never seen any optimizations break correct code. Compiler writers are pretty smart and know what they're doing.
Bugs caused by compiler optimizations that are not rooted in bugs in your code are not predictable and hard to determine (I managed to find one once when examining the assembly code a compiler had created when optimizing a certain area in my code once). The common case is that if an optimization makes your program unstable, it just reveals a flaw in your program.
Just don't work from the assumption that the optimizer ever destroys your code. It's just not what it was made to do. If you do observe problems then automatically consider unintentional UB.
Yes, threading can play havoc with the kind of assumptions you are used to. You get no help from either the language or the compiler, although that's changing. What you do about that is not piss around with volatile, you use a good threading library. And you use one of its synchronization primitives wherever two or more threads can both touch variables. Trying to take short-cuts or optimizing this yourself is a one-way ticket into threading hell.
Failing to include the volatile keyword when declaring access to a volatile memory location or IO device is a bug in your code; even if the bug is only evident when your code gets optimized.
Your compiler will document any "unsafe" optimizations where it documents the command-line switches and pragmas that turn them on and off. Unsafe optimizations usually related to assumptions about floating point math (rounding, edge cases like NAN) or aliasing as others have already mentioned.
Constant folding can create aliasing making bugs in your code appear. So, for example, if you have code like:
static char *caBuffer = " ";
...
strcpy(caBuffer,...)
Your code is basically an error where you scribble over a constant (literal). Without constant folding, the error won't really effect anything. But much like the volatile bug you mentioned, when your compiler folds constants to save space, you might scribble over another literal like the spaces in:
printf("%s%s%s",cpName," ",cpDescription);
because the compiler might point the literal argument to the printf call at the last 4 characters of the literal used to initialize caBuffer.
As long as your code does not rely on specific manifestations of undefined/unspecified behavior, and as long as the functionality of your code is defined in terms of observable behavior of C++ program, a C++ compiler optimizations cannot possibly destroy the functionality of your code with only one exception:
When a temporary object is created with the only purpose of being immediately copied and destroyed, the compiler is allowed to eliminate the creation of such temporary object even if the constructor/destructor of the object has side-effects affecting the observable behavior of the program.
In the newer versions of C++ standard that permission is extended to cover named object in so called Named Return Value Optimization (NRVO).
That's the only way the optimizations can destroy the functionality of conforming C++ code. If your code suffers from optimizations in any other way, it is either a bug in your code or a bug in the compiler.
One can argue though, that relying on this behavior is actually nothing else than relying on a specific manifestation of unspecified behavior. This is a valid argument, which can be used to support the assertion that under the above conditions optimizations can never break the functionality of the program.
Your original example with volatile is not a valid example. You are basically accusing the compiler of breaking guarantees that never existed in the first place. If your question should be interpreted in that specific way (i.e. what random fake non-existent imaginary guarantees can optimizer possibly break), then the number of possible answers is virtually infinite. The question simply wouldn't make much sense.
Strict aliasing is an issue you might run into with gcc. From what I understand, with certain versions of gcc (gcc 4.4) it gets automatically enabled with optimizations. This site http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html does a very good job at explaining strict aliasing rules.
I just recently saw that (in C++0x) the compiler is allowed to assume that certain classes of loops will always terminate (to allow optimizations). I can't find the reference right now but I'll try to link it if I can track it down. This can cause observable program changes.
At a meta level, if your code uses relies on behavior that is based on undefined aspects of the C++ standard, a standards conforming compiler is free to destroy your C++ code (as you put it). If you don't have a standards conforming compiler, then it can also do non-standard things, like destroy your code anyway.
Most compilers publish what subset of the C++ standard they conform to, so you can always write your code to that particular standard and mostly assume you are safe. However, you can't really guard against bugs in the compiler without having encountered them in the first place, so you still aren't really guaranteed anything.
I do not have the exact details (maybe someone else can chime in), but I have heard tell of a bug caused by loop unrolling/optimization if the loop counter variable is of char/uint8_t type(in a gcc context i.e.).