Can C++ compilers automatically eliminate duplicate code? - c++

Code duplication is usually bad and often quite easy to spot. I suppose that compilers could detect it automatically in easiest cases - they already parse the text and get the intermediate representation that they analyze in various ways - detect suspicious patterns like uninitialized variables, optimize emitted code, etc. I guess they could often detect functionally duplicate code this way as well and account for it while emitting machine code.
Are there C++ compilers that can detect duplicate code and only emit corresponding machine code once instead of for each duplicate in the source text?

Some do, some don't.
From the LLVM optimization's page: -mergefunc (MergeFunctions pass, how it works)
The functions are separated in small blocks in the LLVM Intermediate Representation, this optimization pass tries to merge similar blocks. It's not guaranteed to succeed though.
You'll find plenty of other optimizations on this page, even though some of them may appear cryptic at first glance.
I would add a note though, that duplicate code isn't so bad for the compiler / executable, it's bad from a maintenance point of view, and there is nothing a compiler can do about it.

I think the question makes the false assumption that compilers would always want to eliminate code duplication. code duplication is bad for readability/maintainability of source code not necesarily performance of compiled code, indeed one could consider loop unrolling as a compiler adding duplicate code to increase speed. compiled code does not need to follow the same principles as source code and generally doesn't as it is for the machine not for humans to read.
generally compilers are busy compiling not transforming source code, of course IDEs may allow both.

From my knowledge, the code elimination does not usually happen across the functions. So if you write some duplicate piece of code in two different functions there are very less chances(close to none) that piece of code will be eliminated.
There are some optimizations like return value optimization, function inlining which can happen across functions. However most of the optimization is done within the function itself.This is not usually done at the higher language level, by this i mean that the compiler wont look at the C++ code and start optimizing it. Compilers mostly have an intermediary representation, between high level language(C++) and machine language. This intermediary representation(IR) is somewhat similar to machine language but is not exactly the machine language of the system on which code is compiled. Refer to the wiki page http://en.wikipedia.org/wiki/Compiler_optimization, it lists some of those optimizations

Visual C++ does this if you specify 'minimize code size' (/O1). The function provided is described in the docs for /Og, which is deprecated in favour of simpler catch-all options to favor size or favor speed (/O2).

Related

What does an optimizer optimize c++ or assembly

Do optimizers (generally speaking here) take my c/c++ code and write better c/c++ code or do they translate it straight into assembly and then optimize that. Or is it a combo?
EDIT:
I am using gcc (but I would like to know what others do also)
Optimizers can be at different levels, but usually they won't generate new readable code (although sometimes this happens with other languages, like JavaScript for example.)
GCC generates an intermediate representation:
http://www.tldp.org/HOWTO/GCC-Frontend-HOWTO-4.html
Optimizations are then applied to this structure. See more here, for example:
https://gcc.gnu.org/onlinedocs/gccint/Tree-SSA.html
From there, the backend translates it to final machine code (although I believe this part also involves optimizations, as well.)
Do optimizers ...
Well, optimizers (or better optimization strategies) come with particular compiler implementations.
There's no general answer for your question
and write better c/c++ code or do they translate it straight into assembly
No, their job is to optimize the backend code, which might be target assembly or whatever intermediate machine code. Thus there's no intermediate optimized c++ code to be expected.
Optimizer don't rewrite c/c++ code.
The compiler does a lexical analysis, and then makes a semantic analysis using some kind of internal graph representation of your code. The optimizer first works on this internal representation to identify and optimize the flow of execution (for example constant propagation).
Once the code generation can start, the optimizer intervenes again, to make macine dependent optimization (register allocation, special instruction sets such as intel's MMX, etc...)
Only at the end does it generate assembler code.

Can merely using (stable) third party library render my code not working

Say I have C++ project which has been working for years well.
Say also this project might (need to verify) contain undefined behaviour.
So maybe compiler was kind to us and doesn't make program misbehave even though there is UB.
Now imagine I want to add some features to the project. e.g. add Crypto ++ library to it.
But the actual code I add to it say from Crypto++ is legitimate.
Here I read:
Your code, if part of a larger project, could conditionally call some
3rd party code (say, a shell extension that previews an image type in
a file open dialog) that changes the state of some flags (floating
point precision, locale, integer overflow flags, division by zero
behavior, etc). Your code, which worked fine before, now exhibits
completely different behavior.
But I can't gauge exactly what author means. Does he say even by adding say Crypto ++ library to my project, despite the code from Crypto++ I add is legitimate, my project can suddenly start working incorrectly?
Is this realistic?
Any links which can confirm this?
It is hard for me to explain to people involved that just adding library might increase risks. Maybe someone can help me formulate how to explain this?
When source code invokes undefined behaviour, it means that the standard gives no guarantee on what could happen. It can work perfectly in one compilation run, but simply compiling it again with a newer version of the compiler or of a library could make it break. Or changing the optimisation level on the compiler can have same effect.
A common example for that is reading one element past end of an array. Suppose you expect it to be null and by chance next memory location contains a 0 on normal conditions (say it is an error flag). It will work without problem. But suppose now that on another compilation run after changing something totally unrelated, the memory organization is slightly changed and next memory location after the array is no longer that flag (that kept a constant value) but a variable taking other values. You program will break and will be hard to debug, because if that variable is used as a pointer, you could overwrite memory on random places.
TL/DR: If one version works but you suspect UB in it, the only correct way is to consistently remove all possible UB from the code before any change. Alternatively, you can keep the working version untouched, but beware, you could have to change it later...
Over the years, C has mutated into a weird hybrid of a low-level language and a high-level language, where code provides a low-level description of a way of performing a task, and modern compilers then try to convert that into a high-level description of what the task is and then implement efficient code to perform that task (possibly in a way very different from what was specified). In order to facilitate the translation from the low-level sequence of steps into the higher-level description of the operations being performed, the compiler needs to make certain assumptions about the conditions under which those low-level steps will be performed. If those assumptions do not hold, the compiler may generate code which malfunctions in very weird and bizarre ways.
Complicating the situation is the fact that there are many common programming constructs which might be legal if certain parts of the rules were a little better thought-out, but which as the rules are written would authorize compilers to do anything they want. Identifying all the places where code does things which arguably should be legal, and which have historically worked correctly 99.999% of the time, but might break for arbitrary reasons can be very difficult.
Thus, one may wish for the addition of a new library not to break anything, and most of the time one's wish might come true, but unfortunately it's very difficult to know whether any code may have lurking time bombs within it.

Are the Optimization Keywords in C and C++ Reasonable?

So we've all heard the don't-use-register line, the reasoning being that trying to out-optimize a compiler is a fool's errand.
register, from what I know, doesn't actually state anything about CPU registers, just that a given variable can't be referenced indirectly. I'll hazard a guess that it's often referred to as obsolete because compilers can detect a lack of addressing automatically thus making such optimizations transparent.
But if we're firm on that argument, can't it be levelled at every optimization-driven keyword in C? Why do we use inline and C99's restrict for example?
I suppose that some things like aliasing make deducing some optimizations hard or even impossible, so where is the line drawn before we start venturing into Sufficiently Smart Compiler territory?
Where should the line should be drawn in C and C++ between spoon-feeding a compiler optimization information and assuming it knows what it's doing?
EDIT: Jens Gustedt pointed out that my conflating of C and C++ isn't right since two of the keywords have semantic differences and one doesn't exist in standard C++. I had a good link about register in C++ which I'll add if I find it...
I would agree that register and inline are somewhat similar in this respect. If the compiler can see the body of the callee while compiling a call site, it should be able to make a good decision on inlining. The use of the inline keyword in both C and C++ has more to do with the mechanics of making the body of the function visible than with anything else.
restrict, however, is different. When compiling a function, the compiler has no idea of what the call sites are going to be. Being able to assume no aliasing can enable optimizations that would otherwise be impossible.
inline is used in the scenario where you implement a non-templated function within the header then include it from multiple compilation units.
This ensures that the compiler should create just one instance of the function as though it were inlined, so you do not get a link error for multiply defined symbol. It does not however require the compiler to actually inline it.
There are GNU flags I think force-inline or similar but that is a language extension.
register doesn't even say that you can't reference the
variable indirectly (at least in C++). It said that in the
original C, but that has been dropped.
Whether trying to out-optimize the compiler is a fool's errand
depends on the optimization. Not many compilers, for example,
will convert sin(x) * sin(x) + cos(x) * cos(x) into 1.
Today, most compilers ignore register, and no one uses it,
because compilers have become good enough at register allocation
to do a better job than you can with register. In fact,
respecting register would typically make the generated code
slower. This is not the case for inline or restrict: in
both cases, there exist techniques, at least theoretically,
which could result in the compiler doing a better job than you
can. Such techniques are not widespread, however, and (as far
as I know, at least), have a very high compile time overhead,
with in some cases compile times which grow exponentially with
the size of the program (which makes them more or less unusable
on most real programs—compile times which are measured in
years really aren't acceptable).
As to where to draw the line... it changes in time. When
I first started programming in C, register made a significant
difference, and was widely used. Today, no. I imagine that in
time, the same may happen with inline or restrict—some
experimental compilers are very close with inline already.
This is a flame-bait question but I will dive in anyway.
Compilers are a lot better at optimising that your average programmer. There was a time I programmed on a 25MHz 68030 and I got some advantage from the use of register because the compiler's optimizer was so poor. But that was back in 1990.
I see inline as just as bad as register.
In general, measure first before you modify. If you find that you code performs so poorly you want to use register or inline, take a deep breath, stand back and look for a better algorithm first.
In recent times (i.e. the last 5 years) I have gone through code bases and removed inline functions galore with no perceptible change in performance being visible. Code size, however, always benefits from the removal of inline methods. That isn't a big issue for your standard x86-style monster multicore marvel of the modern age but it does matter if you work in the embedded space.
It is a moving target, because compiler technology is improving. (Well, sometimes it is more changing than improving, but that has some of the same effect of rendering your optimization attempts moot, or worse.)
Generally, you should not guess at whether an optimization keyword or other optimization technique is good or not. One has to learn quite a bit about how computers work, including the particular platform you are targeting, and how compilers work.
So a rule about using various optimization techniques is to ask do I know the compiler will not do the best job here? Am I willing to commit to that for a while—will the compiler remain stable while this code is in use, am I willing to rewrite the code when the compiler changes this situation? Typically, you have to be an experienced and knowledgeable software engineer to know when you can do better than the compiler. It also helps if you can talk to the compiler developers.
This means people cannot give you an answer here that has a definite guideline. It depends on what compiler you are using, what your project is, what your resources are, and what your goals are, and so on.
Although some people say not to try to out-optimize the compiler, there are various areas of software engineering where people do better than a compiler and in which it is worth the expense of paying people for this.
The difference is as follows:
register is very local optimization (i.e. inside one function). The register allocation is a relatively solved problem both by smarter compilers and by larger number of register (mostly the former but say x86-64 have more registers then x86 and both have larger number then say 8-bit processor)
inline is harder as it is inter-procedure optimization. However as it involves relatively small depth of recursion and small number of procedures (if inlined procedure is too big there is no sense of inlining it) it may be safely left to the compiler.
restrict is much harder. To fully know the that two pointers don't alias you would need to analyse whole program (including libraries, system, plug-ins etc.) - and even then run into problems. However the information is clearer for programmer AND it is part of specification.
Consider very simple code:
void my_memcpy(void *dst, const void *src, size_t size) {
for (size_t i = 0; i < size; i++) {
((char *)dst)[i] = ((const char *)str)[i];
}
}
Is there a benefit to making this code efficient? Yes - memcpy tend to be very useful (say for copying GC). Can this code be vectorized (here - moved by words - say 128b instead of 8b)? Compiler would have to deduce that dst and src does not alias in any way and regions pointed by them are independent. size may depend on user input or runtime behaviour or other elements which makes the analysis practically impossible - similar problems to Halting Problem - in general we cannot analyse everything without running it. Or it might be part of C library (I assume shared libraries) and is called by program hence all call sites are not even known at compile time. Without such analysis the program would exhibit different behaviour with optimization on. On the other hand programmer might ensure that they are different objects simply by knowing the (even higher-level) design instead of need for bottom-up analysis.
restrict can also be part of documentation as it might be programmer who wrote the procedure in a way that it cannot handle 2 aliasing pointers. For example if we want to copy memory from aliasing locations the above code is incorrect.
So to sum up - Sufficiently Smart Compiler would not be able to deduce the restrict (unless we move to compilers understending the meaning of code) without knowing the whole program. Even then the it would be close to undecidability. However for local optimization the compilers are already sufficiently smart. My guess it that Sufficiently Smart Compiler with whole program analysis would be able to deduce in many interesting cases however.
PS. By local I mean single function. So local optimization cannot assume anything about arguments, global variables etc.
One thing that hasn't been mentioned is that many non-x86 compilers aren't nearly as good at optimizing as gcc and other "modern" C-compilers are.
For instance, the compilers for PIC are absolutely terrible at optimizing. Also, the optimizer for cicc (the CUDA compiler), though much better, still seems to miss a lot of fairly simple optimizations.
For these cases, I've found optimization hints like register, inline, and #pragma unroll to be extremely useful.
From what I have seen back in the days I was more involved with C/C++, these are merely orders directly given to the compiler. Compiler may try to inline a function even if it is not given the direct order to do so. That really depends on the compiler and may even raise some cross-compiler issues. As an example, visual studio provides different levels of optimization which correspond to the different intelligence levels of the compiler. I have read that all class functions are implicitly inline to give compiler a hint to minimize function call overhead. In any case, these directives are extremely helpful when you are using a less intelligent compiler while in intelligent cases, they may be very obvious for the compiler to do some optimization.
Also, be sure that these keywords are guaranteed to be safe. Some compiler optimizations may not work with some libraries such as OpenGL (as I have seen it myself). So in cases where you feel that compiler optimization may be harmful, you can use these keywords to make sure it is done the way you want it to.
The compilers such as g++ these days optimize the code very well. You might as well search for optimization elsewhere, maybe in the methods and algorithm you use or by using TBB or CUDA to make your code parallel.

Get optimized source code from GCC

I have a task to create optimized C++ source code and give it to friend for compilation. It means, that I do not control the final compilation, I just write the source code of C++ program.
I know, that a can make optimization during compilation with -O1 (and -O2 and others) options of GCC. But how can I get this optimized source code instead of compiled program? I am not able to configure parameters of my friend's compiler, that is why I need to make a good source on my side.
The optimizations performed by GCC are low level, that means you won't get C++ code again but assembly code in best case. But you won't be able to convert it or something.
In sum: Optimize the source code on code level, not on object level.
You could ask GCC to dump its internal (Gimple, ...) representations, at various "stages". The middle-end of GCC is made of hundreds of passes, and you could ask GCC to dump them, with arguments like -fdump-tree-all or -fdump-gimple-all; beware that you can get hundreds of dump files for a single compilation!
However, GCC internal representations are quite low level, and you should not expect to understand them without reading a lot of material.
The dump options I am mentionning are mostly useful to those working inside GCC, or extending it thru plugins coded in C or extensions coded in MELT (a high-level domain specific language to extend GCC). I am not sure they will be very useful to your friend. However, they can be useful to make you understand that optimization passes do a lot of complex processing.
And don't forget that premature optimization is evil : you should first make your program run correctly, then benchmark and profile it, at last optimize the few parts worth of your efforts. You probably won't be able to write correct & efficient programs without testing and running them yourself, before giving them to your friend.
Easy - choose the best algorithm possible, let the rest be handled by the optimizer.
Optimizing the source code is different than optimizing the binary. You optimize the source code, the compiler will optimize the binary.
For anything more than algorithm choice, you'll need to do some profiling. Sure, there are practices that can speed up code speed, but some make the code less readable. Only optimize when you have to, and after you measure.

Runtime optimization of static languages: JIT for C++?

Is anyone using JIT tricks to improve the runtime performance of statically compiled languages such as C++? It seems like hotspot analysis and branch prediction based on observations made during runtime could improve the performance of any code, but maybe there's some fundamental strategic reason why making such observations and implementing changes during runtime are only possible in virtual machines. I distinctly recall overhearing C++ compiler writers mutter "you can do that for programs written in C++ too" while listening to dynamic language enthusiasts talk about collecting statistics and rearranging code, but my web searches for evidence to support this memory have come up dry.
Profile guided optimization is different than runtime optimization. The optimization is still done offline, based on profiling information, but once the binary is shipped there is no ongoing optimization, so if the usage patterns of the profile-guided optimization phase don't accurately reflect real-world usage then the results will be imperfect, and the program also won't adapt to different usage patterns.
You may be interesting in looking for information on HP's Dynamo, although that system focused on native binary -> native binary translation, although since C++ is almost exclusively compiled to native code I suppose that's exactly what you are looking for.
You may also want to take a look at LLVM, which is a compiler framework and intermediate representation that supports JIT compilation and runtime optimization, although I'm not sure if there are actually any LLVM-based runtimes that can compile C++ and execute + runtime optimize it yet.
I did that kind of optimization quite a lot in the last years. It was for a graphic rendering API that I've implemented. Since the API defined several thousand different drawing modes as general purpose function was way to slow.
I ended up writing my own little Jit-compiler for a domain specific language (very close to asm, but with some high level control structures and local variables thrown in).
The performance improvement I got was between factor 10 and 60 (depended on the complexity of the compiled code), so the extra work paid off big time.
On the PC I would not start to write my own jit-compiler but use either LIBJIT or LLVM for the jit-compilation. It wasn't possible in my case due to the fact that I was working on a non mainstream embedded processor that is not supported by LIBJIT/LLVM, so I had to invent my own.
The answer is more likely: no one did more than PGO for C++ because the benefits are likely unnoticeable.
Let me elaborate: JIT engines/runtimes have both blesses and drawbacks from their developer's view: they have more information at runtime but much little time to analyze.
Some optimizations are really expensive and you will unlikely see without a huge impact on start time are those one like: loop unrolling, auto-vectorization (which in most cases is also based on loop unrolling), instruction selection (to use SSE4.1 for CPU that use SSE4.1) combined with instruction scheduling and reordering (to use better super-scalar CPUs). This kind of optimizations combine great with C like code (that is accessible from C++).
The single full-blown compiler architecture to do advanced compilation (as far as I know) is the Java Hotspot compilation and architectures with similar principles using tiered compilation (Java Azul's systems, the popular to the day JaegerMonkey JS engine).
But one of the biggest optimization on runtime is the following:
Polymorphic inline caching (meaning that if you run the first loop with some types, the second time, the code of the loop will be specialized types that were from previous loop, and the JIT will put a guard and will put as default branch the inlined types, and based on it, from this specialized form using a SSA-form engine based will apply constant folding/propagation, inlining, dead-code-elimination optimizations, and depends of how "advanced" the JIT is, will do an improved or less improved CPU register assignment.)
As you may notice, the JIT (hotspots) will improve mostly the branchy code, and with runtime information will get better than a C++ code, but a static compiler, having at it's side the time to do analysis, instruction reordering, for simple loops, will likely get a little better performance. Also, typically, the C++ code, areas that need to be fast tends to not be OOP, so the information of the JIT optimizations will not bring such an amazing improvement.
Another advantage of JITs is that JIT works cross assemblies, so it has more information if it wants to do inlining.
Let me elaborate: let's say that you have a base class A and you have just one implementation of it namely B in another package/assembly/gem/etc. and is loaded dynamically.
The JIT as it see that B is the only implementation of A, it can replace everywhere in it's internal representation the A calls with B codes, and the method calls will not do a dispatch (look on vtable) but will be direct calls. Those direct calls may be inlined also. For example this B have a method: getLength() which returns 2, all calls of getLength() may be reduced to constant 2 all over. At the end a C++ code will not be able to skip the virtual call of B from another dll.
Some implementations of C++ do not support to optimize over more .cpp files (even today there is the -lto flag in recent versions of GCC that makes this possible). But if you are a C++ developer, concerned about speed, you will likely put the all sensitive classes in the same static library or even in the same file, so the compiler can inline it nicely, making the extra information that JIT have it by design, to be provided by developer itself, so no performance loss.
visual studio has an option for doing runtime profiling that then can be used for optimization of code.
"Profile Guided Optimization"
Microsoft Visual Studio calls this "profile guided optimization"; you can learn more about it at MSDN. Basically, you run the program a bunch of times with a profiler attached to record its hotspots and other performance characteristics, and then you can feed the profiler's output into the compiler to get appropriate optimizations.
I believe LLVM attempts to do some of this. It attempts to optimize across the whole lifetime of the program (compile-time, link-time, and run-time).
Reasonable question - but with a doubtful premise.
As in Nils' answer, sometimes "optimization" means "low-level optimization", which is a nice subject in its own right.
However, it is based on the concept of a "hot-spot", which has nowhere near the relevance it is commonly given.
Definition: a hot-spot is a small region of code where a process's program counter spends a large percentage of its time.
If there is a hot-spot, such as a tight inner loop occupying a lot of time, it is worth trying to optimize at the low level, if it is in code that you control (i.e. not in a third-party library).
Now suppose that inner loop contains a call to a function, any function. Now the program counter is not likely to be found there, because it is more likely to be in the function. So while the code may be wasteful, it is no longer a hot-spot.
There are many common ways to make software slow, of which hot-spots are one. However, in my experience, that is the only one of which most programmers are aware, and the only one to which low-level optimization applies.
See this.