-fno-inline and compilation time - c++

I'm working on big project most files are longer than 7000 lines. If I'm using -fno-inline option compilation time going down 3 times.
Actual numbers:
w/o -fno-inline - 340 sec
w/ -fno-inline ~ 115 sec
I didn't find anything about -fno-inline impact on compilation performance. Is there any explanation to this ?
Some background:
I use MACROSes pretty extensively (for logging purposes)
There is one global Exception try / catch block inherited from old code (need to rework this piece)
There are few try/catch blocks inside, mostly to catch exceptions from stof/stoi
I tested compilation time with and w/o (-pipe, -O0 to -O3, -g / no -g, -ggdb / no ggdb). Nothing brings compilation time so down as -fno-inline.

I'm working on big project most files are longer than 7000 lines.
That is a bit quite big. You might (I am not sure) win a bit of compilation time by avoiding files bigger than 5KLOC (by splitting large C++ files, bigger than 8KLOC, in several ones), and by compiling in parallel several translation units at the same time (using make -j or ninja). This requires some refactoring work. On the other hand, with genuine C++, don't have too small files (because standard container headers like <vector> could include many thousands lines; you might also consider having a precompiled header). Pragmatically 3KLOC to 7KLOC per C++ source file is a nice trade-off in practice.
Use the -ftime-report option to g++ to get a detailed timing of each compilation phase (or passes). You may need to understand the internals of GCC to decipher the obtained table.
I didn't find anything about -fno-inline impact on compilation performance. Is there any explanation to this ?
Inline expansion happens several times inside GCC. It generally works on some GIMPLE or SSA internal representation. Of course, inlining is improving the runtime performance of your program. By disabling it, you could lose 50% of speed in your executable (and perhaps even more, since inline member functions such as getters and setters are extensively used in C++, notably in standard container templates).
FWIW, my old GCC MELT web pages (GCC MELT is now a dead project) have several slides and reference explaining GCC internals, and I am just writing right now (october 2018) the draft of a technical report on bismon (funded now by the CHARIOT H2020 project); that draft happens to have a section §1.3.2 explaining some interesting GCC optimizations.
See also CppCon 2017 talk: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

Related

Optimized Execution Time

Because of a school assignment I have to convert a C++ code to assembly(ARMv8). Then I have to compile the C++ code using GCC's -O0,-O1,-O2,-O3 and -Os optimizations, write down the time and compare with the execute time of my assembly code. As, I think I know -O3 have to be faster than -O1 and -O2. However, I get that -O2 is the fastest, then are -O1,-O3,-Os,-O0. Is that usual? (Calculated times are about 30 seconds).
Notice that GCC has many other optimization flags.
There is no guarantee that -O3 gives faster code than -O2; a compiler can apply more optimization passes, but they are all heuristics and might be unsuccessful (or even slow down slightly your particular code). Hence it does happen that -O3 gives some slightly slower code than -O2 (on some particular input source code).
You could try a more recent version of GCC (the latest -in November 2017- is GCC 7, GCC 8 will go out in few months). You could also try some better -march= or -mtune= option.
At last, with your GCC plugin, you might add your own optimization pass, or change the order (and the set) of applied optimization passes (there are several hundreds different optimization passes in GCC). But you'll need a lot of work (perhaps a year or two) to be able to extend GCC.
You could tune optimization parameters, and some project (MILEPOST) has even used machine learning techniques to improve them.
See also slides and references on my (old) GCC MELT documentation.
Yes, it is usual. Take the -Ox optimization as guide-lines. In average, they produce optimization that is advertise, but a lot depends on the style in which the code is written, memory layout, as well as the compiler itself.
Sometimes, you need to try and fail many times before getting the optimal code.
-O2 indeed gives the best optimization in most of the cases.

Can a large number of warnings increase compilation time?

I have heard from someone that big projects with a large number of warnings in code builds significantly slower than ones with a small amount of warnings (of course, it is intended that the compiler's set to have high level warning-sensitivity).
Is there any reasonable explanation, or maybe anyone can share their experience about this topic?
On GCC compiler (e.g. gcc for C or g++ for C++) warnings do take a small amount of CPU time. Use e.g. gcc -ftime-report if you want a detailed report of compiler timing. Warning diagnostics do depend upon the optimization level.
But optimizations (especially at high level, like -O2 or more) take much more time than warnings. Empirically, optimized compilation time is proportional to the compilation unit size and to the square of the size (e.g. in number of Gimple instructions, or in lines of C code) of the biggest function. So if you have huge functions (e.g. some function of ten thousand lines in some generated C code) you may want to split them into smaller pieces.
In the early days of MELT (a GCC plugin and GCC experimental branch -GPLv3+ licensed- implementing a DSL to extend GCC, that I have developed and am still working on), it generated huge initialization functions in C (today it is less the case, the initialization is split in many C++ functions; see e.g. gcc/melt/generated/warmelt-base.cc from the MELT branch of GCC as an example). At that time, I plotted the compilation -O2 time versus the length of that initialization function and measured the compilation time vs its length. You could also experiment with manydl.c code. Again, the square of biggest function length is an experimental measure, but might be explained by register allocation issues. Also, J.Pitrat also observed that huge generated C functions -by his interesting CAIA system- are exhausting the compiler.
Also, warnings are output, and sometimes, the IDE or the terminal reading the compiler output may be slowed down if you have a lot of warnings.
Of course, as commented several times, compiler warnings are your friends (so always compile with e.g. gcc -Wall). So please improve your code to get no warnings at all. (In particular, initialize most of your local variables - I usually initialize all of them; since the compiler could optimize by removing some initializations if it can be proven that they are useless).
BTW, you could customize GCC with e.g. MELT to add your own customized warnings (e.g. to check some coding rules conformance).
Also, in C++ with weird templates, you could write a few dozens of lines which take many hours to be compiled (or even crash the compiler because of lack of memory, see this question).
NB. In 2019, GCC MELT is dead, its domain gcc-melt.org disappeared but the web pages are archived here.
It depends a lot on what the warnings actually are.
For an example, if there are lots of "variable is unused" warnings and "condition in 'if' is always true/false" warnings, then that may mean there's a lot of unnecessary code that the compiler has to parse and then remove during optimisation.
For other warnings there may be other consequences. For example, consider a "variable is self initialising" warning caused by something like int i = i;. I'd imagine this could add a whole pile of complications/overhead (where the compiler attempts to determine if the variable is "live" or can be optimised out).
This will likely depend extensively on the compiler, and how it is implemented.
That being said, there are two sure sources of slow-down:
Printing the warnings themselves is a non-trivial task, it requires extensive formatting, potentially accessing back the file, plus all those notes (macro expansion, template instantiation), and finally pushing that to an I/O device.
Emitting said warnings, with all those macro expansion and template instantiation data, might be non-trivial too. Furthermore, if collected first and only emitted at the end of the compilation process (instead of being streamed as they are produced), then the growing memory consumption will also slow you down (requiring more pages to be provided by the OS, ...)
In general, in terms of engineering, I do not expect compiler writers to worry much about the cost of emitting diagnostics; as long as it is a reasonable cost, there seems to be little incentive in optimizing a couple milli-seconds when a human intervention is going to be required anyway.
Who cares if your build takes 10% longer due to the compiler printing tons of warnings? The problem is the code you're being warned about, not the extra time it takes. Besides, 10% is probably a huge overestimate for the overhead of printing even very large numbers of warnings.

Are there conclusive studies/experiments on C compilation using a C++ compiler?

I've seen a lot of arguments over the general performance of C code compiled with a C++ compiler -- I'm curious as to whether there are any solid experimental studies buried beneath all the anecdotal flame wars you find in web searches. I'm particularly interested in the GCC suite, but any data points would be interesting. (Comparing the assembly of "Hello, World!" is not as robust as I'd like. :-)
I'm generally assuming you use the "embedded style" flags -- no exceptions or RTTI. I also wouldn't mind knowing if there are studies on the compilation time itself. TIA!
Adding a datapoint (or at least an anecdote):
We were recently writing a math library for a small embedded-like target, and started writing it in C. About halfway through the project, we switched some of the files to C++, largely in order to use templates for some of the functions where we'd otherwise be writing many nearly-identical pieces of code (or else embedding 40-line functions in preprocessor macros).
At the point where we started switching over, we had a very careful look at the generated assembly code (using GCC) on a number of the functions, and confirmed that it was in fact essentially identical whether the file was compiled as C or C++ -- where by "essentially identical" I mean the differences were in things like symbol names and the stuff at the beginning and end of the assembly file; the actual instructions in the middle of the functions were exactly identical.
Sorry that I don't have a more solid answer.
Edit to add, 2013-03-24: Recently I came across an article where Rusty Russell compared performance on GCC compiled with a C compiler and compiled with a C++ compiler, in response to the recent switch to compiling GCC as C++: http://rusty.ozlabs.org/?p=330. The conclusions are interesting: The version compiled with a C++ compiler was very slightly slower; the difference was about 0.3%. However, that was entirely explained by load time differences caused by larger debug info; when he stripped the binaries and removed the debug info, the differences were less than 0.1% -- i.e., essentially indistinguishable from measurement noise.
I don't know of any studies off-hand, but given the C++ philosophy that you don't pay the price for features you don't use, I doubt there'd be any significant difference between compiling C code with the C compiler and with the C++ compiler.
I don't know of any studies and I doubt that anyone will spend the time to do them. Basically, when compiling with a C++ compiler, the code has the same semantic as when compiling with a C compiler, so it's down to optimization and code generation. But IMO these are much too much compiler-specifc in order to allow any general statements about C vs. C++.
What you mainly gain when you compile C code with a C++ compiler is a much stricter checking (function declarations etc.). IMO this would make compiling C code with a C++ compiler quite attractive. But note that, if you have a large C code base that's never run through a C++ compiler, you're likely facing a very steep up-hill battle until the code compiles clean enough to be able to see any meaningful warnings.
The GCC project is currently under a transition from C to C++ - that is, GCC may be implemented in C++ in the future, it is currently written in C. The next release of GCC will be written in the subset of C which is also valid C++.
Some performance tests were performed on g++ vs gcc, on GCC's codebase. They compared the "bootstrap" time, which means compiling gcc with the sysmem compiler, then compiling it with the resulting compiler, then repeating and checking the results are the same.
Summary: Using g++ was 20% slower. The compiler versions were slightly different, but it was thought that this wouldn't cause there 20% difference.
Note that this measures different programs, gcc vs g++, which although they mostly use the same code, have different front-ends.
I've not tried it from a performance standpoint, but I think compiling C applications with the C++ compiler is a good idea, as it will prevent you from doing "naughty" things such as using functions not declared.
However, the output won't be the same - at the very least, you'll get different symbols, which will render it (mostly) unlinkable with code from the C compiler.
So I think what you really mean is "Is it ok from a performance standpoint, to write C++ code which is very C-like and compile it with the C++ compiler" ?
You would also have to not be using some C99 things such as bool_t which C++ doesn't support on the grounds of having its own ones.
Don't do it if the code has not been designed for. The same valid language constructs can lead to different behavior if interpreted as C or as C++. You would potentially introduce very difficult to understand bugs. Less problematic but still a maintainability nightmare; some C constructs (especially from C99) are not valid in C++.
In the past I have done things like look at the size of the binary which for C++ was huge, that doesnt mean they simply linked in a bunch of unusued libraries. The easiest might be to use gcc -S myprog.cpp vs gcc -S myprog.c and diff the assembler output.

Verifying compiler optimizations in gcc/g++ by analyzing assembly listings

I just asked a question related to how the compiler optimizes certain C++ code, and I was looking around SO for any questions about how to verify that the compiler has performed certain optimizations. I was trying to look at the assembly listing generated with g++ (g++ -c -g -O2 -Wa,-ahl=file.s file.c) to possibly see what is going on under the hood, but the output is too cryptic to me. What techniques do people use to tackle this problem, and are there any good references on how to interpret the assembly listings of optimized code or articles specific to the GCC toolchain that talk about this problem?
GCC's optimization passes work on an intermediary representation of your code in a format called GIMPLE.
Using the -fdump-* family of options, you can ask GCC to output intermediary states of the tree.
For example, feed this to gcc -c -fdump-tree-all -O3
unsigned fib(unsigned n) {
if (n < 2) return n;
return fib(n - 2) + fib(n - 1);
}
and watch as it gradually transforms from simple exponential algorithm into a complex polynomial algorithm. (Really!)
A useful technique is to run the code under a good sampling profiler, e.g. Zoom under Linux or Instruments (with Time Profiler instrument) under Mac OS X. These profilers not only show you the hotspots in your code but also map source code to disassembled object code. Highlighting a source line shows the (not necessarily contiguous) lines of generated code that map to the source line (and vice versa). Online opcode references and optimization tips are a nice bonus.
Instruments: developer.apple.com
Zoom: www.rotateright.com
Not gcc, but when debugging in Visual Studio you have the option to intersperse assembly and source, which gives a good idea of what has been generated for what statement. But sometimes it's not quite aligned correctly.
The output of the gcc tool chain and objdump -dS isn't at the same granularity. This article on getting gcc to output source and assembly has the same options as you are using.
Adding the -L option (eg, gcc -L -ahl) may provide slightly more intelligible listings.
The equivalent MSVC option is /FAcs (and it's a little better because it intersperses the source, machine language, and binary, and includes some helpful comments).
About one third of my job consists of doing just what you're doing: juggling C code around and then looking at the assembly output to make sure it's been optimized correctly (which is preferred to just writing inline assembly all over the place).
Game-development blogs and articles can be a good resource for the topic since games are effectively real-time applications in constant memory -- I have some notes on it, so does Mike Acton, and others. I usually like to keep Intel's instruction set reference up in a window while going through listings.
The most helpful thing is to get a good ground-level understanding of assembly programming generally first -- not because you want to write assembly code, but because having done so makes reading disassembly much easier. I've had a hard time finding a good modern textbook though.
In order to output the optimizations applied you can use:
-fopt-info-optimized
To see those that have not been applied
-fopt-info-missed
Beware that the output is sent to standard error stream so to see it you actually have to redirect that : ( hint 2>&1 )
Here is nice example of :
g++ -O3 -std=c++11 -march=native -mtune=native
-fopt-info-optimized h2d.cpp -o h2d 2>&1
h2d.cpp:225:3: note: loop vectorized
h2d.cpp:213:3: note: loop vectorized
h2d.cpp:198:3: note: loop vectorized
h2d.cpp:186:3: note: loop vectorized
You can check the interleaved output, when having applied -g with objdump -dS|c++filt , but that will not get you that far.Enjoy!
Zoom from RotateRight ( http://rotateright.com ) is mentioned in another answer, but to expand on that: it shows you the mapping of source to assembly in what they call the "code browser". It's incredibly handy even if you're not an asm expert because they have also integrated assembly documentation into the app. And the assembly listing is annotated with comments and timing for several CPU types.
You can just open your object or executable file with Zoom and take a look at what the compiler has done with your code.
Victor, in your case the optimization you are looking for is just a smaller allocation of local memory on the stack. You should see a smaller allocation at function entry and a smaller deallocation at function exit if the space used by the empty class is optimized away.
As for the general question, I've been reading (and writing) assembly language for more than (gulp!) 30 years and all I can say is that it takes practice, especially to read the output of a compiler.
Instead of trying to read through an assembler dump, run your program inside a debugger. You can pause execution, single-step through instructions, set breakpoints on the code you want to check, etc. Many debuggers can display your original C code alongside the generated assembly so you can more easily see what the compiler did to optimize your code.
Also, if you are trying to test a specific compiler optimization you can create a short dummy function that contains the type of code that fits the optimization you are interested in (and not much else, the simpler it is the easier the assembly is to read). Compile the program once with optimizations on and once with them off; comparing the generated assembly code for the dummy function between builds should show you what the compiler's optimizers did.

When to use -O2 flag for gcc?

If I use "-O2" flag, the performance improves, but the compilation time gets longer.
How can I decide, whether to use it or not?
Maybe O2 makes the most difference in some certain types of code (e.g. math calculations?), and I should use it only for those parts of the project?
EDIT: I want to emphasize the fact that setting -O2 for all components of my project changes the total compilation time from 10 minutes to 30 minutes.
I would recommend using -O2 most of the time, benefits include:
Usually reduces size of generated code (unlike -O3).
More warnings (some warnings require analysis that is only done during optimization)
Often measurably improved performance (which may not matter).
If release-level code will have optimization enabled, it's best to have optimization enabled throughout the development/test cycle.
Source-level debugging is more difficult with optimizations enabled, occasionally it is helpful to disable optimization when debugging a problem.
I'm in bioinformatics so my advice may be biased. That said, I always use the -O3 switch (for release and test builds, that is; not usually for debugging). True, it has certain disadvantages, namely increasing compile-time and often the size of the executable.
However, the first factor can be partially mitigated by a good build strategy and other tricks reducing the overall build time. Also, since most of the compilation is really I/O bound, the increase of compile time is often not that pronounced.
The second disadvantage, the executable's size, often simply doesn't matter at all.
Never.
Use -O3 -Wall -Werror -std=[whatever your code base should follow]
Always, except when you're programming and just want to test something you just wrote.
We usually have our build environment set up so that we can build debug builds that use -O0 and release builds that use -O3 (the build enviroment preserves the objects and libraries of all configurations, so that one can switch easily between configurations). During development one mostly builds and runs the debug configuration for faster build speed (and more accurate debug information) and less frequently also builds and tests the release configuration.
Is the increased compilation time really noticable? I use -O2 all the time as the default, anything less just leaves a lot of "friction" in your code. Also note that the optimization levels of -O1, -O2 tends to be the best tested, as they are most interesting. -O0 tends to be more buggy, and you can debug pretty well at -O2 in my experience. Provided you have some idea about what a compiler can do in terms of code reordering, inlining, etc.
-Werror -Wall is necessary.