g++ Optimization Flags: -fuse-linker-plugin vs -fwhole-program - c++

I am reading:
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
It first suggests:
In combination with -flto using this option (-fwhole-program) should not be used. Instead relying on a linker plugin should provide safer and more precise information.
And then, it suggests:
If the program does not require any symbols to be exported, it is possible to combine -flto and -fwhole-program to allow the interprocedural optimizers to use more aggressive assumptions which may lead to improved optimization opportunities. Use of -fwhole-program is not needed when linker plugin is active (see -fuse-linker-plugin).
Does it mean that in theory, using -fuse-linker-plugin with -flto always gets a better optimized executable than using -fwhole-program with -flto?
I tried to use ld to link with -fuse-linker-plugin and -fwhole-program separately, and the executables' sizes at least are different.
P.S. I am using gcc 4.6.2, and ld 2.21.53.0.1 on CentOS 6.

UPDATE: See #PeterCordes comment below. Essentially, -fuse-linker-plugin is no longer necessary.
These differences are subtle. First, understand what -flto actually does. It essentially creates an output that can be optimized later (at "link-time").
What -fwhole-program does is assumes "that the current compilation unit represents the whole program being compiled" whether or not that is actually the case. Therefore, GCC will assume that it knows all of the places that call a particular function. As it says, it might use more aggressive inter-procedural optimizers. I'll explain that in a bit.
Lastly, what -fuse-linker-plugin does is actually perform the optimizations at link time that would normally be done as each compilation unit is performed. So, this one is designed to pair with -flto because -flto means save enough information to do optimizations later and -fuse-linker-plugin means actually do those optimizations.
So, where do they differ? Well, as GCC doc suggests, there is no advantage in principle of using -fwhole-program because that option assumes something that you then have to ensure is true. To break it, simply define a function in one .cpp file and use it in another. You will get a linker error.
Is there any advantage to -fwhole-program? Well, if you only have one compilation unit then you can use it, but honestly, it won't be any better. I was able to get different sized executables by using equivalent programs, but when checking the actual generated machine code, they were identical. In fact, the only differences that I saw were that line numbers with debugging information were different.

Related

What is a Linux equivalent of Visual Studio's 'build code in release mode'? [duplicate]

What are the specific options I would need to build in "release mode" with full optimizations in GCC? If there are more than one option, please list them all. Thanks.
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
There is no 'one size fits all' - you need to understand your application, your requirements and the optimisation flags to determine the correct subset for your binary.
Or the answer you want:
-O3
Here is a part from a Makefile that I use regularly (in this example, it's trying to build a program named foo).
If you run it like $ make BUILD=debug or $ make debug
then the Debug CFLAGS will be used. These turn off optimization (-O0) and includes debugging symbols (-g).
If you omit these flags (by running $ make without any additional parameters), you'll build the Release CFLAGS version where optimization is turned on (-O2), debugging symbols stripped (-s) and assertions disabled (-DNDEBUG).
As others have suggested, you can experiment with different -O* settings dependig on your specific needs.
ifeq ($(BUILD),debug)
# "Debug" build - no optimization, and debugging symbols
CFLAGS += -O0 -g
else
# "Release" build - optimization, and no debug symbols
CFLAGS += -O2 -s -DNDEBUG
endif
all: foo
debug:
make "BUILD=debug"
foo: foo.o
# The rest of the makefile comes here...
Note that gcc doesn't have a "release mode" and a "debug mode" like MSVC does. All code is just code. The presence of the various optimization options (-O2 and -Os are the only ones you generally need to care about unless you're doing very fine tuning) modifies the generated code, but not in a way to prevent interoperability with other ABI-compliant code. Generally you want optimization on stuff you want to release.
The presence of the "-g" option will cause extended symbol and source code information to be placed in the generated files, which is useful for debugging but increases the size of the file (and reveals your source code), which is something you often don't want in "released" binaries.
But they're not exclusive. You can have a binary compiled with optimization and debug info, or one with neither.
-O2 will turn on all optimizations that don't require a space\speed trade off and tends to be the one I see used most often. -O3 does some space for speed trade offs(like function inline.) -Os does O2 plus does other things to reduce code size. This can make things faster than O3 by improving cache use. (test to find out if it works for you.) Note there are a large number of options that none of the O switches touch. The reason they are left out is because it often depends on what kind of code you are writing or are very architecture dependent.

How to mark a *standard library* function/method as deprecated (or disabled altogether) in my project?

I'm trying to somehow disable/mark as deprecated the hideous std::string::operator=(char) overload (which in my experience is used only when mistakingly assigning an integer to a string, and causes subtle and difficult to track bugs).
I tried with:
an explicit specialization with a static assert in it
#include <string>
#include <type_traits>
template<> std::basic_string<char> &std::basic_string<char>::operator=(char c) {
static_assert(false, "Don't use this!");
}
which fails as <string> already does an explicit instantiation of std::string
the [[deprecated]] attribute, applied to a similar declaration as above in various positions; no position I tried seemed to yield any reasonable result;
=delete, which fails for reasons similar to above;
I thought about using linker tricks (in a similar vein, in the same project we have runtime checks on stray setlocale usages using the --wrap ld linker option), but the fact that this is a template and inline method complicates the matter.
Now to the questions:
is there a standard method to somehow disable (as would happen with =delete) any function or method in the standard library (read: in a library where you cannot alter the declarations in the headers)?
as above, but, instead of disable, add a warning (as would happen with [[deprecated]]);
failing the standard method, is there something g++-specific?
if there's no "general" (=applicable to any method, any class, any function, ...) solution, is there something that we could apply to this specific case (=disable a method of a template class, possibly even just a specific instantiation)?
You can use the following compiler/linker option:
$ g++ -O0 test.cpp -Wl,--wrap=_ZNSsaSEc
Explanation:
The _ZNSsaSEc is the decorated name of your offending function:
$ echo _ZNSsaSEc | c++filt
std::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(char)
The -Wl compiler option is to pass options to the linker.
And the --wrap=<symbol> linker option transforms any reference to the given symbol to the alternative __wrap_<symbol>. And since you are (hopefully) not defining a function named __wrap__ZNSsaSEc, you will get a nice linker error:
test.cpp:(.text+0x26): undefined reference to `__wrap__ZNSsaSEc'
And -O0 is to disable optimizations and prevent the compiler from inlining the function. The linker trick will not work if there is inlining, as #SergeBallesta pointed out in the comment.
Maybe a bit of a hack, but hey, it works!
Well, I'm afraid that the standard library is intended to be... standard, and as such does not provide hooks to allow developpers to tweak it.
An ugly way (never say I advise you do use it ;-) ) would be to use the fact that Standard Library headers are just text files, so you can easily change them in you local developpement environment. A possibly less bad way, would be to setup a folder containing links to original headers except for the modified header and instruct compiler to use that folder for system headers.
That way, you can change anything you want, but... portability and maintainability... It's really a desperado solution...
This is clang++ specific, since I don't know what the equivalent functionality is called in the gnu toolchain. It's also somewhat overkill.
Rodrigo's suggestion of using the linker to swap out the symbol is great for non-inlined cases. If you build everything at O0 occasionally that'll suffice.
Otherwise, the llvm (clang) toolchain offers surprising amounts of control over the optimisation pipeline. For example, you can compile without optimisations, then run the optimisations yourself using opt, then convert to an object file.
clang++ -c test.cpp -O0 --emit-llvm test.ll
opt -O3 test.bc -o faster.ll
clang++ -c faster.bc -o test.o
The opt tool is extensible. I can't honestly say it's trivial to extend, but the process is well documented. You can write a compiler pass that warns when your standard library function is seen. The end result could be invoked something like:
clang++ -c test.cpp -O0 --emit-llvm test.ll
opt --load DeprecationPass.so test.bc -o /dev/null
opt -O3 test.bc -o faster.ll
clang++ -c faster.bc -o test.o
If you're confident the custom pass is correct (not merely useful) you can use a single invocation of opt. It's probably possible to pass flags through to opt via the clang front end, but it's not immediately obvious how to.
Overall, following rodrigo's suggestion and occasionally building the entire product at O0 is probably a better plan - but it is exciting that clang lets you do things like this.

Trace gcc compilation and what code slows it down

I want to find out what code causes slow compilation times in gcc. I previously had a code being compiled slowly and someone told me the command-line switch that makes gcc to print each step that it compiles, including each function/variable/symbol and so on. That helped a lot (I could literally see in console where gcc chokes), but I forgot what was the switch.
I found it (from the gcc man page):
-Q
Makes the compiler print out each function name as it is compiled, and print some statistics about each pass when it finishes.
See also this answer to a quite similar question.
You very probably want to invoke GCC with -time or more probably -ftime-report which gives you the time spent by cc1 or cc1plus ... (the compiler proper started by the gcc or g++command) which shows the time spent in each internal phases or passes of the GCC compiler. Don't forget also optimizations, debugging, and warnings flags (e.g. -Wall -O -g); they all slow down the compilation.
You'll learn that for C programs, parsing is a small fraction of the compilation time, as soon as you ask for some optimization, e.g. -O1 or -O2. (This is less true for C++, when parsing can take half of the time, notably since template expansion is considered as parsing).
Empirically, what slows down GCC are very long function bodies. Better have 50 functions of 1000 lines each than one single function of 50000 lines (and this could happen in programs generating some of their C++ code, e.g. RefPerSys or perhaps -in spring 2021- Bismon).
Try the -v (verbose) compilation.
See this link:
http://www.network-theory.co.uk/docs/gccintro/gccintro_75.html
edit:
I understand. Maybe this will help:
gcc -fdump-tree-all -fdump-rtl-all
and the like (-fdump-passes). See here: http://fizz.phys.dal.ca/~jordan/gcc-4.0.1/gcc/Debugging-Options.html

How to remove unused C/C++ symbols with GCC and ld?

I need to optimize the size of my executable severely (ARM development) and
I noticed that in my current build scheme (gcc + ld) unused symbols are not getting stripped.
The usage of the arm-strip --strip-unneeded for the resulting executables / libraries doesn't change the output size of the executable (I have no idea why, maybe it simply can't).
What would be the way (if it exists) to modify my building pipeline, so that the unused symbols are stripped from the resulting file?
I wouldn't even think of this, but my current embedded environment isn't very "powerful" and
saving even 500K out of 2M results in a very nice loading performance boost.
Update:
Unfortunately the current gcc version I use doesn't have the -dead-strip option and the -ffunction-sections... + --gc-sections for ld doesn't give any significant difference for the resulting output.
I'm shocked that this even became a problem, because I was sure that gcc + ld should automatically strip unused symbols (why do they even have to keep them?).
For GCC, this is accomplished in two stages:
First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:
-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):
-Wl,--gc-sections
So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test -Wl,--gc-sections
(Note that -Os is an additional compiler flag that tells GCC to optimize for size)
If this thread is to be believed, you need to supply the -ffunction-sections and -fdata-sections to gcc, which will put each function and data object in its own section. Then you give and --gc-sections to GNU ld to remove the unused sections.
You'll want to check your docs for your version of gcc & ld:
However for me (OS X gcc 4.0.1) I find these for ld
-dead_strip
Remove functions and data that are unreachable by the entry point or exported symbols.
-dead_strip_dylibs
Remove dylibs that are unreachable by the entry point or exported symbols. That is, suppresses the generation of load command commands for dylibs which supplied no symbols during the link. This option should not be used when linking against a dylib which is required at runtime for some indirect reason such as the dylib has an important initializer.
And this helpful option
-why_live symbol_name
Logs a chain of references to symbol_name. Only applicable with -dead_strip. It can help debug why something that you think should be dead strip removed is not removed.
There's also a note in the gcc/g++ man that certain kinds of dead code elimination are only performed if optimization is enabled when compiling.
While these options/conditions may not hold for your compiler, I suggest you look for something similar in your docs.
Programming habits could help too; e.g. add static to functions that are not accessed outside a specific file; use shorter names for symbols (can help a bit, likely not too much); use const char x[] where possible; ... this paper, though it talks about dynamic shared objects, can contain suggestions that, if followed, can help to make your final binary output size smaller (if your target is ELF).
The answer is -flto. You have to pass it to both your compilation and link steps, otherwise it doesn't do anything.
It actually works very well - reduced the size of a microcontroller program I wrote to less than 50% of its previous size!
Unfortunately it did seem a bit buggy - I had instances of things not being built correctly. It may have been due to the build system I'm using (QBS; it's very new), but in any case I'd recommend you only enable it for your final build if possible, and test that build thoroughly.
While not strictly about symbols, if going for size - always compile with -Os and -s flags. -Os optimizes the resulting code for minimum executable size and -s removes the symbol table and relocation information from the executable.
Sometimes - if small size is desired - playing around with different optimization flags may - or may not - have significance. For example toggling -ffast-math and/or -fomit-frame-pointer may at times save you even dozens of bytes.
It seems to me that the answer provided by Nemo is the correct one. If those instructions do not work, the issue may be related to the version of gcc/ld you're using, as an exercise I compiled an example program using instructions detailed here
#include <stdio.h>
void deadcode() { printf("This is d dead codez\n"); }
int main(void) { printf("This is main\n"); return 0 ; }
Then I compiled the code using progressively more aggressive dead-code removal switches:
gcc -Os test.c -o test.elf
gcc -Os -fdata-sections -ffunction-sections test.c -o test.elf -Wl,--gc-sections
gcc -Os -fdata-sections -ffunction-sections test.c -o test.elf -Wl,--gc-sections -Wl,--strip-all
These compilation and linking parameters produced executables of size 8457, 8164 and 6160 bytes, respectively, the most substantial contribution coming from the 'strip-all' declaration. If you cannot produce similar reductions on your platform,then maybe your version of gcc does not support this functionality. I'm using gcc(4.5.2-8ubuntu4), ld(2.21.0.20110327) on Linux Mint 2.6.38-8-generic x86_64
strip --strip-unneeded only operates on the symbol table of your executable. It doesn't actually remove any executable code.
The standard libraries achieve the result you're after by splitting all of their functions into seperate object files, which are combined using ar. If you then link the resultant archive as a library (ie. give the option -l your_library to ld) then ld will only include the object files, and therefore the symbols, that are actually used.
You may also find some of the responses to this similar question of use.
I don't know if this will help with your current predicament as this is a recent feature, but you can specify the visibility of symbols in a global manner. Passing -fvisibility=hidden -fvisibility-inlines-hidden at compilation can help the linker to later get rid of unneeded symbols. If you're producing an executable (as opposed to a shared library) there's nothing more to do.
More information (and a fine-grained approach for e.g. libraries) is available on the GCC wiki.
From the GCC 4.2.1 manual, section -fwhole-program:
Assume that the current compilation unit represents whole program being compiled. All public functions and variables with the exception of main and those merged by attribute externally_visible become static functions and in a affect gets more aggressively optimized by interprocedural optimizers. While this option is equivalent to proper use of static keyword for programs consisting of single file, in combination with option --combine this flag can be used to compile most of smaller scale C programs since the functions and variables become local for the whole combined compilation unit, not for the single source file itself.
You can use strip binary on object file(eg. executable) to strip all symbols from it.
Note: it changes file itself and don't create copy.

How to build in release mode with optimizations in GCC?

What are the specific options I would need to build in "release mode" with full optimizations in GCC? If there are more than one option, please list them all. Thanks.
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
There is no 'one size fits all' - you need to understand your application, your requirements and the optimisation flags to determine the correct subset for your binary.
Or the answer you want:
-O3
Here is a part from a Makefile that I use regularly (in this example, it's trying to build a program named foo).
If you run it like $ make BUILD=debug or $ make debug
then the Debug CFLAGS will be used. These turn off optimization (-O0) and includes debugging symbols (-g).
If you omit these flags (by running $ make without any additional parameters), you'll build the Release CFLAGS version where optimization is turned on (-O2), debugging symbols stripped (-s) and assertions disabled (-DNDEBUG).
As others have suggested, you can experiment with different -O* settings dependig on your specific needs.
ifeq ($(BUILD),debug)
# "Debug" build - no optimization, and debugging symbols
CFLAGS += -O0 -g
else
# "Release" build - optimization, and no debug symbols
CFLAGS += -O2 -s -DNDEBUG
endif
all: foo
debug:
make "BUILD=debug"
foo: foo.o
# The rest of the makefile comes here...
Note that gcc doesn't have a "release mode" and a "debug mode" like MSVC does. All code is just code. The presence of the various optimization options (-O2 and -Os are the only ones you generally need to care about unless you're doing very fine tuning) modifies the generated code, but not in a way to prevent interoperability with other ABI-compliant code. Generally you want optimization on stuff you want to release.
The presence of the "-g" option will cause extended symbol and source code information to be placed in the generated files, which is useful for debugging but increases the size of the file (and reveals your source code), which is something you often don't want in "released" binaries.
But they're not exclusive. You can have a binary compiled with optimization and debug info, or one with neither.
-O2 will turn on all optimizations that don't require a space\speed trade off and tends to be the one I see used most often. -O3 does some space for speed trade offs(like function inline.) -Os does O2 plus does other things to reduce code size. This can make things faster than O3 by improving cache use. (test to find out if it works for you.) Note there are a large number of options that none of the O switches touch. The reason they are left out is because it often depends on what kind of code you are writing or are very architecture dependent.