Does unnecessary C++ code end up in my completed program? - c++

Let's say I'm including a header file that has tons of functions.
#include "1000Functions.h"
Function1(42);
Function2("Hello");
Function1000("geeks!");
But, I just want to use a few of the functions from the header. After preprocessing, compiling, and linking (for example, with g++), would my program include all 1000 functions, or just the 3 that I used?

I found this article that was useful. Using objdump -tC ProgramName can show you the unnecessary code that ends up being loaded into .text when your program is loaded into the memory.
Link-time optimization was what I was looking for, which worked for me once I added both of these tags to the linking command, not just -flto.
-O2 -flto

Related

gfortran: multiple definitions of... first defined here

I have code that includes main program and many modules in separate files that I am linking. Currently I have a makefile that creates .o files for each module (one on separate line) and then I put them all together, such as here:
mpif90 - modutils
mpif90 -c modvarsym
mpif90 -c s1_Phi.f90
mpif90 -c s2_Lambda.f90
mpif90 maincode.f90 modutils.o modvarsym.o s1_Phi.o s2_Lambda.o -o maincode
The above compiles fine and runs OK - except tat I suspect that I suspect array bound problems in my variables. So I include -fbounds-check maincode statement such as here:
mpif90 maincode.f90 modutils.o modvarsym.o s1_Phi.o s2_Lambda.o -o -fbounds-check maincode
That's when numerous "multiple definition" errors appear, and the code will no longer compile. I imagine that is because of -fbounds-check: rather than just enabling checking for array bounds, it probably does some additional checks. I also suspect that the error is in the way that I enter files in the make file. However I could not find the way that would work. In these files, both modvarsym and modutils is used by the main code as well as by the other two modules. The main code uses all four modules.
There is no include statement in these files. Maincode is the only file with the program statement, the variables are declared only once in modvarsym. Overall, the code compiles and runs without -fbounds-check. However I really want to use -fbounds-check to make sure the arrays do not overrun. Would anybody be able to put me on the right track? Thank you.
This is the answer #dave_thompson_085 gave in his comments, it seems to solve the problem.
First, I assume your first command is meant to have -c, and your first two are meant to have .f90 (or .f95 or similar) suffix as otherwise the compiler shouldn't do anything for them. Second, -o -fbounds-check maincode (in the absence of -c) means to put the linked output in file -fbounds-check and include maincode (if it exists) among the files linked. Since you have already linked all your routines into maincode, linking those same routines again PLUS maincode produces duplicates.
Move -fbounds-check before the -o at least; even better, it is usual style (though not required) to put options that affect parsing and code generation before the source file(s) as well, and in your example that is maincode.f90. Also note that this generates bound checks only for the routines in maincode; if there are any subscripting errors in the other routines they won't be caught. When you have a bug in a compiled language the place where a problem is detected may not be the actual origin, and it usually best to apply debugging options to everything you can.

GCC: how to find why an object file is not discarded

I have an executable which links to a big .a archive that contains lots of functions. The executable only uses a small fraction of the functions in this archive, but for some reason it pulls everything from it and ends up being very big.
My suspicion is that some of the functionality that the executable is using somehow references something it shouldn't and that causes everything else to be pulled.
Is it possible to make gcc tell me what reference causes a specific symbol to be added in the executable? Why else can this happen?
I've tried using --gc-sections with no effect.
I've tried using --version-script to make all the symbols in the executable local with no effect
I'm not interested in -ffunction-sections and -fdata-sections since it is while object files I want to discard, not functions.
Other answers mention -why_live but that seem to be implemented only for darwin and I am in linux x86_64
Use -Wl,-M to pass -M to the linker, causing it to print a link trace. This will show you the reasons (or at least the first-found reason) for every object file that gets linked from an archive.

Functions only getting inlined if defined in a header. Am I missing something?

Using gcc v4.8.1
If I do:
//func.hpp
#ifndef FUNC_HPP
#define FUNC_HPP
int func(int);
#endif
//func.cpp
#include "func.hpp"
int func(int x){
return 5*x+7;
}
//main.cpp
#include <iostream>
#include "func.hpp"
using std::cout;
using std::endl;
int main(){
cout<<func(5)<<endl;
return 0;
}
Even the simple function func will not get inlined. No combination of inline, extern, static, and __attribute__((always_inline)) on the prototype and/or the definition changes this (obviously some combinations of these specifiers cause it to not even compile and/or produce warnings, not talking about those). I'm using g++ *.cpp -O3 -o run and g++ *.cpp -O3 -S for assembly output. When I look at the assembly output, I still see call func. It appears only way I can get the function to be properly inlined is to have the prototype (probably not necessary) and the definition of the function in the header file. If the header is only included by one file in the whole program (included by only main.cpp for example) it will compile and the function will be properly inlined without even needing the inline specifier. If the header is to be included by multiple files, the inline specifier appears to be needed to resolve multiple definition errors, and that appears to be its only purpose. The function is of course inlined properly.
So my question is: am I doing something wrong? Am I missing something? Whatever happened to:
"The compiler is smarter than you. It knows when a function should be inlined better than you do. And never ever use C arrays. Always use std::vector!"
-Every other StackOverflow user
Really? So calling func(5) and printing the result is faster than just printing 32? I will blindly follow you off the edge of a cliff all mighty all knowing and all wise gcc.
For the record, the above code is just an example. I am writing a ray tracer and when I moved all of the code of my math and other utility classes to their header files and used the inline specifier, I saw massive performance gains. Literally like 10 times faster for some scenes.
Recent GCC is able to inline across compilation units through link-time optimizations (LTO). You need to compile - and link - with -flto; see Link-time optimization and inline and GCC optimize options.
(Actually, LTO is done by a special variant lto1 of the compiler at link time; LTO works by serializing, inside the object files, some internal representations of GCC, which are also used by lto1; so what happens with -flto is that when compiling a src1.c with it the generated src1.o contains the GIMPLE representations in addition of the object binary; and when linking with gcc -flto src*.o the lto1 "front-end" is extracting that GIMPLE representations from inside the src*.o and almost recompiling all again...)
You need to explicitly pass -flto both at compile time AND at link time (see this). If using a Makefile you could try make CC='gcc -flto'; otherwise, compile each translation unit with e.g. gcc -Wall -flto -O2 -c src1.c (and likewise for src2.c etc...) and link all of your program (or library) with gcc -Wall -flto -O2 src1.o src2.o -o prog -lsomelib
Notice that -flto will significantly slow down your build (it is not passed by -O3 so you need to use it explicitly, and you need to link with it also). Often you get a 5% or 10% improvement of performance -of the built program- at the expense of nearly doubling the build time. Sometimes you can get more improvements.
The compiler can't inline what it doesn't have. It needs the full body of the function to inline its code.
You have to remember that the compiler only works on one source file at a time (more precisely, one translation unit at a time), and have no idea about other source files and whats in them.
The linker might be able to do it though, as it sees all the code, and some linkers have flags that allows some link-time optimizations.
The inline keyword is nothing more than a suggestion to the compiler, "i want this function to be inlined". It can ignore this keyword, without even a warning.
In order for your function func(...) to be inlined, your compiler/linker HAVE TO support some form of link-time code generation(and optimizaton). Because func() and main() lie in different code units, the C++ compiler can't see them both at the same time, and therefore can't inline one function within the other. It NEEDS the LINKER SUPPORT to do so.
Consult your build tool manuals on how to switch link time code gen features on, if they are supported at all.

gcc: Linking C library in a C++ app results in "multiple definition of" errors

I have a working C library which I want to link to a C++ application using gcc but the linker (g++) is giving me the "multiple definition" error. With a C application and gcc it works.
The headers defining the interface all contain the:
#ifdef __cplusplus
extern "C" {
#endif
I checked the library using the "nm" command and it does have multiple definitions of the method (the method in question is not from the public interface).
My questions are:
Why does my library have multiple definitions (some have the T while others have U)?
Why it works if the application including the file is a C application (I'm using -Wall to build)?
Do I need any special attribute or use a specific file extension to make it work or is the case that I need to go back to programming school :) ?
Paying more attention to the lib.a file I can see that one of the objects is included twice. For example, I have two sections for the same object:
obj1.o
00000000 T Method
obj2.o
00000000 T Hello
obj1.o
00000000 T Method
I guess this is the problem?
Any help is really appreciated.
My wild guess is that the "#define BLAHBLAH_H" and "#ifndef BLAHBLAH_H / #endif" set outside the 'extern "C"{}' thing.
after playing around I found that actually the whole command line (it's kind of a complex application with an automated compilation and linkage) contained the --whole-archive parameter before the inclusion of the C library. Moving the library after the --no-whole-archive fixed the problem.
Original command
gcc -Wl,**--whole-archive** -l:otherlibs *-Llibpath -l:libname* Wl,**--no-whole-archive** -o myApp hello.c
Fixed command
gcc -Wl,**--whole-archive** -l:otherlibs Wl,**--no-whole-archive** *-Llibpath -l:libname* -o myApp hello.c
Thank you for everyone's help guys and sorry if I didn't provide enough/accurate information.
Best Regards

Garbage from other linking units

I asked myself the following question, when I was discussing this topic .
Are there cases when some unused code from translation units will link to final executable code (in release mode of course) for popular compilers like GCC and VC++?
For example suppose we have 2 compilation units:
//A.hpp
//Here are declarations of some classes, functions, extern variables etc.
And source file
//A.cpp
//defination of A.hpp declarations
And finally main
//main.cpp
//including A.hpp library
#include "A.hpp"
//here we will use some stuff from A.hpp library, but not everything
My question is. What if in main.cpp not all the stuff from A.hpp is used? Will the linker remove all unused code, or there are some cases, when some unused code can link with executable file?
Edit: I'm interested in G++ and VC++ linkers.
Edit: Of course I mean in release mode.
Edit: I'm starting bounty for this question to get good and full answer. I'm expecting answer, which will explain in which cases g++ and VC++ linkers are linking junk and what kind of code they are able to remove from executable file(unneeded functions, unneeded global variables, unneeded class definitions, etc...) and why aren't they able to remove some kind of unneeded stuff.
As other posters have indicated, the linker typically does not remove dead code before building the final executable. However, there are often Optimization settings you can use to force the linker to try extra hard to do this.
For GCC, this is accomplished in two stages:
First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:
-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):
-Wl,--gc-sections
So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
(Note that -Os is an additional linker flag that tells GCC to optimize for size)
I have also read somewhere that linking static libraries is different though. That GCC automatically omits unused symbols in this case. Perhaps another poster can confirm/disprove this.
As for MSVC, as others have mentioned, function level linking accomplishes the same thing.
I believe the compiler flag for this is (to sort things into sections):
/Gy
And then the linker flag (to discard unused sections):
/OPT:REF
EDIT: After further research, I think that bit about GCC automatically doing this for static libraries is false.
The linker will not remove code.
You can still access it via dlsym dynamically in your code.
In general, linkers tend to include everything from the object files explicitly passed on the command line, but only pull in those object files from a static library that contain symbols needed to resolve external references from object files already linked.
However, a linker may decide to discard functions that are never called, or data which is never referenced. The precise details will depend on the compiler and linker switches.
In C++ code, if a source file is explicitly compiled and linked in to your application then I would expect that the objects with static storage duration that have constructors and/or destructors will be included, and their constructors/destructors run at the appropriate times. Consequently, any code called from those constructors or destructors must be in the final executable. However, if the code is not called from anywhere then you cannot write a program to tell whether or not the code is included without using things like dlsym, so the linker may well omit to include it in the final executable.
I would also expect that any symbols defined with global visibility such that they could be found via dlsym (as opposed to "hidden" symbols which are only visible within the executable) would be present in the final executable. However, this is an expectation rather than something I have confirmed by testing or reading the docs.
If you wanted to ensure code was in your executable even if it isn't called by inside it, you could load it in as a statically aware dynamic link library (a statically aware library is one which is loaded automatically into memory as the program is loaded, as opposed to the functionality where you can pass a string to a function that loads a library and then you manually search for hooks)