C and C++ static linking: just a copy?

C and C++ static linking: just a copy? - c++

When someone statically links a .lib, will the linker copy the whole contents of lib into the final executable or just the functions used in the object files?

The whole library? -- No.
Just the functions you called? -- No.
Something else? -- Yes.
It certainly doesn't throw in the whole library.
But it doesn't necessarily include just "the functions used in the object files" either.
The linker will make a recursively built list of which object modules in the library satisfy your undefined symbols.
Then, it will include each of those object modules.
Typically, a given object module will include more than one function, and if some of these are not called by the ones that you do call, you will get some number of functions (and data objects) that you didn't need.

The linker typically does not remove dead code before building the final executable. That is, it will (usually) link in ALL symbols whether they are used in the final executable or not. However, linkers often explicitly provide Optimization settings you can use to force the linker to try extra hard to do this.
For GCC, this is accomplished in two stages:
First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:
-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):
-Wl,--gc-sections
So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
(Note that -Os is an additional compiler flag that tells GCC to optimize for size)
As for MSVC, function level linking accomplishes the same thing.
I believe the compiler flag for this is (to sort things into sections):
/Gy
And then the linker flag (to discard unused sections):
/OPT:REF

Linkers were invented in ancient times, when memory was especially precious. One of their primary functions was to prune out the modules you weren't using. That ability has been carried forward to the present day.
It's quite common for some library functions to rely on others though, and all the dependencies will be linked.

Sort of. It will however also need to fix up all the function call pointers. Especially if those function calls exist outside of the static library (ie in another static library or executable).

Depends on the linker. Some linkers are lazy and just throw the whole library in. The other extreme is linkers that throw in only the necessary code into an executable.
A sample test is to write a program that uses puts and compare with a program that uses printf. If the executables are the same size, you have more of a lazy linker.
Example:
puts_test.cpp
#include <cstdio>
using namespace std;
int main(void)
{
puts("Hello World\n");
return 0;
}
printf_test.cpp
#include <cstdio>
using namespace std;
int main(void)
{
printf("%s\n", "Hello World");
return 0;
}
With the above example, the puts function does not require extra code for parsing format strings or converting numerics into text. This is the baseline because it requires a minimal library function.
The example using printf requires more functionality. The printf function requires parsing the format string and outputting text.
The expected result is that the printf executable should be larger than the puts executable. Most compilers will haul in all the code for the printf function to resolve symbols (such as for displaying floats) even though that portion of the code is not used. More intelligent (and costly) compilers will break up the printf function and only include the parts that are used or required. In the example above, the compiler should only include the parts for processing text and not include code to format integers and floating point values.
A lazy compiler, or in debug mode, will copy the entire library for the puts example, thus making the executables the same size.
Symbol comparison
The *nix platforms and Cygwin provide tools to obtaining the symbols from executables. One such utility is nm. Run nm on each executable, directing output to a text file. Compare the two text files. Lazy compilers should have the same symbols; except their locations may differ (which is not important to the issue).

It will use only the used functions & symbols (unless told otherwise, but that can be tricky).
Side issue:
This can actually be a problem if you f.ex. have some classes that just register themselves to a factory. No-one calls these classes directly, so they won't be included and thus not registered in the factory. There are ways around this (usually by declaring some anonymous variable in the header file that references the source file).

Related

Compiling All the Symbols of an Object File as Weak

Motivation
I have 2 static libraries, libStatic1.a and libStatic2.a. In addition, I have many SOs (Shared Objects) that compile with libStatic1.a.
Up until now, libStatic1.a and libStatic2.a were independent and everything was okay. But now I added to the code that generates libStatic1.a a dependency on the code that generates libStatic2.a. Therefore, any SO that depends on libStatic1.a now needs to be compiled with libStatic2.a. This is undesirable, because it adds a dependency on libStatic2.a to every build targets that depends on libStatic1.a.
only on libStatic1.a now need to compile their code with libStatic2.a in order for the compilation/runtime to succeed/not crash. This creates an unnecessary coupling and I would like to avoid it.
Therefore, I need to somehow "embed" the object code of libStatic2.a in libStatic1.a. If I would just compile libStatic1.a with all the object files of libStatic2.a (in addition to its own), it will basically contain it but this creates another problem- If some user of libStatic1.a will decide to use libStatic2.a and will link it, he will get a weird "multiple definitions" error. If I could somehow tell the compiler to generate the object files of libStatic2.a with weak symbols (only for the use in libStatic1.a) this would solve the problem- no one will get multiple definitions, and no makefile of all the many SOs that use libStatic1.a will need to change.
My thinking: I know that it is possible (using GCC/g++ extensions to the C language) to declare a function with the keyword __attribute__ and the weak attribute as following:
void __attribute__((weak)) foo(int j);
Is there a way to tell the compiler (g++) to compile an entire compilation unit as "weak", meaning all its global symbols in the symbol table will be considered weak when linking?
Alternatively, is there a way to tell the linker (ld) to consider all the symbols of some object file/library as if they are weak?

If your library is small, the simplest way is still to change the declarations by adding manually the __attribute__((weak)).
Another possibility might be to ask g++ to spill the assembly code (with -S) and have some (perhaps awkor ed) script work on it.
You could also code a GCC plugin (assuming your g++ is a 4.6 version) or a GCC MELT extension for that.

Compile it normally and then objcopy the object file with --weaken.

No, there doesn't appear to be; are there so many weak external functions that it's not practical to set their attributes individually?

Garbage from other linking units

I asked myself the following question, when I was discussing this topic .
Are there cases when some unused code from translation units will link to final executable code (in release mode of course) for popular compilers like GCC and VC++?
For example suppose we have 2 compilation units:
//A.hpp
//Here are declarations of some classes, functions, extern variables etc.
And source file
//A.cpp
//defination of A.hpp declarations
And finally main
//main.cpp
//including A.hpp library
#include "A.hpp"
//here we will use some stuff from A.hpp library, but not everything
My question is. What if in main.cpp not all the stuff from A.hpp is used? Will the linker remove all unused code, or there are some cases, when some unused code can link with executable file?
Edit: I'm interested in G++ and VC++ linkers.
Edit: Of course I mean in release mode.
Edit: I'm starting bounty for this question to get good and full answer. I'm expecting answer, which will explain in which cases g++ and VC++ linkers are linking junk and what kind of code they are able to remove from executable file(unneeded functions, unneeded global variables, unneeded class definitions, etc...) and why aren't they able to remove some kind of unneeded stuff.

As other posters have indicated, the linker typically does not remove dead code before building the final executable. However, there are often Optimization settings you can use to force the linker to try extra hard to do this.
For GCC, this is accomplished in two stages:
First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:
-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):
-Wl,--gc-sections
So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
(Note that -Os is an additional linker flag that tells GCC to optimize for size)
I have also read somewhere that linking static libraries is different though. That GCC automatically omits unused symbols in this case. Perhaps another poster can confirm/disprove this.
As for MSVC, as others have mentioned, function level linking accomplishes the same thing.
I believe the compiler flag for this is (to sort things into sections):
/Gy
And then the linker flag (to discard unused sections):
/OPT:REF
EDIT: After further research, I think that bit about GCC automatically doing this for static libraries is false.

The linker will not remove code.
You can still access it via dlsym dynamically in your code.

In general, linkers tend to include everything from the object files explicitly passed on the command line, but only pull in those object files from a static library that contain symbols needed to resolve external references from object files already linked.
However, a linker may decide to discard functions that are never called, or data which is never referenced. The precise details will depend on the compiler and linker switches.
In C++ code, if a source file is explicitly compiled and linked in to your application then I would expect that the objects with static storage duration that have constructors and/or destructors will be included, and their constructors/destructors run at the appropriate times. Consequently, any code called from those constructors or destructors must be in the final executable. However, if the code is not called from anywhere then you cannot write a program to tell whether or not the code is included without using things like dlsym, so the linker may well omit to include it in the final executable.
I would also expect that any symbols defined with global visibility such that they could be found via dlsym (as opposed to "hidden" symbols which are only visible within the executable) would be present in the final executable. However, this is an expectation rather than something I have confirmed by testing or reading the docs.

If you wanted to ensure code was in your executable even if it isn't called by inside it, you could load it in as a statically aware dynamic link library (a statically aware library is one which is loaded automatically into memory as the program is loaded, as opposed to the functionality where you can pass a string to a function that loads a library and then you manually search for hooks)

Preventing objects from being linked if they are not needed?

I have an ARM project that I'm building with make. I'm creating the list of object files to link based on the names of all of the .c and .cpp files in my source directory. However, I would like to exclude objects from being linked if they are never used. Will the linker exclude these objects from the .elf file automatically even if I include them in the list of objects to link? If not, is there a way to generate a list of only the objects that need to be linked?

You have to compile your code differently to strip out function and data that isn't used. Usually all the objects are compiled into the same symbol, so they can't be individually omitted if they're not used.
Add the two following switches to your compiler line:
-ffunction-sections -fdata-sections
When you compile, the compiler will now put individual functions and data into their own sections instead of lumping them all in one module section.
Then, in your linker, specify the following:
--gc-sections
This instructs the linker to remove unused sections ("gc" is for garbage collection). It will garbage collect parts of files and entire files. For example, if you're compiling an object, but only use 1 function of 100 in the object, it will toss out the other 99 you're not using.
If you run into issues with functions not found (it happens due to various reasons like externs between libraries), you can use .keep directives in your linker file (*.ld) in order to prevent garbage collection on those individual functions.

If you are using RealView, it seems that it is possible. This section discusses it:
3.3.3 Unused section elimination
Unused section elimination removes code that is never executed, or data that is not
referred to by the code, from the final image. This optimization can be controlled by the
--remove, --no_remove, --first, --last, and --keep linker options. Use the --info unused
linker option to instruct the linker to generate a list of the unused sections that have been
eliminated.

Like many people said, the answer is "depends". In my experience, RVCT is very good about dead code stripping. Unused code and data will almost always be removed in the final link stage. GCC, on the other hand (at least without the LLVM back end), is rather poor at whole image static analysis and will not do a very good job at removing unused code (and woe be it to you if your code is in different sections requiring long jumps). You can take some steps to mitigate it, such as using function-sections, which creates a separate section for each function and enables some better dead code stripping.
Have your linker generate a map file of your binary so you can see what made it in there and what got stripped out.

Depending on the sophistication of the compiler/linker and optimization level, the linker will not link in code that isn't being called.

What compiler/linker are you using? Some linkers do this automatically, and some provide the feature as a command-line option.

In my experience, many compilers will not include unused code on an object file basis. Some may not have this resolution and will include entire libraries ("because this makes the build process faster").
For example, given a file junk.c and it has three functions: Func1, Func2 and Func3. The build process creates an object file, junk.o, which has all three functions in it. If function Func2 is not used, it will be included anyway because the linker can't exclude one function out of an object file.
On the other hand, given files: Func1.c, Func2.c, and Func3.c, with the functions above, one per file. If Func2 in Func2.c is not used, the linker will not include it.
Some linkers are intelligent enough to exclude files out of libraries. However, each linker is different on its granularity of file inclusion (and thus file exclusion). Read your linker's manual or contact their customer support for exact information.
I suggest moving the suspect functions into a separate file (one function per file) and rebuild. Measure the code size before and after. Also, there may be a difference between Debug and Release linking. The Debug linking could be lazy and just throw everything in while the Release linking puts more effort into removing unused code.
Just my thoughts and experience, Your Mileage May Vary (YMMV).

Traditionally linkers link in all object files that are explicity specified in the command line, even if they could be left out and the program would not have any unresolved symbols. This means that you can deliberately change the behaviour of a program by including an object file that does something triggered from static initialization but is not called directly or indirectly from main.
Typically if you place most of your object files in a static library and link this library with a single object file containing your entry point the linker will only pick out members of the library (iteratively) that help resolve an unresolved symbol reference in the original object file or one included subsequently because it resolved a previous unresolved symbol.
In short, place most of your object files in a library and just link this with one object containing your entry point.

Shared libraries and .h files

I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.

Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.

You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.

You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib

If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).

Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.

static link library

I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks

In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.

Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.

Some of the space is used by C++ startup code, and the portions of the static library that you use.

In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C and C++ static linking: just a copy? - c++

When someone statically links a .lib, will the linker copy the whole contents of lib into the final executable or just the functions used in the object files?

Sort of. It will however also need to fix up all the function call pointers. Especially if those function calls exist outside of the static library (ie in another static library or executable).

Related

Compiling All the Symbols of an Object File as Weak

Garbage from other linking units

Preventing objects from being linked if they are not needed?

Shared libraries and .h files

static link library

Categories

Resources