gcc -fPIC vs. -shared - c++

When compiling a shared library with gcc / g++, why is -fPIC not implied by -shared option? Or, said differently, is the option -fPIC required at link time?
For short, should I write:
gcc -c -fPIC foo.c -o foo.o
gcc -shared -fPIC foo.o -o libfoo.so // with -fPIC
or is the following sufficient:
gcc -c -fPIC foo.c -o foo.o
gcc -shared foo.o -o libfoo.so // without -fPIC

Code that is built into shared libraries should normally but not mandatory be position independent code, so that the shared library can readily be loaded at any address in memory. The -fPIC option ensures that GCC produces such code. It is not required, thus it is not implied in -shared so GCC gives a freedom of choice. Without this option the compiler can make some optimizations on that position dependent code.
Position dependent code may occur an error if one process wants to load more than one shared library at the same virtual address. Since libraries cannot predict what other libraries could be loaded, this problem is unavoidable with the traditional shared library concept. Virtual address space doesn't help here. If your application does not use a lot of other shared libraries and they are loaded before yours, you can predict the loading address of your library and you can define it as a base address of your position dependent library.
The shared library is supposed to be shared between processes, it may not always be possible to load the library at the same address in both. If the code were not position independent, then each process would require its own copy.

Related

Is the address of a C function symbol constant between compiles

I have been experimenting with symbol visibility in my shared library and noticed that the address / value of an exported function symbol does not seem to change. Are these addresses constant between compiles, or is this a coincidence?
The addresses where obtained on a Virtual Machine running Arch Linux using the command readelf with option -W and --dyn-syms.
The reason I'm asking is that I am wondering if the address of a templated C++ function could be used as an uuid for an object type. This is of interest in my serialization routine where I would like to setup an id system which is constant between compiles (object types are registered statically at initialization time, so order is not defined).
If build process is unchanged (i.e. compiler, linker, Makefiles and code remain the same) the static address in ELF file will not change either. But if any component changes, all bets are off.
More importantly, dynamic address (assigned by dynamic loader) will be different on each run due to address-space randomization in modern Linux distros so you should not rely on it.
When you build your code you can choose either to build it position dependent or position independent this has nothing to do with static build (though you can't build a position independent static binary). Position dependent binaries (given the same sources, compiler and build flags) will always generate the same addresses, but as I say further down, I wouldn't rely on it in release.
This is supplied by GCC's options -fPIE (Position independent executable), -fPIC (Position independent code), -pie. ELF executable files can be built as either position dependent or independent but shared objects (libraries) will always be built as position independent as you need to be able to load them in a random location given to you by the OS. From GCC's MAN page:
-fPIC
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table.
-fpie
-fPIE
These options are similar to -fpic and -fPIC, but generated position independent code can be only linked into executables. Usually these options are used when -pie GCC option will be used during linking.
-pie
Produce a position independent executable on targets which support it. For predictable results, you must also specify the same set of options that were used to generate code (-fpie, -fPIE, or model suboptions) when you specify this option.
When loading a PIC shared object you cannot assume it will reside in the same place for each run, as it might be affected by ASLR that is driven by the kernel.
In any way I don't think it's a good practice to use memory addresses as uuids to classes as these might change, even more so if these template classes are implemented as part of a shared object.

Is -fPIC for shared libraries ONLY?

I know -fPIC is necessary for shared libraries and know why.
However, I am not clear on the question:
Should -fPIC never be used during building an executable or a static library?
Should -fPIC never be used during building an executable or a static library?
Never is a strong word, and the statement above is false.
Code built with -fPIC is (slightly) less optimal, so why would you want to put it into anything other than a shared library?
Let's start with a static library, which has an easy answer.
Suppose you want to give your users a static library that can be linked into either an executable, or into a shared library of their own?
In that case you must either give them 3 separate archive libraries (one built with -fPIC for linking into shared libraries, one built with -fPIE for linking into PIE executables, and a "regular" one), or you can give them a single archive library (which must have code built with -fPIC).
Now, it could be argued that you should instead give them a shared library, but that forces your end users to distribute 2 binaries, and they may prefer to not do that.
But suppose you want to build a regular (non-PIE) executable. What could be the reason to link in -fPIC code into such executable?
Well, suppose you are in the development stage, and don't care that much about optimizing the code yet. Suppose further that you want to test your code as both a shared library, and as part of PIE and non-PIE executable.
Under above conditions, you could either compile your code 3 times (with and without -fPIC, and with -fPIE), or you could compile it once (with -fPIC) and link it into all 3 of shared library, PIE and non-PIE executable. Doing this saves a lot of compilation time, and some build system complexity.
TL;DR: putting -fPIC objects into executables and static libraries has its place, and you should understand the reason why you are doing it (if you end up doing it).
Update:
Code in an object file is always relocatable
Correct.
is it position-independent code?
No: not all relocatable code is position-independent.
Position-independent code is a subset of relocatable code. Relocatable code can have relocations that apply to any section. Position-independent code must not have any relocations against .text (and .rodata).

How can I statically link standard library to my C++ program?

I'm using Code::Blocks IDE(v13.12) with GNU GCC Compiler.
I want to the linker to link static versions of required runtime libraries for my programs, how may I do this?
I already know that my executable size will increase. Would you please tell me other the downsides?
What about doing this in Visual C++ Express?
Since nobody else has come up with an answer yet, I will give it a try. Unfortunately, I don't know that Code::Blocks IDE so my answer will only be partial.
1 How to Create a Statically Linked Executable with GCC
This is not IDE specific but holds for GCC (and many other compilers) in general. Assume you have a simplistic “hello, world” program in main.cpp (no external dependencies except for the standard library and runtime library). You'd compile and statically link it via:
Compile main.cpp to main.o (the output file name is implicit):
$ g++ -c -Wall main.cpp
The -c tells GCC to stop after the compilation step (not run the linker). The -Wall turns on most diagnostic messages. If novice programmers would use it more often and pay more attention to it, many questions on this site would not have been asked. ;-)
Link main.o (could list more than one object file) statically pulling in the standard and runtime library and put the executable in the file main:
$ g++ -o main main.o -static
Without using the -o main switch, GCC would have put the final executable in the not so well-named file a.out (which once eventually stood for “assembly output”).
Especially at the beginning, I strongly recommend doing such things “by hand” as it will help get a better understanding of the build tool-chain.
As a matter of fact, the above two commands could have been combined into just one:
$ g++ -Wall -o main main.cpp -static
Any reasonable IDE should have options for specifying such compiler / linker flags.
2 Pros and Cons of Static Linking
Reasons for static linking:
You have a single file that can be copied to any machine with a compatible architecture and operating system and it will just work, no matter what version of what library is installed.
You can execute the program in an environment where the shared libraries are not available. For example, putting a statically linked CGI executable into a chroot() jail might help reduce the attack surface on a web server.
Since no dynamic linking is needed, program startup might be faster. (I'm sure there are situations where the opposite is true, especially if the shared library was already loaded for another process.)
Since the linker can hard-code function addresses, function calls might be faster.
On systems that have more than one version of a common library (LAPACK, for example) installed, static linking can help make sure that a specific version is always used without worrying about setting the LD_LIBRARY_PATH correctly. Obviously, this is also a disadvantage since now you cannot select the library any more without recompiling. If you always wanted the same version, why would you have installed more than one in the first place?
Reasons against static linking:
As you have already mentioned, the size of the executable might grow dramatically. This depends of course heavily on what libraries you link in.
The operating system might be smart enough to load the text section of a shared library into the RAM only once if several processes need the library at the same time. By linking statically, you void this advantage and the system might run short of memory more quickly.
Your program no longer profits from library upgrades. Instead of simply replacing one shared library with a (hopefully ABI compatible) newer release, a system administrator will have to recompile and reinstall every program that uses it. This is the most severe drawback in my opinion.
Consider for example the OpenSSL library. When the Heartbleed bug was discovered and fixed earlier this year, system administrators could install a patched version of OpenSSL and restart all services to fix the vulnerability within a day as soon as the patch was out. That is, if their services were linking dynamically against OpenSSL. For those that have been linked statically, it would have taken weeks until the last one was fixed and I'm pretty sure that there is still proprietary “all in one” software out in the wild that did not see a fix up to the present day.
Your users cannot replace a shared library on the fly. For example, the torsocks script (and associated library) allows users to replace (via setting LD_PRELOAD appropriately) the networking system library by one that routes their traffic through the Tor network. And this even works for programs whose developers never even thought of that possibility. (Whether this is secure and a good idea is subject of an unrelated debate.) An other common use-case is debugging or “hardening” applications by replacing malloc and the like with specialized versions.
In my opinion, the disadvantages of static linking outweigh the advantages in all but very special cases. As a rule of thumb: link dynamically if you can and statically if you have to.
A Addendum
As Alf has pointed out (see comments), there is a special GCC option to selectively link in the C++ standard library statically but not link the whole program statically. From the GCC manual:
-static-libstdc++
When the g++ program is used to link a C++ program, it normally automatically links against libstdc++. If libstdc++ is available as a shared library, and the -static option is not used, then this links against the shared version of libstdc++. That is normally fine. However, it is sometimes useful to freeze the version of libstdc++ used by the program without going all the way to a fully static link. The -static-libstdc++ option directs the g++ driver to link libstdc++ statically, without necessarily linking other libraries statically.
In Visual C++, the /MT option does a static link and the /MD option does a dynamic link. (see http://msdn.microsoft.com/en-us/library/2kzt1wy3.aspx)
I'd recommend using /MD and redistributing the C++ runtime, which is freely available from Microsoft. Once the C++ runtime is installed, than any program requiring the run time will continue to work. You would need to pass the proper option to tell the compiler which runtime to use. There is a good explanation here, Should I compile with /MD or /MT?
On Linux, I'd recommend redistributing libstdc++ instead of a static link. If their system libstdc++ works, I'd let the user just use that. System libraries, such as libpthread and libgcc should just use the system default. This requires compiling the program on a system with symbols compatible with all linux versions you are distributing for.
On Mac OS X, just redistribute the app with dynamic linking to libstdc++. Anyone using the same OS version should be able to use your program.

Dynamic linking: is it possible to disable automatic loading of non used shared objects?

I have a limited knowledge of dynamic libraries and I usually have problems related to libraries that I do not understand.
I recently learned of libraries from google search and especially from the following links:
Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?.
http://www.ibm.com/developerworks/library/l-dynamic-libraries/. That article was very useful in understanding the dynamic libraries and their usage:
If I understood well (correct me if I am wrong), there are two possible usages of shared objects:
dynamic linking: the shared object is automatically loaded by the dynamic linker when the program starts.
dynamic loading: the share object is loaded and used under the program control at runtime through the dynamic loading API (dlopen, dlerror, dlsym and dlclose). That option is useful for plugins.
If I got everything right, in the case of dynamic linking, all the symbols are verified at compilation time. This allows the compiler/linker to know exactly which shared object is effectively used by the program and which one is not used.
Now, it happens that the dynamic linker is always invoked at runtime even if the shared object is not used. It can be verified by linking an empty program against libraries that are not in locations searchable at runtime, and the execution will fail. Linking a program against library that is not actually used in the program can happen when there are updates and the use of a library is no longer necessary. It also happen when one isolates a part of the program for debugging, and link against all the libraries of the main program.
My question is: is there an option to ask the compiler/linker to not include reference to shared objects that do not have symbols referred to in the program?
Is there any issue that prevent the compiler from doing that?
The following posts share some similarities with the present question, but none of them has an accepted answer, nor an answer that satisfies my curiosity:
https://stackoverflow.com/questions/22617744/how-to-disable-the-runtime-checking-of-shared-object-if-they-are-not-used
Delay-Load equivalent in unix based systems
If you happen to use g++/ld there are a few suggestions spelled out on How to remove unused C/C++ symbols with GCC and ld?
For example:
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
-dead_strip
-dead_strip_dylibs
However I'm actually not sure it's possible for the compiler to do this in the general case. Consider a dependent shared library that has a weak reference to the library that you want to remove from your link line: How would the compiler know that it's safe to remove the library and/or symbols at that point?

Can I bind shared libraries with "gcc -llibnamehere", in addition to static ones?

Two projects:
The loader, a standalone executable (only loads modules)
any module, a shared library (librainbowdash.so) (there can be many modules)
Now, the module is linked with -lpthreads, but I get some weird errors which make me think pthreads is bound as a shared object only, and when the loader loads a module pthreads are not being loaded. (debugging with GDB is impossible, that kind of errors).
I thought the -l switch only allows static libraries? Does it? Doesn't it?
-l specifies library names. It is up to the linker to resolve the library names into static libraries or shared objects to be linked against as appropriate. And it is the loader's job to load any shared libraries used.
If you look in the ld and gcc manpages, it is possible to define 'option groups', I might be a bit rusty, but it should go something like
gcc -o yourprog -Wl,-Bstatic yourprog.c -lstatic_lib -Wl,-Bdynamic -ldynamic_lib
The exact incantation is probably wrong.
From experience, passing the full path of the static library has proven to be much less of a headache than figuring out the exact form of the above mentioned incantation.
That being said, I doubt you would gain much benefit by statically linking pthreads.
I think you might also use
gcc -pthread ...
as well.
Using a simple -static will make the output and all its dependencies static. This is probably not what you want.
It might be that your shared library points to lpthread in a wrong location. Using ldd tool e.g., ldd libfoo.so is often a very effective way to find such linking problems.