What does -fPIC mean when building a shared library? - c++

I know the '-fPIC' option has something to do with resolving addresses and independence between individual modules, but I'm not sure what it really means. Can you explain?

PIC stands for Position Independent Code.
To quote man gcc:
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table. This option makes a difference on AArch64, m68k, PowerPC and SPARC.
Use this when building shared objects (*.so) on those mentioned architectures.

The f is the gcc prefix for options that "control the interface conventions used
in code generation"
The PIC stands for "Position Independent Code", it is a specialization of the fpic for m68K and SPARC.
Edit: After reading page 11 of the document referenced by 0x6adb015, and the comment by coryan, I made a few changes:
This option only makes sense for shared libraries and you're telling the OS you're using a Global Offset Table, GOT. This means all your address references are relative to the GOT, and the code can be shared accross multiple processes.
Otherwise, without this option, the loader would have to modify all the offsets itself.
Needless to say, we almost always use -fpic/PIC.

man gcc says:
-fpic
Generate position-independent code (PIC) suitable for use in a shared
library, if supported for the target machine. Such code accesses all
constant addresses through a global offset table (GOT). The dynamic
loader resolves the GOT entries when the program starts (the dynamic
loader is not part of GCC; it is part of the operating system). If
the GOT size for the linked executable exceeds a machine-specific
maximum size, you get an error message from the linker indicating
that -fpic does not work; in that case, recompile with -fPIC instead.
(These maximums are 8k on the SPARC and 32k on the m68k and RS/6000.
The 386 has no such limit.)
Position-independent code requires special support, and therefore
works only on certain machines. For the 386, GCC supports PIC for
System V but not for the Sun 386i. Code generated for the
IBM RS/6000 is always position-independent.
-fPIC
If supported for the target machine, emit position-independent code,
suitable for dynamic linking and avoiding any limit on the size of
the global offset table. This option makes a difference on the m68k
and the SPARC.
Position-independent code requires special support, and therefore
works only on certain machines.

Related

Is the address of a C function symbol constant between compiles

I have been experimenting with symbol visibility in my shared library and noticed that the address / value of an exported function symbol does not seem to change. Are these addresses constant between compiles, or is this a coincidence?
The addresses where obtained on a Virtual Machine running Arch Linux using the command readelf with option -W and --dyn-syms.
The reason I'm asking is that I am wondering if the address of a templated C++ function could be used as an uuid for an object type. This is of interest in my serialization routine where I would like to setup an id system which is constant between compiles (object types are registered statically at initialization time, so order is not defined).
If build process is unchanged (i.e. compiler, linker, Makefiles and code remain the same) the static address in ELF file will not change either. But if any component changes, all bets are off.
More importantly, dynamic address (assigned by dynamic loader) will be different on each run due to address-space randomization in modern Linux distros so you should not rely on it.
When you build your code you can choose either to build it position dependent or position independent this has nothing to do with static build (though you can't build a position independent static binary). Position dependent binaries (given the same sources, compiler and build flags) will always generate the same addresses, but as I say further down, I wouldn't rely on it in release.
This is supplied by GCC's options -fPIE (Position independent executable), -fPIC (Position independent code), -pie. ELF executable files can be built as either position dependent or independent but shared objects (libraries) will always be built as position independent as you need to be able to load them in a random location given to you by the OS. From GCC's MAN page:
-fPIC
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table.
-fpie
-fPIE
These options are similar to -fpic and -fPIC, but generated position independent code can be only linked into executables. Usually these options are used when -pie GCC option will be used during linking.
-pie
Produce a position independent executable on targets which support it. For predictable results, you must also specify the same set of options that were used to generate code (-fpie, -fPIE, or model suboptions) when you specify this option.
When loading a PIC shared object you cannot assume it will reside in the same place for each run, as it might be affected by ASLR that is driven by the kernel.
In any way I don't think it's a good practice to use memory addresses as uuids to classes as these might change, even more so if these template classes are implemented as part of a shared object.

I have questions about gdb memory address

When I use gcc to compile a C++ program to a 32 bit and I run it through gdb. When I disassemble the main function the gdb reads out the memory addresses EXAMPLE: 0x585583d0 and in other peoples examples of 32 bit it reads out 0x080483d0. Im using Kali linux and am wondering if its just because its a different distribution or am I missing some C libraries?
am wondering if its just because its a different distribution or am I missing some C libraries?
This is because you built a position independent executable, while other people didn't.
The default load address for non-PIE binaries on 32-bit x86 systems is 0x08048000. The default load address for PIE binaries under GDB is somewhere in the 0x5855.... region (it can be very random outside of GDB; if you set disable-randomization off, you'll observe that the executable starts "jumping around" to different addresses).
Some newer distributions default to building PIE binaries. You can avoid this with:
gcc -no-pie main.c
The resulting binary should now start around 0x08048xxx.
You can check whether you have a PIE binary or not with file a.out -- it will say executable for non-PIE binary, and shared library for a PIE binary. See also this answer.

Object code relocation and Intel Pin interaction

I am working on a multiprocessor architectural simulator that uses Intel Pin to instrument C++ executable binaries and report interesting events (e.g., some function calls, thread create/finish, etc.). Basically, I build an instruction-decode cache of all instructions when their images are loaded and analyze instruction execution afterwards. So it is important for instruction addresses at image-load time to be the same as (or at least get updated synchronously with) instruction addresses at run-time.
Intel Pin API (e.g., IMG_AddInstrumentFunction) enables me to get information about the loaded images (executables and shared libraries) such as entry points, low/high address, etc.
However, I noticed that the instrumented program executes instructions at addresses that do not belong to any of the loaded images. By inspection, I am suspecting that the dynamic loader (image /lib64/ld-linux-x86-64.so.2 on 64-bit Centos 6.3) is relocating the main executable in memory by calling routine _dl_relocate_object.
I understand the need for relocatable code and all that stuff. I just need pointers to a good documentation (or just a brief description/advice) on how/when these relocations might happen (at load-time and runtime) so that I can take them into account in my architectural simulator. In other words, the mechanism used to achieve it (library functions that I need to instrument, conditions, or maybe randomization if there is any, g++ compiler switches that can be used to suppress relocation, etc).
P.S.: I am only targeting x86/x86_64 architectures
Relocation are processor specific, so ARM and x86-64 and x86 have different relocations (because their instruction set is different).
Relocation are also operating system specific, but some related OSes try to have the same relocations, e.g. Solaris and Linux for x86-64
They are described in detail in the ABI (application binary interface) specification "System V Application Binary Interface AMD64 Architecture Processor Supplement". The original x86-64 ABI used to be on http://www.x86-64.org/documentation.html
but that site is not responding since several weeks. An old copy is on this link and a newer one is here
There is also the X32 ABI
See also this question.

position independent executable (-pie) for arm(cortex-m3)

I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.
The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.
There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly

boost testing fpic linking error

I have been staring and googling this but I cannot see what I have done.
I have a working project on a 32 bit machine. I have just pulled the repository to a 64 bit machine (which was the original development machine for the project) and I am now getting the following linking errors when trying to build the testing binary
/usr/bin/ld: error: /usr/lib/libboost_test_exec_monitor-mt.a(unit_test_log.o): requires dynamic R_X86_64_PC32 reloc against 'std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&)' which may overflow at runtime; recompile with -fPIC
/usr/bin/ld: error: /usr/lib/libboost_test_exec_monitor-mt.a(unit_test_log.o): requires unsupported dynamic reloc 11; recompile with -fPIC
I really can't see what I could have changed. The boost libraries are pulled straight from the ubuntu repositories. Anyone with any clues.
You are linking a static library (the Boost one) into a dynamic library. Static libraries are not typically built with -fPIC, as they are assumed to be linked into a program only, not another library.
On 32 bit x86, such code is silently fixed up by relocating the portions of the code that are not position-independent to the load address; this makes the affected pages unshareable. For this to work, the relocation entry needs to be converted from a link time to a run time relocation.
This conversion fails on x86 64 bit; the two error messages mean
The relocation is applied to a 32 bit value, but the displacement may be larger than that (shared libraries live at random addresses for security reasons, which places them wide apart on 64 bit platforms, and
for this reason, there is no dynamic relocation type corresponding to the relocation entry from the static library.
Thus, the linker cannot generate code that would be loadable, and rightfully refuses to do so.
To solve this problem, you need to link against the shared libboost_test_exec_monitor-mt, or build a static library yourself.
Shared libraries can be set up in two ways. One is with absolute addresses, so that each binary that loads the shared object gets it own copy of the shared code, but the calls have no extra indirection and are as fast as possibly. The other way is with "PIC" or position independent code. This adds an extra layer of indirection but then one copy of the shared library code can serve all applications that need it (because the extra layer of indirection is per application binary).
What you're seeing is that when you try to build in 64-bit, the absolute addresses from the first option aren't able to force a particular 64-bit address (possibly some object file in your code doesn't support 64-bit addresses) and the compiler is telling you that you have to use option 2 with PIC enabled. In order to do this you'll need to compile all your code and libraries with -fPIC assuming g++/gcc. You may also need to link the library with -shared but I can't recall the precise times you have to do that.
Okay, Simon's answer really helped me along the way.
The ultimate solution to this particular problem was to use
libboost_unit_test_framework
(which comes with a shared library) in place of
libboost_test_exec_monitor
(which does not)