Is the address of a C function symbol constant between compiles - c++

I have been experimenting with symbol visibility in my shared library and noticed that the address / value of an exported function symbol does not seem to change. Are these addresses constant between compiles, or is this a coincidence?
The addresses where obtained on a Virtual Machine running Arch Linux using the command readelf with option -W and --dyn-syms.
The reason I'm asking is that I am wondering if the address of a templated C++ function could be used as an uuid for an object type. This is of interest in my serialization routine where I would like to setup an id system which is constant between compiles (object types are registered statically at initialization time, so order is not defined).

If build process is unchanged (i.e. compiler, linker, Makefiles and code remain the same) the static address in ELF file will not change either. But if any component changes, all bets are off.
More importantly, dynamic address (assigned by dynamic loader) will be different on each run due to address-space randomization in modern Linux distros so you should not rely on it.

When you build your code you can choose either to build it position dependent or position independent this has nothing to do with static build (though you can't build a position independent static binary). Position dependent binaries (given the same sources, compiler and build flags) will always generate the same addresses, but as I say further down, I wouldn't rely on it in release.
This is supplied by GCC's options -fPIE (Position independent executable), -fPIC (Position independent code), -pie. ELF executable files can be built as either position dependent or independent but shared objects (libraries) will always be built as position independent as you need to be able to load them in a random location given to you by the OS. From GCC's MAN page:
-fPIC
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table.
-fpie
-fPIE
These options are similar to -fpic and -fPIC, but generated position independent code can be only linked into executables. Usually these options are used when -pie GCC option will be used during linking.
-pie
Produce a position independent executable on targets which support it. For predictable results, you must also specify the same set of options that were used to generate code (-fpie, -fPIE, or model suboptions) when you specify this option.
When loading a PIC shared object you cannot assume it will reside in the same place for each run, as it might be affected by ASLR that is driven by the kernel.
In any way I don't think it's a good practice to use memory addresses as uuids to classes as these might change, even more so if these template classes are implemented as part of a shared object.

Related

Does removing relocation data using `-s` affect a position dependent executable

I need to know if using -s in GCC (g++) will have any effects on the PIE. I also want to know its effects on a position-dependent executable. As far as I know, not using any linking option (like -pie and -fpie) results in a non-PIE just like when using -no-pie. Now I have an executable and that's probably non-PIE since I have not specified -pie in the link command. Can -s cause any problems? Will it improve the performance (since the exe will be smaller)?
I also checked this question and in the answer it says:
It seems pretty clear that removing relocation information would interfere with ASLR.
But ASLR only deals with position-independent executables, right? Could removing relocation data from a position-dependent executable interfere with ASLR?
After doing a bit of research, I found some info that might be correct.
From GCC Options for Linking:
-pie
Produce a dynamically linked position independent executable on targets that support it.
-no-pie
Don’t produce a dynamically linked position independent executable.
Looking at this, my guess is that both options produce position-independent executable and the only difference is that the former is dynamically linked but the latter is not dynamically linked (statically linked??). Therefore in both cases, the executable contains relocation data. However, it is still unclear to me whether the generated executable (using -s) interferes with ASLR or not.

gcc -fPIC vs. -shared

When compiling a shared library with gcc / g++, why is -fPIC not implied by -shared option? Or, said differently, is the option -fPIC required at link time?
For short, should I write:
gcc -c -fPIC foo.c -o foo.o
gcc -shared -fPIC foo.o -o libfoo.so // with -fPIC
or is the following sufficient:
gcc -c -fPIC foo.c -o foo.o
gcc -shared foo.o -o libfoo.so // without -fPIC
Code that is built into shared libraries should normally but not mandatory be position independent code, so that the shared library can readily be loaded at any address in memory. The -fPIC option ensures that GCC produces such code. It is not required, thus it is not implied in -shared so GCC gives a freedom of choice. Without this option the compiler can make some optimizations on that position dependent code.
Position dependent code may occur an error if one process wants to load more than one shared library at the same virtual address. Since libraries cannot predict what other libraries could be loaded, this problem is unavoidable with the traditional shared library concept. Virtual address space doesn't help here. If your application does not use a lot of other shared libraries and they are loaded before yours, you can predict the loading address of your library and you can define it as a base address of your position dependent library.
The shared library is supposed to be shared between processes, it may not always be possible to load the library at the same address in both. If the code were not position independent, then each process would require its own copy.

position independent executable (-pie) for arm(cortex-m3)

I'm programming for stm32 (Cortex-m3) with codesourcery g++ lite(based on gcc4.7.2 version). And I want the executables to be loaded dynamically.
I knew I have two options available:
1. relocatable elf, which needs a elf parser.
2. position independent code (PIC) with a global offset register
I prefer PIC with global offset register, because it seems it's easier to implement and I'm not familiar with elf or any elf library. Also, It's easy to generate a .bin file from an elf file with some tools.
I've tried building my program with "-msingle-pic-base -fpic" compiling options and "-pie" linking options, but then I got a linking error:
...path...ld.exe: ...path...thumb2\libstdc++.a(pure.o): relocation
R_ARM_THM_MOVW_ABS_NC against `a local symbol' can not be used when
making a shared object; recompile with -fPIC
I don't quite understand the error message. It seems the default standard c/c++ library can't go with my options and I need to get the source of the library and rebuild for my own purpose.
So,
1. Could anyone provide me any useful information/link on how to work with the position independent executable ?
2. with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
Note: Without the "-pie" linking option I can build the program. But the program fails when calling a c++ virtual function (when I'm using the IDE(keil)'s simulator to debug my program). I don't understand what's going on and what I've been missing.
----------------------------------------------------------------------
-- added 20130314
with the -msingle-pic-base option, I don't need to care too much about the GOT and ld script anymore, right?
From my experiments, the register (r9 is used in my program) should point to the beginning of the got.plt sections. Delete the "-pie" option, the linking will success, (with r9 properly set) then the c++ virtual function is called successfully. However, I still think the "-pie" option is important, which may ensure that the current standard library is position independent. Could anyone explain this for me?
----------------------------------------------------------------------
-- added 20130315
I took a look at the documents on ABI from ARM's website. But it was of little help because they are not targeting a specific platform. There seems to be a concept of EABI (I'm using sourcery's arm-none-eabi edition), but I couldn't find any documentation on "EABI" from arm's website. I can't neither find documentation on this topic from sourcery and gcc's. There're more than one implementation of PIC, so which one is the sourcery g++ using in the none-eabi case? I think the behaviors of the "-msingle-pic-base", "-fpie", "-pie" options are so poorly documented !
-----------------------------------------------------------------------
From the dis-assembly code, I just figured out that, whit the "-msingle-pic-base", the r9 should point to the base address of the ".got" section, the pointers in the .got sections are absolute pointer and the addressing of variable is similar to the description in the article : Position Independent Code (PIC) in shared libraries. So I still need to modify the ".got" sections on loading. I don't know what is the ".got.plt" section used for in my program. It seems that function calls are using PC-relative addressing.
How to build with the "-pie" or how to link a standard library compiled with "-fpic" is still a problem for me.
The error message tells you to recompile the libstdc++ library, which is most often built, when the gcc compiler is built.
Thus you must recompile your standard libraries (libstdc++, libgcc_*, libc, libm and the all) with -fPIC and link your project against them.
If you rely on prebuilt compiler packages, you're mostly out of the game in the microcontroller world. If you build your compiler yourself (which is, by the way, not too difficult, but an advanced/expert task) you are on the go.
It is also possible to compile your stdandard libraries yourself with the compiler you have. You will need the sources of libraries and figure out, how the compiler package build system builds them and you have to mimic this. Perhaps here are some experts, who can advise you on this way.
There's a nice blog post on this topic, eight years after asking the question initially, but it's there: https://mcuoneclipse.com/2021/06/05/position-independent-code-with-gcc-for-arm-cortex-m/
The general outline is that you have to:
Set up GOT from linker-generated information
Set up PLT from Program Header information
Implement a binder based on the GOT entries
Compile your library as a shared relocatable binary: -msingle-pic-base -mpic-register=r9 -mno-pic-data-is-text-relative -fPIC
Set R9 accordingly

What does -fPIC mean when building a shared library?

I know the '-fPIC' option has something to do with resolving addresses and independence between individual modules, but I'm not sure what it really means. Can you explain?
PIC stands for Position Independent Code.
To quote man gcc:
If supported for the target machine, emit position-independent code, suitable for dynamic linking and avoiding any limit on the size of the global offset table. This option makes a difference on AArch64, m68k, PowerPC and SPARC.
Use this when building shared objects (*.so) on those mentioned architectures.
The f is the gcc prefix for options that "control the interface conventions used
in code generation"
The PIC stands for "Position Independent Code", it is a specialization of the fpic for m68K and SPARC.
Edit: After reading page 11 of the document referenced by 0x6adb015, and the comment by coryan, I made a few changes:
This option only makes sense for shared libraries and you're telling the OS you're using a Global Offset Table, GOT. This means all your address references are relative to the GOT, and the code can be shared accross multiple processes.
Otherwise, without this option, the loader would have to modify all the offsets itself.
Needless to say, we almost always use -fpic/PIC.
man gcc says:
-fpic
Generate position-independent code (PIC) suitable for use in a shared
library, if supported for the target machine. Such code accesses all
constant addresses through a global offset table (GOT). The dynamic
loader resolves the GOT entries when the program starts (the dynamic
loader is not part of GCC; it is part of the operating system). If
the GOT size for the linked executable exceeds a machine-specific
maximum size, you get an error message from the linker indicating
that -fpic does not work; in that case, recompile with -fPIC instead.
(These maximums are 8k on the SPARC and 32k on the m68k and RS/6000.
The 386 has no such limit.)
Position-independent code requires special support, and therefore
works only on certain machines. For the 386, GCC supports PIC for
System V but not for the Sun 386i. Code generated for the
IBM RS/6000 is always position-independent.
-fPIC
If supported for the target machine, emit position-independent code,
suitable for dynamic linking and avoiding any limit on the size of
the global offset table. This option makes a difference on the m68k
and the SPARC.
Position-independent code requires special support, and therefore
works only on certain machines.

GCC / Linux: adding a static library to a .so?

I've a program that implements a plugin system by dynamically loading a function from some plugin_name.so (as usual).
But in turn I've a static "helper" library (lets call it helper.a) whose functions are used both from the main program and the main function in the plugin. They don't have to inter-operate in any way, they are just helper functions for text manipulation and such.
This program, once started, cannot be reloaded or restarted, that's why I'm expecting to have new "helper" functionality from the plugin, not from the main program.
So my questin is.. is it possible to force this "plugin function code" in the .so to use (statically link against?) a different (perhaps newer) version of "helper" than the main program?
How could this be done? perhaps by statically linking or otherwise adding helper.a to plugin_name.so?
Nick Meyer's answer is correct on Windows and AIX, but is unlikely to be correct on every other UNIX platform by default.
On most UNIX platforms, the runtime loader maintains a single name space for all symbols, so if you define foo_helper in a.out, and also in plugin.so, and then call foo_helper from either, the first definition visible to the runtime loader (usually that from a.out) is used by default for both calls.
In addition, the picture is complicated by the fact that foo_helper may not be exported from a.out (and thus may be invisible to runtime loader), unless you use -rdynamic flag, or some other shared library references it. In other words, things may appear to work as Nick described them, then you add a shared library to the a.out link line, and they don't work that way anymore.
On ELF platforms (such as Linux), you have great control over symbol visibility and binding. See description of -fvisibility=hidden and -rdynamic in GCC man page, and also -Bsymbolic in linker man page.
Most other UNIX platforms have some way to control symbol bindings as well, but this is necessarily platform-specific.
If your main program and dynamic library both statically link to helper.a, then you shouldn't need to worry about mixing versions of helper.a (as long as you don't do things like pass pointers allocated in helper.a between the .exe and .so boundaries).
The code required from the helper.a is inserted to the actual binary when you link against it. So when you call into helper.a from the .exe, you will be executing code from the code segment of your executable image, and when you call into helper.a from the .so, you will be executing code from the portion of the address space where the .so was loaded. Even if you're calling the same function inside helper.a, you're calling two different 'instances' of that function depending on whether the call was made from the .exe or the .so.
i think this question is the same as yours. How to force symbols from a static library to be included in a shared library build?
The --whole-archive linker option should do this. You'd use it as e.g.
gcc -o libmyshared.so foo.o -lanothersharedlib -Wl,--whole-archive -lmystaticlib
and it works for me.