GCC -fPIC option - c++

I have read about GCC's Options for Code Generation Conventions, but could not understand what "Generate position-independent code (PIC)" does. Please give an example to explain me what does it mean.

Position Independent Code means that the generated machine code is not dependent on being located at a specific address in order to work.
E.g. jumps would be generated as relative rather than absolute.
Pseudo-assembly:
PIC: This would work whether the code was at address 100 or 1000
100: COMPARE REG1, REG2
101: JUMP_IF_EQUAL CURRENT+10
...
111: NOP
Non-PIC: This will only work if the code is at address 100
100: COMPARE REG1, REG2
101: JUMP_IF_EQUAL 111
...
111: NOP
EDIT: In response to comment.
If your code is compiled with -fPIC, it's suitable for inclusion in a library - the library must be able to be relocated from its preferred location in memory to another address, there could be another already loaded library at the address your library prefers.

I'll try to explain what has already been said in a simpler way.
Whenever a shared lib is loaded, the loader (the code on the OS which load any program you run) changes some addresses in the code depending on where the object was loaded to.
In the above example, the "111" in the non-PIC code is written by the loader the first time it was loaded.
For not shared objects, you may want it to be like that because the compiler can make some optimizations on that code.
For shared object, if another process will want to "link" to that code it must read it to the same virtual addresses or the "111" will make no sense. But that virtual-space may already be in use in the second process.

Code that is built into shared libraries should normally be position-independent code, so that the shared library can readily be loaded at (more or less) any address in memory. The -fPIC option ensures that GCC produces such code.

The link to a function in a dynamic library is resolved when the library is loaded or at run time. Therefore, both the executable file and dynamic library are loaded into memory when the program is run.
The memory address at which a dynamic library is loaded cannot be determined in
advance, because a fixed address might clash with another dynamic library requiring the same address.
There are two commonly used methods for dealing with this problem:
1.Relocation. All pointers and addresses in the code are modified, if necessary, to fit the actual load address. Relocation is done by the linker and the loader.
2.Position-independent code. All addresses in the code are relative to the current position. Shared objects in Unix-like systems use position-independent code by default. This is less efficient than relocation if program run for a long time, especially in 32-bit mode.
The name "position-independent code" actually implies the following:
The code section contains no absolute addresses that need relocation, but only self relative
addresses. Therefore, the code section can be loaded at an arbitrary memory address and shared between multiple processes.
The data section is not shared between multiple processes because it often contains
writeable data. Therefore, the data section may contain pointers or addresses that
need relocation.
All public functions and public data can be overridden in Linux. If a function
in the main executable has the same name as a function in a shared object, then the
the version in main will take precedence, not only when called from main, but also when
called from the shared object. Likewise, when a global variable in the main has the same
name as a global variable in the shared object, then the instance in main will be
used, even when accessed from the shared object. This so-called symbol interposition is intended to mimic the behavior of static libraries.
A shared object has a table of pointers to its functions, called procedure linkage table (PLT), and a table
of pointers to its variables called global offset table (GOT) in order to implement this "override" feature.
All accesses to functions and public variables go through these tables.
p.s. Where dynamic linking cannot be avoided, there are various ways to avoid the time-consuming features of the position-independent code.
You can read more from this article: http://www.agner.org/optimize/optimizing_cpp.pdf

Adding further...
Every process has same virtual address space (If randomization of virtual address is stopped by using a flag in linux OS)
(For more details Disable and re-enable address space layout randomization only for myself)
So if its one exe with no shared linking (Hypothetical scenario), then we can always give same virtual address to same asm instruction without any harm.
But when we want to link shared object to the exe, then we are not sure of the start address assigned to shared object as it will depend upon the order the shared objects were linked.That being said, asm instruction inside .so will always have different virtual address depending upon the process its linking to.
So one process can give start address to .so as 0x45678910 in its own virtual space and other process at the same time can give start address of 0x12131415 and if they do not use relative addressing, .so will not work at all.
So they always have to use the relative addressing mode and hence fpic option.

A minor addition to the answers already posted: object files not compiled to be position independent are relocatable; they contain relocation table entries.
These entries allow the loader (that bit of code that loads a program into memory) to rewrite the absolute addresses to adjust for the actual load address in the virtual address space.
An operating system will try to share a single copy of a "shared object library" loaded into memory with all the programs that are linked to that same shared object library.
Since the code address space (unlike sections of the data space) need not be contiguous, and because most programs that link to a specific library have a fairly fixed library dependency tree, this succeeds most of the time. In those rare cases where there is a discrepancy, yes, it may be necessary to have two or more copies of a shared object library in memory.
Obviously, any attempt to randomize the load address of a library between programs and/or program instances (so as to reduce the possibility of creating an exploitable pattern) will make such cases common, not rare, so where a system has enabled this capability, one should make every attempt to compile all shared object libraries to be position independent.
Since calls into these libraries from the body of the main program will also be made relocatable, this makes it much less likely that a shared library will have to be copied.

Related

Getting the base address of a dlopen'd library [duplicate]

This question already has an answer here:
Get loaded address of a ELF binary, dlopen is not working as expected
(1 answer)
Closed 1 year ago.
On Windows, the HMODULE returned from LoadLibrary is the base pointer of the loaded DLL.
The shared library I use is a headless version of a game. To save its state, I parse the DLL to locate the .data and .bss sections, add their VAs to the DLL's base address, then copy the right amount of data from each section.
In principle, the same should be doable on Linux. However, I'm stuck on how to get the base address of a dlopen()ed ELF library, since the void* returned from dlopen() is a pointer to the shared library's link_map AFAIK.
How might I accomplish this?
EDIT 1: The "state" of the shared library is the state of all the static variables in it. To save that state, I copy the sections that contain them (.data and .bss) to an alternate buffer (in memory). When I restore that state, I write the alternate buffer's data back to the shared library's .data and .bss.
Read and parse the file /proc/self/maps.
That said, the notion of saving the state of a noncooperating library by dumping its data section is rather questionable. Imagine there is a pointer. It points at some other variable else in the data section. There is no way you can tell it's a pointer - from the standpoint of the controlling program, it's just a word in memory that by pure accident contains a value that corresponds to a valid address. If the library is loaded at some other address and the state is restored, the said pointer won't be valid anymore.

Does dlopen re-load already loaded dependencies? If so, what are the implications?

I have a program, code-named foo. foo depends on common.so and is linked to it in the normal way (sorry I don't know the technical way to say that). When foo is running it then dynamically loads bar.so using dlopen(). So far so good.
But, bar.so also depends on common.so. Will dlopen() re-load common.so (from what I've read it loads any required dependencies recursively), or will it detect that it is already loaded? If it does re-load it, could that cause problems in my program? Both foo and bar.so need to see the changes in common.so that either of them make to static variables there.
Maybe my design needs to be changed or requires use of -rdynamic (which I also don't quite understand properly yet)?
The POSIX spec for dlopen() says:
Only a single copy of an executable object file shall be brought into
the address space, even if dlopen() is invoked multiple times in
reference to the executable object file, and even if different
pathnames are used to reference the executable object file.
On Linux, this is implemented using a reference count; until dlclose is called an equal number of times, the shared object will remain resident.
[update]
I realize you are asking about shared objects implicitly loaded as dependencies, but the same principle applies. Otherwise, many things would break... In particular, global constructors in the shared object would run multiple times, which would wreak havoc.

All writable and executable program memory segment types

In "Secure coding in C and C++", the author mentions,
"The W^X policy allows a memory segment to be writable or executable, but no both. This policy cannot prevent overwriting targets such as those required by atexit() that need to be both writable at runtime and executable. "
I have two questions:
The atexit needs to register a function by a function pointer as argument. The function pointed by the function pointer is either defined in the current program where linker will find the definition or the runtime loader will finds the function body. In either cases, we will know the function definition. Then it only needs to be executable. So why the memory segment for atexit() needs to be both writable at runtime and executable?
Can any C/C++ expert tells me that what other types of APIs that have this property (writable at runtime and executable)? (let's limit the scope to linux only)
Fundamentally, memory that can be written AND executed is very easily tempered with and can result in exploits more easily since there is no need to use ROP or other fancy methods, you can simply write anywhere in the segment the code to execute and branch to it.
In your quote, the meaning of targets in this context is very likely to be a list of function pointers called on exit. The list itself needs to be writable/mutable as per the C API. The code these functions point to need only be executable. Here again, because the list is mutable, you could exploit a program by simply modifying this list by inserting a pointer to your code and force the program to exit which would execute your code. In this context, keeping all memory segments writable OR executable will not save you since 2 different segments are used here (one writable with the function pointer list, the other executable with the code).
Writable & executable memory segments are required by anything that generates code dynamically at runtime: JIT, kernel, executable unpackers, etc. For each of these, there is no technical requirement that the segments hold both properties at the same time. The memory can be allocated writable first, the code copied/generated and with a call to mprotect(), be made executable (and removing the writable property). The only scenario I can see that would benefit from having both properties at the same time is perhaps in a memory constrained environment (eg: unpack an executable in place).
Note that some platforms do not support executable memory to be allocated in user space: Xbox360 & PS3 for example do not support JIT. (The kernel/api support it but you will not be able to release your software, Microsoft and Sony will refuse your submissions as such the feature can only be used in development.)

Sharing memory between modules

I was wondering how to share some memory between different program modules - lets say, I have a main application (exe), and then some module (dll). They both link to the same static library. This static library will have some manager, that provides various services. What I would like to achieve, is to have this manager shared between all application modules, and to do this transparently during the library initialization.
Between processes I could use shared memory, but I want this to be shared in the current process only.
Could you think of some cross-platform way to do this? Possibly using boost libraries, if they provide some facilities to do this.
Only solution I can think of right now, is to use shared library of the respective OS, that all other modules will link to at runtime, and have the manager saved there.
EDIT:
To clarify what I actually need:
I need to find out, if the shared manager was already created (the answers below already provided some ways to do that)
Get the pointer to the manager, if it exists, or Set the pointer somewhere to the newly created manager object.
I think you're going to need assistance from a shared library to do this in any portable fashion. It doesn't necessarily need to know anything about the objects being shared between modules, it just needs to provide some globally-accessible mapping from a key (probably a string) to a pointer.
However, if you're willing to call OS APIs, this is feasible, and I think you may only need two implementations of the OS-specific part (one for Windows DLLs and GetProcAddress, one for OSes which use dlopen).
As each module loads, it walks the list of previously loaded modules looking for any that export a specially-named function. If it finds one (any, doesn't matter which, because the invariant is that all fully-loaded modules are aware of the common object), it gets the address of the common object from the previously loaded module, then increments the reference count. If it's unable to find any, it allocates new data and initializes the reference count. During module unload, it decrements the reference count and frees the common object if the reference count reached zero.
Of course it's necessary to use the OS allocator for the common object, because although unlikely, it's possible that it is deallocated from a different library from the one which first loaded it. This also implies that the common object cannot contain any virtual functions or any other sort of pointer to segments of the different modules. All its resources must by dynamically allocated using the OS process-wide allocator. This is probably less of a burden on systems where libc++ is a shared library, but you said you're statically linking the CRT.
Functions needed in Win32 would include EnumProcessModules, GetProcAddress, HeapAlloc, and HeapFree, GetProcessHeap and GetCurrentProcess.
Everything considered, I think I would stick to putting the common object in its own shared library, which leverages the loader's data structures to find it. Otherwise you're re-inventing the loader. This will work even when the CRT is statically linked into several modules, but I think you're setting yourself up for ODR violations. Be really particular about keeping the common data POD.
For use from the current process only, you don't need to devise any special function or structure.
You could do it even without any function but it is more safe and cross platform friendly to define set of functions providing access to the shared data. And these functions could be implemented by the common static library.
I think, only concern of this setup is that: "Who will own the data?". There must exist one and only one owner of the shared data.
With these basic idea, we could sketch the API like this:
IsSharedDataExist // check whether of not shared data exist
CreateSharedData // create (possibly dynamically) shared data
DestroySharedData // destroy shared data
... various data access API ...
Or C++ class with the Singleton pattern will be appropriate.
UPDATE
I was confused. Real problem can be defined as "How to implement a Singleton class in a static library that will be linked with multiple dynamic loading library (will be used in the same process) in platform independent way".
I think, basic idea is not much different but make sure the singleton is the really single is the additional problem of this setup.
For this purpose, you could employ Boost.Interprocess.
#include <boost/config.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
...
boost::interprocess::named_mutex* singleton_check = 0;
// in the Create function of the singleton
try {
singleton_check = new boost::interprocess::named_mutex(boost::interprocess::create_only, "name_of_the_mutex" );
// if no exception throw, this is the first time execution
}
catch (...)
{
}
Freeing the named_mutex is as simple as delete singleton_check.
UPDATE#2
Another suggestion.
I think, we should not place shared data in the common static library. If we can not ensure globally unique data, it is not only tricky platform dependent implementation problems but also waste of memory and global resources.
If you prefer static library implementation you should make two static libraries. One for the server/creator of the shared data, one for users of that shared data. Server library define and provide access to the Singleton. Client library provide various data access method.
This is effectively same as the Singleton implementation without static libraries.
You can use boost::interprocess http://www.boost.org/doc/libs/1_45_0/doc/html/interprocess.html
and on Windows you can create a shared segment in your DLL that will be shared by all processes using #pragma's: http://www.codeproject.com/KB/DLL/data_seg_share.aspx
As per MSDN I see there are only two ways to share data between modules
Using data_seg pragma
Use shared memory.
As someone pointed out Shared Segment works only for two instances of the same dll so we are left with only one choice to use Memory-Mapped Files technique.

Static Global Fields in a Shared Library - Where do they go?

I have a cpp file from which I am generating a shared library (using autofoo and the like). Within the cpp file, I have declared a couple of static fields that I use throughout the library functions.
My question is 2-part:
1) Where are these fields stored in memory? It's not as if the system instantiates the entire library and keeps it in memory... the library, after all, really is just a bunch of hooks.
2) Is there a better way to do this? The reason I did it to begin with is that I want to avoid requiring the user to pass the fields into every library function call as parameters.
Thanks!
The code used to load shared libraries:
Generally (each has minor technical differences):
Loads the shared lib into memory
Walks the symbol table and updates the address of function in the DLL
Initializes any global static members using their constructor.
Note: The shared lib loader need not do all this at the load point.
It may do some of these jobs lazily (implementation detail). But they will be done before use.
Any Global staic POD variables (things with no constructor). Will be stored in special memory segments depending on weather they are initialized or not (again an implementation detail). If they were initialized then they will be loaded with the from disk (or shared lib source) with that value already defined.
So the answer to your questions:
undefined.
The library is code segments
Initialized data segments
Uninitialized data segments
Some utility code that knows how to link it into a running application.
Better than what exactly
Good practice would suggest passing values to a function rather than relying on global state. But to be honest that is an over generalization and really down to the problem.
Logically speaking, it is as if the system instantiates the entire library. In practice, only the code is really "shared" in a shared library, anybody who links against it will get a copy of the data. (Well maybe not read-only data). So, as far your questions go:
1) Your process will get a copy of the variable somehow (dependent on how the shared library system on your OS works).
2) I don't see a problem with this approach.