Load dynamic library from memory - c++

Is it possible to load a library from memory instead of from the filesystem on mac/gcc?
With windows I'm using MemoryModule but it's obviously not cross-platform compatible.

First things first, to do this I advise you use read the OS X ABI Dynamic Loader Reference.
To do this, you must use the NSCreateObjectFileImageFromMemory API.
Given a pointer to a Mach-O file in memory, this function creates and returns an NSObjectFileImage reference. The current implementation works only with bundles, so you must build the Mach-O executable file using the -bundle linker option.
The memory block that address points to, must be allocated with vm_allocate (/usr/include/mach/vm_map.h).
Make sure to abide by the requirement that vm_allocate is used for the memory block containing the module.
Once you acquire the object file image, you must use NSLinkModule function to link the module into the program.
When you call this function, all libraries referenced by the given module are added to the library search list. Unless you pass the NSLINKMODULE_OPTION_PRIVATE, NSLinkModule adds all global symbols in the module to the global symbol list.
After linking, don't forget to clean up by calling the NSDestroyObjectFileImage function.
When this function is called, the dynamic loader calls vm_deallocate (/usr/include/mach/vm_map.h) on the memory pointed to by the objectFileImage parameter.
Note that while these functions are deprecated, there is no substitute (to the best of my knowledge) using the suggested alternative dlopen et. al.

Related

All writable and executable program memory segment types

In "Secure coding in C and C++", the author mentions,
"The W^X policy allows a memory segment to be writable or executable, but no both. This policy cannot prevent overwriting targets such as those required by atexit() that need to be both writable at runtime and executable. "
I have two questions:
The atexit needs to register a function by a function pointer as argument. The function pointed by the function pointer is either defined in the current program where linker will find the definition or the runtime loader will finds the function body. In either cases, we will know the function definition. Then it only needs to be executable. So why the memory segment for atexit() needs to be both writable at runtime and executable?
Can any C/C++ expert tells me that what other types of APIs that have this property (writable at runtime and executable)? (let's limit the scope to linux only)
Fundamentally, memory that can be written AND executed is very easily tempered with and can result in exploits more easily since there is no need to use ROP or other fancy methods, you can simply write anywhere in the segment the code to execute and branch to it.
In your quote, the meaning of targets in this context is very likely to be a list of function pointers called on exit. The list itself needs to be writable/mutable as per the C API. The code these functions point to need only be executable. Here again, because the list is mutable, you could exploit a program by simply modifying this list by inserting a pointer to your code and force the program to exit which would execute your code. In this context, keeping all memory segments writable OR executable will not save you since 2 different segments are used here (one writable with the function pointer list, the other executable with the code).
Writable & executable memory segments are required by anything that generates code dynamically at runtime: JIT, kernel, executable unpackers, etc. For each of these, there is no technical requirement that the segments hold both properties at the same time. The memory can be allocated writable first, the code copied/generated and with a call to mprotect(), be made executable (and removing the writable property). The only scenario I can see that would benefit from having both properties at the same time is perhaps in a memory constrained environment (eg: unpack an executable in place).
Note that some platforms do not support executable memory to be allocated in user space: Xbox360 & PS3 for example do not support JIT. (The kernel/api support it but you will not be able to release your software, Microsoft and Sony will refuse your submissions as such the feature can only be used in development.)

Dynamically load a class from a library and use it after closing the library

TL;DR: Is it possible to load a class object from a library at runtime, close the library and then use the object as a "normal" object (after closing)?
I am trying to implement a plug-in system with some sort of "hot swap" functionality. Suppose my program expects a doSomething() function from its plugins. My idea would be to scan the fileystem for any libs in a specific folder, extract the functions and then close the lib (before using the functions!). This way, a monitor thread could just monitor changes on the filesystem and reset the function pointer in case something changed and thus plug-ins could be "hot swapped".
I believe that the function pointer would become invalid as soon as I close the library (Is that so?). Therefore my idea is to let the library return a copy of an object which does the desired functionality. In this case, I would call the lib to create the object before closing it and save the copy of the object in my program. However, since the object can use other objects/functions of the library, I am not sure if this would work, since these objects/functions would not be available, would they?
You cannot copy the object and close the library, since only data, but not the code of those objects is copied. Instead of it OS loads code of the library to the memory and all function pointers points to this region of memory. What will be if OS unloads the library?
You can implement something like this. You can have a Proxy object that contains a pointer to current loaded implementation. If a new library is detected, you can load new library, create instance of a new implementation, delete old instance of implementation, close old library. In this way you implement a "hot swap" mechanism and avoid problem with shared libraries code.
If you chose way described in item 2, beware of concurrency problems (what if another thread is scheduled when old implementation is deleted, but before the pointer is changed?).
An object is data, not code. A copy of an object is a copy of the data, but it still refers to the original code. As soon as you unload a dynamic library, its code is gone from memory, and any objects still referencing that code (i.e. of a type provided by the library) will be in trouble as soon as they are asked to execute a member function (such as the destructor).
So no, it's not possible to unload a library and keep using its code.

GCC -fPIC option

I have read about GCC's Options for Code Generation Conventions, but could not understand what "Generate position-independent code (PIC)" does. Please give an example to explain me what does it mean.
Position Independent Code means that the generated machine code is not dependent on being located at a specific address in order to work.
E.g. jumps would be generated as relative rather than absolute.
Pseudo-assembly:
PIC: This would work whether the code was at address 100 or 1000
100: COMPARE REG1, REG2
101: JUMP_IF_EQUAL CURRENT+10
...
111: NOP
Non-PIC: This will only work if the code is at address 100
100: COMPARE REG1, REG2
101: JUMP_IF_EQUAL 111
...
111: NOP
EDIT: In response to comment.
If your code is compiled with -fPIC, it's suitable for inclusion in a library - the library must be able to be relocated from its preferred location in memory to another address, there could be another already loaded library at the address your library prefers.
I'll try to explain what has already been said in a simpler way.
Whenever a shared lib is loaded, the loader (the code on the OS which load any program you run) changes some addresses in the code depending on where the object was loaded to.
In the above example, the "111" in the non-PIC code is written by the loader the first time it was loaded.
For not shared objects, you may want it to be like that because the compiler can make some optimizations on that code.
For shared object, if another process will want to "link" to that code it must read it to the same virtual addresses or the "111" will make no sense. But that virtual-space may already be in use in the second process.
Code that is built into shared libraries should normally be position-independent code, so that the shared library can readily be loaded at (more or less) any address in memory. The -fPIC option ensures that GCC produces such code.
The link to a function in a dynamic library is resolved when the library is loaded or at run time. Therefore, both the executable file and dynamic library are loaded into memory when the program is run.
The memory address at which a dynamic library is loaded cannot be determined in
advance, because a fixed address might clash with another dynamic library requiring the same address.
There are two commonly used methods for dealing with this problem:
1.Relocation. All pointers and addresses in the code are modified, if necessary, to fit the actual load address. Relocation is done by the linker and the loader.
2.Position-independent code. All addresses in the code are relative to the current position. Shared objects in Unix-like systems use position-independent code by default. This is less efficient than relocation if program run for a long time, especially in 32-bit mode.
The name "position-independent code" actually implies the following:
The code section contains no absolute addresses that need relocation, but only self relative
addresses. Therefore, the code section can be loaded at an arbitrary memory address and shared between multiple processes.
The data section is not shared between multiple processes because it often contains
writeable data. Therefore, the data section may contain pointers or addresses that
need relocation.
All public functions and public data can be overridden in Linux. If a function
in the main executable has the same name as a function in a shared object, then the
the version in main will take precedence, not only when called from main, but also when
called from the shared object. Likewise, when a global variable in the main has the same
name as a global variable in the shared object, then the instance in main will be
used, even when accessed from the shared object. This so-called symbol interposition is intended to mimic the behavior of static libraries.
A shared object has a table of pointers to its functions, called procedure linkage table (PLT), and a table
of pointers to its variables called global offset table (GOT) in order to implement this "override" feature.
All accesses to functions and public variables go through these tables.
p.s. Where dynamic linking cannot be avoided, there are various ways to avoid the time-consuming features of the position-independent code.
You can read more from this article: http://www.agner.org/optimize/optimizing_cpp.pdf
Adding further...
Every process has same virtual address space (If randomization of virtual address is stopped by using a flag in linux OS)
(For more details Disable and re-enable address space layout randomization only for myself)
So if its one exe with no shared linking (Hypothetical scenario), then we can always give same virtual address to same asm instruction without any harm.
But when we want to link shared object to the exe, then we are not sure of the start address assigned to shared object as it will depend upon the order the shared objects were linked.That being said, asm instruction inside .so will always have different virtual address depending upon the process its linking to.
So one process can give start address to .so as 0x45678910 in its own virtual space and other process at the same time can give start address of 0x12131415 and if they do not use relative addressing, .so will not work at all.
So they always have to use the relative addressing mode and hence fpic option.
A minor addition to the answers already posted: object files not compiled to be position independent are relocatable; they contain relocation table entries.
These entries allow the loader (that bit of code that loads a program into memory) to rewrite the absolute addresses to adjust for the actual load address in the virtual address space.
An operating system will try to share a single copy of a "shared object library" loaded into memory with all the programs that are linked to that same shared object library.
Since the code address space (unlike sections of the data space) need not be contiguous, and because most programs that link to a specific library have a fairly fixed library dependency tree, this succeeds most of the time. In those rare cases where there is a discrepancy, yes, it may be necessary to have two or more copies of a shared object library in memory.
Obviously, any attempt to randomize the load address of a library between programs and/or program instances (so as to reduce the possibility of creating an exploitable pattern) will make such cases common, not rare, so where a system has enabled this capability, one should make every attempt to compile all shared object libraries to be position independent.
Since calls into these libraries from the body of the main program will also be made relocatable, this makes it much less likely that a shared library will have to be copied.

Sharing memory between modules

I was wondering how to share some memory between different program modules - lets say, I have a main application (exe), and then some module (dll). They both link to the same static library. This static library will have some manager, that provides various services. What I would like to achieve, is to have this manager shared between all application modules, and to do this transparently during the library initialization.
Between processes I could use shared memory, but I want this to be shared in the current process only.
Could you think of some cross-platform way to do this? Possibly using boost libraries, if they provide some facilities to do this.
Only solution I can think of right now, is to use shared library of the respective OS, that all other modules will link to at runtime, and have the manager saved there.
EDIT:
To clarify what I actually need:
I need to find out, if the shared manager was already created (the answers below already provided some ways to do that)
Get the pointer to the manager, if it exists, or Set the pointer somewhere to the newly created manager object.
I think you're going to need assistance from a shared library to do this in any portable fashion. It doesn't necessarily need to know anything about the objects being shared between modules, it just needs to provide some globally-accessible mapping from a key (probably a string) to a pointer.
However, if you're willing to call OS APIs, this is feasible, and I think you may only need two implementations of the OS-specific part (one for Windows DLLs and GetProcAddress, one for OSes which use dlopen).
As each module loads, it walks the list of previously loaded modules looking for any that export a specially-named function. If it finds one (any, doesn't matter which, because the invariant is that all fully-loaded modules are aware of the common object), it gets the address of the common object from the previously loaded module, then increments the reference count. If it's unable to find any, it allocates new data and initializes the reference count. During module unload, it decrements the reference count and frees the common object if the reference count reached zero.
Of course it's necessary to use the OS allocator for the common object, because although unlikely, it's possible that it is deallocated from a different library from the one which first loaded it. This also implies that the common object cannot contain any virtual functions or any other sort of pointer to segments of the different modules. All its resources must by dynamically allocated using the OS process-wide allocator. This is probably less of a burden on systems where libc++ is a shared library, but you said you're statically linking the CRT.
Functions needed in Win32 would include EnumProcessModules, GetProcAddress, HeapAlloc, and HeapFree, GetProcessHeap and GetCurrentProcess.
Everything considered, I think I would stick to putting the common object in its own shared library, which leverages the loader's data structures to find it. Otherwise you're re-inventing the loader. This will work even when the CRT is statically linked into several modules, but I think you're setting yourself up for ODR violations. Be really particular about keeping the common data POD.
For use from the current process only, you don't need to devise any special function or structure.
You could do it even without any function but it is more safe and cross platform friendly to define set of functions providing access to the shared data. And these functions could be implemented by the common static library.
I think, only concern of this setup is that: "Who will own the data?". There must exist one and only one owner of the shared data.
With these basic idea, we could sketch the API like this:
IsSharedDataExist // check whether of not shared data exist
CreateSharedData // create (possibly dynamically) shared data
DestroySharedData // destroy shared data
... various data access API ...
Or C++ class with the Singleton pattern will be appropriate.
UPDATE
I was confused. Real problem can be defined as "How to implement a Singleton class in a static library that will be linked with multiple dynamic loading library (will be used in the same process) in platform independent way".
I think, basic idea is not much different but make sure the singleton is the really single is the additional problem of this setup.
For this purpose, you could employ Boost.Interprocess.
#include <boost/config.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
...
boost::interprocess::named_mutex* singleton_check = 0;
// in the Create function of the singleton
try {
singleton_check = new boost::interprocess::named_mutex(boost::interprocess::create_only, "name_of_the_mutex" );
// if no exception throw, this is the first time execution
}
catch (...)
{
}
Freeing the named_mutex is as simple as delete singleton_check.
UPDATE#2
Another suggestion.
I think, we should not place shared data in the common static library. If we can not ensure globally unique data, it is not only tricky platform dependent implementation problems but also waste of memory and global resources.
If you prefer static library implementation you should make two static libraries. One for the server/creator of the shared data, one for users of that shared data. Server library define and provide access to the Singleton. Client library provide various data access method.
This is effectively same as the Singleton implementation without static libraries.
You can use boost::interprocess http://www.boost.org/doc/libs/1_45_0/doc/html/interprocess.html
and on Windows you can create a shared segment in your DLL that will be shared by all processes using #pragma's: http://www.codeproject.com/KB/DLL/data_seg_share.aspx
As per MSDN I see there are only two ways to share data between modules
Using data_seg pragma
Use shared memory.
As someone pointed out Shared Segment works only for two instances of the same dll so we are left with only one choice to use Memory-Mapped Files technique.

Static Global Fields in a Shared Library - Where do they go?

I have a cpp file from which I am generating a shared library (using autofoo and the like). Within the cpp file, I have declared a couple of static fields that I use throughout the library functions.
My question is 2-part:
1) Where are these fields stored in memory? It's not as if the system instantiates the entire library and keeps it in memory... the library, after all, really is just a bunch of hooks.
2) Is there a better way to do this? The reason I did it to begin with is that I want to avoid requiring the user to pass the fields into every library function call as parameters.
Thanks!
The code used to load shared libraries:
Generally (each has minor technical differences):
Loads the shared lib into memory
Walks the symbol table and updates the address of function in the DLL
Initializes any global static members using their constructor.
Note: The shared lib loader need not do all this at the load point.
It may do some of these jobs lazily (implementation detail). But they will be done before use.
Any Global staic POD variables (things with no constructor). Will be stored in special memory segments depending on weather they are initialized or not (again an implementation detail). If they were initialized then they will be loaded with the from disk (or shared lib source) with that value already defined.
So the answer to your questions:
undefined.
The library is code segments
Initialized data segments
Uninitialized data segments
Some utility code that knows how to link it into a running application.
Better than what exactly
Good practice would suggest passing values to a function rather than relying on global state. But to be honest that is an over generalization and really down to the problem.
Logically speaking, it is as if the system instantiates the entire library. In practice, only the code is really "shared" in a shared library, anybody who links against it will get a copy of the data. (Well maybe not read-only data). So, as far your questions go:
1) Your process will get a copy of the variable somehow (dependent on how the shared library system on your OS works).
2) I don't see a problem with this approach.