C/C++ dynamic-link library overload

C/C++ dynamic-link library overload - c++

In my porject, I need to modify some functions of the glibc source code.
I only need to modify part of the pthread. For example, I modified multithreaded related functions such as pthread_create.c or pthread_mutex_lock.c in the source code. Then, when my concrete program is running,I want to specify it to use the modified functions when it needs to using these functions ,and it won't affect other functions.Also,I do not want to specify an entire version of glibc when program is running.
I need to ask for your help is there any good solution for this problem?
Thanks!!
Ding

This is a job for a shared library interposer. Here is an excellent article.
If a function is in a shared library, the runtime linker can be instructed to call another 'interposed' function instead. The interposer can totally replace the functionality or it can augment it. A great example is the malloc family of functions, a memory leak detector and heap reporting tool can be based on a set of interposers between the user program and the system calls.
Interposers only work for shared (.so) libraries. Static (.a) libraries directly link into the executable and the calls cannot be easily intercepted.
All major flavors of Linux support interposes for the LD_PRELOAD functionality.
Here is an example interposer for pthread_create.

Related

is there a way to make dlopen fail intentionally

I would like to be able to control from inside my library if it is allowed to be loaded or not without using exceptions, meaning for some cases i would like dlopen("mylib.so") to return NULL and only if all conditions are right it will succeed.
Many have asked about the motivation, inside my library i use dlopen several times and i want to make sure all needed components have been loaded before my library can be loaded.
Take in mind i have to use standard solutions, meaning i can't use external plugins or do things like rewriting dlopen.

This is probably some XY problem. We cannot guess your motivation and goals (and these are what really matters). What you want to do is not possible on Linux.
But read carefully and several times the dlopen(3) man page. You'll notice that since "mylib.so" has no / it is handled specifically by using the LD_LIBRARY_PATH environment variable. That is why I generally use an absolute file path for dlopen. See e.g. realpath(3), glob(3), wordexp(3). Notice that there is no documented way to make the dlopen fail outside of the documented failure cases, that is:
On success, dlopen() and dlmopen() return a non-NULL handle for the
loaded library. On error (file could not be found, was not readable,
had the wrong format, or caused errors during loading), these
functions return NULL.
Then you can use dlerror(3) to understand the error cause.
So you could play wild tricks with that LD_LIBRARY_PATH but you should not. You might be weird (actually crazy) enough to e.g. use putenv(3) on that, or add into that path a directory inside some FUSE filesystem managed by your program, or do some LD_PRELOAD trick. But you really should not do such insane tricks.
So be reasonable: solve your actual need in some other ways. Expect dlopen to behave as documented (so to usually succeed), and don't call it if you don't want to. When coding, it is important to use your common sense.
Be aware of rpath (it could be explicitly set at link time), and read carefully Program Library HowTo and Drepper's How To Write Shared Libraries. Read also the C++ dlopen minihowto and be aware of name mangling.
Notice that in practice dlopen is part of your C standard library on Linux, and that libc and ld-linux.so(8) is generally some free software (e.g. GNU glibc or musl-libc). So if you are not happy with the system's dlopen, you could in principle change it (but I don't recommend doing that, since libc is the cornerstone of every Linux system).
You could consider (probably not a good idea) to use some ELF parsing library (like libelf, libbfd, ...) or some ELF analyzing program (like readelf(1) or objdump(1) ...) on that shared object before your dlopen (but a malicious process or user might still alter the shared library after the analysis but before dlopen). You could study the elf(5) format yourself and do such a parsing by hand (probably even more bad idea).
If you are writing that mylib.so library (on Linux, and perhaps some other similar OSes; but this behaviour is non-standard since non-specified in POSIX dlopen), you could be interested by function attributes like __attribute__((constructor)) & __attribute__((visibility)) (see also this and that). If you want to "reject" being dlopen-ed (when some conditions are met) from your mylib.so you could consider having some constructor function testing these conditions and calling exit when they fail. If your mylib.so is a plugin loaded from some other program that you could improve, you might simply seek some initialization function with dlsym, call it after the dlopen, and fail the main program if that initialization function failed. BTW, throwing some C++ exception from such a constructor function (or some obsolete _init one) is unwise, because the dlopen machinery might consume internal resources that won't be released in that case.
At last, in theory, you could re-implement dlopen yourself (above open(2), mmap(2) etc... and care about the relocations in ELF explicitly). That could take a few years (and is processor specific). Study the appropriate x86 ABI.
You probably can achieve your (unstated) goal by just using the usual dlopen, and do some test before it, and perhaps some test (using dlsym) after it. Most programs using plugins are doing that.
Perhaps you might have every exported function of mylib.so do appropriate checks when running. Maybe you could have some static boolean flag set by some function with __attribute__((constructor)) (so it would be called once at dlopen time) and have other public functions check that flag.
In a recent edit you explain at last:
inside my library i use dlopen several times and i want to make sure all needed components have been loaded before my library can be loaded.
There is no need to play with dlopen or its constructors (and you probably don't need to use dlopen inside your library; and if you do that, you need to explain why, how, and where). You just link mylib.so with all the required shared libraries it uses (see this). If they are not loadable or accessible at dlopen time the entire dlopen of mylib.so fails (intuitively, on Linux, dynamic loading is somehow "recursive").
BTW, if you indeed call dlopen inside your mylib.so, that dlopen happens after mylib.so has been dlopen-ed (unless you call dlopen from some constructor function of mylib.so, which is weird but should be ok and makes a different question).

As far as I remember, no, you can't, at least in a normal way. If the library exists at given path, it will be loaded. The dlopen it is not designed to do any "business checks" for you at your discretion. At most, it will abide any filesystem permissions/etc and return error if process has no access to the file, and that's it.
If you have full control on the code that will load your library, then wrap it like Atterson suggested and use dlopen2 and it's done.
If you don't have full control, then dlopen2 would still not prevent anyone from using the original dlopen and bypass the checks. You could try to make it more smart, for example, make your dlopen2 do something detectable so then the library can deny working if it was opened by dlopen instead of dlopen2, but then.. someone could fake up that "something detectable", then use dlopen, done. Then it boils down to making that "something detectable" hard to reproduce by attackers.
Simply, it was not intended to do these things. It's meant to load the library if the OS allows (~filesystem permissions, etc).
For any other "access checks" like "do you have license? no? then go away" you have to implement it inside the library. Let them load it via dlopen, then make the library check the permissions for example, at each call to its functions. Do use exceptions, or just do-nothing and return NULLs. Or even better, you probably could use initialization function (see https://stackoverflow.com/a/1602459/717732 + http://tldp.org/HOWTO/Program-Library-HOWTO/miscellaneous.html#INIT-AND-CLEANUP) and do the check once when the lib gets loaded. Note that this functions take and return void so there's still no way to make the dlopen fail, but at least the library can have its moment to disable its functions.

You could always wrap it into a function
void *dlopen2(const char *filename, int flags){
if(/*your conditions*/)
return void *dlopen(filename, flags);
return (void *) NULL;
}

The correct way to deal with this kind of scenarios, is to make use of seccomp and limit what the dlopen call is allowed to do.
http://man7.org/linux/man-pages/man3/seccomp_rule_add.3.html
Naturally such configuration requires root access.

Load-time dynamic link library dispatching

I'd like my Windows application to be able to reference an extensive set of classes and functions wrapped inside a DLL, but I need to be able to guide the application into choosing the correct version of this DLL before it's loaded. I'm familiar with using dllexport / dllimport and generating import libraries to accomplish load-time dynamic linking, but I cannot seem to find any information on the interwebs with regard to possibly finding some kind of entry point function into the import library itself, so I can, specifically, use CPUID to detect the host CPU configuration, and make a decision to load a paricular DLL based on that information. Even more specifically, I'd like to build 2 versions of a DLL, one that is built with /ARCH:AVX and takes full advantage of SSE - AVX instructions, and another that assumes nothing is available newer than SSE2.
One requirement: Either the DLL must be linked at load-time, or there needs to be a super easy way of manually binding the functions referenced from outside the DLL, and there are many, mostly wrapped inside classes.
Bonus question: Since my libraries will be cross-platform, is there an equivalent for Linux based shared objects?

I recommend that you avoid dynamic resolution of your DLL from your executable if at all possible, since it is just going to make your life hard, especially since you have a lot of exposed interfaces and they are not pure C.
Possible Workaround
Create a "chooser" process that presents the necessary UI for deciding which DLL you need, or maybe it can even determine it automatically. Let that process move whatever DLL has been decided on into the standard location (and name) that your main executable is expecting. Then have the chooser process launch your main executable; it will pick up its DLL from your standard location without having to know which version of the DLL is there. No delay loading, no wonkiness, no extra coding; very easy.
If this just isn't an option for you, then here are your starting points for delay loading DLLs. Its a much rockier road.
Windows
LoadLibrary() to get the DLL in memory: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress() to get pointer to a function: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683212(v=vs.85).aspx
OR possibly special delay-loaded DLL functionality using a custom helper function, although there are limitations and potential behavior changes.. never tried this myself: https://msdn.microsoft.com/en-us/library/151kt790.aspx (suggested by Igor Tandetnik and seems reasonable).
Linux
dlopen() to get the SO in memory: http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html
dladdr() to get pointer to a function: http://man7.org/linux/man-pages/man3/dladdr.3.html

To add to qexyn's answer, one can mimic delay loading on Linux by generating a small static stub library which would dlopen on first call to any of it's functions and then forward actual execution to shared library. Generation of such stub library can be automatically generated by custom project-specific script or Implib.so:
# Generate stub
$ implib-gen.py libxyz.so
# Link it instead of -lxyz
$ gcc myapp.c libxyz.tramp.S libxyz.init.c

Check compatibility of dynamic library at runtime

I am developing a C++ application which is required to load a dynamic library at runtime using dlopen. This library generally won't be written by me.
What method do people recommend to ensure future binary compatibility between this library and my application?
The options as I see them are:
Put the version number in the library file name, and attempt to load it (through a symbolic link) no matter what. If dlopen fails, report an error.
Maintain a second interface which returns an version number. However, if this interface changes for some reason, we run into the same problems as before.
Are there any other options?

You should define a convention about the dynamically loaded (i.e. dlopen-ed) library.
You might have the convention that the library is required to provide a const char mylib_version_str[]; symbol which gives the version of the API etc etc. Of course you could have your own preprocessor tricks to help about this.
For your inspiration, you might look at what GCC requires from its plugins (e.g. the plugin_is_GPL_compatible symbol).
If the dynamically loaded library is in C++, you might use demangling to check the signature of functions....

why not use both options? by istance few libraries already do that (Lua for example, old dll is Lua51.dll, then you have Lua52 etc. nd you can also query its version.)
A good interface can change, but not so often, why should 2 simple static methods
const char* getLibraryName();
uint32 getLibraryVersion();
change overtime?

if you/they are using a libtool to build a library/application you may recommend this way: http://www.gnu.org/software/libtool/manual/libtool.html#Versioning

Catching calls from a program in runtime and mapping them to other calls

A program usually depends on several libraries and might sometimes depend on other programs as well. I look at projects like Wine and think how do they figure out what calls a program is making?
In a Linux environment, what are the approaches used to know what calls an executable is making in runtime in order to catch and map them to other calls?
Any code snippets or references to resources for extra reading is greatly appreciated :)

On Linux you're looking for the LD_PRELOAD environment variable. This will load your libraries before any requested by the program. If you provide a function definition that matches one loaded by the target program then your version will be called instead.
You can't really detect what functions a program is calling however. You can however get all the functions in a shared library and implement all of those. You aren't really catching the functions, you are simply reimplementing them.
Projects like Wine do this in some cases, but not in all. They also rewrite some of the dynamic libraries. So when a Win32 loads some DLL it is actually loading the Wine version and not the native version. This is essentially the same concept of replacing the functions with your own.
Lookup LD_PRELOAD for more information.

are runtime linking library globals shared among plugins loaded with dlopen?

I've a C++ program that links at runtime with, lets say, mylib.so. then, the same program uses dlopen()/dlsym() to load a function from myplugin.so, dynamic library that in turn has dependencies to mylib.so.
My question is: will the program AND the function in the plugin access the same globals defined in mydlib.so in the same memory area reserved for the program, or each will be assigned different, unrelated copies in its own memory space? if the latter is the default behaviour, is it possible to change that?
Thanks in advance =)!

Globals in the main program that does the dlopen should be visible to the code that is dynamically loaded. However, the best advice I've seen to date (especially if you ever want to have even vaguely portable code) is to only have function calls be passed across the linker divide, and to not export any variables in either direction. It's also best if there is an API for the loaded code to register the interesting parts of its API with the loader (e.g., "Here is how I provide this SPI for drawing foobars on a baz") as that's a much saner way of doing callbacks rather than just mashing everything together.
[EDIT]: The other reason for doing this is if you're simulating weak linking on a platform that doesn't support it. That's a lot like the other one I list, except that it is the main program that is building the SPI out of the API exported by the dynamic library rather than the .so exporting it explicitly on startup. It's second best really, but you make do with what you've got rather than wishing (well, unless you're prepared to do the work by writing some sort of connection library).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js