Catching calls from a program in runtime and mapping them to other calls

Catching calls from a program in runtime and mapping them to other calls - c++

A program usually depends on several libraries and might sometimes depend on other programs as well. I look at projects like Wine and think how do they figure out what calls a program is making?
In a Linux environment, what are the approaches used to know what calls an executable is making in runtime in order to catch and map them to other calls?
Any code snippets or references to resources for extra reading is greatly appreciated :)

On Linux you're looking for the LD_PRELOAD environment variable. This will load your libraries before any requested by the program. If you provide a function definition that matches one loaded by the target program then your version will be called instead.
You can't really detect what functions a program is calling however. You can however get all the functions in a shared library and implement all of those. You aren't really catching the functions, you are simply reimplementing them.
Projects like Wine do this in some cases, but not in all. They also rewrite some of the dynamic libraries. So when a Win32 loads some DLL it is actually loading the Wine version and not the native version. This is essentially the same concept of replacing the functions with your own.
Lookup LD_PRELOAD for more information.

Related

is there a way to make dlopen fail intentionally

I would like to be able to control from inside my library if it is allowed to be loaded or not without using exceptions, meaning for some cases i would like dlopen("mylib.so") to return NULL and only if all conditions are right it will succeed.
Many have asked about the motivation, inside my library i use dlopen several times and i want to make sure all needed components have been loaded before my library can be loaded.
Take in mind i have to use standard solutions, meaning i can't use external plugins or do things like rewriting dlopen.

This is probably some XY problem. We cannot guess your motivation and goals (and these are what really matters). What you want to do is not possible on Linux.
But read carefully and several times the dlopen(3) man page. You'll notice that since "mylib.so" has no / it is handled specifically by using the LD_LIBRARY_PATH environment variable. That is why I generally use an absolute file path for dlopen. See e.g. realpath(3), glob(3), wordexp(3). Notice that there is no documented way to make the dlopen fail outside of the documented failure cases, that is:
On success, dlopen() and dlmopen() return a non-NULL handle for the
loaded library. On error (file could not be found, was not readable,
had the wrong format, or caused errors during loading), these
functions return NULL.
Then you can use dlerror(3) to understand the error cause.
So you could play wild tricks with that LD_LIBRARY_PATH but you should not. You might be weird (actually crazy) enough to e.g. use putenv(3) on that, or add into that path a directory inside some FUSE filesystem managed by your program, or do some LD_PRELOAD trick. But you really should not do such insane tricks.
So be reasonable: solve your actual need in some other ways. Expect dlopen to behave as documented (so to usually succeed), and don't call it if you don't want to. When coding, it is important to use your common sense.
Be aware of rpath (it could be explicitly set at link time), and read carefully Program Library HowTo and Drepper's How To Write Shared Libraries. Read also the C++ dlopen minihowto and be aware of name mangling.
Notice that in practice dlopen is part of your C standard library on Linux, and that libc and ld-linux.so(8) is generally some free software (e.g. GNU glibc or musl-libc). So if you are not happy with the system's dlopen, you could in principle change it (but I don't recommend doing that, since libc is the cornerstone of every Linux system).
You could consider (probably not a good idea) to use some ELF parsing library (like libelf, libbfd, ...) or some ELF analyzing program (like readelf(1) or objdump(1) ...) on that shared object before your dlopen (but a malicious process or user might still alter the shared library after the analysis but before dlopen). You could study the elf(5) format yourself and do such a parsing by hand (probably even more bad idea).
If you are writing that mylib.so library (on Linux, and perhaps some other similar OSes; but this behaviour is non-standard since non-specified in POSIX dlopen), you could be interested by function attributes like __attribute__((constructor)) & __attribute__((visibility)) (see also this and that). If you want to "reject" being dlopen-ed (when some conditions are met) from your mylib.so you could consider having some constructor function testing these conditions and calling exit when they fail. If your mylib.so is a plugin loaded from some other program that you could improve, you might simply seek some initialization function with dlsym, call it after the dlopen, and fail the main program if that initialization function failed. BTW, throwing some C++ exception from such a constructor function (or some obsolete _init one) is unwise, because the dlopen machinery might consume internal resources that won't be released in that case.
At last, in theory, you could re-implement dlopen yourself (above open(2), mmap(2) etc... and care about the relocations in ELF explicitly). That could take a few years (and is processor specific). Study the appropriate x86 ABI.
You probably can achieve your (unstated) goal by just using the usual dlopen, and do some test before it, and perhaps some test (using dlsym) after it. Most programs using plugins are doing that.
Perhaps you might have every exported function of mylib.so do appropriate checks when running. Maybe you could have some static boolean flag set by some function with __attribute__((constructor)) (so it would be called once at dlopen time) and have other public functions check that flag.
In a recent edit you explain at last:
inside my library i use dlopen several times and i want to make sure all needed components have been loaded before my library can be loaded.
There is no need to play with dlopen or its constructors (and you probably don't need to use dlopen inside your library; and if you do that, you need to explain why, how, and where). You just link mylib.so with all the required shared libraries it uses (see this). If they are not loadable or accessible at dlopen time the entire dlopen of mylib.so fails (intuitively, on Linux, dynamic loading is somehow "recursive").
BTW, if you indeed call dlopen inside your mylib.so, that dlopen happens after mylib.so has been dlopen-ed (unless you call dlopen from some constructor function of mylib.so, which is weird but should be ok and makes a different question).

As far as I remember, no, you can't, at least in a normal way. If the library exists at given path, it will be loaded. The dlopen it is not designed to do any "business checks" for you at your discretion. At most, it will abide any filesystem permissions/etc and return error if process has no access to the file, and that's it.
If you have full control on the code that will load your library, then wrap it like Atterson suggested and use dlopen2 and it's done.
If you don't have full control, then dlopen2 would still not prevent anyone from using the original dlopen and bypass the checks. You could try to make it more smart, for example, make your dlopen2 do something detectable so then the library can deny working if it was opened by dlopen instead of dlopen2, but then.. someone could fake up that "something detectable", then use dlopen, done. Then it boils down to making that "something detectable" hard to reproduce by attackers.
Simply, it was not intended to do these things. It's meant to load the library if the OS allows (~filesystem permissions, etc).
For any other "access checks" like "do you have license? no? then go away" you have to implement it inside the library. Let them load it via dlopen, then make the library check the permissions for example, at each call to its functions. Do use exceptions, or just do-nothing and return NULLs. Or even better, you probably could use initialization function (see https://stackoverflow.com/a/1602459/717732 + http://tldp.org/HOWTO/Program-Library-HOWTO/miscellaneous.html#INIT-AND-CLEANUP) and do the check once when the lib gets loaded. Note that this functions take and return void so there's still no way to make the dlopen fail, but at least the library can have its moment to disable its functions.

You could always wrap it into a function
void *dlopen2(const char *filename, int flags){
if(/*your conditions*/)
return void *dlopen(filename, flags);
return (void *) NULL;
}

The correct way to deal with this kind of scenarios, is to make use of seccomp and limit what the dlopen call is allowed to do.
http://man7.org/linux/man-pages/man3/seccomp_rule_add.3.html
Naturally such configuration requires root access.

C/C++ dynamic-link library overload

In my porject, I need to modify some functions of the glibc source code.
I only need to modify part of the pthread. For example, I modified multithreaded related functions such as pthread_create.c or pthread_mutex_lock.c in the source code. Then, when my concrete program is running,I want to specify it to use the modified functions when it needs to using these functions ,and it won't affect other functions.Also,I do not want to specify an entire version of glibc when program is running.
I need to ask for your help is there any good solution for this problem?
Thanks!!
Ding

This is a job for a shared library interposer. Here is an excellent article.
If a function is in a shared library, the runtime linker can be instructed to call another 'interposed' function instead. The interposer can totally replace the functionality or it can augment it. A great example is the malloc family of functions, a memory leak detector and heap reporting tool can be based on a set of interposers between the user program and the system calls.
Interposers only work for shared (.so) libraries. Static (.a) libraries directly link into the executable and the calls cannot be easily intercepted.
All major flavors of Linux support interposes for the LD_PRELOAD functionality.
Here is an example interposer for pthread_create.

Load-time dynamic link library dispatching

I'd like my Windows application to be able to reference an extensive set of classes and functions wrapped inside a DLL, but I need to be able to guide the application into choosing the correct version of this DLL before it's loaded. I'm familiar with using dllexport / dllimport and generating import libraries to accomplish load-time dynamic linking, but I cannot seem to find any information on the interwebs with regard to possibly finding some kind of entry point function into the import library itself, so I can, specifically, use CPUID to detect the host CPU configuration, and make a decision to load a paricular DLL based on that information. Even more specifically, I'd like to build 2 versions of a DLL, one that is built with /ARCH:AVX and takes full advantage of SSE - AVX instructions, and another that assumes nothing is available newer than SSE2.
One requirement: Either the DLL must be linked at load-time, or there needs to be a super easy way of manually binding the functions referenced from outside the DLL, and there are many, mostly wrapped inside classes.
Bonus question: Since my libraries will be cross-platform, is there an equivalent for Linux based shared objects?

I recommend that you avoid dynamic resolution of your DLL from your executable if at all possible, since it is just going to make your life hard, especially since you have a lot of exposed interfaces and they are not pure C.
Possible Workaround
Create a "chooser" process that presents the necessary UI for deciding which DLL you need, or maybe it can even determine it automatically. Let that process move whatever DLL has been decided on into the standard location (and name) that your main executable is expecting. Then have the chooser process launch your main executable; it will pick up its DLL from your standard location without having to know which version of the DLL is there. No delay loading, no wonkiness, no extra coding; very easy.
If this just isn't an option for you, then here are your starting points for delay loading DLLs. Its a much rockier road.
Windows
LoadLibrary() to get the DLL in memory: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress() to get pointer to a function: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683212(v=vs.85).aspx
OR possibly special delay-loaded DLL functionality using a custom helper function, although there are limitations and potential behavior changes.. never tried this myself: https://msdn.microsoft.com/en-us/library/151kt790.aspx (suggested by Igor Tandetnik and seems reasonable).
Linux
dlopen() to get the SO in memory: http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html
dladdr() to get pointer to a function: http://man7.org/linux/man-pages/man3/dladdr.3.html

To add to qexyn's answer, one can mimic delay loading on Linux by generating a small static stub library which would dlopen on first call to any of it's functions and then forward actual execution to shared library. Generation of such stub library can be automatically generated by custom project-specific script or Implib.so:
# Generate stub
$ implib-gen.py libxyz.so
# Link it instead of -lxyz
$ gcc myapp.c libxyz.tramp.S libxyz.init.c

dlopen and implicit library loading: two copies of the same library

I have 3 things: open source application (let's call it APP),
closed source shared library (let's call it OPENGL)
and open source plugin for OPENGL (let's call it PLUGIN)[also shared library].
OS: Linux.
There is need to share data between APP and PLUGIN,
so APP linking with PLUGIN, and when I run it,
system load it automatically.
After that APP call eglInitialize that belongs to OPENGL,
and after that this function load PLUGIN again.
And after that I have two copies of PLUGIN in the APP memory.
I know that because of PLUGIN have global data, and after debugging
I saw that there are two copies of global data.
So question how I can fix this behaviour?
I want one instance of PLUGIN, that used by APP and OPENGL.
And I can not change OPENGL library.

It obviously depends a lot on exactly what the libraries are doing, but in general some solution should be possible.
First note that normally if a shared library with the same name is loaded multiple times it will continue to use the same library. This of coruse primarily applies to loading via the standard loading/linking mechanism. If the library calls dlopen on its own it still can get the same library but it depends on the flags to dlopen. Try reading the docs on dlopen to get an understanding of how it works and how you can manipulate it.
You can also try positioning the PLUGIN earlier in the linker command so that it gets loaded first and thus might avoid a double load later one. If you must load the PLUGIN dynamically this obviously won't help. You can also check if LD_PRELOAD might resolve the linking order.
As a last resort you may have to resort to using LD_LIBARY_PATH and putting an interface library in from of the real one. This one will simply pass calls to the real one but will intercept duplicate loads and shunt them to the previous load.
This is just a general direction to consider. Your actual answer will depend highly on your code and what the other shared libraries do. Always investigate linker load ordering first, as it is the easiest to check, and then dlopen flags, before going into the other options.

I suspect that OPENGL is loading PLUGIN with the RTLD_LOCAL flag. This
is normally what you want when loading a plugin, so that multiple
plugins don't conflict.
We've had similar problems with loading code under Java: we'd load a
dozen or so different modules, and they couldn't communicate with one
another. It's possible that our solution would work for you: we wrote a
wrapper for the plugin, and told Java that the wrapper was the plugin.
That plugin then loaded each of the other shared objects, using dlopen
with RTLD_GLOBAL. This worked between plugins. I'm not sure that it
will allow the plugins to get back to the main, however (but I think it
should). And IIRC, you'll need special options when linking main for
its symbols to be available. I think Linux treats the symbols in main
as if main had been loaded with RTLD_LOCAL otherwise. (Maybe
--export-dynamic? It's been a while since I've had to do this, and I
can't remember exactly.)

are runtime linking library globals shared among plugins loaded with dlopen?

I've a C++ program that links at runtime with, lets say, mylib.so. then, the same program uses dlopen()/dlsym() to load a function from myplugin.so, dynamic library that in turn has dependencies to mylib.so.
My question is: will the program AND the function in the plugin access the same globals defined in mydlib.so in the same memory area reserved for the program, or each will be assigned different, unrelated copies in its own memory space? if the latter is the default behaviour, is it possible to change that?
Thanks in advance =)!

Globals in the main program that does the dlopen should be visible to the code that is dynamically loaded. However, the best advice I've seen to date (especially if you ever want to have even vaguely portable code) is to only have function calls be passed across the linker divide, and to not export any variables in either direction. It's also best if there is an API for the loaded code to register the interesting parts of its API with the loader (e.g., "Here is how I provide this SPI for drawing foobars on a baz") as that's a much saner way of doing callbacks rather than just mashing everything together.
[EDIT]: The other reason for doing this is if you're simulating weak linking on a platform that doesn't support it. That's a lot like the other one I list, except that it is the main program that is building the SPI out of the API exported by the dynamic library rather than the .so exporting it explicitly on startup. It's second best really, but you make do with what you've got rather than wishing (well, unless you're prepared to do the work by writing some sort of connection library).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js