Detecting and intercepting linked library dependencies at runtime - c++

On a UNIX system is there a simple way to identify whether a dynamic (shared) library depends on other dynamic libraries?
I'm exploring system level APIs such as dlopen and friends in C and C++. I have a controlled environment where I will be calling dlopen and mapping specific functions. The idea is that people will be able to write native plugins to the application (and let us assume for the moment that it is in the author's best interest to make sure the library actually performs the correct function and is not malicious).
However, as a security measure I would like to ensure at runtime that the loaded dynamic library does not link to any other dynamic libraries (the intention being to disallow system calls and only permit functions from the local maths library) - and if it does, don't load it.
Some (long and/or difficult) solutions I've come up with:
Write a compiler shim that compiles the plugin code with a very specific set of compiler options, and injects some sort of checksum function that can be used to validate that the compiler shim was used.
Individually slice out all function calls referencing function names I don't want to be used.
Intercept all syscalls with some linker path trickery.
Is there a simple(r) way of checking dynamic libraries' dependencies?

However, as a security measure I would like to ensure at runtime that the loaded dynamic library does not link to any other dynamic libraries
Note that a dynamic library can import a symbol without explicitly linking to any other dynamic library. For example:
int foo() { return open("/etc/passwd", O_RDONLY); }
gcc -fPIC -shared -o foo.so foo.c -nostdlib
Now foo.so does not have any DT_NEEDED shared library dependencies, yet will still open /etc/passwd at will.
(the intention being to disallow system calls and only permit functions from the local maths library) - and if it does, don't load it.
There is so much wrong with your intention, it's not even funny.
For a start, you don't need to link to any external library to execute system calls directly, it can be trivially done in assembly.
Also, the plugin can find and modify dynamic loader's data, and once it does, it can make your main program do arbitrary things. For example, it can hijack all of your main program's calls into libc.so, and redirect them elsewhere.
Once you run untrusted code in your process, it's game over.
solutions I've come up with
None of them have even remote chance of working. Any time you expend on this is time totally wasted.

Related

How do I create a static library which automatically links to a dynamic library?

I maintain our in-house infrastructure library - lets call it libcluracan. This library has to be statically linked because it doesn't exist on outside computers where the code is used.
This means that instead of creating a libcluracan.so file using the linker I create a libcluracan.a file using the ar command
Now I'm trying to add some new functionality to libcluracan, but it requires linking with an outside library - specifically -lfltk, but the specifics aren't important. What is important is that I can assume the outside computers have this library (and any other publicly available library I need).
The biggest problem is that I can't change how the in-house programmers compile (well, link) their code.
Had we been using a dynamic libcluracan.so library I'd just add -lfltk to the linker and forget about it - the programmers would continue to link with -lcluracan and get -lfltk automatically.
I need to find a way to do the same thing with our static libcluracan.a library.
TL;DR
Is there a way to create a static .a library that automatically links to another dynamic .so library when used?
The biggest problem is that I can't change how the in-house programmers compile (well, link) their code.
That is indeed a big problem. Your in-house programmers should be using make (or a similar automated build system), and the change would be trivial.
There is no way to achieve what you want on an arbitrary UNIX system.
IF you are using GNU toolchain, and in particular GNU-ld or gold, THEN you can achieve what you want by linking with libcluracan.so, where libcluracan.so is not a shared library, but a linker script, which looks like this:
GROUP ( libcluracan.a libfltk.so )

Merge Mach-O executable with a static lib?

Suppose you have
a pre-built iOS executable app (for simulator or device).
a pre-built static archive library static library which among other things contains c++ static initializers.
Now it should be possible to merge the two built products to produce the a new iOS executable which is like the old one, except that it is now also linked with the additional static library, and on execution will run the static library's static initializers.
Which tool (if any) could help solve this merge problem?
Edit: An acceptable solution is also to dynamically load the library using dlopen. The whole purpose of this is for application testing, so the re-linked app will never see app store.
How a compiler work (in a simple explanation)
The most popular C++ compilers (like say, GCC), work by translating all the C++ (and Obj-C, C, etc...) code to ASM.
Then it calls the appropriate assembler for the target processor, and create the object binaries.
Then it calls the linker, that search on those binaries for the symbols that explain what links with what. A common optimisation that linkers can do, is also strip of the final binary anything from the statically linked libraries that was not used, other common optimisation is not attempt to link at all unused libraries.
Also finally, the linker removes the things that only it needed.
What this mean in your case
You have a library, the library has the linking symbols. You also has a executable, that one had its linking symbols stripped, in fact depending on how it was optimised the internal jumps might be only a couple of jmp instructions to arbitrary addresses on the code. No machine, can do what you want in a automatic manner, because you don't have the needed information on the executable.
How to do it anyway
You need to disassemble the executable, figure on your own where are the function calls, and then manually reassemble it with your library, changing those functions call to jump to addresses in your library instead.
This process is sometimes used by game moders to change the video drivers of old games (for example to update their OpenGL version, or to force Glide games to use some newer drivers, and so on).
So if you want to do that anyway (I warn you: it is absurdly crazy to do though...) ask those guys :) I don't remember right now anyone to point to you, but they exist.
Analogy
When you are in normal linking phase, the compiled object files are like a source code that the machine understands, full of function calls as needed.
After it is compiled, all function calls became goto.
So if you are a linker tasked in doing what you want to do, imagine that you would be reading a source code filled with goto to random places in the code (sometimes even to inside loops) and that you have to somehow figure what ones of those you want to change to jump to the new part you are trying to paste there.

Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?

I have been involved in some debate with respect to libraries in Linux, and would like to confirm some things.
It is to my understanding (please correct me if I am wrong and I will edit my post later), that there are two ways of using libraries when building an application:
Static libraries (.a files): At link time, a copy of the entire library is put into the final application so that the functions within the library are always available to the calling application
Shared objects (.so files): At link time, the object is just verified against its API via the corresponding header (.h) file. The library isn't actually used until runtime, where it is needed.
The obvious advantage of static libraries is that they allow the entire application to be self-contained, while the benefit of dynamic libraries is that the ".so" file can be replaced (ie: in case it needs to be updated due to a security bug) without requiring the base application to be recompiled.
I have heard some people make a distinction between shared objects and dynamic link libraries (DLL's), even though they are both ".so" files. Is there any distinction between shared objects and DLLs when it comes to C/C++ development on Linux or any other POSIX compliant OS (ie: MINIX, UNIX, QNX, etc)? I am told that one key difference (so far) is that shared objects are just used at runtime, while DLL's must be opened first using the dlopen() call within the application.
Finally, I have also heard some developers mention "shared archives", which, to my understanding, are also static libraries themselves, but are never used by an application directly. Instead, other static libraries will link against the "shared archives" to pull some (but not all) functions/resources from the shared archive into the static library being built.
Thank you all in advance for your assistance.
Update
In the context in which these terms were provided to me, it was effectively erroneous terms used by a team of Windows developers that had to learn Linux. I tried to correct them, but the (incorrect) language norms stuck.
Shared Object: A library that is automatically linked into a program when the program starts, and exists as a standalone file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.so). The library must be present at compile time, and when the application starts.
Static Library: A library that is merged into the actual program itself at build time for a single (larger) application containing the application code and the library code that is automatically linked into a program when the program is built, and the final binary containing both the main program and the library itself exists as a single standalone binary file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.a). The library must be present at compile time.
DLL: Essentially the same as a shared object, but rather than being included in the linking list at compile time, the library is loaded via dlopen()/dlsym() commands so that the library does not need to be present at build time for the program to compile. Also, the library does not need to be present (necessarily) at application startup or compile time, as it is only needed at the moment the dlopen/dlsym calls are made.
Shared Archive: Essentially the same as a static library, but is compiled with the "export-shared" and "-fPIC" flags. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylibS for a library file named mylibS.a). The distinction between the two is that this additional flag is required if a shared object or DLL wants to statically link the shared archive into its own code AND be able to make the functions in the shared object available to other programs, rather than just using them internal to the DLL. This is useful in the case when someone provides you with a static library, and you wish to repackage it as an SO. The library must be present at compile time.
Additional Update
The distinction between "DLL" and "shared library" was just a (lazy, inaccurate) colloquialism in the company I worked in at the time (Windows developers being forced to shift to Linux development, and the term stuck), adhering to the descriptions noted above.
Additionally, the trailing "S" literal after the library name, in the case of "shared archives" was just a convention used at that company, and not in the industry in general.
A static library(.a) is a library that can be linked directly into the final executable produced by the linker,it is contained in it and there is no need to have the library into the system where the executable will be deployed.
A shared library(.so) is a library that is linked but not embedded in the final executable, so will be loaded when the executable is launched and need to be present in the system where the executable is deployed.
A dynamic link library on windows(.dll) is like a shared library(.so) on linux but there are some differences between the two implementations that are related to the OS (Windows vs Linux) :
A DLL can define two kinds of functions: exported and internal. The exported functions are intended to be called by other modules, as well as from within the DLL where they are defined. Internal functions are typically intended to be called only from within the DLL where they are defined.
An SO library on Linux doesn't need special export statement to indicate exportable symbols, since all symbols are available to an interrogating process.
I've always thought that DLLs and shared objects are just different terms for the same thing - Windows calls them DLLs, while on UNIX systems they're shared objects, with the general term - dynamically linked library - covering both (even the function to open a .so on UNIX is called dlopen() after 'dynamic library').
They are indeed only linked at application startup, however your notion of verification against the header file is incorrect. The header file defines prototypes which are required in order to compile the code which uses the library, but at link time the linker looks inside the library itself to make sure the functions it needs are actually there. The linker has to find the function bodies somewhere at link time or it'll raise an error. It ALSO does that at runtime, because as you rightly point out the library itself might have changed since the program was compiled. This is why ABI stability is so important in platform libraries, as the ABI changing is what breaks existing programs compiled against older versions.
Static libraries are just bundles of object files straight out of the compiler, just like the ones that you are building yourself as part of your project's compilation, so they get pulled in and fed to the linker in exactly the same way, and unused bits are dropped in exactly the same way.
I can elaborate on the details of DLLs in Windows to help clarify those mysteries to my friends here in *NIX-land...
A DLL is like a Shared Object file. Both are images, ready to load into memory by the program loader of the respective OS. The images are accompanied by various bits of metadata to help linkers and loaders make the necessary associations and use the library of code.
Windows DLLs have an export table. The exports can be by name, or by table position (numeric). The latter method is considered "old school" and is much more fragile -- rebuilding the DLL and changing the position of a function in the table will end in disaster, whereas there is no real issue if linking of entry points is by name. So, forget that as an issue, but just be aware it's there if you work with "dinosaur" code such as 3rd-party vendor libs.
Windows DLLs are built by compiling and linking, just as you would for an EXE (executable application), but the DLL is meant to not stand alone, just like an SO is meant to be used by an application, either via dynamic loading, or by link-time binding (the reference to the SO is embedded in the application binary's metadata, and the OS program loader will auto-load the referenced SO's). DLLs can reference other DLLs, just as SOs can reference other SOs.
In Windows, DLLs will make available only specific entry points. These are called "exports". The developer can either use a special compiler keyword to make a symbol an externally-visible (to other linkers and the dynamic loader), or the exports can be listed in a module-definition file which is used at link time when the DLL itself is being created. The modern practice is to decorate the function definition with the keyword to export the symbol name. It is also possible to create header files with keywords which will declare that symbol as one to be imported from a DLL outside the current compilation unit. Look up the keywords __declspec(dllexport) and __declspec(dllimport) for more information.
One of the interesting features of DLLs is that they can declare a standard "upon load/unload" handler function. Whenever the DLL is loaded or unloaded, the DLL can perform some initialization or cleanup, as the case may be. This maps nicely into having a DLL as an object-oriented resource manager, such as a device driver or shared object interface.
When a developer wants to use an already-built DLL, she must either reference an "export library" (*.LIB) created by the DLL developer when she created the DLL, or she must explicitly load the DLL at run time and request the entry point address by name via the LoadLibrary() and GetProcAddress() mechanisms. Most of the time, linking against a LIB file (which simply contains the linker metadata for the DLL's exported entry points) is the way DLLs get used. Dynamic loading is reserved typically for implementing "polymorphism" or "runtime configurability" in program behaviors (accessing add-ons or later-defined functionality, aka "plugins").
The Windows way of doing things can cause some confusion at times; the system uses the .LIB extension to refer to both normal static libraries (archives, like POSIX *.a files) and to the "export stub" libraries needed to bind an application to a DLL at link time. So, one should always look to see if a *.LIB file has a same-named *.DLL file; if not, chances are good that *.LIB file is a static library archive, and not export binding metadata for a DLL.
You are correct in that static files are copied to the application at link-time, and that shared files are just verified at link time and loaded at runtime.
The dlopen call is not only for shared objects, if the application wishes to do so at runtime on its behalf, otherwise the shared objects are loaded automatically when the application starts. DLLS and .so are the same thing. the dlopen exists to add even more fine-grained dynamic loading abilities for processes. You dont have to use dlopen yourself to open/use the DLLs, that happens too at application startup.
I suspect some kind of misunderstanding here, but header files, at least of the .h variety used for compiling source code, are most definitely NOT checked during link time.
.h, and for that matter, .c/.cpp files, are only involved during the compilation phase, which includes preprocessing. Once the object code has been created the header file is long gone well before the linker gets around to dealing with things.

How can I link with a directory which might not exist while running?

I'm writing a library for Linux which has some functionality which relies on the a shared library libfoo.so.
I'm trying to case where libfoo.so doesn't exist into consideration. I have a well defined behavior for such a case, but I don't know how to properly implement it, linkage wise.
Currently, my library is shipped compiled against libfoo.so, when a client tries to compile it's code against it on an environment which doesn't include libfoo.so he gets linkage errors.
My question is, how can I build my library in such a way that it would compile even if libfoo.so doesn't exist, but behave differently. The only solution I was able to come up with myself is to branch from it a version which doesn't support foo, but there must be a better way...
Thanks in advance
A follow up due to the given answer:
It seems that not linking against libfoo but rather dynamically loading it with dlopen solves the problem, but it requires me to manually export all the symbols, and is limited in scope... Is there any "less painful" to achieve that?
There are two different problems that kind of fit the description, depending on whether the presence of the library is to be detected at compile time or at runtime.
At compile time you will have to use some tool to detect whether the library is present and modify the build scripts to pass that information down to the code (think defines).
At runtime, you can avoid linking against the library, and rather dynamically load it. The code should handle failures to locate/load the library and fall back to the alternative version.

GCC / Linux: adding a static library to a .so?

I've a program that implements a plugin system by dynamically loading a function from some plugin_name.so (as usual).
But in turn I've a static "helper" library (lets call it helper.a) whose functions are used both from the main program and the main function in the plugin. They don't have to inter-operate in any way, they are just helper functions for text manipulation and such.
This program, once started, cannot be reloaded or restarted, that's why I'm expecting to have new "helper" functionality from the plugin, not from the main program.
So my questin is.. is it possible to force this "plugin function code" in the .so to use (statically link against?) a different (perhaps newer) version of "helper" than the main program?
How could this be done? perhaps by statically linking or otherwise adding helper.a to plugin_name.so?
Nick Meyer's answer is correct on Windows and AIX, but is unlikely to be correct on every other UNIX platform by default.
On most UNIX platforms, the runtime loader maintains a single name space for all symbols, so if you define foo_helper in a.out, and also in plugin.so, and then call foo_helper from either, the first definition visible to the runtime loader (usually that from a.out) is used by default for both calls.
In addition, the picture is complicated by the fact that foo_helper may not be exported from a.out (and thus may be invisible to runtime loader), unless you use -rdynamic flag, or some other shared library references it. In other words, things may appear to work as Nick described them, then you add a shared library to the a.out link line, and they don't work that way anymore.
On ELF platforms (such as Linux), you have great control over symbol visibility and binding. See description of -fvisibility=hidden and -rdynamic in GCC man page, and also -Bsymbolic in linker man page.
Most other UNIX platforms have some way to control symbol bindings as well, but this is necessarily platform-specific.
If your main program and dynamic library both statically link to helper.a, then you shouldn't need to worry about mixing versions of helper.a (as long as you don't do things like pass pointers allocated in helper.a between the .exe and .so boundaries).
The code required from the helper.a is inserted to the actual binary when you link against it. So when you call into helper.a from the .exe, you will be executing code from the code segment of your executable image, and when you call into helper.a from the .so, you will be executing code from the portion of the address space where the .so was loaded. Even if you're calling the same function inside helper.a, you're calling two different 'instances' of that function depending on whether the call was made from the .exe or the .so.
i think this question is the same as yours. How to force symbols from a static library to be included in a shared library build?
The --whole-archive linker option should do this. You'd use it as e.g.
gcc -o libmyshared.so foo.o -lanothersharedlib -Wl,--whole-archive -lmystaticlib
and it works for me.