Run code before shared library is loaded in C++

Run code before shared library is loaded in C++ - c++

I have a proprietary shared object library that runs a code before my application starts. It complains about an environment variable that is not defined. Despite the fact that I can define it on the system environment, I would like to make the definition on the run time within my application executable, before this library is loaded.
I've read that GCC is capable of defining some _init functions before the library loads. However, I'm unable to find how to call these functions.
Is there some way to run a code on the executable before the loader calls the init section of the libraries?

Related

Understanding lua extension dll building/loading in statically linked embedded lua environment

I have a relatively complex lua environment and I'm trying to understand how the following would/could work. The starting setup includes the following two modules:
Main application (no lua environment)
DLL (statically linked to lua lib, including interpreter)
The dll is loaded into the main application and runs a lua console interpreter and a lua API accessible from the console.
Now, let's say I want to expand this setup to include another dll that would extend that lua API, luasql for example. The new dll needs to link against lua in order to build, and my understanding is that I cannot link against lua statically since there would now be two unshared copies of the lua code in process when I load the extension dll. However, even if I built the lua core lib as a dll and linked against it with the extension dll, that lua core dll would not be loaded at runtime by the main application or the primary dll. So my questions are:
What happens if I load that extension dll from the lua intepreter in the primary dll, considering that the lua core dll will not be loaded?
If I loaded the lua core dll at runtime, how would that conflict with the statically linked lua lib?
Would both scenarios (linking statically in extension dll and dynamically linking/loading the lua dll) result in having two copies of the lua core code in process?
In that event, what would happen if I tried to call an API function from the primary dll's lua environment/interpreter that was built/loaded in the extension dll?
OR does lua have some kind of special mechanism for loading native dlls that provide new C API functions that allows it to bypass normal linking rules?
Hopefully I have provided enough details to make the questions specific, if not I will be happy to refine the scenario/questions further.
Edit: I have looked at Bundling additional Lua libraries for embedded and statically linked Lua runtime and I believe it may be helpful in providing a solution ultimately but I'd like to understand it at the linker level.

You can't have the situation when you load one interpreter (let's say it's linked statically) and load a module X that is linked against a dll with a Lua interpreter, which loads another copy of the interpreter. This is likely to cause an application crash. You need to make the loaded dll to use the interpreter that is already loaded, either by linking against that dll with the interpreter or by using a proxy dll (see below).
You have two main options: (1) make dllA that is loaded by the main application that in turn depends on Lua dll; you can then link all other lua modules against Lua dll without any issues; or (2) include Lua dll into dllA, but keep Lua methods exposed so that lua modules can be linked against that dllA.
I think the first option is much simpler and likely not to require any changes to the Lua modules (as long as you can keep the name of the Lua dll the same as the one that the modules are compiled against).
Another option I should mention is that you can still use Lua modules compiled against a Lua DLL even with applications that have the Lua interpreter statically compiled. You need to use a proxy DLL; see this maillist thread for the solution and related discussion.

The answer boils down to this:
Don't try to load any Lua extensions from a dll linked against a different Lua-core. Doing so will cause utter chaos.
As long as any Lua extension loaded resolves all its dependencies to the proper Lua core, it does not matter (aside from bloat) how many Lua cores you use.
Keep in mind that windows always resolves symbols according their name and their providing dll.

How to replace the usage of LD_PRELOAD with dlopen()?

I'm working on c++ with shared library usage.
Currently I'm using "LD_PRELOAD" and set this environment variable using setenv()
call.
But I want to use dlopen() API to load shared library. That should work same as like setting environment variable (i.e. LD_PRELOAD) using setenv().
can i use dlopen() to get above requirements? or there is difference in the library loading using LD_PRELOAD and dlopen()?

I'm not 100% sure about this, but as I understand it using LD_PRELOAD makes the program loader load all libraries, first, then the library specified by LD_PRELOAD and last your application. This makes it possible to override system libraries with your own.
Using dlopen loads the shared object after your program is loaded, so can not be used to override system objects.
If the environment variable have to be set for the program to work correctly, then it has to be set before the program is loaded, either in the shell or by your LD_PRELOAD file. If the program doesn't need the environment variable immediately then you can either set it in the program or in the "on-load" function in the shared object loaded by dlopen.

Two projects linking same SQLite library statically causes problems

I have a weird problem. 
I am working on a shared library, written using C and a GUI application written on C++. GUI application uses the shared library. This shared library uses SQLite amalgamation and links statically. GUI also uses SQLite for some configuration purpose. It is also statically linked. Both of them uses latest SQLite version. 
My shared library uses FTS4. I have enabled FTS4 by providing the compile time options while compiling the shared library. All works well with the shared library. All my tests in the shared library codebase is passing.
Problem happens when I start using this in the GUI program. I am getting error like, Unknown module FTS4. This is weird because I have it linked statically in my shared library and all this GUI program does is to dynamically link to my library. When I set the FTS compilation options to the GUI program, error goes away and all works well.
In short,
libfoo.so - Statically links SQLite with FTS4 options turned on
foo - Statically links SQLIte with out any special compile time options. Dynamically links to libfoo.
I am not sure why this is happening. Any help would be great!

It sounds like all the sqlite functions in the shared library are being exported. As a result, when you load the shared object, all of these functions get resolved to the main application, which also defines identical copies of the symbol names, but with different functionality.
You may have better luck compiling your shared object with a map file looking something like:
{
global:
*;
local:
sqlite3*;
};
put it into a file called foo.map, and when linking libfoo.so (assuming using gcc)
gcc -Wl,--version-script=foo.map -o libfoo.so <dependent files>
This should hopefully cause the use of the internal symbols within the .so rather than the ones defined in the main application.

Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?

I have been involved in some debate with respect to libraries in Linux, and would like to confirm some things.
It is to my understanding (please correct me if I am wrong and I will edit my post later), that there are two ways of using libraries when building an application:
Static libraries (.a files): At link time, a copy of the entire library is put into the final application so that the functions within the library are always available to the calling application
Shared objects (.so files): At link time, the object is just verified against its API via the corresponding header (.h) file. The library isn't actually used until runtime, where it is needed.
The obvious advantage of static libraries is that they allow the entire application to be self-contained, while the benefit of dynamic libraries is that the ".so" file can be replaced (ie: in case it needs to be updated due to a security bug) without requiring the base application to be recompiled.
I have heard some people make a distinction between shared objects and dynamic link libraries (DLL's), even though they are both ".so" files. Is there any distinction between shared objects and DLLs when it comes to C/C++ development on Linux or any other POSIX compliant OS (ie: MINIX, UNIX, QNX, etc)? I am told that one key difference (so far) is that shared objects are just used at runtime, while DLL's must be opened first using the dlopen() call within the application.
Finally, I have also heard some developers mention "shared archives", which, to my understanding, are also static libraries themselves, but are never used by an application directly. Instead, other static libraries will link against the "shared archives" to pull some (but not all) functions/resources from the shared archive into the static library being built.
Thank you all in advance for your assistance.
Update
In the context in which these terms were provided to me, it was effectively erroneous terms used by a team of Windows developers that had to learn Linux. I tried to correct them, but the (incorrect) language norms stuck.
Shared Object: A library that is automatically linked into a program when the program starts, and exists as a standalone file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.so). The library must be present at compile time, and when the application starts.
Static Library: A library that is merged into the actual program itself at build time for a single (larger) application containing the application code and the library code that is automatically linked into a program when the program is built, and the final binary containing both the main program and the library itself exists as a single standalone binary file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.a). The library must be present at compile time.
DLL: Essentially the same as a shared object, but rather than being included in the linking list at compile time, the library is loaded via dlopen()/dlsym() commands so that the library does not need to be present at build time for the program to compile. Also, the library does not need to be present (necessarily) at application startup or compile time, as it is only needed at the moment the dlopen/dlsym calls are made.
Shared Archive: Essentially the same as a static library, but is compiled with the "export-shared" and "-fPIC" flags. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylibS for a library file named mylibS.a). The distinction between the two is that this additional flag is required if a shared object or DLL wants to statically link the shared archive into its own code AND be able to make the functions in the shared object available to other programs, rather than just using them internal to the DLL. This is useful in the case when someone provides you with a static library, and you wish to repackage it as an SO. The library must be present at compile time.
Additional Update
The distinction between "DLL" and "shared library" was just a (lazy, inaccurate) colloquialism in the company I worked in at the time (Windows developers being forced to shift to Linux development, and the term stuck), adhering to the descriptions noted above.
Additionally, the trailing "S" literal after the library name, in the case of "shared archives" was just a convention used at that company, and not in the industry in general.

A static library(.a) is a library that can be linked directly into the final executable produced by the linker,it is contained in it and there is no need to have the library into the system where the executable will be deployed.
A shared library(.so) is a library that is linked but not embedded in the final executable, so will be loaded when the executable is launched and need to be present in the system where the executable is deployed.
A dynamic link library on windows(.dll) is like a shared library(.so) on linux but there are some differences between the two implementations that are related to the OS (Windows vs Linux) :
A DLL can define two kinds of functions: exported and internal. The exported functions are intended to be called by other modules, as well as from within the DLL where they are defined. Internal functions are typically intended to be called only from within the DLL where they are defined.
An SO library on Linux doesn't need special export statement to indicate exportable symbols, since all symbols are available to an interrogating process.

I've always thought that DLLs and shared objects are just different terms for the same thing - Windows calls them DLLs, while on UNIX systems they're shared objects, with the general term - dynamically linked library - covering both (even the function to open a .so on UNIX is called dlopen() after 'dynamic library').
They are indeed only linked at application startup, however your notion of verification against the header file is incorrect. The header file defines prototypes which are required in order to compile the code which uses the library, but at link time the linker looks inside the library itself to make sure the functions it needs are actually there. The linker has to find the function bodies somewhere at link time or it'll raise an error. It ALSO does that at runtime, because as you rightly point out the library itself might have changed since the program was compiled. This is why ABI stability is so important in platform libraries, as the ABI changing is what breaks existing programs compiled against older versions.
Static libraries are just bundles of object files straight out of the compiler, just like the ones that you are building yourself as part of your project's compilation, so they get pulled in and fed to the linker in exactly the same way, and unused bits are dropped in exactly the same way.

I can elaborate on the details of DLLs in Windows to help clarify those mysteries to my friends here in *NIX-land...
A DLL is like a Shared Object file. Both are images, ready to load into memory by the program loader of the respective OS. The images are accompanied by various bits of metadata to help linkers and loaders make the necessary associations and use the library of code.
Windows DLLs have an export table. The exports can be by name, or by table position (numeric). The latter method is considered "old school" and is much more fragile -- rebuilding the DLL and changing the position of a function in the table will end in disaster, whereas there is no real issue if linking of entry points is by name. So, forget that as an issue, but just be aware it's there if you work with "dinosaur" code such as 3rd-party vendor libs.
Windows DLLs are built by compiling and linking, just as you would for an EXE (executable application), but the DLL is meant to not stand alone, just like an SO is meant to be used by an application, either via dynamic loading, or by link-time binding (the reference to the SO is embedded in the application binary's metadata, and the OS program loader will auto-load the referenced SO's). DLLs can reference other DLLs, just as SOs can reference other SOs.
In Windows, DLLs will make available only specific entry points. These are called "exports". The developer can either use a special compiler keyword to make a symbol an externally-visible (to other linkers and the dynamic loader), or the exports can be listed in a module-definition file which is used at link time when the DLL itself is being created. The modern practice is to decorate the function definition with the keyword to export the symbol name. It is also possible to create header files with keywords which will declare that symbol as one to be imported from a DLL outside the current compilation unit. Look up the keywords __declspec(dllexport) and __declspec(dllimport) for more information.
One of the interesting features of DLLs is that they can declare a standard "upon load/unload" handler function. Whenever the DLL is loaded or unloaded, the DLL can perform some initialization or cleanup, as the case may be. This maps nicely into having a DLL as an object-oriented resource manager, such as a device driver or shared object interface.
When a developer wants to use an already-built DLL, she must either reference an "export library" (*.LIB) created by the DLL developer when she created the DLL, or she must explicitly load the DLL at run time and request the entry point address by name via the LoadLibrary() and GetProcAddress() mechanisms. Most of the time, linking against a LIB file (which simply contains the linker metadata for the DLL's exported entry points) is the way DLLs get used. Dynamic loading is reserved typically for implementing "polymorphism" or "runtime configurability" in program behaviors (accessing add-ons or later-defined functionality, aka "plugins").
The Windows way of doing things can cause some confusion at times; the system uses the .LIB extension to refer to both normal static libraries (archives, like POSIX *.a files) and to the "export stub" libraries needed to bind an application to a DLL at link time. So, one should always look to see if a *.LIB file has a same-named *.DLL file; if not, chances are good that *.LIB file is a static library archive, and not export binding metadata for a DLL.

You are correct in that static files are copied to the application at link-time, and that shared files are just verified at link time and loaded at runtime.
The dlopen call is not only for shared objects, if the application wishes to do so at runtime on its behalf, otherwise the shared objects are loaded automatically when the application starts. DLLS and .so are the same thing. the dlopen exists to add even more fine-grained dynamic loading abilities for processes. You dont have to use dlopen yourself to open/use the DLLs, that happens too at application startup.

I suspect some kind of misunderstanding here, but header files, at least of the .h variety used for compiling source code, are most definitely NOT checked during link time.
.h, and for that matter, .c/.cpp files, are only involved during the compilation phase, which includes preprocessing. Once the object code has been created the header file is long gone well before the linker gets around to dealing with things.

Link dll to static library and load it into an application linked against the same static library

I am creating an application that supports modules in the form of dlls that are loaded dynamically at runtime. The code is laid out in the following way:
core - static library
This has a mechanism to load shared libraries and call a "create" function that returns a new Module object (uses a shared header).
module shared library (linked against core static library)
This module uses the shared Module header and also uses other classes from the core library (hence why it is linked against the core library). It is built to include all symbols from static libraries.
test application executable (linked against core static library)
I am getting funky, and seemingly sporadic behavior. They always end up in access violations but it seems that member variables that I very explicitly set (integers) will print out in later functions as garbage (i have verified that they are not being deleted earlier). This only ever seems to happen if they dynamic library is loaded (even if I never call the create function).
My main question is, is there are danger here that the symbols in the shared library will conflict with the symbols in the executable (since they came from the same static library) and cause problems even though they are from the exact same static library?

I can't speak for Linux and OS X behavior, but on Windows, the following is exactly what is happening. Since you say you also want to compile on Windows, this is relevant.
The problem you are experiencing is that you actually have multiple versions of everything in the core. Each module and the application itself has its own copy of the core, and their variables are not shared. This includes the C runtime, so things like new/delete across module boundaries are fraught with peril.
To verify that this is what is happening, create a simple test: set a global in the core to a value in your test application, then from from your dynamically loaded code try to access that global and see what you get. I will wager that you will see that your store to the global will not be reflected!
Solutions:
1) Make core a shared dynamic library. This may or may not be an option for you.
2) Operate extremely carefully with the knowledge of the above; All CRT and/or your own core state will not be shared, so you must make sure things will be allocated/destroyed on their own side of the module boundaries, among other things.
My own application is designed almost identically to yours; ie a static library with shared code needed by both the application and the modules, and then dynamically loaded plugins loaded by the application core.
What I do for all shared core state that must be accessed across modules is that the first thing each module does after loading is have its "core pointer" set to an instantiation of the core libraries in the application. This ensures that all modules are working with the same data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js