Dilemma about shared libraries on Unix - c++

If I build a shared library (shared object) I can use it in two following ways:
First way is to use a shared library like I would use a static library.
#include "myLib.h"
//...
//afterwards I can use functions defined in mylib.h
myFunction();
The second way of using shared library is by calling dynamic loader API functions: dlopen, dlsym, and dlclose from dlfcn.h. I would use shared library in this way when I want to implement a plugin pattern, for example. Listing would look like this:
#include <dlfcn.h>
void *myLib; /* Handle to shared lib file */
void (*myFunction)(); /* Pointer to loaded function */
//...
//load shared object
myLib = dlopen("/home/dlTest/myLib.so",RTLD_LAZY);
dlerror();
//get handle to function
myFunction = dlsym( myLib, "myFunction");
dlerror();
//execute function
(*myFunction)();
//close lib
dlclose(myLib);
dlerror();
Now my first question is: what is the difference between these two usages of shared object in terms of loading time? By using shared library in first way, we're linking/loading shared library to main app in load-time and in the second way we're doing the same thing in run-time?
Second question. What is the name of these two usages?
First one is called statically linked shared library and the second one is dynamically linked/loaded shared library?
Third question If I've built a shared library without -fPIC flag (osition independent code), would I be able to use it in a second way?
Cheers

These two use modes are typically called implicit and explicit. As you correctly stated the difference in loading is that explicitly linked dynamic library is loaded when dlopen is executed and implicitly linked library is loaded at the time the application is loaded in memory. Each dlopen may take milliseconds to finish, unless library was already loaded, in which case it's very fast, so if you have very stringent latency requirements or need to do frequent loading/unloading then you may decide to make linking implicit or explicitly load the library on program start and don't unload it until it's no longer used.

The main difference is in the error handling. Implicit is easier, but if there's a problem (the library is missing or the function is not in the library) the program will fail to run at all. With explicit loading, you can check the dlopen/dlsym calls for errors and if there's a problem, fall back on some alternative.
To answer your third question, it actually depends on the architecture, but on most ABIs you can still load a shared object compiled without -PIC, but it may be slower to load and require more memory.

Related

difference between Load DLL and Direct Call

i think is very stupid, but I can't understand,
for example, I want use Windows API like GetWindowsDirectory, GetSystemInfo and etc... I can use Api directly or calling through GetProcAddress :
Method 1
here I can calling APIs with LoadLibrary and GetProcAddress :
#include <windows.h>
typedef UINT (WINAPI *GET_WIN_DIR)(LPWSTR lpBuffer, UINT size);
TCHAR infoBuffer[MAX_PATH + 1];
HINSTANSE dllLoad = LoadLibrary("Kernel32.dll");
GET_WIN_DIR function = (GET_WIN_DIR )GetProcAddress(dllLoad, "GetWindowsDirectoryW");
int result = function2(infoBuffer, MAX_PATH + 1);
Method 2
here I can calling directly APIs like GetWindowsDirectory :
#include <windows.h>
TCHAR infoBuffer[MAX_PATH + 1];
GetWindowsDirectory(infoBuffer, MAX_PATH);
I have 2 question :
What is the difference between the two methods above?
is it load Library impact on executable file?(.exe)(I did test, but it'snot changed)
Microsoft calls
Method 1 ... Explicit linking
Method 2 ... Implicit linking
From MSDN Linking an Executable to a DLL:
Implicit linking is sometimes referred to as static load or load-time dynamic linking. Explicit linking is sometimes referred to as dynamic load or run-time dynamic linking.
With implicit linking, the executable using the DLL links to an import library (.lib file) provided by the maker of the DLL. The operating system loads the DLL when the executable using it is loaded. The client executable calls the DLL's exported functions just as if the functions were contained within the executable.
With explicit linking, the executable using the DLL must make function calls to explicitly load and unload the DLL and to access the DLL's exported functions. The client executable must call the exported functions through a function pointer.
An executable can use the same DLL with either linking method. Furthermore, these mechanisms are not mutually exclusive, as one executable can implicitly link to a DLL and another can attach to it explicitly.
In our projects we use implicit linking in any common case.
We use the explicit linking exceptionally in two situations:
for plug-in DLLs which are loaded explicitly at run-time
in special cases where the implicit linked function is not the right one.
The 2nd case may happen if we use DLLs which themselves link to distinct versions of other DLLs (e.g. from Microsoft). This is, of course, a bit critical. Actually, we try to prevent the 2nd case.
No, I don't think it's stupid at all. If you don't understand, ask. That's what this site is for. Maybe you'll get downvoted, who knows, but not by me. Goes with the territory. No pain, no gain, ask me how I know.
Anyway, the main purpose of what #Scheff calls 'explicit linking' is twofold:
If you're not sure whether the the DLL you want to use to is going to be present on the machine at runtime (although you can also use /DELAYLOAD for this which is a lot more convenient).
If you're not sure if the function you want to call is present in (for example) all versions of Windows on which you want your application to run.
Regard point 1, an example of this might be reading or writing WMA files. Some older versions of Windows did not include WMA support by default (we're going back quite a long way here) and if you implicitly link to WMA.DLL then your application won't start up if it's not present. Using explicit linking (or /DELAYLOAD) lets you check for this at runtime and put up a polite message if it's missing while still allowing the rest of your app to function as normal.
As for point 2, you might, for example, want to make use of the LoadIconWithScaleDown() function because it generally produces a nicer scaled icon than LoadIcon(). However, if you just blindly call it then, again, your app wont run on XP because XP doesn't support it, so you would instead call it conditionally, via GetProcAddress(), if it's available and fall back to LoadIcon() if not.
Okay, so to round off, what's the deal with /DELAYLOAD? Well, this is a linker switch that lets you tell the linker which DLL's are optional for your app. Once you've done that, then you can do something like this:
if (LoadIconWithScaleDown)
LoadIconWithScaleDown (...);
else
LoadIcon (...);
And that is pretty neat.
So I hope you can now see that this question is really about the utility of explicit linking versus the inconvenience involved (all of which goes way anyway with /DELAYLOAD). What goes on under the covers is, for me, less interesting.
And yes, the end result, in terms of the way the program behaves, is the same. Explicit linking or delay loading might involve a small (read: tiny) performance overhead but I really wouldn't worry about that, and delay loading involves a few potential 'gotchas' (which won't affect most normal mortals) as detailed here.

Is there a way to uniquely identify a static library at runtime in C++?

With dynamic/shared libraries we can retrieve the DLL/so/dylib handle containing a function address at runtime (with GetModuleHandleEx or dladdr). This could be used to associate some code with the dynamic library which contains it.
The problem comes With static libraries because every link operation is deferred until the final executable link operation and every function gets its resolved final pointer at this time. We can't then associate any address or function pointer to the static lib.
Multiple inline function pointers get an unique pointer at link time (duplicates will be thrown away). So we cannot use one of them to identify the static library.
Multiple static will get a pointer for each translation unit, which is also not identifying the static library.
My only solution for now is to provide an UUID with -DUUID= different for each static library used in the project, but that would be cool to automatise this with an internal compile time feature/trick.
My question is : is there a trick to uniquely identify a static library at runtime on windows and unix platforms (linux/OSX/iOS/Android...) ?
(not tested yet, but for windows, the "__ImageBase" symbol trick could do the job even if i don't like it, but for other platforms i'm stuck).
To be more precise of what i try to achieve, i'm looking to map at runtime some global variable with the respective static library they belong too.
For example: in lib0.lib built from this lib0.cpp code:
namespace {
AClass global0;
}
and in lib1.lib built from this lib1.cpp code:
namespace {
AClass global1;
}
I want to be able at runtime to map in a generic way :
- &global0 with "lib0.lib"
- &global1 with "lib1.lib"
Hope my question is a bit more clear...

MSVCR and CRT Initialization

Out of curiosity, what exactly happens when an application compiled with the MSVCR is loaded, resp. how does the loader of Windows actually initialize the CRT? For what I have gathered so far, when the program as well as all the imported libraries are loaded into memory and all relocations are done, the CRT startup code (_CRT_INIT()?) initializes all global initializers in the .CRT$XC* sections and calls the user defined main() function. I hope this is correct so far.
But let's assume, for the sake of explanation, a program that is not using the MSVCR (e.g. an application built with Cygwin GCC or other compilers) tries to load a library at runtime, requiring the CRT, using a custom loader/runtime linker, so no LoadLibrary() involved. How would the loader/linker has to handle CRT initialization? Would it have to manually initialize all "objects" in said sections, does it have to do something else to make the internal wiring of the library work properly, or would it have to just call _CRT_INIT() (which unpractically is defined in the runtime itself and not exported anywhere as far as I am aware). Would this mix-up even work in any way, assuming that the non-CRT application and the CRT-library would not pass any objects, exceptions and things the like between them?
I would be very interested in finding out, because I can't quite make out what the CRT has an effect on the actual loading process...
Any information is very appreciated, thanks!
The entrypoint for an executable image is selected with the /ENTRY linker option. The defaults it uses are documented well in the MSDN Library article. They are the CRT entrypoint.
If you want to replace the CRT then either pick the same name or use the /ENTRY option explicitly when you link. You'll also need /NODEFAULTLIB to prevent it from linking the regular .lib
Each library compiled against the C++ runtime is calling _DllMainCRTStartup when it's loaded. _DllMainCRTStartup calls _CRT_INIT, which initializes the C/C++ run-time library and invokes C++ constructors on static, non-local variables.
The PE format contains an optional header that has a slot called 'addressofentrypoint', this slot calls a function that will call _DllMainCRTStartup which fires the initialization chain.
after _DllMainCRTStartup finishes the initialization phase it will call your own implemented DllMain() function.
When you learn about programming, someone will tell you that "the first thing that happens is that the code runs in main. But that's a bit like when you learn about atoms in School, they are fairly well organized and operate acorrding to strict rules. If you later go to a Nuclear/Particle Physics class at university, those simple/strict rules are much more detailed and don't always apply, etc.
When you link a C or C++ program the CRT contains some code something like this:
start()
{
CRT_init();
...
Global_Object_Constructors();
...
exit(main());
}
So the initialization is done by the C runtime library itself, BEFORE it calls your main.
A DLL has a DllMain that is executed by LoadLibrary() - this is responsible for initializing/creating global objects in the DLL, and if you don't use LoadLibrary() [e.g. loading the DLL into memory yourself] then you would have to ensure that objects are created and initialized.

How to compile app in C with module?

I want to do application, which can be compiled with external modules, for example like in php. In php you can load modules in runtime, or compile php with modules together, so modules are available without loading in runtime. But i don't understand how this can be done. If i have module in module.c and there is one function, called say_hello, how can i register it to main application, if you understand what i mean?
/* module.c */
#include <stdio.h>
// here register say_hello function, but how, if i can't in global scope
// call another function?
void say_hello()
{
printf("hello!");
}
If i compile all that files(main app + modules) together, there isn't some reference to say_hello function from main app, because it is called only if user call it in its code. So how can i say to my app, hey, there is say_hello function, if someone want to call it, you know it exists.
EDIT1: I need to have something like table at runtime, where i can see if user called function exists (have C equivavent). Header files doesn't help to me.
EDIT2: My app is interpret for my script langugage.
EDIT3: If someone call function in php, php interpret must know that function exists. I know about dynamic linking and if .so or .dll is loaded, then some start routine is called and you can simple register function in that dll, so php interpret can see, if some module registred for example function called "say_hello". But if i want compile php with for example gd support, then how gd functions are registred to some php function list, hashtable or whatever?
I guess what you are looking for is dynamic libraries (we call runtime loadable modules as dynamic/shared libraries in C and in the OS world, in general). Take, for example, Pidgin which supports plugins to extend it's functionalities. It gives a particular interface to it's plugin-makers to abide by, say functions to register, load, unload and use, which the plugins will have to follow.
When the program loads, it looks for such dynamic libraries in it's plugins directory, if present, it'll load and use it, else it'll skip exposing the functionality. The reason why an interface is needed is that since different modules can have different functionalities which are unknown uptil runtime, an app. has to have a common, agreed-upon way of "talking" to it's plugins/modules.
Every C program can be linked to a static or a dynamic library; static will copy the code from the library to the said program, there by leaving no dependencies for the program to run, while linking to a dynamic library expects the dynamic library to be present when the program is launched. A third way of doing it, is not to link to a DLL, but just asking the OS to perform a load operation of the library. If this succeeds, then the dynamic module is used, else ignored. The functionality which the dynamic library should perform is exposed to the user, only if the load call succeeds.
It is to be noted that this is a operating system provided feature and it has nothing to do with the language used (C or C++ or Python doesn't matter here); as far as C is concered, the compiler still links to known code i.e. code which is available # compile time. This is the reason for different operating system, one needs to write different code to load a dynamic module. Even more, the file type/format of syuch libraries vary from system to system. In Linux it's called shared objects (.so), in Mac it's called dynamic libraries (.dylib) and in Windows as Dynamic link libraries (.dll).
C is not interpreted language. So you need linking, you may want static linking or dynamic linking.
Program building consists of 2 major phases: compiling and linking. During compiling all c-files are translated into machine code, leaving called functions unresolved (obj or o files). Then linker merges all these files into one executable, resolving what was unresolved.
This is static linking. Linked module becomes integral part of executable.
Dynamic linking is platform specific. Under windows these are DLLs. You should issue a system call to load DLL after which you will be able to call functions from it.
What you need is dynamic library. Let's first take a look at the example provided in the Linux manpage of dlopen(3):
/* Load the math library, and print the cosine of 2.0: */
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int main(int argc, char **argv) {
void *handle;
double (*cosine)(double);
char *error;
handle = dlopen("libm.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
dlerror(); /* Clear any existing error */
/* Writing: cosine = (double (*)(double)) dlsym(handle, "cos");
would seem more natural, but the C99 standard leaves
casting from "void *" to a function pointer undefined.
The assignment used below is a workaround. */
*(void **) (&cosine) = dlsym(handle, "cos");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(EXIT_FAILURE);
}
printf("%f\n", (*cosine)(2.0));
dlclose(handle);
exit(EXIT_SUCCESS);
}
There's also a C++ dlopen mini HOWTO.
For more general information about dynamic loading, start from the wikipedia page first.
I think it is impossible, if i understand what you mean. Because it is compiled language.

Alternatives to dlsym() and dlopen() in C++

I have an application a part of which uses shared libraries. These libraries are linked at compile time.
At Runtime the loader expects the shared object to be in the LD_LIBRARY_PATH , if not found the entire application crashes with error "unable to load shared libraries".Note that there is no guarantee that client would be having the library, in that case I want the application to leave a suitable error message also the independent part should work correctly.
For this purpose I am using dlsym() and dlopen() to use the API in the shared library. The problem with this is if I have a lot of functions in the API, i have to access them Individually using dlsym() and ptrs which in my case are leading to memory corruption and code crashes.
Are there any alternatives for this?
The common solution to your problem is to declare a table of function pointers, to do a single dlsym() to find it, and then call all the other functions through a pointer to that table. Example (untested):
// libfoo.h
struct APIs {
void (*api1)(void);
void *(*api2)(int);
long (*api3)(int, void *);
};
// libfoo.cc
void fn1(void) { ... }
void *fn2(int) { ... }
long fn3(int, void *) { ... }
APIs api_table = { fn1, fn2, fn3 };
// client.cc
#include "libfoo.h"
...
void *foo_handle = dlopen("libfoo.so", RTLD_LAZY);
if (!foo_handle) {
return false; // library not present
}
APIs *table = dlsym(foo_handle, "api_table");
table->api1(); // calls fn1
void *p = table->api2(42); // calls fn2
long x = table->api3(1, p); // calls fn3
P.S. Accessing your API functions individually using dlsym and pointers does not in itself lead to memory corruption and crashes. Most likely you just have bugs.
EDIT:
You can use this exact same technique with a 3rd-party library. Create a libdrmaa_wrapper.so and put the api_table into it. Link the wrapper directly against libdrmaa.so.
In the main executable, dlopen("libdrmaa_wrapper.so", RTLD_NOW). This dlopen will succeed if (and only if) libdrmaa.so is present at runtime and provides all API functions you used in the api_table. If it does succeed, a single dlsym call will give you access to the entire API.
You can wrap your application with another one which first checks for all the required libraries, and if something is missing it errors out nicely, but if everything is allright it execs the real application.
Use below type of code
Class DynLib
{
/* All your functions */
void fun1() {};
void fun2() {};
.
.
.
}
DynLib* getDynLibPointer()
{
DynLib* x = new Dynlib;
return x;
}
use dlopen() for loading this library at runtime.
and use dlsym() and call getDynLibPointer() which returns DynLib object.
from this object you can access all your functions jst as obj.fun1().....
This is ofcource a C++ style of struct method proposed earlier.
You are probably looking for some form of delay library load on Linux. It's not available out-of-the-box but you can easily mimic it by creating a small static stub library that would try to dlopen needed library on first call to any of it's functions (emitting diagnostic message and terminating if dlopen failed) and then forwarding all calls to it.
Such stub libraries can be written by hand, generated by project/library-specific script or generated by universal tool Implib.so:
$ implib-gen.py libxyz.so
$ gcc myapp.c libxyz.tramp.S libxyz.init.c ...
Your problem is that the resolution of unresolved symbols is done very early on - on Linux I believe the data symbols are resolved at process startup, and the function symbols are done lazily. Therefore depending on what symbols you have unresolved, and on what sort of static initialization you have going on - you may not get a chance to get in with your code.
My suggestion would be to have a wrapper application that traps the return code/error string "unable to load shared libraries", and then converts this into something more meaningful. If this is generic, it will not need to be updated every time you add a new shared library.
Alternatively you could have your wrapper script run ldd and then parse the output, ldd will report all libraries that are not found for your particular application.