Pybind11 and shared libraries [duplicate]

Pybind11 and shared libraries [duplicate] - c++

Routine is not loaded until it is called. All routines are kept on disk in a re-locatable load format. The main program is loaded into memory & is executed. This is called Dynamic Linking.
Why this is called Dynamic Linking? Shouldn't it be Dynamic Loading because Routine is not loaded until it is called in dynamic loading where as in dynamic linking, Linking postponed until execution time.

This answer assumes that you know basic Linux command.
In Linux, there are two types of libraries: static or shared.
In order to call functions in a static library you need to statically link the library into your executable, resulting in a static binary.
While to call functions in a shared library, you have two options.
First option is dynamic linking, which is commonly used - when compiling your executable you must specify the shared library your program uses, otherwise it won't even compile. When your program starts it's the system's job to open these libraries, which can be listed using the ldd command.
The other option is dynamic loading - when your program runs, it's the program's job to open that library. Such programs are usually linked with libdl, which provides the ability to open a shared library.
Excerpt from Wikipedia:
Dynamic loading is a mechanism by which a computer program can, at run
time, load a library (or other binary) into memory, retrieve the
addresses of functions and variables contained in the library, execute
those functions or access those variables, and unload the library from
memory. It is one of the 3 mechanisms by which a computer program can
use some other software; the other two are static linking and dynamic
linking. Unlike static linking and dynamic linking, dynamic loading
allows a computer program to start up in the absence of these
libraries, to discover available libraries, and to potentially gain
additional functionality.
If you are still in confusion, first read this awesome article: Anatomy of Linux dynamic libraries and build the dynamic loading example to get a feel of it, then come back to this answer.
Here is my output of ldd ./dl:
linux-vdso.so.1 => (0x00007fffe6b94000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f400f1e0000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f400ee10000)
/lib64/ld-linux-x86-64.so.2 (0x00007f400f400000)
As you can see, dl is a dynamic executable that depends on libdl, which is dynamically linked by ld.so, the Linux dynamic linker when you run dl. Same is true for the other 3 libraries in the list.
libm doesn't show in this list, because it is used as a dynamically loaded library. It isn't loaded until ld is asked to load it.

Dynamic loading means loading the library (or any other binary for that matter) into the memory during load or run-time.
Dynamic loading can be imagined to be similar to plugins , that is an exe can actually execute before the dynamic loading happens(The dynamic loading for example can be created using LoadLibrary call in C or C++)
Dynamic linking refers to the linking that is done during load or run-time and not when the exe is created.
In case of dynamic linking the linker while creating the exe does minimal work.For the dynamic linker to work it actually has to load the libraries too.Hence it's also called linking loader.
Hence the sentences you refer may make sense but they are still quite ambiguous as we cannot infer the context in which it is referring in.Can you inform us where did you find these lines and at what context is the author talking about?

Dynamic loading refers to mapping (or less often copying) an executable or library into a process's memory after it has started. Dynamic linking refers to resolving symbols - associating their names with addresses or offsets - after compile time.
Here is the link to the full answer by Jeff Darcy at quora
http://www.quora.com/Systems-Programming/What-is-the-exact-difference-between-Dynamic-loading-and-dynamic-linking/answer/Jeff-Darcy

I am also reading the "dinosaur book" and was confused with the loading and linking concept. Here is my understanding:
Both dynamic loading and linking happen at runtime, and load whatever they need into memory.
The key difference is that dynamic loading checks if the routine was loaded by the loader while dynamic linking checks if the routine is in the memory.
Therefore, for dynamic linking, there is only one copy of the library code in the memory, which may be not true for dynamic loading. That's why dynamic linking needs OS support to check the memory of other processes. This feature is very important for language subroutine libraries, which are shared by many programs.

Dynamic linker is a run time program that loads and binds all of the dynamic dependencies of a program before starting to execute that program. Dynamic linker will find what dynamic libraries a program requires, what libraries those libraries require (and so on), then it will load all those libraries and make sure that all references to functions then correctly point to the right place. For example, even the most basic “hello world” program will usually require the C library to display the output and so the dynamic linker will load the C library before loading the hello world program and will make sure that any calls to printf() go to the right code.

Dynamic Loading: Load routine in main memory on call.
Dynamic Linking: Load routine in main memory during execution time,if call happens before execution time it is postponed till execution time.
Dynamic loading does not require special support from Operating system, it is the responsibility of the programmer to check whether the routine that is to be loaded does not exist in main memory.
Dynamic Linking requires special support from operating system, the routine loaded through dynamic linking can be shared across various processes.
Routine is not loaded until it is called. All routines are kept on disk in a re-locatable load format. The main program is loaded into memory & is executed. This is called Dynamic Linking.
The statement is incomplete."The main program is loaded into main memory & is executed." does not specify when the program is loaded.
If we consider that it is loaded on call as 1st statement specifies then its Dynamic Loading

We use dynamic loading to achieve better space utilization
With dynamic loading a program is not loaded until it is called.All routines are kept on a disk in a relocatable load format.The main program is loaded into memory and is executed.
When a routine needs to call another routine, the calling routine first checks to see whether has been loaded.If not , the relocatable linking loader is called to load the desired routine into memory and update program's address tables to reflect this change.Then control is passed to newly loaded routine
Advantages
An unused routine is never loaded .This is most useful when the program code
is large where infrequently occurring cases are needed to handle such as
error routines.In this case although the program code is large ,used code
will be small.
Dynamic loading doesn't need special support from O.S.It is the
responsibility of user to design their program to take advantage of
method.However, O.S can provide libraries to help the programmer

There are two types of Linking Static And Dynamic ,when output file is executed without any dependencies(files=Library) at run time this type of linking is called Static where as Dynamic is of Two types 1.Dynamic Loading Linking 2.Dynamic Runtime Linking.These are Described Below
Dynamic linking refers to linking while runtime where library files are brought to primary memory and linked ..(Irrespective of Function call these are linked).
Dynamic Runtime Linking refers to linking when required,that means whenever there is a function call happening at that time linking During runtime..Not all Functions are linked and this differs in Code writing .

Related

C++ Static Libraries

Aside from inclusion of 3rd party software, why would you make a static library for a project. If your writing the source yourself you could just build it as a part of the project and if it's a library to be used more than once wouldn't it make more sense to dynamically link and sit on a run-time library?

Dynamic libraries have a run-time cost due to relocations† because the base and relative load address of the library is unknown until run-time. That is, the function calls and variable access to dynamic libraries are indirect. For this reason the code for shared libraries must be compiled as position-independent code (-fPIC flag in gcc).
Whereas with static libraries it can use cheaper program counter relative access even with address-space randomization because the relative position of that static library (object files really) is available to the linker.
Note that calls to virtual functions are resolved through the vtable (which the dynamic linker can patch on load), so that the cost of calling a virtual function is always the same regardless of where that function resides. (IIRC, I may need to double-check this statement).
See How To Write Shared Libraries by Ulrich Drepper for full details.
Linking to shared libraries is easier though because they contain a list of other shared libraries they depend upon.
Whereas when linking against a static library one must also link explicitly the dependencies of that static library (because a .a is just a bunch of .o files).
A build system should do extra handling for static libraries so that the user does not have to list static library dependencies every time when linking it.
When linking against a static library the linker only pulls in those .o files from the .a that resolve any unresolved symbols, whereas an entire shared library is loaded at run-time. So that if you have a global object in a .o with constructor/destructor side-effects, those side effects will not happen with a static library unless that global object is linked in. Extra care must be taken to make sure that global object is always linked in.
When linking against a shared librarie residing in a non-standard location, along with -L<path> one must specify -Wl,-rpath=<path> as well for the run-time linker to find the shared library there and/or use -Wl,-rpath=$ORIGIN if the shared library is shipped with the executable. Having to set LD_LIBRARY_PATH is a wrong way.
† What is PLT/GOT?

The use of dynamic libraries has three main advantages: a) When you release an update of your app it can live in a DL, which is smaller for downloading from Internet than the whole app. b) If your app is a great RAM eater, then you can load and unload DL as needed. c) Its obvious purpose: share the same code in different apps, in a machine with low resources.
a) May lead to dll hell, where different files, same or different versions, populate the directory tree and mess what app uses what .dll
b) Is only possible if you reserve an excesive amount of stack RAM. Likely bad design.
c) This may be right for broad used libs, like stdio, drivers, and most of OS helpers.
The usage of static libraries avoids a) and b). The disadvantage are that they make the final executable bigger and that, when code changes, they require likely a full re-compilation of the project

"dynamically linked at run time but statically aware" - how to control which .so file loaded?

In reference to this answer:
There are two Linux C/C++ library types.
Static libraries (*.a) are archives of object code which are linked with and becomes part of the application. They are created
with and can be manipulated using the ar(1) command (i.e. ar
-t libfoo.a will list the files in the library/archive).
Dynamically linked shared object libraries (*.so) can be used in two ways.
The shared object libraries can be dynamically linked at run time but statically aware. The libraries must be available during
compile/link phase. The shared objects are not included into the
binary executable but are tied to the execution.
The shared object libraries can be dynamically loaded/unloaded and linked during execution using the dynamic linking loader system
functions.
what does it mean to make a dynamic lib tied to the execution?
Is this like Windows manifest files that allow the application to load in a specific dll?
What's the mechanism to control the .so loaded?
There must be such a mechanism otherwise the "compiled" .so is the only one ever allowed to be loaded which defeats the purpose of making it dynamic?

It means that the library is available at link time, so the linker can verify that the functions that you reference from the .so are present in the .so. The advantage is that calls to these functions are done transparently to you. In other words, if you are linking to an .so with
int foo(double bar);
you call it like this
int res = foo(4.2);
The linker makes sure that foo is there, and that it takes one argument of type double. After that it "links" the call site int res = ... to the function.
Dynamically loading/unloading during the execution time lets you link without .so present on the build system (hence, no "static awareness"). In exchange for this added flexibility you open your system up to a possibility of not finding the functions that you want in the target .so. Your calls sequence also look considerably trickier than foo(4.2), because you need to go through dlopen preparation step. More info on calling functions from .so is in this Q&A.

Difference between a shared object and a dll

I have a library which at compile time is building a shared object, called libEXAMPLE.so (in the so.le folder), and a dll by the name of EXAMPLE.so (in the dll folder). The two shared objects are quite similar in size and appear to be exactly the same thing. Scouring the internet revealed that there might be a difference in the way programs use the dll to do symbol resolution vs the way it is done with the shared object.
Can you guys please help me out in understanding this?

"DLL" is how windows like to name their dynamic library
"SO" is how linux like to name their dynamic library
Both have same purpose: to be loaded dynamically.
Windows uses PE binary format and linux uses ELF.
PE:
http://en.wikipedia.org/wiki/Portable_Executable
ELF:
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

I suppose a Linux OS.
In Linux, static libraries (.a, also called archives) are used for linking at compile time while shared objects (.so) are used both for linking at load time and at run time.
In your case, it seems that for some reason the library differentiate the files for linking at load time (libEXAMPLE.so) and linking at run time (EXAMPLE.so) even though those 2 files are exactly the same.

Difference between shared objects (.so), static libraries (.a), and DLL's (.so)?

I have been involved in some debate with respect to libraries in Linux, and would like to confirm some things.
It is to my understanding (please correct me if I am wrong and I will edit my post later), that there are two ways of using libraries when building an application:
Static libraries (.a files): At link time, a copy of the entire library is put into the final application so that the functions within the library are always available to the calling application
Shared objects (.so files): At link time, the object is just verified against its API via the corresponding header (.h) file. The library isn't actually used until runtime, where it is needed.
The obvious advantage of static libraries is that they allow the entire application to be self-contained, while the benefit of dynamic libraries is that the ".so" file can be replaced (ie: in case it needs to be updated due to a security bug) without requiring the base application to be recompiled.
I have heard some people make a distinction between shared objects and dynamic link libraries (DLL's), even though they are both ".so" files. Is there any distinction between shared objects and DLLs when it comes to C/C++ development on Linux or any other POSIX compliant OS (ie: MINIX, UNIX, QNX, etc)? I am told that one key difference (so far) is that shared objects are just used at runtime, while DLL's must be opened first using the dlopen() call within the application.
Finally, I have also heard some developers mention "shared archives", which, to my understanding, are also static libraries themselves, but are never used by an application directly. Instead, other static libraries will link against the "shared archives" to pull some (but not all) functions/resources from the shared archive into the static library being built.
Thank you all in advance for your assistance.
Update
In the context in which these terms were provided to me, it was effectively erroneous terms used by a team of Windows developers that had to learn Linux. I tried to correct them, but the (incorrect) language norms stuck.
Shared Object: A library that is automatically linked into a program when the program starts, and exists as a standalone file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.so). The library must be present at compile time, and when the application starts.
Static Library: A library that is merged into the actual program itself at build time for a single (larger) application containing the application code and the library code that is automatically linked into a program when the program is built, and the final binary containing both the main program and the library itself exists as a single standalone binary file. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylib for a library file named mylib.a). The library must be present at compile time.
DLL: Essentially the same as a shared object, but rather than being included in the linking list at compile time, the library is loaded via dlopen()/dlsym() commands so that the library does not need to be present at build time for the program to compile. Also, the library does not need to be present (necessarily) at application startup or compile time, as it is only needed at the moment the dlopen/dlsym calls are made.
Shared Archive: Essentially the same as a static library, but is compiled with the "export-shared" and "-fPIC" flags. The library is included in the linking list at compile time (ie: LDOPTS+=-lmylibS for a library file named mylibS.a). The distinction between the two is that this additional flag is required if a shared object or DLL wants to statically link the shared archive into its own code AND be able to make the functions in the shared object available to other programs, rather than just using them internal to the DLL. This is useful in the case when someone provides you with a static library, and you wish to repackage it as an SO. The library must be present at compile time.
Additional Update
The distinction between "DLL" and "shared library" was just a (lazy, inaccurate) colloquialism in the company I worked in at the time (Windows developers being forced to shift to Linux development, and the term stuck), adhering to the descriptions noted above.
Additionally, the trailing "S" literal after the library name, in the case of "shared archives" was just a convention used at that company, and not in the industry in general.

A static library(.a) is a library that can be linked directly into the final executable produced by the linker,it is contained in it and there is no need to have the library into the system where the executable will be deployed.
A shared library(.so) is a library that is linked but not embedded in the final executable, so will be loaded when the executable is launched and need to be present in the system where the executable is deployed.
A dynamic link library on windows(.dll) is like a shared library(.so) on linux but there are some differences between the two implementations that are related to the OS (Windows vs Linux) :
A DLL can define two kinds of functions: exported and internal. The exported functions are intended to be called by other modules, as well as from within the DLL where they are defined. Internal functions are typically intended to be called only from within the DLL where they are defined.
An SO library on Linux doesn't need special export statement to indicate exportable symbols, since all symbols are available to an interrogating process.

I've always thought that DLLs and shared objects are just different terms for the same thing - Windows calls them DLLs, while on UNIX systems they're shared objects, with the general term - dynamically linked library - covering both (even the function to open a .so on UNIX is called dlopen() after 'dynamic library').
They are indeed only linked at application startup, however your notion of verification against the header file is incorrect. The header file defines prototypes which are required in order to compile the code which uses the library, but at link time the linker looks inside the library itself to make sure the functions it needs are actually there. The linker has to find the function bodies somewhere at link time or it'll raise an error. It ALSO does that at runtime, because as you rightly point out the library itself might have changed since the program was compiled. This is why ABI stability is so important in platform libraries, as the ABI changing is what breaks existing programs compiled against older versions.
Static libraries are just bundles of object files straight out of the compiler, just like the ones that you are building yourself as part of your project's compilation, so they get pulled in and fed to the linker in exactly the same way, and unused bits are dropped in exactly the same way.

I can elaborate on the details of DLLs in Windows to help clarify those mysteries to my friends here in *NIX-land...
A DLL is like a Shared Object file. Both are images, ready to load into memory by the program loader of the respective OS. The images are accompanied by various bits of metadata to help linkers and loaders make the necessary associations and use the library of code.
Windows DLLs have an export table. The exports can be by name, or by table position (numeric). The latter method is considered "old school" and is much more fragile -- rebuilding the DLL and changing the position of a function in the table will end in disaster, whereas there is no real issue if linking of entry points is by name. So, forget that as an issue, but just be aware it's there if you work with "dinosaur" code such as 3rd-party vendor libs.
Windows DLLs are built by compiling and linking, just as you would for an EXE (executable application), but the DLL is meant to not stand alone, just like an SO is meant to be used by an application, either via dynamic loading, or by link-time binding (the reference to the SO is embedded in the application binary's metadata, and the OS program loader will auto-load the referenced SO's). DLLs can reference other DLLs, just as SOs can reference other SOs.
In Windows, DLLs will make available only specific entry points. These are called "exports". The developer can either use a special compiler keyword to make a symbol an externally-visible (to other linkers and the dynamic loader), or the exports can be listed in a module-definition file which is used at link time when the DLL itself is being created. The modern practice is to decorate the function definition with the keyword to export the symbol name. It is also possible to create header files with keywords which will declare that symbol as one to be imported from a DLL outside the current compilation unit. Look up the keywords __declspec(dllexport) and __declspec(dllimport) for more information.
One of the interesting features of DLLs is that they can declare a standard "upon load/unload" handler function. Whenever the DLL is loaded or unloaded, the DLL can perform some initialization or cleanup, as the case may be. This maps nicely into having a DLL as an object-oriented resource manager, such as a device driver or shared object interface.
When a developer wants to use an already-built DLL, she must either reference an "export library" (*.LIB) created by the DLL developer when she created the DLL, or she must explicitly load the DLL at run time and request the entry point address by name via the LoadLibrary() and GetProcAddress() mechanisms. Most of the time, linking against a LIB file (which simply contains the linker metadata for the DLL's exported entry points) is the way DLLs get used. Dynamic loading is reserved typically for implementing "polymorphism" or "runtime configurability" in program behaviors (accessing add-ons or later-defined functionality, aka "plugins").
The Windows way of doing things can cause some confusion at times; the system uses the .LIB extension to refer to both normal static libraries (archives, like POSIX *.a files) and to the "export stub" libraries needed to bind an application to a DLL at link time. So, one should always look to see if a *.LIB file has a same-named *.DLL file; if not, chances are good that *.LIB file is a static library archive, and not export binding metadata for a DLL.

You are correct in that static files are copied to the application at link-time, and that shared files are just verified at link time and loaded at runtime.
The dlopen call is not only for shared objects, if the application wishes to do so at runtime on its behalf, otherwise the shared objects are loaded automatically when the application starts. DLLS and .so are the same thing. the dlopen exists to add even more fine-grained dynamic loading abilities for processes. You dont have to use dlopen yourself to open/use the DLLs, that happens too at application startup.

I suspect some kind of misunderstanding here, but header files, at least of the .h variety used for compiling source code, are most definitely NOT checked during link time.
.h, and for that matter, .c/.cpp files, are only involved during the compilation phase, which includes preprocessing. Once the object code has been created the header file is long gone well before the linker gets around to dealing with things.

Link dll to static library and load it into an application linked against the same static library

I am creating an application that supports modules in the form of dlls that are loaded dynamically at runtime. The code is laid out in the following way:
core - static library
This has a mechanism to load shared libraries and call a "create" function that returns a new Module object (uses a shared header).
module shared library (linked against core static library)
This module uses the shared Module header and also uses other classes from the core library (hence why it is linked against the core library). It is built to include all symbols from static libraries.
test application executable (linked against core static library)
I am getting funky, and seemingly sporadic behavior. They always end up in access violations but it seems that member variables that I very explicitly set (integers) will print out in later functions as garbage (i have verified that they are not being deleted earlier). This only ever seems to happen if they dynamic library is loaded (even if I never call the create function).
My main question is, is there are danger here that the symbols in the shared library will conflict with the symbols in the executable (since they came from the same static library) and cause problems even though they are from the exact same static library?

I can't speak for Linux and OS X behavior, but on Windows, the following is exactly what is happening. Since you say you also want to compile on Windows, this is relevant.
The problem you are experiencing is that you actually have multiple versions of everything in the core. Each module and the application itself has its own copy of the core, and their variables are not shared. This includes the C runtime, so things like new/delete across module boundaries are fraught with peril.
To verify that this is what is happening, create a simple test: set a global in the core to a value in your test application, then from from your dynamically loaded code try to access that global and see what you get. I will wager that you will see that your store to the global will not be reflected!
Solutions:
1) Make core a shared dynamic library. This may or may not be an option for you.
2) Operate extremely carefully with the knowledge of the above; All CRT and/or your own core state will not be shared, so you must make sure things will be allocated/destroyed on their own side of the module boundaries, among other things.
My own application is designed almost identically to yours; ie a static library with shared code needed by both the application and the modules, and then dynamically loaded plugins loaded by the application core.
What I do for all shared core state that must be accessed across modules is that the first thing each module does after loading is have its "core pointer" set to an instantiation of the core libraries in the application. This ensures that all modules are working with the same data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js