Load-time dynamic link library dispatching

Load-time dynamic link library dispatching - c++

I'd like my Windows application to be able to reference an extensive set of classes and functions wrapped inside a DLL, but I need to be able to guide the application into choosing the correct version of this DLL before it's loaded. I'm familiar with using dllexport / dllimport and generating import libraries to accomplish load-time dynamic linking, but I cannot seem to find any information on the interwebs with regard to possibly finding some kind of entry point function into the import library itself, so I can, specifically, use CPUID to detect the host CPU configuration, and make a decision to load a paricular DLL based on that information. Even more specifically, I'd like to build 2 versions of a DLL, one that is built with /ARCH:AVX and takes full advantage of SSE - AVX instructions, and another that assumes nothing is available newer than SSE2.
One requirement: Either the DLL must be linked at load-time, or there needs to be a super easy way of manually binding the functions referenced from outside the DLL, and there are many, mostly wrapped inside classes.
Bonus question: Since my libraries will be cross-platform, is there an equivalent for Linux based shared objects?

I recommend that you avoid dynamic resolution of your DLL from your executable if at all possible, since it is just going to make your life hard, especially since you have a lot of exposed interfaces and they are not pure C.
Possible Workaround
Create a "chooser" process that presents the necessary UI for deciding which DLL you need, or maybe it can even determine it automatically. Let that process move whatever DLL has been decided on into the standard location (and name) that your main executable is expecting. Then have the chooser process launch your main executable; it will pick up its DLL from your standard location without having to know which version of the DLL is there. No delay loading, no wonkiness, no extra coding; very easy.
If this just isn't an option for you, then here are your starting points for delay loading DLLs. Its a much rockier road.
Windows
LoadLibrary() to get the DLL in memory: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
GetProcAddress() to get pointer to a function: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683212(v=vs.85).aspx
OR possibly special delay-loaded DLL functionality using a custom helper function, although there are limitations and potential behavior changes.. never tried this myself: https://msdn.microsoft.com/en-us/library/151kt790.aspx (suggested by Igor Tandetnik and seems reasonable).
Linux
dlopen() to get the SO in memory: http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html
dladdr() to get pointer to a function: http://man7.org/linux/man-pages/man3/dladdr.3.html

To add to qexyn's answer, one can mimic delay loading on Linux by generating a small static stub library which would dlopen on first call to any of it's functions and then forward actual execution to shared library. Generation of such stub library can be automatically generated by custom project-specific script or Implib.so:
# Generate stub
$ implib-gen.py libxyz.so
# Link it instead of -lxyz
$ gcc myapp.c libxyz.tramp.S libxyz.init.c

Related

Do I need to distribute a header file and a lib file with a DLL?

I'm updating a DLL for a customer and, due to corporate politics - among other things - my company has decided to no longer share source code with the customer.
Previously. I assume they had all the source and imported it as a VC++6 project. Now they will have to link to the pre-compiled DLL. I would imagine, at minimum, that I'll need to distribute the *.lib file with the DLL so that DLL entry-points can be defined. However, do I also need to distribute the header file?
If I can get away with not distributing it, how would the customer go about importing the DLL into their code?

Yes, you will need to distribute the header along with your .lib and .dll
Why ?
At least two reasons:
because C++ needs to know the return type and arguments of the functions in the library (roughly said, most compilers use name mangling, to map the C++ function signature to the library entry point).
because if your library uses classes, the C++ compiler needs to know their layout to generate code in you the library client (e.g. how many bytes to put on the stack for parameter passing).
Additional note: If you're asking this question because you want to hide implementation details from the headers, you could consider the pimpl idiom. But this would require some refactoring of your code and could also have some consequences in terms of performance, so consider it carefully

However, do I also need to distribute the header file?
Yes. Otherwise, your customers will have to manually declare the functions themselves before they can use it. As you can imagine, that will be very error prone and a debugging nightmare.

In addition to what others explained about header/LIB file, here is different perspective.
The customer will anyway be able to reverse-engineer the DLL, with basic tools such as Dependency Walker to find out what system DLLs your DLL is using, what functions being used by your DLL (for example some function from AdvApi32.DLL).
If you want your DLL to be obscured, your DLL must:
Load all custom DLLs dynamically (and if not possible, do the next thing anyhow)
Call GetProcAddress on all functions you want to call (GetProcessToken from ADVAPI32.DLL for example
This way, at least dependency walker (without tracing) won't be able to find what functions (or DLLs) are being used. You can load the functions of system DLL by ordinal, and not by name so it becomes more difficult to reverse-engineer by text search in DLL.
Debuggers will still be able to debug your DLL (among other tools) and reverse engineer it. You need to find techniques to prevent debugging the DLL. For example, the very basic API is IsDebuggerPresent. Other advanced approaches are available.
Why did I say all this? Well, if you intend to not to deliver header/DLL, the customer will still be able to find exported functions and would use it. You, as a DLL provider, must also provide programming elements with it. If you must hide, then hide it totally.

One alternative you could use is to pass only the DLL and make the customer to load it dynamically using LoadLibrary() + GetProcAddress(). Although you still need to let your customer know the signature of the functions in the DLL.
More detailed sample here:
Dynamically load a function from a DLL

dlopen and implicit library loading: two copies of the same library

I have 3 things: open source application (let's call it APP),
closed source shared library (let's call it OPENGL)
and open source plugin for OPENGL (let's call it PLUGIN)[also shared library].
OS: Linux.
There is need to share data between APP and PLUGIN,
so APP linking with PLUGIN, and when I run it,
system load it automatically.
After that APP call eglInitialize that belongs to OPENGL,
and after that this function load PLUGIN again.
And after that I have two copies of PLUGIN in the APP memory.
I know that because of PLUGIN have global data, and after debugging
I saw that there are two copies of global data.
So question how I can fix this behaviour?
I want one instance of PLUGIN, that used by APP and OPENGL.
And I can not change OPENGL library.

It obviously depends a lot on exactly what the libraries are doing, but in general some solution should be possible.
First note that normally if a shared library with the same name is loaded multiple times it will continue to use the same library. This of coruse primarily applies to loading via the standard loading/linking mechanism. If the library calls dlopen on its own it still can get the same library but it depends on the flags to dlopen. Try reading the docs on dlopen to get an understanding of how it works and how you can manipulate it.
You can also try positioning the PLUGIN earlier in the linker command so that it gets loaded first and thus might avoid a double load later one. If you must load the PLUGIN dynamically this obviously won't help. You can also check if LD_PRELOAD might resolve the linking order.
As a last resort you may have to resort to using LD_LIBARY_PATH and putting an interface library in from of the real one. This one will simply pass calls to the real one but will intercept duplicate loads and shunt them to the previous load.
This is just a general direction to consider. Your actual answer will depend highly on your code and what the other shared libraries do. Always investigate linker load ordering first, as it is the easiest to check, and then dlopen flags, before going into the other options.

I suspect that OPENGL is loading PLUGIN with the RTLD_LOCAL flag. This
is normally what you want when loading a plugin, so that multiple
plugins don't conflict.
We've had similar problems with loading code under Java: we'd load a
dozen or so different modules, and they couldn't communicate with one
another. It's possible that our solution would work for you: we wrote a
wrapper for the plugin, and told Java that the wrapper was the plugin.
That plugin then loaded each of the other shared objects, using dlopen
with RTLD_GLOBAL. This worked between plugins. I'm not sure that it
will allow the plugins to get back to the main, however (but I think it
should). And IIRC, you'll need special options when linking main for
its symbols to be available. I think Linux treats the symbols in main
as if main had been loaded with RTLD_LOCAL otherwise. (Maybe
--export-dynamic? It's been a while since I've had to do this, and I
can't remember exactly.)

Catching calls from a program in runtime and mapping them to other calls

A program usually depends on several libraries and might sometimes depend on other programs as well. I look at projects like Wine and think how do they figure out what calls a program is making?
In a Linux environment, what are the approaches used to know what calls an executable is making in runtime in order to catch and map them to other calls?
Any code snippets or references to resources for extra reading is greatly appreciated :)

On Linux you're looking for the LD_PRELOAD environment variable. This will load your libraries before any requested by the program. If you provide a function definition that matches one loaded by the target program then your version will be called instead.
You can't really detect what functions a program is calling however. You can however get all the functions in a shared library and implement all of those. You aren't really catching the functions, you are simply reimplementing them.
Projects like Wine do this in some cases, but not in all. They also rewrite some of the dynamic libraries. So when a Win32 loads some DLL it is actually loading the Wine version and not the native version. This is essentially the same concept of replacing the functions with your own.
Lookup LD_PRELOAD for more information.

Difference between GetModuleHandle and including header

maybe stupid question, but i dont know answer. What is difference between using GetModuleHandle or LoadLibrary to load dll(and then to use function of that dll) and to include directly desired header. For example, with using GetModuleHandle:
typedef void (WINAPI *PGNSI)(LPSYSTEM_INFO);
// Call GetNativeSystemInfo if supported or GetSystemInfo otherwise.
PGNSI pGNSI;
SYSTEM_INFO si;
ZeroMemory(&si, sizeof(SYSTEM_INFO));
pGNSI = (PGNSI) GetProcAddress(
GetModuleHandle(TEXT("kernel32.dll")),
"GetNativeSystemInfo");
if(NULL != pGNSI)
pGNSI(&si); //calling function through pointer
else GetSystemInfo(&si);
But I can include windows.h header to directly call that function from my code:
#include <windows.h>
SYSTEM_INFO si;
ZeroMemory(&si, sizeof(SYSTEM_INFO));
GetNativeSystemInfo(&si);
The same applies for example to opengl32.dll, i dont know if it is better to include header file for opengl functions in my project or to use Getmodulehandle and GetProcAdress to invoke desired functions. What is the difference? Is first method with using getmodulehandle benefit in some way? Thanks for answers.

First, make sure you understand that GetModuleHandle and LoadLibrary are not exactly equivalent. But since that's not directly part of your question, I'll leave off a big explanation and just suggest you make sure to understand the documentation in those two links.
To use a dll function directly as if it were like any other function, you don't just include the header. In addition to the header, somewhere in your project it is being told to link against a corresponding lib file. In your example, it would be kernel32.lib. This might be done through various means, such as linker settings in the project or having a #pragma comment (lib, ...) in a file.
When a program is built like that, code is written by the compiler to load that dll when the program starts. If the dll in question cannot be found when you actually try to run the program, then it will fail with an error message. You have no way to write code to catch the failure and take some alternative action.
For dlls that are part of the operating system (like kernel32.dll) or at least typically supplied with it, this immediate loading behavior isn't an issue since you can safely assume the dll will always be there. On the other hand, if you were to build against a dll which is not typically present, then you would have more of a concern. Either you would have to make sure such a dll is distributed with your program or else somehow try to make sure the user installs whatever other package is necessary to get that dll on their system.
Also, if the dll loads but any of the functions you're trying to use from that dll don't actually exist in it, then it will immediately fail with an error message. (It doesn't wait until your program tries to call the function. It finds this discrepancy when the program starts and aborts then.) This can be an issue if different versions of the dll exist out in the world.
Now, when you use LoadLibrary/GetProcAddress, you are asking for the dll to be loaded at the time of your choosing and asking to find a particular function provided by that dll. If either of those steps fails, you have the ability to write code to deal with it in some reasonable manner.
This can be used for various purposes. For instance, you could make a plugin mechanism where the program searches for and loads plugin dlls from some particular folder on the fly. Since the program doesn't know ahead of time which plugins will be present, LoadLibrary is the only way to do this.
Another thing LoadLibrary/GetProcAddress can be used for is to load a dll and call a function from it even when you don't have the proper header and lib files. If you know the name of the dll, the name of the function, and exact signature of the function (parameter types, return type, calling convention) then you have enough to write the code to load that dll and call the function successfully. Occasionally this can be useful. For instance, it's one way that people are able to use certain "undocumented" functions provided by Windows dlls.
Lastly, LoadLibrary/GetModuleHandle/GetProcAddress can be used to let you use functionality that doesn't necessarily exist on all the operating systems you want to support. This appears to be the reason for the code snippet you have that calls either GetNativeSystemInfo or GetSystemInfo. The former is only available from WinXP/2003 on up while the latter is available on Win2000. If the code had just been written as a direct call to GetNativeSystemInfo, then the program would fail to run on Windows 2000. But instead, what you have there checks to see if GetNativeSystemInfo exists on the current operation system and only uses it if so, otherwise it falls back on the more widely supported GetSystemInfo.
So in the case of your example, which technique you choose to call that function depends on which operating systems you intend to support. If your software does not need to run on Windows 2000, then just directly calling GetNativeSystemInfo is a lot easier (and most likely the preferable way to go).

Some functions don't exist in older versions of Windows and when you call a function directly, that function ends up in the import table of your program. If Windows fails to find one of the functions in the import table, the program will not run at all.
In your specific example, GetNativeSystemInfo was added in Windows XP and if you call it directly then your program will not run on Windows 2000 or older.
LoadLibrary is also useful if you want to support 3rd-party plug-ins etc.

GetModuleHandle enables you to load dlls dynamically, what can be used for instance for implementing plug-ins or loading some resources on-demand.
Underneath, there is no difference between the two methods -- static library that you link just contains code that does dynamic linking when program starts (in C).

It depends, in this case I'd say its being used so that the dll/exe won't get loading errors from the windows LDR if the system(kernel32.dll etc) doesn't have exports that the binary might use, onr good example of this is the DEP functions for windows XP, which only exist in SP 2+, but its not a good idea to force a required SP, as it can cut down on the avaiable audience for the program.
OpenGL uses the same sorta principle, due to the fact one can't predict the API support (and extensions), so one must check for it, then import it or an alternative with wglGetProcAddress
Another reason, which is more 'cheap' is so that one doesn't not have to link to certain libs and include certain headers, especially if your using say only 1 function from a very large SDK, this prevents other developers wasting time tring to get the huge SDK(they ofc will need the binary contianing the export)

LoadLibrary and GetModulehandle both will work for the same stuff i.e; they will map the module on to the process at run time but in case of LoadLibrary it increases the reference count in the kernel perspective where as the later will not do so..

Benefits of exporting a class from a dll vs. static library

I have a C++ class I'm writing now that will be used all over a project I'm working on. I have the option to put it in a static library, or export the class from a dll. What are the benefits/penalties for each approach. The only one I can think of is compiled code size which I don't really care about. Thanks!

Advantages of a DLL:
You can have multiple different exe's that access this functionality, so you will have a smaller project size overall.
You can dynamically update your component without replacing the whole exe. If you do this though be careful that the interface remains the same.
Sometimes like in the case of LGPL you are forced into using a DLL.
You could have some components as C#, Python or other languages that tie into your DLL.
You can build programs that consume your DLL that work with different versions of the DLL. For example you could check if a function exists in a certain operating system DLL and only call it if it exists, and otherwise do some other processing.
Advantages of Static library:
You cannot have dll verisoning problems that way
Less to distribute, you aren't forced into a full installer if you only have a small application.
You don't have to worry about anyone else tying into your code that would have been accessible if it was a DLL.
Easier to develop a static library as you don't need to worry about exports and imports.
Memory management is easier.

One of the most significant and often unnoted features of dynamic libraries on Windows is that DLLs have their own heap. This can be an advantage or a disadvantage depending on your point of view but you need to be aware of it. For example, a global variable in a DLL will be shared among all the processes attaching to that library which can be a useful form of de facto interprocess communication or the source of an obscure run time error.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js