crash in Windows dll plugin on load - c++

Looking for some advice as to what can cause a crash of this type in Windows when loading a dll.
I'm writing a Windows VST3 .dll plugin in C++, and experiencing a crash on startup when my .dll is loaded. When running the host application and the plugin via Visual Studio 2019 debugger I'm getting the Access violation executing location ... dialog shown above, and the call stack is empty, making me think memory is completely corrupted.
Maybe this is caused by linking options I need to tweak when building the .dll?
Here are some facts:
The plugin itself works fine. I can get it to load and unload without issues when all 3rd party API calls are commented out.
If I link some 3rd party libraries into my plugin, then again everything works fine.
Once I make a call to any of those 3rd party libraries, then that causes the segfault on startup.
Plugin is built with /MD, but I also tried /MT and saw the same behaviour.
I'm using JUCE C++ to get the VST3 framework and GUI components.
An example of one of the 3rd party library I'm linking against and attempting to call into is the SuperpoweredSDK C++ library. But I've tried other libraries, and I'm seeing the same behaviour with each one.
Unfortunately, getting an entire blank VST3 plugin together is too much code to post in a StackOverflow question. But since I can get the plugin to work if I comment out all calls to 3rd party libraries, I don't think the code itself is the issue, I think the way the dll (.vst3) file is built is the problem.
The critical portion of the CMakeLists.txt file that builds the .vst3 dll file looks like this:
FIND_LIBRARY ( SuperpoweredSDK NAMES SuperpoweredWin141_Debug_MD_x64.lib PATHS ${SUPERPOWERED_DIR}/libWindows )
SET ( TEST_EXTERNAL_DEPS PRIVATE Threads::Threads ${SuperpoweredSDK} )
FILE ( GLOB JUCE_SOURCE ${CMAKE_HOME_DIRECTORY}/JuceLibraryCode/*.cpp )
LIST ( FILTER JUCE_SOURCE EXCLUDE REGEX ".+/JuceLibraryCode/include_juce_audio_plugin_client_[^u].+" )
FILE ( GLOB PLUGIN_SOURCE *.cpp )
ADD_LIBRARY ( TestVST3 SHARED ${JUCE_SOURCE} ${PLUGIN_SOURCE} ${CMAKE_HOME_DIRECTORY}/JuceLibraryCode/include_juce_audio_plugin_client_VST3.cpp )
SET_TARGET_PROPERTIES ( TestVST3 PROPERTIES OUTPUT_NAME "Test" SUFFIX ".vst3" )
TARGET_LINK_LIBRARIES ( TestVST3 PUBLIC ${TEST_EXTERNAL_DEPS} )

what can cause a crash of this type in Windows when loading a dll
Typical reason is initializer of some global variable. Example code that will crash when loading the DLL:
class C
{
public:
C() { __debugbreak(); } // Compiles into `int 3` interrupt, will definitely crash
};
static C g_c;
When you link an external library but don’t call it, linker’s dead code elimination feature often drops complete .obj file which are linked but unused. This also drops variables there, along with their constructors.
One way to debug, set breakpoint on DllMain. But not the DllMain in your code, another one, that’s actually exported. In VC++ 2017 CRT that one is called dllmain_dispatch, the source file for that is in VC++ runtime, crt\src\vcruntime\dll_dllmain.cpp. That dllmain_dispatch calls a couple of functions before calling your DllMain, one of them, dllmain_crt_process_attach, calls _initterm and that one actually calls constructors of all the global stuff. At least one of them fails in your case.
P.S. A possible reason of a crash like this, your third-party library is trying to use COM, and the thread which called LoadLibrary never called CoInitialize().

We eventually did find the cause of this crash. With VST3 files, the DAW loads each .dll (or .vst3), queries the capabilities, and immediately unloads the dll again. It then moves on to the next file, and the process starts all over until all plugins have been queried.
In this case, one of the 3rd party libs I was using (SuperpoweredSDK) starts a background thread to verify the license against a web server somewhere. This was not documented. With the background thread started, the DAW unknowingly proceeds to unload the DLL, leaving the thread hanging there running code that no longer exists! And thus, the crash without any call stack.
Several possible solutions:
Move the SuperpoweredSDK initialization code until later when it really is needed.
In a future not-yet-released version of SuperpoweredSDK (current version is 2.0.1), they're apparently adding an API call to shutdown Superpowered, and that shutdown call will wait until all background threads have finished running. Make sure you call this new API.

Related

Linker /delayload is not fully working with winhttp.dll

I'm trying to prevent dll-hijacking vulnerability in my executable, which has only one explicit dynamic dependence - winhttp.dll.
My attempt is to use linker /delayload:winhttp.dll option and call SetDefaultDllDirectories in the very beginning of the program. Seems like it is working for some modules AND for winhttp itself, but not for it's dependencies - binary is still loading mpr.dll and a few other modules before main (and it is first trying to load it from my executable folder).
It is not helping to add mpr.dll to /delayload list - I'm getting LINK : warning LNK4199: /DELAYLOAD:mpr.dll ignored; no imports found from mpr.dll.
Dependency Walker is showing me this dependence chain: winhttp -> oleaut32 -> combase -> ole32 -> mpr.
Why is this module loading before any of winhttp functions usage?
P.S. I really want to avoid using LoadLibrary + GetProcAddress manually.
Win 8.1, MSVS 2013 Update 5
Procmon:
Dep. walker:
It is crazy, but the true reason for this behaviour appears to be my executable name. It does contain a word "launcher", which seems to be the one, triggering Windows AppCompat layer.
The same executable renamed to "test1.exe" works all well.

Why does my DLL require my program to have a specific name?

I've created some classes in a C++ application that I've developed that allow for third party developers to create their own DLL that subclasses off one my objects and have their subclass get loaded by the application through that DLL at runtime. This part all works.
The problem comes in when the name of the application executable is changed. Say I pass this application, call it "App.exe", to tester and he renames it to App-02.exe to distinguish it from another version he's tested. When App-02.exe attempts to load the DLL it throws the error:
"The program can't start because App.exe is missing from your computer. Try reinstalling the program to fix this problem."
There's an obvious solution, just don't rename the executable. But why should the DLL even care what the program loading it is called?
It sounds like the base class is defined inside App.exe, so you've got a circular dependency in your binaries (app.exe -> plug-in.dll -> app.exe). You can break this cycle by using the dependency inversion principle. Basically, move your base class out of app.exe and into a new .dll. Then you have something like this: app.exe -> plug-in-api.dll, plug-in.dll -> plug-in-api.dll.
Of course, if you do this, your tester might just rename plug-in-api.dll and everything would break again.
https://en.wikipedia.org/wiki/Dependency_inversion_principle

Is using an absolute path to load system DLLs with LoadLibrary() and GetModuleHandle() better to prevent DLL hijacking?

In many cases to load some newer API one would use a construct as such:
(FARPROC&)pfnZwQueryVirtualMemory = ::GetProcAddress(::GetModuleHandle(L"ntdll.dll"), "ZwQueryVirtualMemory");
But then, considering a chance of Dll hijacking, is it better to specify a DLL's absolute path, as such. Or is it just an overkill?
WCHAR buff[MAX_PATH];
buff[0] = 0;
if(::GetSystemDirectory(buff, MAX_PATH) &&
::PathAddBackslash(buff) &&
SUCCEEDED(::StringCchCat(buff, MAX_PATH, L"ntdll.dll")))
{
(FARPROC&)pfnZwQueryVirtualMemory = ::GetProcAddress(::GetModuleHandle(buff), "ZwQueryVirtualMemory");
}
else
{
//Something went wrong
pfnZwQueryVirtualMemory = NULL;
}
The problem with the latter method is that it doesn't always work (for instance with Comctl32.dll.)
You don't have to do anything special for ntdll.dll and kernel32.dll because they are going to be loaded before you get the chance to do anything, they are also on the known-dlls list.
The dll hijacking issues often include auxiliary libraries. Take version.dll for example, it is no longer on the known-dlls list so explicitly linking to it is problematic, it needs to be loaded dynamically.
The best solution is a combination of 3 things:
Call SetDefaultDllDirectories(LOAD_LIBRARY_SEARCH_SYSTEM32) if it is available (Win8+ and updated Win7).
Call LoadLibrary with the full path (GetSystemDirectory) before calling GetModuleHandle.
Don't explicitly link to anything other than kernel32, user32, gdi32, shlwapi, shell32, ole32, comdlg32 and comctl32.
If SetDefaultDllDirectories is not available then it is really hard to protect yourself if you don't control the application directory because various Windows functions will delay-load dlls like shcore.dll without full paths (especially the shell APIs). SetDllDirectory("") helps against the current/working directory but there is no good application directory workaround for unpatched pre-Win8 systems, you just have to test with Process Monitor and manually load the problematic libraries early in WinMain.
The application directory is a problem because some users just put everything in the downloads folder and run it from there. This means you might end up with a malicious dll in your application directory.

CreateProcess STATUS_DLL_NOT_FOUND - which dll?

I have a process which calls CreateProcess. It appears that CreateProcess returns nonzero indicating success. However, the HANDLE to the process then gets immediately set, indicating the process has exited. When I call GetExitCodeProcess, STATUS_DLL_NOT_FOUND is then returned.
I understand that a DLL is missing. I even know exactly which one. However, what I don't understand is how to figure that out programmatically.
I noticed that Windows will present a dialog saying that the process failed to start because it couldn't find the specified DLL (screenshot: http://www.mediafire.com/view/?kd9ddq0e2dlvlb9 ). In the dialog, Windows specifies which DLL is missing. However, I find no way to get that information myself programmatically.
If a process fails to start and would return STATUS_DLL_NOT_FOUND, how do I programmatically retrieve the library name to which the target process was linked which couldn't be found? That way I can automatically record in an error report what DLL appears to be missing or corrupt in a given installation.
CreateProcess returns 0 indicating success.
CreateProcess() returns a BOOL, where 0 is FALSE, aka failure not success.
If a process fails to start and would return STATUS_DLL_NOT_FOUND, how do I programmatically retrieve the library name to which the target process was linked which couldn't be found?
Unfortunately, there is no API for that. Your only option would be to manually access and enumerate the executable's IMPORTS table to find out what DLLs it uses, and then recursively access and enumerate their IMPORTS tables, manually checking every DLL reference you find to see whether that DLL file exists on the OS's search path or not.
If the dll is statically linked you can walk the iat and see if the dll exists. If the dll is dynamically loaded then starting the process suspended and hooking LoadLibrary (or instead of hooking emulate a debugger) is the only way I see.
The best way is to use loader snaps. Basically you use gflags.exe (which is included with windbg) to enable loader snaps; then, run the process with the debugger attached. Loader snaps will enable the loader to print out dbg messages of the process and it will print the failures.
gflags.exe -i yourcode.exe +sls
windbg yourcode.exe
I know this is not a "programmatic" way to find out the problem, but what the loader does is complicated, and you don't really want to be redoing its logic to find the failure. That is why loader snaps were invented.
The very hard way would be: Parsing the .EXE and .DLL files and create the dependency tree of .DLL files.
I don't think there is a way to get a list of DLL files that are missing: When Windows finds one missing DLL file it stops loading so if one DLL file is missing you won't find out if more DLL files are missing.
Another problem you could have is that old DLL versions could have missing "exports" (functions). This is even harder to detect than the dependency tree.
Just since this is somehow the top stackoverflow result on Google for "STATUS_DLL_NOT_FOUND". How to trace and solve any random occurence:
Download SysInternals procmon64.exe (or just the entire set). After startup immediately hit the looking glass 'Stop capture' button (Ctrl+E). And 'Clear' (Ctrl+X).
Set filters for:
'Process name' is to whatever the mentioned process name was (for me it was 'build-script-build.exe') [Add]
'Result' is not 'SUCCESS' [Add]
'Path' ends with '.dll' [Add] [OK]
Start capture again (Ctrl+E).
Run the thing that had a problem again (for me: build cargo). Google for the last listed DLL file.
For me that was VCRUNTIME140.dll, so I installed the VC++ 2015 to 2019 redistributable.
ProcMon is kind of like unix strace.

DLL file loaded twice with DLL redirection through manifest

I'm including python.h in my Visual C++ DLL file project which causes an implicit linking with python25.dll. However, I want to load a specific python25.dll (several can be present on the computer), so I created a very simple manifest file named test.manifest:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<assembly xmlns='urn:schemas-microsoft-com:asm.v1' manifestVersion='1.0'>
<file name="python25.dll" />
</assembly>
And I'm merging it with the automatically embedded manifest file generated by Visual Studio thanks to:
Configuration Properties -> Manifest Tool -> Input and Output -> Additional Manifest Files
-->$(ProjectDir)\src\test.manifest
python25.dll is now loaded twice: the one requested by the manifest, and the one that Windows should find through its search order.
Screendump of Process Explorer http://dl.dropbox.com/u/3545118/python25_dll.png
Why is that happening and how can I just load the DLL file pointed by the manifest?
After exhaustive battle with WinSxS and DLL redirection, here's my advice for you:
Some background
Various things can cause a DLL file to be loaded under Windows:
Explicit linking (LoadLibrary) -- the loader uses the current activation context of the running EXE file. This is intuitive.
Implicit linking ("load time linkage", the "auto" ones) -- the loader uses the default activation context of the depending DLL file. If A.exe depends on B.dll depends on C.dll (all implicit linkage), the loader will use B.dll's activation context when loading C.dll. IIRC, it means if B's DllMain loads C.dll, it can be using B.dll's activation context -- most of the time it means the system-wide default activation context. So you get your Python DLL from %SystemRoot%.
COM (CoCreateInstance) -- this is the nasty one. Extremely subtle. It turns out the loader can look up the full path of a DLL file from the registry using COM (under HKCR\CLSID). LoadLibrary will not do any searching if the user gives it a full path, so the activation context can't affect the DLL file resolution. Those can be redirected with the comClass element and friends, see [reference][msdn_assembly_ref].
Even though you have the correct manifest, sometimes someone can still change the activation context at run time using the Activation Context API. If this is the case, there is usually not much you can do about it (see the ultimate solution below); this is just here for completeness. If you want to find out who is messing with the activation context, WinDbg bp kernel32!ActivateActCtx.
Now on to finding the culprit
The easiest way to find out what causes a DLL file to load is to use Process Monitor. You can watch for "Path containing python25.dll" or "Detail containing python25.dll" (for COM lookups). Double clicking an entry will actually show you a stack trace (you need to set the symbol search paths first, and also set Microsoft's PDB server). This should be enough for most of your needs.
Sometimes the stack trace obtained from above could be spawned from a new thread. For this purpose you need WinDbg. That can be another topic, but suffice to say you can sxe ld python25 and look at what other threads are doing (!findstack MyExeModuleName or ~*k) that causes a DLL file to load.
Real world solution
Instead of fiddling with this WinSxS thing, try hooking LoadLibraryW using Mhook or EasyHook. You can just totally replace that call with your custom logic. You can finish this before lunch and find the meaning of life again.
[msdn_assembly_ref]: Assembly Manifests
I made some progress for the understanding of the issue.
First let me clarify the scenario:
I'm building a DLL file that both embeds and extends Python, using the Python C API and Boost.Python.
Thus, I'm providing a python25.dll in the same folder as my DLL file, as well as a boost_python-vc90-mt-1_39.dll.
Then I have an EXE file which is a demo to show how to link to my DLL file: this EXE file doesn't have to be in the same folder as my DLL file, as long as the DLL file can be found in the PATH (I'm assuming that the end user may or may not put it in the same folder).
Then, when running the EXE file, the current directory is not the one containing python25.dll, and that's why the search order is used and some other python25.dll can be found before mine.
Now I figured out that the manifest technique was the good approach: I managed to redirect the loading to "my" python25.dll.
The problem is that this is the Boost DLL file boost_python-vc90-mt-1_39.dll that's responsible for the "double" loading!
If I don't load this one, then python25.dll is correctly redirected. Now I somehow have to figure out how to tell the Boost DLL file not to load another python25.dll...
Dependency Walker is usually the best tool for resolving this kind of problem. I'm not too sure how well it handles manifests though...
Where in this entangled mess is the actual process executable file?
Two possibilities come to mind:
You are writing a Python extension DLL file. So the Python process is loading your DLL file, and it would already have its own python25.dll dependency.
The EXE file loading your DLL file is being built with header files and libraries provided by the DLL file project. So it is inheriting the #pragma comment(lib,"python25.lib") from your header file and as a result is loading the DLL file itself.
My problem with the second scenario is, I'd expect the EXE file, and your DLL file, to be in the same folder in the case that the EXE file is implicitly loading your DLL file. In which case the EXE file, your DLL file and the python25.dll are all already in the same folder. Why then would the system32 version ever be loaded? The search order for implicitly loaded DLL files is always in the application EXE file's folder.
So the actual interesting question implicit in your query is: How is the system32 python26.dll being loaded at all?
Recently, I hit a very similar problem:
My application embedding Python loads the python32.dll from a known location, that is a side-by-side assembly (WinSxS) with Python.manifest
Attempt to import tkinter inside the embedded Python interpreter caused second loading of the very same python32.dll, but under a different non-default address.
The initialisation function of tkinter module (specifically, _tkinter.pyd) was failing because to invalid Python interpreter thread state (_PyThreadState_Current == NULL). Obviously, Py_Initialize() was never called for the second Python interpreter loaded from the duplicate python32.dll.
Why was the python32.dll loaded twice? As I explained in my post on python-capi, this was caused by the fact the application was loading python32.dll from WinSxS, but _tkinter.pyd did not recognise the assembly, so python32.dll was loaded using the regular DLL search path.
The Python.manifest + python32.dll assembly was recognised by the DLL loading machinery as a different module (under different activation context), than the python32.dll requested by _tkinter.pyd.
Removing the reference to Python.manifest from the application embedding Python and allowing the DLL search path to look for the DLLs solved the problem.