CreateThread in 64bit DLL won't work - c++

I have a 32 bit and a 64 bit executable. Both load a DLL that is of the same bit, as in the 64 bit executable loads a 64bit dll. Anyway, the 32 bit DLL works perfectly, it creates a thread and pops a hello world messagebox. The 64bit DLL however, that piece of code never executes. It's like the createthread fails.
return TRUE;
void myFunc()
HANDLE hThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)&MyThread, NULL, 0, NULL);
MessageBoxA(0, "HELLO 64", 0,0);
Those are the some snippets from the DLL. I've googeled and all I can come up with is that it's the stack alignment failing? If that is the reason, how do I properly call CreateThread to make it work? If that isnt the reason, does anyone know what might be wrong?
I'd be outmost grateful for any help, thanks in advance!

You have the wrong signature for MyThread. You should not cast it you should make sure your function matches the signature. The correct code would be:
CreateThread(NULL, 0, MyThread, NULL, 0, NULL);
// etc
Apart from that you should not do anything in your DllMain as #GSerg comments because there is a lock that is held while you are in there. By doing anything complex you can inadvertently load another DLL causing a deadlock.
Instead you would usually have a separate initialization function in your DLL that your calling code can call after it has loaded the DLL.

Ok the solve was simple, the thread exited too early. Adding WaitForSingleObject(hThread, INFINITE); solved the issue. Wasn't necessary in 32bit for some reason. :)


Is it safe to call std::thread::join function under Win32 DLL_PROCESS_DETACH? [duplicate]

I've stumbled upon an unexpected behavior of Windows thread mechanism when DLL is unloaded. A have a pack of worker thread objects and I'm trying to finish them graciously when DLL is unloaded (via DllMain DLL_PROCESS_DETACH). The code is very simple (I do send an event to finish the thread's wait loop):
WaitForSingleObject( ThrHandle, INFINITE );
CloseHandle( ThrHandle );
Yet the WaitForSingleObject hangs the whole thing. It works fine if I perform it before DLL is unloaded. How this behavior can be fixed?
You can't wait for a thread to exit in DllMain(). Unless the thread had already exited by the time the DLL_PROCESS_DETACH was received, doing so will always deadlock. This is the expected behaviour.
The reason for this is that calls to DllMain() are serialized, via the loader lock. When ExitThread() is called, it claims the loader lock so that it can call DllMain() with DLL_THREAD_DETACH. Until that call has finished, the thread is still running.
So DllMain is waiting for the thread to exit, and the thread is waiting for DllMain to exit, a classic deadlock situation.
See also Dynamic-Link Library Best Practices on MSDN.
The solution is to add a new function to your DLL for the application to call before unloading the DLL. As you have noted, your code already works perfectly well when called explicitly.
In the case where backwards compatibility requirements make adding such a function impossible, and if you must have the worker threads, consider splitting your DLL into two parts, one of which is dynamically loaded by the other. The dynamically loaded part would contain (at a minimum) all of the code needed by the worker threads.
When the DLL that was loaded by the application itself receives DLL_PROCESS_DETACH, you just set the event to signal the threads to exit and then return immediately. One of the threads would have to be designated to wait for all the others and then free the second DLL, which you can do safely using FreeLibraryAndExitThread().
(Depending on the circumstances, and in particular if worker threads are exiting and/or new ones being created as part of regular operations, you may need to be very careful to avoid race conditions and/or deadlocks; this would likely be simpler if you used a thread pool and callbacks rather than creating worker threads manually.)
In the special case where the threads do not need to use any but the very simplest Windows APIs, it might be possible to use a thread pool and work callbacks to avoid the need for a second DLL. Once the callbacks have exited, which you can check using WaitForThreadpoolWorkCallbacks(), it is safe for the library to be unloaded - you do not need to wait for the threads themselves to exit.
The catch here is that the callbacks must avoid any Windows APIs that might take the loader lock. It is not documented which API calls are safe in this respect, and it varies between different versions of Windows. If you are calling anything more complicated than SetEvent or WriteFile, say, or if you are using a library rather than native Windows API functions, you must not use this approach.
I have such problem when I try to inject code into another desktop process, WaitForSingleObject will cause the deadlock inside my thread. I solved the issue by trapping the window's default message procedure, hope it helps for others.
#define WM_INSIDER (WM_USER + 2021)
WNDPROC prev_proc = nullptr;
HWND FindTopWindow(DWORD pid)
struct Find { HWND win; DWORD pid; } find = { nullptr, pid };
EnumWindows([](HWND hwnd, LPARAM lParam) -> BOOL {
auto p = (Find*)(lParam);
if (GetWindowThreadProcessId(hwnd, &id) && id == p->pid) {
// done
p->win = hwnd;
return FALSE;
// continue
return TRUE;
}, (LPARAM)&find);
// thread entry
int insider(void *)
// do whatever you want as a normal thread
return (0);
LRESULT CALLBACK insider_proc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
switch (uMsg) {
t = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)insider, 0, 0, NULL);
return CallWindowProc(prev_proc, hwnd, uMsg, wParam, lParam);
void setup() {
auto pid = GetCurrentProcessId();
auto win = FindTopWindow(pid);
prev_proc = (WNDPROC)SetWindowLongPtr(win, GWL_WNDPROC, (LONG_PTR)&insider_proc);
// signal to create thread later
PostMessage(win, WM_INSIDER, 0, 0);
DWORD ul_reason_for_call,
LPVOID lpReserved
switch (ul_reason_for_call)
return TRUE;

WaitForSingleObject for thread object does not work in DLL unload

I've stumbled upon an unexpected behavior of Windows thread mechanism when DLL is unloaded. A have a pack of worker thread objects and I'm trying to finish them graciously when DLL is unloaded (via DllMain DLL_PROCESS_DETACH). The code is very simple (I do send an event to finish the thread's wait loop):
WaitForSingleObject( ThrHandle, INFINITE );
CloseHandle( ThrHandle );
Yet the WaitForSingleObject hangs the whole thing. It works fine if I perform it before DLL is unloaded. How this behavior can be fixed?
You can't wait for a thread to exit in DllMain(). Unless the thread had already exited by the time the DLL_PROCESS_DETACH was received, doing so will always deadlock. This is the expected behaviour.
The reason for this is that calls to DllMain() are serialized, via the loader lock. When ExitThread() is called, it claims the loader lock so that it can call DllMain() with DLL_THREAD_DETACH. Until that call has finished, the thread is still running.
So DllMain is waiting for the thread to exit, and the thread is waiting for DllMain to exit, a classic deadlock situation.
See also Dynamic-Link Library Best Practices on MSDN.
The solution is to add a new function to your DLL for the application to call before unloading the DLL. As you have noted, your code already works perfectly well when called explicitly.
In the case where backwards compatibility requirements make adding such a function impossible, and if you must have the worker threads, consider splitting your DLL into two parts, one of which is dynamically loaded by the other. The dynamically loaded part would contain (at a minimum) all of the code needed by the worker threads.
When the DLL that was loaded by the application itself receives DLL_PROCESS_DETACH, you just set the event to signal the threads to exit and then return immediately. One of the threads would have to be designated to wait for all the others and then free the second DLL, which you can do safely using FreeLibraryAndExitThread().
(Depending on the circumstances, and in particular if worker threads are exiting and/or new ones being created as part of regular operations, you may need to be very careful to avoid race conditions and/or deadlocks; this would likely be simpler if you used a thread pool and callbacks rather than creating worker threads manually.)
In the special case where the threads do not need to use any but the very simplest Windows APIs, it might be possible to use a thread pool and work callbacks to avoid the need for a second DLL. Once the callbacks have exited, which you can check using WaitForThreadpoolWorkCallbacks(), it is safe for the library to be unloaded - you do not need to wait for the threads themselves to exit.
The catch here is that the callbacks must avoid any Windows APIs that might take the loader lock. It is not documented which API calls are safe in this respect, and it varies between different versions of Windows. If you are calling anything more complicated than SetEvent or WriteFile, say, or if you are using a library rather than native Windows API functions, you must not use this approach.
I have such problem when I try to inject code into another desktop process, WaitForSingleObject will cause the deadlock inside my thread. I solved the issue by trapping the window's default message procedure, hope it helps for others.
#define WM_INSIDER (WM_USER + 2021)
WNDPROC prev_proc = nullptr;
HWND FindTopWindow(DWORD pid)
struct Find { HWND win; DWORD pid; } find = { nullptr, pid };
EnumWindows([](HWND hwnd, LPARAM lParam) -> BOOL {
auto p = (Find*)(lParam);
if (GetWindowThreadProcessId(hwnd, &id) && id == p->pid) {
// done
p->win = hwnd;
return FALSE;
// continue
return TRUE;
}, (LPARAM)&find);
// thread entry
int insider(void *)
// do whatever you want as a normal thread
return (0);
LRESULT CALLBACK insider_proc(HWND hwnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
switch (uMsg) {
t = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)insider, 0, 0, NULL);
return CallWindowProc(prev_proc, hwnd, uMsg, wParam, lParam);
void setup() {
auto pid = GetCurrentProcessId();
auto win = FindTopWindow(pid);
prev_proc = (WNDPROC)SetWindowLongPtr(win, GWL_WNDPROC, (LONG_PTR)&insider_proc);
// signal to create thread later
PostMessage(win, WM_INSIDER, 0, 0);
DWORD ul_reason_for_call,
LPVOID lpReserved
switch (ul_reason_for_call)
return TRUE;

CreateThread inside another thread

I am having an issue creating a thread inside of another thread. Normally I would be able to do this, but the reason for this issue is because I've Incremented Reference Count of the DLL which starts these threads. I need to start multiple threads inside this DLL. How can I get around this and be able to issue multiple CreateThread()'s when needed in my project without experiencing problems because of the Incremented Reference Count in my DLL?
Here is the function I've written to Increment Reference Count in my DLL file:
BOOL IncrementReference( HMODULE hModule )
if ( hModule == NULL )
return FALSE;
TCHAR ModulePath[ MAX_PATH + 1 ];
if ( GetModuleFileName( hModule , ModulePath , MAX_PATH ) == 0 )
return FALSE;
if ( LoadLibrary( ModulePath ) == NULL )
return FALSE;
return TRUE;
As requested, here is a PoC program to recreate the issue I am facing. I am really hoping this will help you guys point me to a solution. Also, take note, the DLL is being unloading due to conditions in the application which I am targeting (hooks that are already set in that application), so Incrementing the Reference Count is required for my thread to run in the first place.
Also, I can't run more than one operation in the main thread as it has its own functionality to take care of and another thread is required on the side to take care of something else. They must also run simultaneously, hence I need to fix this issue of making more than one thread in an Incremented DLL.
// dllmain.cpp : Defines the entry point for the DLL application.
#pragma comment( linker , "/Entry:DllMain" )
#include <Windows.h>
#include <process.h>
UINT CALLBACK SecondThread( PVOID pParam )
MessageBox( NULL , __FUNCTION__ , "Which Thread?" , 0 );
return 0;
UINT CALLBACK FirstThread( PVOID pParam )
MessageBox( NULL , __FUNCTION__ , "Which Thread?" , 0 );
_beginthreadex(0, 0, &SecondThread, 0, 0, 0);
return 0;
BOOL IncrementReference( HMODULE hModule )
if ( hModule == NULL )
return FALSE;
TCHAR ModulePath[ MAX_PATH + 1 ];
if ( GetModuleFileName( hModule , ModulePath , MAX_PATH ) == 0 )
return FALSE;
if ( LoadLibrary( ModulePath ) == NULL )
return FALSE;
return TRUE;
DWORD ul_reason_for_call,
LPVOID lpReserved
switch (ul_reason_for_call)
if (IncrementReference(0))
_beginthreadex(0, 0, &FirstThread, 0, 0, 0);
return TRUE;
As you can see, the code never executes the SecondThread function. The question is, why? And what can be done to fix it?
#pragma comment( linker , "/Entry:DllMain" )
That was a very bad idea, the proper entrypoint for a DLL is not in fact DllMain(). You have to keep in mind that WinMain and DllMain are just place-holder names. A way for Microsoft to document the relevance of executable file entrypoints. By convention you use those same names in your program, everybody will understand what they do.
But there's a very important additional detail in a C or C++ program, the CRT (C runtime library) needs to be initialized first. Before you can run any code that might make CRT function calls. Like _beginthreadex().
In other words, the default /ENTRY linker option is not DllMain(). The real entrypoint of a DLL is _DllMainCRTStartup(). A function inside the CRT that takes care of the required initialization, then calls DllMain(). If you wrote one in your program then that's the one that runs. If you didn't then a dummy one in the CRT gets linked.
All bets are off when you make CRT function calls and the CRT wasn't initialized. You must remove that #pragma so the linker will use the correct entrypoint.
According to MSDN you schould neither call LoadLibrary nor CreateThread inside DllMain - your code does both!
The MCVE as posted has three problems:
The first is a simple mistake, you're calling IncrementReference(0) instead of IncrementReference(hModule).
The second is that there is no entry point for rundll32 to use; the entry point argument is mandatory, or rundll32 won't work (I don't think it even loads the DLL).
The third is the #pragma as pointed out by Hans.
After fixing the IncrementReference() call, removing the #pragma and adding an entry point:
extern "C" __declspec(dllexport) void __stdcall EntryPoint(HWND, HINSTANCE, LPSTR, INT)
MessageBoxA( NULL , __FUNCTION__ , "Which Thread?" , 0 );
You can then run the DLL like this:
rundll32 testdll.dll,_EntryPoint#16
This works on my machine; EntryPoint, FirstThread and SecondThread all generate message boxes. Make sure you do not dismiss the message box from EntryPoint prematurely, as that will cause the application to exit, taking the other threads with it.
The call to LoadLibrary is still improper, however it does not appear to have any side-effects in this scenario (probably because the library in question is guaranteed to already be loaded).
(Previous) Answer:
The MCVE can be fixed by simply moving the call to IncrementReference from DllMain to FirstThread. That is the only safe and correct way to resolve the problem.
Addendum: as Hans pointed out, you'll also need to remove the /Entry pragma.
(Redundant?) Commentary:
If the application that is loading the DLL is misbehaving to the extent where the DLL is being unloaded before FirstThread can run, and assuming for the sake of argument that you can't fix it, the only realistic option is to work around the problem - for example, DllMain could suspend all the other threads in the process so that they cannot unload the DLL, and resume them from FirstThread after the call to IncrementReference.
Or you could try hooking FreeLibrary, or reverse engineering the loader and messing with the reference count directly, or removing the hooks the application has placed, or loading a separate copy of the DLL by hand inside DllMain (with your own DLL loader rather than the one Windows provides) or starting a separate process and working from there or, oh, no doubt there's any number of other possibilities, but at that point I'm afraid the question really is too broad for Stack Overflow, particularly since you can't give us the real details of what the application is doing.

Several programs crash when unhooking with UnhookWindowsHookEx()

I am doing a global hook to add my DLL to the hook chain:
HHOOK handle = SetWindowsHookEx(WH_CALLWNDPROC, addr, dll, 0);
Inside my DLL I am using Detours to intercept several WINAPI function calls. Everything works fine, except for WaitForSingleObject calls. Whenever I add WaitForSingleObject to the detoured functions, several programs crash when I unhook my DLL (Chrome, Skype, ...).
Here is how the DLL looks:
DWORD (WINAPI* Real_WaitForSingleObject)( HANDLE hHandle, DWORD dwMilliseconds) = WaitForSingleObject;
DWORD WINAPI Mine_WaitForSingleObject(HANDLE hHandle, DWORD dwMilliseconds);
switch(Reason) {
DetourAttach(&(PVOID&)Real_WaitForSingleObject, Mine_WaitForSingleObject);
DetourDetach(&(PVOID&)Real_WaitForSingleObject, Mine_WaitForSingleObject);
return TRUE;
DWORD WINAPI Mine_WaitForSingleObject(HANDLE hHandle, DWORD dwMilliseconds) {
return Real_WaitForSingleObject(hHandle, dwMilliseconds);
extern "C" __declspec(dllexport) int meconnect(int code, WPARAM wParam, LPARAM lParam) {
return CallNextHookEx(NULL, code, wParam, lParam);
Could someone help me to understand why this is happening and how I can get around that Problem? Thanks!
I think this is happening because many programs (Chrome, Skype, ...) have a thread pool, where background thread[s] are waiting on WaitForSingleObject() for something interesting for them to happen, and when it does happen, that thread[s] wake up and do something.
So, your thread A is calling DetourDetach while another thread B of the same process is currently inside Mine_WaitForSingleObject() Then DLL unloads, and everything crashes. You can verify by using debugger, attach to that problematic process, set breakpoint in DLL_PROCESS_DETACH, and when the breakpoint will hit, look through the stacks of another threads for Mine_WaitForSingleObject.
I’m not sure how to fix.
But, one way that you might try — enumerate threads, and call DetourUpdateThread() for every thread of the process. This way, maybe the Detours will do something about that.
You are detouring a function that almost any process uses. And it is particularly dangerous since it is very likely that such a process has a call on that function active. A blocking call in almost any case. As soon as it unblocks, the code will resume into your detour that is no longer there.
Realistically, the only way to unload your detour is by logging out so that every process that could have been detoured is no longer running.

Multithreading with _beginthread and CreateThread

I try to write a Multithreading WIN32 Application in C++, but due to i get difficulties.
One of the Window Procedure creates a Thread, which manages the output of this window. If this Window Procedure receives a message (from the other Window Procedures), it should transmit it to their Thread. At the beginning i worked with the _beginthread(...) function, what doesn't work.
Then i tried it with the CreateThread(...) function, and it worked? What did i do wrong?
(My English isn't so good, i hope you understand my problem)
Code with CreateThread(...):
DWORD thHalloHandle; // global
HWND hwndHallo; // Hwnd of WndProc4
LRESULT APIENTRY WndProc4 (HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam)
static PARAMS params ;
switch (message)
case WM_CREATE: {
params.hwnd = hwnd ;
params.cyChar = HIWORD (GetDialogBaseUnits ()) ;
CreateThread(NULL, 0, thHallo, &params, 0, &thHalloHandle);
return 0 ;
case WM_SPACE: {
PostThreadMessage(thHalloHandle, WM_SPACE, 0, 0);
return 0;
Code with _beginthread(...):
case WM_CREATE: {
params.hwnd = hwnd ;
params.cyChar = HIWORD (GetDialogBaseUnits ()) ;
thHalloHandle = (DWORD)_beginthread (thHallo, 0, &params) ;
return 0;
case WM_SPACE: {
PostThreadMessage(thHalloHandle, WM_SPACE, 0, 0);
return 0;
thHallo for CreateThread:
DWORD WINAPI thHallo(void *pvoid)
static TCHAR *szMessage[] = { TEXT(...), ...};
// Some Declaration
pparams = (PPARAMS) pvoid;
MsgReturn = GetMessage(&msg, NULL, 0, 0);
hdc = GetDC(pparams->hwnd);
// case....
return 0;
thHallo for _beginthread(...):
void thHallo(void *pvoid)
// The Same like for CreateThread
The _beginthread/ex() function is proving to be radically difficult to eliminate. It was necessary back in the previous century, VS6 was the last Visual Studio version that required it. It was a band-aid to allow the CRT to allocate thread-local state for internal CRT variables. Like the ones used for strtok() and gmtime(), CRT functions that maintain internal state. That state must be stored separately for each thread so that the use of, say, strtok() in one thread doesn't screw up the use of strtok() in another thread. It must be stored in thread-local state. _beginthread/ex() ensures that this state is allocated and cleaned-up again.
That has been worked on, necessarily so when Windows 2000 introduced the thread-pool. There is no possible way to get that internal CRT state initialized when your code gets called by a thread-pool thread. Quite an effort btw, the hardest problem they had to solve was to ensure that the thread-local state is automatically getting cleaned-up again when the thread stops running. Many a program has died on that going wrong, Apple's QuickTime is a particularly nasty source of these crashes.
So forget that _beginthread() ever existed, using CreateThread() is fine.
There's a serious problem with your use of PostThreadMessage(). You are used the wrong argument in your _beginthread() code which is why it didn't work. But there are bigger problems with it. The message that is posted can only ever be retrieved in your message loop. Which works fine, until it is no longer your message loop that is dispatching messages. That happens in many cases in a GUI app. Simple examples are using MessageBox(), DialogBox() or the user resizing the window. Modal code that works by Windows itself pumping the message loop.
A big problem is the message loop in that code knows beans about the messages you posted. They just fall in the bit-bucket and disappear without trace. The DispatchMessage() call inside that modal loop fails, the message you posted has a NULL window handle.
You must fix this by using PostMessage() instead. Which requires a window handle. You can use any window handle, the handle of your main window is a decent choice. Better yet, you can create a dedicated window, one that just isn't visible, with its own WndProc() that just handles these inter-thread messages. A very common choice. DispatchMessage() can now no longer fail, solves your bug as well.
Your call to CreateThread puts the thread ID into thHalloHandle. The call to _beginthread puts the thread handle into thHalloHandle.
Now, the thread ID is not the same as the thread handle. When you call PostThreadMessage you do need to supply a thread ID. You only do that for the CreateThread variant which I believe explains the problem.
Your code lacks error checking. Had you checked for errors on the call to PostThreadMessage you would have found that PostThreadMessage returned FALSE. Had you then gone on to call GetLastError that would have returned ERROR_INVALID_THREAD_ID. I do urge you to include proper error checking.
In order to address this you must first be more clear on the difference between thread ID and thread handle. You should give thHalloHandle a different name: thHalloThreadId perhaps. If you wish to use _beginthread you will have to call GetThreadId, passing the thread handle, to obtain the thread ID. Alternatively, use _beginthreadex which yields the thread ID, or indeed CreateThread.
Your problem is that you need a TID (Thread Identifier) to use PostThreadMessage.
_beginthread doesn't return a TID, it return a Thread Handle.
Solution is to use the GetThreadId function.
HANDLE hThread = (HANDLE)_beginthread (thHallo, 0, &params) ;
thHalloHandle = GetThreadId( hThread );
Better Code (see the documentation here)
HANDLE hThread = (HANDLE)_beginthreadex(NULL, 0, thHallo, &params, 0, &thHalloHandle ) ;