Unloading an Injected DLL - c++

I have a DLL I inject into other processes using SetWindowsHookEx. Inside the DLL I increment the module's reference counter by calling GetModuleHandleEx so I can control when the module is unloaded.
At this point the module reference count "should be" 2 from both of those API calls. When the calling process shuts down, it calls UnhookWindowsHookEx, decrementing the reference count to 1. The DLL has a thread that waits on a few things, one of them being the handle of the process that called SetWindowsHookEx. When the process goes away the DLL does some cleanup, terminates all threads, cleans up memory and handles and then calls FreeLibraryAndExitThread. This decrements the counter and the DLL gets unloaded.
Here's my problem. There are a few processes, especially those without a UI, where the DLL never gets unloaded. I'm pretty confident I have cleaned up everything. And I know for a fact that none of my threads are running.
First of all, if you have any troubleshooting tips to help uncover the cause, that would be helpful. Otherwise, I was thinking about using some API like NtQueryInformationProcess to get the module address and confirm the module handle count is in fact zero, then call CreateRemoteThread to inject a call to LdrUnloadDll to unload the module address from within the process. What are your thoughts to this approach? Does anyone have any example code? I'm having difficulty finding out how to get the module handle count.

Okay.. here goes.. There are many ways to get the module info from a process. The undocumented way and the "documented" way.
Results (documented):
Here is the "documented" way..
#include <windows.h>
#include <TlHelp32.h>
#include <iostream>
#include <sstream>
int strcompare(const char* One, const char* Two, bool CaseSensitive)
{
#if defined _WIN32 || defined _WIN64
return CaseSensitive ? strcmp(One, Two) : _stricmp(One, Two);
#else
return CaseSensitive ? strcmp(One, Two) : strcasecmp(One, Two);
#endif
}
PROCESSENTRY32 GetProcessInfo(const char* ProcessName)
{
void* hSnap = nullptr;
PROCESSENTRY32 Proc32 = {0};
if ((hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0)) == INVALID_HANDLE_VALUE)
return Proc32;
Proc32.dwSize = sizeof(PROCESSENTRY32);
while (Process32Next(hSnap, &Proc32))
{
if (!strcompare(ProcessName, Proc32.szExeFile, false))
{
CloseHandle(hSnap);
return Proc32;
}
}
CloseHandle(hSnap);
Proc32 = { 0 };
return Proc32;
}
MODULEENTRY32 GetModuleInfo(std::uint32_t ProcessID, const char* ModuleName)
{
void* hSnap = nullptr;
MODULEENTRY32 Mod32 = {0};
if ((hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, ProcessID)) == INVALID_HANDLE_VALUE)
return Mod32;
Mod32.dwSize = sizeof(MODULEENTRY32);
while (Module32Next(hSnap, &Mod32))
{
if (!strcompare(ModuleName, Mod32.szModule, false))
{
CloseHandle(hSnap);
return Mod32;
}
}
CloseHandle(hSnap);
Mod32 = {0};
return Mod32;
}
std::string ModuleInfoToString(MODULEENTRY32 Mod32)
{
auto to_hex_string = [](std::size_t val, std::ios_base &(*f)(std::ios_base&)) -> std::string
{
std::stringstream oss;
oss << std::hex << std::uppercase << val;
return oss.str();
};
std::string str;
str.append(" =======================================================\r\n");
str.append(" Module Name: ").append(Mod32.szModule).append("\r\n");
str.append(" =======================================================\r\n\r\n");
str.append(" Module Path: ").append(Mod32.szExePath).append("\r\n");
str.append(" Process ID: ").append(std::to_string(Mod32.th32ProcessID).c_str()).append("\r\n");
str.append(" Load Count (Global): ").append(std::to_string(static_cast<int>(Mod32.GlblcntUsage != 0xFFFF ? Mod32.GlblcntUsage : -1)).c_str()).append("\r\n");
str.append(" Load Count (Process): ").append(std::to_string(static_cast<int>(Mod32.ProccntUsage != 0xFFFF ? Mod32.ProccntUsage : -1)).c_str()).append("\r\n");
str.append(" Base Address: 0x").append(to_hex_string(reinterpret_cast<std::size_t>(Mod32.modBaseAddr), std::hex).c_str()).append("\r\n");
str.append(" Base Size: 0x").append(to_hex_string(Mod32.modBaseSize, std::hex).c_str()).append("\r\n\r\n");
str.append(" =======================================================\r\n");
return str;
}
int main()
{
PROCESSENTRY32 ProcessInfo = GetProcessInfo("notepad.exe");
MODULEENTRY32 ME = GetModuleInfo(ProcessInfo.th32ProcessID, "uxtheme.dll");
std::cout<<ModuleInfoToString(ME);
}
The problem with the undocumented API is that I've never figured out why the load counts are always "6" for dynamic modules and "-1" for static modules.. For this reason, I will not post it..
It is BEST NOT to use undocumented API if you want just the load count. The undocumented API's only advantage is that you can use it to "un-link/hide" a module within a process (like viruses do).. It will "unlink/hide" it.. NOT "unload" it. This means that at any time, you can "re-link" it back into the process's module list.
Since you only need the module-reference-count, I've only included "documented" API which does exactly that.

I found the cause of the problem and the solution. I honestly feel quite stupid for missing it and struggling with it for so long.
As I mentioned in the original problem, the processes that are problematic do not have a UI. Turns out they do have a message pump running. The problem is nothing is sending messages to these processes without a UI after we call UnhookWindowsHookEx that would trigger the unloading. (In fact, I believe MSDN does state that window messages are not sent to processes when calling UnhookWindowsHookEx.)
By broadcasting WM_NULL to all processes after the injecting process calls UnhookWindowsHookEx the message pump wakes up in the injected processes and the DLL reference count is decremented. The DLL is unloaded immediately when the injected DLL finally calls FreeLibraryAndExitThread.
This is only part of the solution. If the injecting process is killed or crashes, no message is broadcasted so the DLL does not get unloaded from processes that do not have a UI. As I mention before I have a thread running in the DLL that waits on the injecting process handle. When the injecting process ends the DLL is signaled and then calls PostThreadMessage to send WM_NULL to each thread in the process. Then it waits until the DLL reference count is decremented before continuing and cleaning up before calling FreeLibraryAndExitThread. As a result, the DLL is unloaded almost immediately from all processes, UI or no UI.

Related

Replacing vector with array or static vector for a better approach

My application runs particular number of processes(different executables) using createprocess(windows api) in parallel(using threads). User can refresh / close my application at any time. As of now, I am pushing the process handles into vector and whenever close request received, I am iterating the vector and terminating(using GetExitCodeProcess and TerminateProcess APIs) and closing(using CloseHandle API) the process handles. Also I am closing the handle of the process when it is completed. The problem with current model is, whenever process completed handle will be closed and when close request received again I will try to close it using vector(handle is not updated). To solve this, I have to update/remove the handle in/from the vector. To do this, need to maintain index.
Since I know the number of process, I want to create a static vector and update it rather than pushing a local object to a vector. Can someone suggest a best approach.
Below is the sample code.
//member object
std::vector<PROCESS_INFORMATION> mProcessHandles;
//this is a thread and will be called multiple times with different executable names in the application
void method(std::string executable)
{
STARTUPINFO startInfo{};
PROCESS_INFORMATION procInfo{};
bool ret = CreateProcess(NULL, executable, NULL, NULL, TRUE, CREATE_NO_WINDOW, NULL, NULL, &startInfo, &procInfo);
mProcessInfo.push_back(procInfo);
if(ret)
{
WaitForSingleObject(procInfo.hProcess, INFINITE);
CloseHandle(procInfo.hProcess);
procInfo.hProcess = NULL;
CloseHandle(procInfo.hThread);
procInfo.hThread = NULL;
}
return;
}
//this will be called when application close requested
void forceKill()
{
for (auto &processHandlesIt : mProcessHandles)
{
DWORD errorcode = 0;
GetExitCodeProcess(processHandlesIt.hProcess, &errorcode);
if (errorcode == STILL_ACTIVE)
{
TerminateProcess(processHandlesIt.hProcess, errorcode);
}
CloseHandle(processHandlesIt.hProcess);
processHandlesIt.hProcess = NULL;
CloseHandle(processHandlesIt.hThread);
processHandlesIt.hThread = NULL;
}
}
You should not use handles (in GetExitCodeProcess for example) after they are closed.
I would simply not close those process handles in the threads, and just leave them for the forceKill or other clean-up function to close.
Also, since you are not using procInfo.hThread, you could close it right after CreateProcess returns.
I guess you are not using any other members of the procInfo, so you could only store the process' handles in your vector.

Ejecting dll by calling CreateRemoteThread : crash

I am trying to make for myself a tool for extracting/releasing dlls from processes. I have already experienced with LoadLibrary and injecting but this time the logic doesn't seem to apply.
This is my code:
HMODULE findModuleOffset(HANDLE proc, char *mod_name) {
//Finds module address in specified process. 0 if not found
HMODULE hMods[2048];
DWORD modules_byte_size;
if (EnumProcessModules(proc, hMods, sizeof(hMods), &modules_byte_size))
{
for (unsigned long i = 0; i < (modules_byte_size / sizeof(HMODULE)); i++) {
CHAR module_name[MAX_PATH];
// Get the full path to the module's file.
if (GetModuleFileNameExA(proc, hMods[i], module_name, sizeof(module_name))) {
if (strcmp(strrchr(module_name,'.')+1,"exe")!=0 && compareExeName(module_name, mod_name)) {
return hMods[i];
}
}
}
}
return 0;
}
bool compareExeName(char *path, char *partial_name) {
//This will substract the filename from path and compare it with partial_name
char *lastSlash = strrchr(path, '\\') + 1;
if (lastSlash != NULL && strstr(lastSlash, partial_name) == lastSlash) return 1;
return 0;
}
void unload_all_dll(char *dll_name) {
DWORD process_ids[2048];
DWORD process_byte_size; //size of filled process_ids in BYTES (after the call)
DWORD process_count; //count of all elements in process_ids
HMODULE ext_dll_module;
HANDLE opened_process;
HANDLE Hthread;
DWORD thread_exit_code = 1;
CHAR exe_path[1024];
if (EnumProcesses(process_ids, sizeof(process_ids), &process_byte_size)) {
process_count = process_byte_size / sizeof(DWORD);
for (int i = 0; i < process_count; i++) {
thread_exit_code = 0;
if ((opened_process = OpenProcess(PROCESS_ALL_ACCESS, false, process_ids[i])) == NULL) continue;
GetModuleFileNameExA(opened_process, 0, exe_path, MAX_PATH);
if ((ext_dll_module = findModuleOffset(opened_process, dll_name)) != 0) {
while (thread_exit_code == 0) {
if ((Hthread = CreateRemoteThread(opened_process, NULL, 0, (LPTHREAD_START_ROUTINE)GetProcAddress(GetModuleHandleA("kernel32.dll"), "FreeLibrary"), (void*)ext_dll_module, 0, NULL)) == NULL) {
cout<<"Process closed meanwhile or dll unloaded";
break; //process has closed meanwhile
}
while (WaitForSingleObject(Hthread, 1000) == WAIT_TIMEOUT);
GetExitCodeThread(Hthread, &thread_exit_code);
}
cout << "Dll unloaded from " << exe_path << endl;
}
}
}
}
Warning:some variables names might be confusing(I am in hurry)
But every time I try to eject a dll everything crashes(of course,only the apps that contained the specfied dll). I tested everything I could and everything seems fine: the module address returned by findModuleOffset is good(checked against the value given by process explorer). I have no ideea what the return value of createremotethread or thread_exit_code is because the app crashes(it contasins the dll to be ejected..so...). Can you help me?
(moving from the comments)
Given that there are threads in the target process that are running code from the dll being unloaded, they are going to crash immediately after the dll gets freed - after all, the very pages of code that the CPU is executing are being unmapped!
To avoid the problem, the running threads have to be notified in some way, so they can terminate before unloading the dll; Windows provides many IPC methods, one rarely used is particularly well-fitted for this case, namely mailslots.
When the dll is injected, the "master" thread that gets created will create a mailslot with a well-known name, and periodically check if there's any message for him. When you want to unload the dlls, instead of brutally injecting a thread that forcefully frees the dll, just ask to your "inside man": post a message to the mailslot asking it to terminate1.
The thread will see that there's a message in the mailslot, take care to possibly terminate the other threads that were started inside the target process (a shared atomic variable + WaitForSingleObject can be used) and, when the cleanup is finished, call FreeLibraryAndExitThread to suicide both the last thread and the dll.
Notes
A particularly interesting peculiarity of mailslots is that, if they are created multiple times with the same name, messages sent to such a name will be delivered to all them, so if, as it seems, you want to shut down all the injected dlls at the same time, this greatly simplifies the controlling program - there's not even need to enumerate the running processes.

Crashes after Injecting std functions in a process

i am currently trying out PE injection and noticed that as soon as i use stuff like std::cout or std::string my target process which i injected in crashes.
Messageboxes or even printf() works fine. The code compiles without an error and i read about the import table not being at the same location in the injected process could cause it to crash but i have no idea what to do in order to fix it (re load the import table). Thanks in advance and here is the injection example:
#include <iostream>
#include <stdio.h>
#include <Windows.h>
void ThreadProc(PVOID p)
{
MessageBox(NULL,"Message from injected code!","Message",MB_ICONINFORMATION); //funktioniert einwandfrei
RedirectOutput();
std::cout << "hi"; //crashed
}
int main(int argc,char* argv[])
{
PIMAGE_DOS_HEADER pIDH;
PIMAGE_NT_HEADERS pINH;
PIMAGE_BASE_RELOCATION pIBR;
HANDLE hProcess,hThread;
PUSHORT TypeOffset;
PVOID ImageBase,Buffer,mem;
ULONG i,Count,Delta,*p;
printf("\nOpening target process\n");
hProcess=OpenProcess(
PROCESS_CREATE_THREAD|PROCESS_QUERY_INFORMATION|PROCESS_VM_OPERATION|PROCESS_VM_READ|PROCESS_VM_WRITE,
FALSE,
13371337);
if(!hProcess)
{
printf("\nError: Unable to open target process (%u)\n",GetLastError());
return -1;
}
ImageBase=GetModuleHandle(NULL);
printf("\nImage base in current process: %#x\n",ImageBase);
pIDH=(PIMAGE_DOS_HEADER)ImageBase;
pINH=(PIMAGE_NT_HEADERS)((PUCHAR)ImageBase+pIDH->e_lfanew);
printf("\nAllocating memory in target process\n");
mem=VirtualAllocEx(hProcess,NULL,pINH->OptionalHeader.SizeOfImage,MEM_COMMIT|MEM_RESERVE,PAGE_EXECUTE_READWRITE);
if(!mem)
{
printf("\nError: Unable to allocate memory in target process (%u)\n",GetLastError());
CloseHandle(hProcess);
return 0;
}
printf("\nMemory allocated at %#x\n",mem);
Buffer=VirtualAlloc(NULL,pINH->OptionalHeader.SizeOfImage,MEM_COMMIT|MEM_RESERVE,PAGE_READWRITE);
memcpy(Buffer,ImageBase,pINH->OptionalHeader.SizeOfImage);
printf("\nRelocating image\n");
pIBR=(PIMAGE_BASE_RELOCATION)((PUCHAR)Buffer+pINH->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);
Delta=(ULONG)mem-(ULONG)ImageBase;
printf("\nDelta: %#x\n",Delta);
while(pIBR->VirtualAddress)
{
if(pIBR->SizeOfBlock>=sizeof(IMAGE_BASE_RELOCATION))
{
Count=(pIBR->SizeOfBlock-sizeof(IMAGE_BASE_RELOCATION))/sizeof(USHORT);
TypeOffset=(PUSHORT)(pIBR+1);
for(i=0;i<Count;i++)
{
if(TypeOffset[i])
{
p=(PULONG)((PUCHAR)Buffer+pIBR->VirtualAddress+(TypeOffset[i] & 0xFFF));
*p+=Delta;
}
}
}
pIBR=(PIMAGE_BASE_RELOCATION)((PUCHAR)pIBR+pIBR->SizeOfBlock);
}
printf("\nWriting relocated image into target process\n");
if(!WriteProcessMemory(hProcess,mem,Buffer,pINH->OptionalHeader.SizeOfImage,NULL))
{
printf("\nError: Unable to write process memory (%u)\n",GetLastError());
VirtualFreeEx(hProcess,mem,0,MEM_RELEASE);
CloseHandle(hProcess);
return -1;
}
VirtualFree(Buffer,0,MEM_RELEASE);
printf("\nCreating thread in target process\n");
hThread=CreateRemoteThread(hProcess,NULL,0,(LPTHREAD_START_ROUTINE)((PUCHAR)ThreadProc+Delta),NULL,0,NULL);
if(!hThread)
{
printf("\nError: Unable to create thread in target process (%u)\n",GetLastError());
VirtualFreeEx(hProcess,mem,0,MEM_RELEASE);
CloseHandle(hProcess);
return -1;
}
printf("\nWaiting for the thread to terminate\n");
WaitForSingleObject(hThread,INFINITE);
printf("\nThread terminated\n\nFreeing allocated memory\n");
VirtualFreeEx(hProcess,mem,0,MEM_RELEASE);
CloseHandle(hProcess);
return 0;
}
I think that answer simple - STL library request some initializations of global data. Via constructors of global objects, for example. But you just copy your code to target process. It don't invoke any initializations, that normally performed before call main function. Just try DLL injection instead.
You don't seem to be loading the CRT dll in the target process, so what I'm assuming is that when you try to call the cout function, you are jumping to unallocated memory.
If the DLL is in fact loaded in the target process, make sure it's loaded at the same address as it is in your own process. Otherwise you'll have to patch your import table to match that of the target process.

Getting Base Address not working

I need the base Address of the exe "tibia.exe". This is what I got so far but it doesn't work. It always returns 0.
What's wrong?
DWORD MainWindow::getBaseAddress(DWORD dwProcessIdentifier)
{
TCHAR lpszModuleName[] = {'t','i','b','i','a','.','e','x','e','\0'}; //tibia.exe
HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE,
dwProcessIdentifier);
DWORD dwModuleBaseAddress = 0;
if(hSnapshot != INVALID_HANDLE_VALUE)
{
MODULEENTRY32 ModuleEntry32;
ModuleEntry32.dwSize = sizeof(MODULEENTRY32);
if(Module32First(hSnapshot, &ModuleEntry32))
{
do
{
if( wcscmp(ModuleEntry32.szModule, lpszModuleName) == 0)
{
dwModuleBaseAddress = (DWORD)ModuleEntry32.modBaseAddr;
break;
}
}
while(Module32Next(hSnapshot, &ModuleEntry32));
}
CloseHandle(hSnapshot);
}
return dwModuleBaseAddress;
}
//Call it here
tibiaWindow = FindWindow( L"TibiaClient", NULL);
DWORD PID;
GetWindowThreadProcessId( tibiaWindow, &PID );
DWORD baseAddress = getBaseAddress( PID );
if( baseAddress == 0 )
return false ;
Perhaps it's just because I was using them before ToolHelp32 was available (at least on the NT-based operating systems), but I tend to use the PSAPI functions for this kind of task. Using them, the code would look like this:
#include <windows.h>
#include <string>
#include <psapi.h>
#include <iostream>
int main(int argc, char **argv) {
HANDLE process = GetCurrentProcess();
if (argc != 1)
process = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, atoi(argv[1]));
HMODULE handles[2048];
DWORD needed;
EnumProcessModules(process, handles, sizeof(handles), &needed);
for (int i = 0; i < needed / sizeof(handles[0]); i++) {
MODULEINFO info;
char name[1024];
GetModuleBaseName(process, handles[i], name, sizeof(name));
if (std::string(name).find(".exe") != std::string::npos) {
GetModuleInformation(process, handles[i], &info, sizeof(info));
std::cout << name << ": " << info.lpBaseOfDll << "\n";
break;
}
}
}
As it stands right now, this will let you enter a process ID on the command line, and show the load address of the first module it finds in that process with a name that includes ".exe". If you don't specify a process ID, it'll search through its own process (demos how the functions work, but otherwise pretty much useless).
Using either ToolHelp32 or PSAPI, you end up with a similar limitation: you need to compile this into a 64-bit executable for it to be able to "see" other 64-bit processes (i.e., when compiled as 32-bit code, they see only other 32-bit processes).
There are also some processes (e.g., CSRSS.exe) that neither will be able to open/enumerate successfully. As far as I know, the same processes will succeed/fail with PSAPI vs. ToolHelp32.
PSAPI does have one bit of clumsiness compared to ToolHelp32: dealing (well) with processes that have lots of modules is clumsy (at best). You call EnumProcessModules, and if you haven't given room for enough modules, the "Needed" parameter will be set to the space needed for the number of modules it contains. There's a race condition though: between the time that returns and the time you call EnumProcessModules again, the process could have loaded more DLLs, so that second call could fail the same way.
For the moment, I've just assumed that no process will use more than 2048 modules. To be really correct, you should have a while loop (or maybe a do/while loop) that starts with zero space, calls EnumProcessModules to find out how much space is needed, allocate that (perhaps with a little extra in case it loads more DLLs) and repeat until it succeeds.

Close handle to a mutex in another process

I want to close a handle to a mutex located in another process, so I can run more than one instance of the application.
I already know this can be done, see Process Explorer. Example: Windows Minesweeper (Windows 7) uses a mutex to only allow one game, so I thought I would use it as an example since it's pre-installed with Windows and therefore easier for you guys to guide me.
The mutex that I need to close is \Sessions\1\BaseNamedObjects\Oberon_Minesweeper_Singleton, which I found using Process Explorer.
After closing this mutex I was able to launch two games of Minesweeper, but I want to do this in my program using C++.
After some searching I have found that I might need the API DuplicateHandle. So far I haven't been able to close the handle on this mutex.
Here is my code so far:
#include <Windows.h>
#include <iostream>
using namespace std;
void printerror(LPSTR location){
printf("Error: %s_%d", location, GetLastError());
cin.get();
}
int main(){
DWORD pid = 0;
HWND hMineWnd = FindWindow("Minesweeper", "Minesveiper");
GetWindowThreadProcessId(hMineWnd, &pid);
HANDLE hProc =OpenProcess(PROCESS_DUP_HANDLE, 0, pid);
if(hProc == NULL){
printerror("1");
return 1;
}
HANDLE hMutex = OpenMutex(MUTEX_ALL_ACCESS, TRUE, "Oberon_Minesweeper_Singleton");
if(hMutex == NULL){
printerror("2");
return 2;
}
if(DuplicateHandle(hProc, hMutex, NULL, 0, 0, FALSE, DUPLICATE_CLOSE_SOURCE) == 0){
printerror("3");
return 3;
}
if(CloseHandle(hMutex) == 0){
printerror("4");
return 4;
}
return 0;
}
This code returns 0, but the mutex is still there, and I am not able to launch more games of Minesweeper. I think some of my parameters to DuplicateHandle are wrong.
The second argument to DuplicateHandle expects "an open object handle that is valid in the context of the source process", however I believe the handle you're passing in would only be valid within the current process (OpenMutex creates a new handle to an existing mutex object). You'll likely need to determine what the mutex's handle is in the remote process, and use that value when calling DuplicateHandle.