threading in someone else's address space - c++

I'm building a monitor app and am having some threading issues.
I have, using a cbt hook, injected a dll in to another processes memory. I am reading the memory of the other application at certain addresses. The trouble is I was using a loop to watch the process and basically the app being watched wasn't free to carry on. So I thought put my watch process in a thread. I am using the code below to create the thread:
void readAddresses(DWORD addr)
{
LPDWORD dwThreadID;
HANDLE hThread = CreateThread(NULL,0,ThreadProc,&addr,0,dwThreadID);
}
I did try with CreateRemoteThread(...) as well and got the same error. With the thread running when it calls the ReadProcessMemory() api it fails and i am not really sure what I am doing wrong.
//going to pass in an address, dword
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
DWORD pid;
GetWindowThreadProcessId(targetWindow,&pid);
HANDLE hProcess = ::OpenProcess(PROCESS_CREATE_THREAD | PROCESS_QUERY_INFORMATION |
PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_VM_READ,
FALSE, pid);
...
ReadProcessMemory(hProcess,(void *)_start, data, 255, &lpRead);
...
}
The trouble is when I call readprocessmemory I now get an access violation. What I am curious about is that is the thread operating in the same process address space as the process into which it been injected. As I said without the thread code it works fine but i need the monitor code to run in the background and i am wondering how to achieve this? Should I use create remote thread?
As Remus sais use beginthread() or beginthreadex()...
Thanks

One thing is sure: addresses to read and write are definitely not a DWORD type. From the code above, it seems that you pass an DWORD addr as the address to read from, then you start a thread to which you pass on the address of your local addr parameter. Most likely the thread proc is then attempting to read the address where the addr parameter once was in the current process on the original thread stack (a meaningless address now in any process) and the result is random (sometimes you will hit jackpot and read some innocent victim location on the remote process).
pass in the address to read as a proper address (LPVOID). DWORD cannot be right.
pass to the background thread the address you want to read, not some local stack frame garbage it cannot use
.
void readAddresses(LPVOID addr)
{
LPDWORD dwThreadID;
HANDLE hThread = CreateThread(NULL,0,myThreadProc,addr,0,dwThreadID);
}
DWORD WINAPI myThreadProc(LPVOID addr)
{
...
ReadProcessMemory (..., addr);
}

Related

Resume completion port notification after they were stopped

In the MSDN doc for the lpOverlapped parameter of GetQueuedCompletionStatus it is said that the application can prevent completion port notification by setting the low-order bit of the hEvent member of the OVERLAPPED structure. But is it possible to resume the notifications after they were stopped?
I need to use this for monitoring network folders for changes:
When GetQueuedCompletionStatus returns FALSE and GetLastError() returns ERROR_NETNAME_DELETED, I do this (works):
di->Overlapped.hEvent = CreateEvent( NULL, FALSE, FALSE, di->lpszDirName );
reinterpret_cast<uintptr_t &>(di->Overlapped.hEvent) |= 0x1;
And when the network problem was resolved, I tried to do the reverse operation - but it DID NOT work:
reinterpret_cast<uintptr_t &>(di->Overlapped.hEvent) &= ~(0x1);
(It will be good if the solution be compatible with Windows 7)
first of all completion port notification can not be "suspended" or "resumed"
Even if you have passed the function a file handle associated with a
completion port and a valid OVERLAPPED structure, an application can
prevent completion port notification. This is done by specifying a
valid event handle for the hEvent member of the OVERLAPPED structure,
and setting its low-order bit. A valid event handle whose low-order
bit is set keeps I/O completion from being queued to the completion
port.
this mean the next - when we call some win32 I/O api (api which take pointer to OVERLAPPED as in/out parameter, such as ReadFile, ReadDirectoryChangesW, LockFileEx etc) and file handle (passed to this api) associated with a completion port - despite this we can prevent completion port notification for this call by event handle with low-order bit. this is for only concrete api call and not affect any another api calls. and all this unrelated to GetQueuedCompletionStatus
(strictly said we can simply pass 1 in place hEvent too. but in this case question - how we get notify about I/O complete, if api return pending status ? yes possible wait and on file handle only, call GetOverlappedResult. but this will be correct only in no any another I/O call on this file in concurent)
in any case need understand how this is internally work. all native I/O api have the next signature:
NTSTATUS NTAPI SomeIoApi(
_In_ HANDLE FileHandle,
_In_opt_ HANDLE Event,
_In_opt_ PIO_APC_ROUTINE ApcRoutine,
_In_opt_ PVOID ApcContext,
_Out_ PIO_STATUS_BLOCK IoStatusBlock,
...
);
all have this common 5 parameters at begin. for queue I/O completion as result of this call several conditions must be met. of course FileHandle must be associated with some completion port (to this port and can be packet sent). but else one mandatory condition - ApcContext must be not zero (ApcContext != 0). if this 2 condition met and device return not error status (if FILE_SKIP_COMPLETION_PORT_ON_SUCCESS set on file - must be pending status only) - when I/O complete - ApcContext pointer will be pushed to port. and then it can be removed by
NTSTATUS
NTAPI
NtRemoveIoCompletion(
_In_ HANDLE IoCompletionHandle,
_Out_ PVOID *KeyContext,
_Out_ PVOID *ApcContext,
_Out_ PIO_STATUS_BLOCK IoStatusBlock,
_In_opt_ PLARGE_INTEGER Timeout
);
or by it win32 shell GetQueuedCompletionStatus.
so solution for not sent packet to port (even is file handle associated with completion port) - set ApcContext = 0. win32 layer do this in next way (pseudo - code):
BOOL WINAPI SomeWin32Api(
HANDLE FileHandle,
LPOVERLAPPED lpOverlapped,
LPOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine
)
{
HANDLE hEvent = lpOverlapped->hEvent;
PVOID ApcContext = lpOverlapped;
if ((ULONG_PTR)hEvent & 1)
{
reinterpret_cast<uintptr_t&>(hEvent) &= ~1;
ApcContext = 0;
}
NTSTATUS status = SomeIoApi(
FileHandle,
hEvent,
lpCompletionRoutine, // not exactly, but by sense
ApcContext,
(PIO_STATUS_BLOCK)lpOverlapped,...);
}
it check low-order bit of hEvent in OVERLAPPED - if it set - pass 0 inplace ApcContext otherwise pass lpOverlapped (pointer to OVERLAPPED) as context ( ApcContext = lpOverlapped;)
note that nt layer let pass any void* pointer as ApcContext. but win32 layer always pass here pointer to OVERLAPPED structure or 0. because this and GetQueuedCompletionStatus return this pointer back as _Out_ LPOVERLAPPED *lpOverlapped (compare with NtRemoveIoCompletion - return as _Out_ PVOID *ApcContext)
anyway this trick affect only concrete single win32 I/O call, and if you late reset low-order bit in hEvent from overlapped ( reinterpret_cast<uintptr_t &>(di->Overlapped.hEvent) &= ~(0x1);) this already can not have any effect - the 0 in place ApcContext already passed.
also from general view this is rarely when we associate file handle with a completion port, but want not use it in some call. usually this is another api call. for example we can create asynchronous file handle, associate it with a completion port. and use port notifications in call WriteFile, but before begin write we can set/remove compression on file via FSCTL_SET_COMPRESSION. because file is asynchronous, the FSCTL_SET_COMPRESSION also can complete asynchronous, but we can want prevent completion port notification for this ioctl, instead wait inplace (on event) for it complete. for such situation and can be used this trick.
and in most case applications (if this not server with huge count of i/o requests) can instead manual call GetQueuedCompletionStatus, bind callback to file via BindIoCompletionCallback or CreateThreadpoolIo. as result system for you create iocp, thread pool which will be listen on this iocp (via GetQueuedCompletionStatus or NtRemoveIoCompletion) and then call your callback. this is very simplify your src code and logic
findings:
i almost sure (despite not view your code) that you not need at all
use trick with event low-order bit
if you use this trick in some I/O request (say ReadDirectoryChangesW)
this affect only this particular request
you can not change the behaviour by reset low-order bit in event
handle after request is sent, or by any another way
you in general not need use GetQueuedCompletionStatus and self thread
pool at all. instead simply call BindIoCompletionCallback for file

How to obtain handles for all children process of current process in Windows?

For the purposes of performance monitoring on Windows OS, I need a program which can report both user and kernel times for an arbitrary process. On POSIX systems, standard time utility is perfectly OK as it reports wall clock time, user time and kernel time.
For Windows, there is no such utility by default. I looked around and found at least three alternatives. As I explain below, none of them actually suits my needs.
timeit from Windows SDK (cannot recall what exact version). It is no longer distributed, supported, or guaranteed to work on modern systems. I was not able to test it.
Cygwin's time. Almost identical to POSIX counterpart with similar output formatting.
timep.exe by Johnson (John) Hart, available in source code and binaries for his book "Windows System Programming, 4th Edition". This is a pretty simple utility that uses WinAPI's GetProcessTimes() to obtain the very same three values. I suspect that Cygwin's time is no different in that regard.
Now the problem: GetProcessTimes() only reports times for the PID directly spawned by timep, but not its children. This makes both time and timep useless for me.
My target EXE application is typically spawned through a BAT file which invokes one more BAT file; both BATs are meant to tune environment or alter command line arguments:
timep.exe
|
+---wrapper.bat
|
+--- real-wrapper.bat
|
+--- application.exe
Times reported for wrapper.bat alone tell nothing about application.exe.
Obviously, process creation models of POSIX (fork-exec) and Win32 (CreateProcess) are very different, which makes my goal that hard to achieve on Windows.
I want to try to write my own variant of time. It has to sum up times for given process and all his children, grandchildren etc., recursively. So far I can imagine the following approach:
CreateProcess() and get its PID (root PID) and handle; add this handle to a list
Enumerate all processes in system; for each process
Compare its PID with root PID. If equal, get PID and handle for it, add it to the handle list.
For every new PID, repeat process scan phase to collect more children handles
Recurse down until no new process handles are added to the list
Wait for all collected handles from the list to terminate.
For each handle, call GetProcessTimes() and sum them up
Report results
This algorithm is bad because it is racy — children processes may be created late in the life of any process, or they can terminate before we get a chance to obtain their handle. In both cases, reported result times will be incorrect.
My question is: Is there a better solution?
EDIT: I was able to achieve my goal by using Job Objects. Below is a code snippet extracted from my application, relevant to obtaining kernel and user times from a process and all of its children. Hopefully it will save some time for someone.
I tested it with Windows 8.1 x64 and VS 2015, but it should be backwards-portable to at least Windows 7. Some fiddling might be required for 32-bit hosts (I am not sure) in regard to long long types - I am not familiar with CL.EXE's ways of dealing with them on such platforms.
#include <windows.h>
#include <string>
#include <cassert>
#include <iostream>
/* ... */
STARTUPINFO startUp;
PROCESS_INFORMATION procInfo;
/* Start program in paused state */
PROCESS_INFORMATION procInfo;
if (!CreateProcess(NULL, CmdParams, NULL, NULL, TRUE,
CREATE_SUSPENDED | NORMAL_PRIORITY_CLASS, NULL, NULL, &startUp, &procInfo)) {
DWORD err = GetLastError();
// TODO format error message
std::cerr << "Unable to start the process: " << err << std::endl;
return 1;
}
HANDLE hProc = procInfo.hProcess;
/* Create job object and attach the process to it */
HANDLE hJob = CreateJobObject(NULL, NULL); // XXX no security attributes passed
assert(hJob != NULL);
int ret = AssignProcessToJobObject(hJob, hProc);
assert(ret);
/* Now run the process and allow it to spawn children */
ResumeThread(procInfo.hThread);
/* Block until the process terminates */
if (WaitForSingleObject(hProc, INFINITE) != WAIT_OBJECT_0) {
DWORD err = GetLastError();
// TODO format error message
std::cerr << "Failed waiting for process termination: " << err << std::endl;
return 1;
}
DWORD exitcode = 0;
ret = GetExitCodeProcess(hProc, &exitcode);
assert(ret);
/* Calculate wallclock time in nanoseconds.
Ignore user and kernel times (third and fourth return parameters) */
FILETIME createTime, exitTime, unusedTime;
ret = GetProcessTimes(hProc, &createTime, &exitTime, &unusedTime, &unusedTime);
assert(ret);
LONGLONG createTimeNs = (LONGLONG)createTime.dwHighDateTime << 32 | createTime.dwLowDateTime;
LONGLONG exitTimeNs = (LONGLONG)exitTime.dwHighDateTime << 32 | exitTime.dwLowDateTime;
LONGLONG wallclockTimeNs = exitTimeNs - createTimeNs;
/* Get total user and kernel times for all processes of the job object */
JOBOBJECT_BASIC_ACCOUNTING_INFORMATION jobInfo;
ret = QueryInformationJobObject(hJob, JobObjectBasicAccountingInformation,
&jobInfo, sizeof(jobInfo), NULL);
assert(ret);
if (jobInfo.ActiveProcesses != 0) {
std::cerr << "Warning: there are still "
<< jobInfo.ActiveProcesses
<< " alive children processes" << std::endl;
/* We may kill survived processes, if desired */
TerminateJobObject(hJob, 127);
}
/* Get kernel and user times in nanoseconds */
LONGLONG kernelTimeNs = jobInfo.TotalKernelTime.QuadPart;
LONGLONG userTimeNs = jobInfo.TotalUserTime.QuadPart;
/* Clean up a bit */
CloseHandle(hProc);
CloseHandle(hJob);
Yes, from timep.exe create a job, and use job accounting. Child processes (unless created in their own jobs) share the job with their parent process.
This pretty much skips your steps 2-4
I've packaged the solution for this problem into a standalone program for Windows called chronos. It creates a job object and then spawns a requested process inside it. All the children spawned later stay in the same job object and thus can be accounted later.

Invalid HANDLE from AfxBeginThread

I am writing a multi-threaded networked application, and I'm using a separate thread with a blocking socket to receive data asynchronously from the server.
When I need to shutdown the socket I use a function which checks if the receive thread is still running and if it is calls TerminateThread to end it as follows:
DWORD dwExitCode = 0;
if( GetExitCodeThread( theApp.m_hRecvThread, &dwExitCode ) && dwExitCode == STILL_ACTIVE )
TerminateThread( theApp.m_hRecvThread, 0 );
However, GetExitCodeThread returns FALSE, and when polling GetLastError() it returns 6 (ERROR_INVALID_HANDLE). Which suggests that I do not have the THREAD_QUERY_INFORMATION or THREAD_QUERY_LIMITED_INFORMATION access rights on the m_hRecvThread handle.
My m_hRecvThread handle is set when creating the thread like so:
m_hRecvThread = AfxBeginThread( RecvThread, hWndMainFrame );
This successfully creates the thread, and the thread is running fine and exhibiting expected functionality. The TerminateThread and GetExitCodeThread are being called from the same thread which created the Receive thread in the first place.
My understanding was that when using AfxBeginThread, the HANDLE returned had THREAD_ALL_ACCESS access rights, is this the case, and if so, why am I still getting ERROR_INVALID_HANDLE?
Thanks in advance!
The thread created using:
m_hRecvThread = AfxBeginThread( RecvThread, hWndMainFrame )
will return a pointer to winthread. But GetExitCodeThread() requires the handle to the thread for example, you can pass RecvThread->m_hThread, which will solve the issue

WinHTTP Async Callback

I'm not very good in C++, you if you see something in the code fragment which could be better, please educate me!
I'm implementing winhttp in an asynchronous fashion. But im having trouble retrieving the response. I cant figure it out. Because you should be able to parsethe whole response at once. Since multiple concurent request can occur, buffering the response (headers+body) in a global variable is not the way to go.
How can I retrieve the response of the http get request? Or else, is it an good practice to execute winhttp synchronous on a new thread (so the main loop doesn;t get blocked and then calls a function when done?):
void __stdcall cb(HINTERNET h, DWORD_PTR d, DWORD dwInternetStatus, LPVOID lpvStatusInformation, DWORD dwStatusInformationLength){
char* s=new char[1];
DWORD dwSize = 0;
if (dwInternetStatus==WINHTTP_CALLBACK_STATUS_DATA_AVAILABLE){
MessageBoxA(0,s,"",0);
WinHttpQueryDataAvailable( h, &dwSize);
.....
}
}
And the call in the main:
...winhttpopen...
WinHttpSetStatusCallback(request, (WINHTTP_STATUS_CALLBACK)whCallback,WINHTTP_CALLBACK_FLAG_ALL_NOTIFICATIONS,0);
...winhttpsend....
Check this sample code on MSDN - Asynchronous Completion in WinHTTP.
The call to WinHttpQueryDataAvailable in QueryData generates a status
callback with a WINHTTP_CALLBACK_STATUS_DATA_AVAILABLE completion in
the dwInternetStatus parameter. By checking the value pointed to by
the lpvStatusInformation parameter, the callback can determine how
much data is left to be read, and if there is no remaining data, can
proceed to display all the data that has been read.
This shows you that your callback is called with buffer pointer and length of data in it.

Process Id's and process names

I'm creating a windows program that basically scans the system to see if a particular process is running or not. I have the process name (AcroRd32.exe) and nothing else.
From what I've read the easiest way to create a snapshot of all processes using CreateToolhelp32Snapshot and then iterate through each process looking for the process name.
My application is highly performance centric. So is there a better more efficient way to do this.
The application collects a snapshot every few seconds. Iterating through 100's of processes in the snapshot doesn't seem efficient. Is there a direct API that can find the Process through its process name (and retrieve process handle or id through the name)?
I've searched extensively without much luck. Has anyone tried this before?
The fastest way to scan for processes is via NTDLL's NtQuerySystemInformation call. It provides you with a list of names and process IDs of all processes on the system with a single call (or more in rare cases, i.e. large # of processes). You can combine NtQuerySystemInformation and use a hash to do string comparisons instead of comparing each byte.
// headers # http://pastebin.com/HWzJYpbv
NtQuerySystemInformation = (_RT_NAPI_QUERYSYSINFO)GetProcAddress(GetModuleHandleA("NTDLL.DLL"), "NtQuerySystemInformation");
// Get process information buffer
do {
// Allocate buffer for process info
pBuffer = HeapAlloc(hHeap, HEAP_ZERO_MEMORY, cbBuffer);
if (pBuffer == NULL) {
// Cannot allocate enough memory for buffer (CRITICAL ERROR)
return 1;
}
// Obtain system process snapshot
Status = NtQuerySystemInformation(5, pBuffer, cbBuffer, NULL);
// Allocate bigger buffer for moar data
if (Status == STATUS_INFO_LENGTH_MISMATCH) {
HeapFree(hHeap, 0, pBuffer);
cbBuffer *= 2; // Increase the size of the buffer :-)
} else if ((Status) != 0x00) {
// Can't query process information (probably rootkit or anti-virus)
HeapFree(hHeap, 0, pBuffer);
return 1;
}
} while (Status == STATUS_INFO_LENGTH_MISMATCH);
// Get pointer to first system process info structure
pInfo = (PSYSTEM_PROCESS_INFORMATION)pBuffer;
// Loop over each process
for (;;) {
// Get process name
pszProcessName = pInfo->ImageName.Buffer;
// ... do work. For a fast string compare, calculate a 32-bit hash of the string, then compare to a static hash.
if(CRC32(pszProcessName) == 0xDEADBEEF /* <- hash of adobe reader process name goez here */) {
// Found process
}
// Load next entry
if (pInfo->NextEntryOffset == 0)
break;
pInfo = (PSYSTEM_PROCESS_INFORMATION)(((PUCHAR)pInfo)+ pInfo->NextEntryOffset);
}
Tested on Windows 2000 - Windows 7 English editions, x64/x86 (except Win XP x64)
Note: It will return all processes to 32-bit WOW64 processes on 64-bit systems.
No.
Each process has a unique ID but not unique name. There could be multiple processes with the same name. So it is impossible to get the process handle out of its name directly without iterating over all processes.
Internally all prcesses are linked together somehow, e.g., in a linked list. Even if there was a function GetProcessByName() provided, it would internally traverse the list to find those processes with that name on behalf of you as well. So that won't make a big difference in performance.
Aside
Give a shot to EnumProcesses() which has less overhead and is simpler. Check here.
BOOL WINAPI EnumProcesses(
__out DWORD *pProcessIds,
__in DWORD cb,
__out DWORD *pBytesReturned
);
MSDN has an example for this.