Get 32bit PEB of another process from a x64 process - c++

I have a 64 bit process that needs to read the 32bit PEB of a Wow64 process.
I'm able to get it with NtQueryInformationProcess, but I realized that Wow64 processes have two PEBs (64 and 32 bit) and NtQueryInformationProcess returns the PEB corresponding to the bitness of the caller (64bit in my case), as #Anders commented in this solution:
How to get the Process Environment Block (PEB) from extern process?
That's my scenario: I'm trying to get the 32bit PEB of a Wow64 process, from inside a x64 process. Any suggestions that involve changing that scenario are useless. I'm also aware that this kind of solution is not recommended for production and that's not my intention.
Any ideas?
Thanks in advance.

If you read the NtQueryInterformationProcess() documentation on MSDN, there is a comment that says:
It appears that when querying a process running under wow64 in (at least) windows Vista the PebBaseAddress returned is actually for the 64-bit modules loaded under wow64. From some initial investigations I've done it appears that the PEB which pertains to 32-bit modules can be found by taking the PebBaseAddress and subtracting one page (0x1000) from its value. I have minimally confirmed this hypothesis by inspecting the process's TIB's and following their PEB pointers back to an address which, so far, has always shown to be -0x1000 from the PebBaseAddress value returned by this function.
Update: I just found this code that states the above will not work from Windows 8 onwards, but does provide an alternative solution:
#define TEB32OFFSET 0x2000
void interceptNtDll32(HANDLE hProcess)
{
THREAD_BASIC_INFORMATION tbi;
NTSTATUS ntrv;
TEB32 teb32;
void *teb32addr;
PEB_LDR_DATA32 ldrData;
PEB32 peb32;
LIST_ENTRY32 *pMark = NULL;
LDR_DATA_TABLE_ENTRY32 ldrDataTblEntry;
size_t bytes_read;
HANDLE hThread = getThreadHandle(hProcess);
/* Used to be able to get 32 bit PEB from PEB64 with 0x1000 offset but
Windows 8 changed that so we do it indirectly from the TEB */
if(!hThread)
return;
/* Get thread basic information to get 64 bit TEB */
ntrv = NtQueryInformationThread(hThread, ThreadBasicInformation, &tbi, sizeof(tbi), NULL);
if(ntrv != 0){
goto out;
}
/* Use magic to find 32 bit TEB */
teb32addr = (char *)tbi.TebBaseAddress + TEB32OFFSET; // Magic...
ntrv = NtReadVirtualMemory(hProcess, teb32addr, &teb32, sizeof(teb32), NULL);
if(ntrv != 0 || teb32.NtTib.Self != (DWORD)teb32addr){ // Verify magic...
goto out;
}
/* TEB32 has address for 32 bit PEB.*/
ntrv = NtReadVirtualMemory(hProcess, (void *)teb32.ProcessEnvironmentBlock, &peb32, sizeof(peb32), NULL);
if(ntrv != 0){
goto out;
}
...

You can use NtQueryInformationProcess
see https://learn.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntqueryinformationprocess#ulong_ptr
set the ProcessInformationClass to ProcessWow64Information
set a pointer to recive the ProcessInformation's value
When called the api, the ProcessInformation's value will be pebBaseAddr of the wow64 process if it's not zero

Related

Shared memory "Too many open files" but ipcs doesn't show many allocations

I'm writing unit tests for code which creates shared memory.
I only have a couple of tests. I make 4 allocations of shared memory and then it fails on the fifth.
After calling shmat() perror() says Too many open files:
template <typename T>
bool Attach(T** ptr, const key_type& key)
{
// shmemId was 262151
int32_t shmemId = shmget( key.key( ), ( size_t )0, 0644 );
if (shmemId < 0)
{
perror("Error: ");
return false;
}
*ptr = ( T * ) shmat(shmemId, 0, 0 );
if ( ( int64_t ) * ptr < 0 )
{
// Problem is here. perror() says 'Too many open files'
perror( "Error: ");
return false;
}
return true;
}
However, when I check ipcs -m -p I only have a couple of shared memory allocations.
T ID KEY MODE OWNER GROUP CPID LPID
Shared Memory:
m 262151 0x0000a028 --rw-r--r-- 3229 0
m 262152 0x0000a029 --rw-r--r-- 3229 0
In addition, when I check my OS shared memory limits sysctl -A | grep shm I get:
kern.sysv.shmall: 1024
kern.sysv.shmmax: 4194304
kern.sysv.shmmin: 1
kern.sysv.shmmni: 32
kern.sysv.shmseg: 8
security.mac.posixshm_enforce: 1
security.mac.sysvshm_enforce: 1
Are these variables large enough/are they the cause/what values should I have?
I'm sure I edited the file to increase them and restarted machine but perhaps it hasn't accepted (this is on Mac/OSX).
Your problem may be elsewhere.
Edit: This may be a shmmni limit of macOS. See below.
When I run your [simplified] code on my system (linux), the shmget fails.
You didn't specify IPC_CREAT to the third argument. If another process has created the segment, this may be okay.
But, it doesn't/shouldn't like a size of 0. The [linux] man page states that it returns an error (errno set to EINVAL) if the size is less than SHMMIN (which is 1).
That is what happened on my system. So, I adjusted the code to use a size of 1.
This was done [as I mentioned] on linux.
macOS may allow a size of 0, even if that doesn't make practical sense. (e.g.) It may round it up to a page size.
For shmat, it returns (void *) -1.
But, some systems can have valid addresses that have the high bit set. (e.g.) 0xFFE0000000000000 is a valid address, but would fail your if test because casting that to int64_t will test negative.
Better to do:
if ((int64_t) *ptr == (int64_t) -1)
Or [possibly better]:
if ((void *) *ptr == (void *) -1)
Note that errno is not set/changed if the call succeeds.
To verify this, do: errno = 0 before the shmat call. If perror says "Success", then the shmat is okay. And, your current test needs to be adjusted as above--I'd do that change regardless.
You could also do (e.g):
printf("ptr=%p\n",*ptr);
Normally, errno starts as 0.
Note that there are some differences between macOS and linux.
So, if errno is ever set to "too many open files", this can be because the process has too many open files (EMFILE).
It might be because the system-wide limit is reached (ENFILE) but that is "file table overflow".
Note that under linux shmat can not generate EMFILE. However, it appears that under macOS it can.
However, if the number of calls to shmat is limited [as you mention], the shmat should succeed.
The macOS man page is a little vague as to what the limit is based on. However, I checked the FreeBSD man page for shmat and that says it is limited by the sysctl parameter: kern.ipc.shmseg. Your grep should have caught that [if applicable].
It is possible some other syscall elsewhere in the code is opening too many files. And, that syscall is not checking the error return.
Again, I realize you're running macOS.
But, if available, you may want to try your program under linux. For example, it has much larger limits from the sysctl:
kernel.shm_next_id = -1
kernel.shm_rmid_forced = 0
kernel.shmall = 18446744073692774399
kernel.shmmax = 18446744073692774399
kernel.shmmni = 4096
vm.hugetlb_shm_group = 0
Note that shmmni is the system-wide maximum number of shared memory segments.
Note that for macOS, shmmni is 32 (vs. 4096 for linux)!?!?
That means that the entire system can only have 32 open shared memory segments for any/all processes???
That seems very low. You can probably set this to a larger number and see if that helps.
Linux has the strace program and you could use it to monitor the syscalls.
But, macOS has dtruss: How to trace system calls of a program in Mac OS X?

How to obtain handles for all children process of current process in Windows?

For the purposes of performance monitoring on Windows OS, I need a program which can report both user and kernel times for an arbitrary process. On POSIX systems, standard time utility is perfectly OK as it reports wall clock time, user time and kernel time.
For Windows, there is no such utility by default. I looked around and found at least three alternatives. As I explain below, none of them actually suits my needs.
timeit from Windows SDK (cannot recall what exact version). It is no longer distributed, supported, or guaranteed to work on modern systems. I was not able to test it.
Cygwin's time. Almost identical to POSIX counterpart with similar output formatting.
timep.exe by Johnson (John) Hart, available in source code and binaries for his book "Windows System Programming, 4th Edition". This is a pretty simple utility that uses WinAPI's GetProcessTimes() to obtain the very same three values. I suspect that Cygwin's time is no different in that regard.
Now the problem: GetProcessTimes() only reports times for the PID directly spawned by timep, but not its children. This makes both time and timep useless for me.
My target EXE application is typically spawned through a BAT file which invokes one more BAT file; both BATs are meant to tune environment or alter command line arguments:
timep.exe
|
+---wrapper.bat
|
+--- real-wrapper.bat
|
+--- application.exe
Times reported for wrapper.bat alone tell nothing about application.exe.
Obviously, process creation models of POSIX (fork-exec) and Win32 (CreateProcess) are very different, which makes my goal that hard to achieve on Windows.
I want to try to write my own variant of time. It has to sum up times for given process and all his children, grandchildren etc., recursively. So far I can imagine the following approach:
CreateProcess() and get its PID (root PID) and handle; add this handle to a list
Enumerate all processes in system; for each process
Compare its PID with root PID. If equal, get PID and handle for it, add it to the handle list.
For every new PID, repeat process scan phase to collect more children handles
Recurse down until no new process handles are added to the list
Wait for all collected handles from the list to terminate.
For each handle, call GetProcessTimes() and sum them up
Report results
This algorithm is bad because it is racy — children processes may be created late in the life of any process, or they can terminate before we get a chance to obtain their handle. In both cases, reported result times will be incorrect.
My question is: Is there a better solution?
EDIT: I was able to achieve my goal by using Job Objects. Below is a code snippet extracted from my application, relevant to obtaining kernel and user times from a process and all of its children. Hopefully it will save some time for someone.
I tested it with Windows 8.1 x64 and VS 2015, but it should be backwards-portable to at least Windows 7. Some fiddling might be required for 32-bit hosts (I am not sure) in regard to long long types - I am not familiar with CL.EXE's ways of dealing with them on such platforms.
#include <windows.h>
#include <string>
#include <cassert>
#include <iostream>
/* ... */
STARTUPINFO startUp;
PROCESS_INFORMATION procInfo;
/* Start program in paused state */
PROCESS_INFORMATION procInfo;
if (!CreateProcess(NULL, CmdParams, NULL, NULL, TRUE,
CREATE_SUSPENDED | NORMAL_PRIORITY_CLASS, NULL, NULL, &startUp, &procInfo)) {
DWORD err = GetLastError();
// TODO format error message
std::cerr << "Unable to start the process: " << err << std::endl;
return 1;
}
HANDLE hProc = procInfo.hProcess;
/* Create job object and attach the process to it */
HANDLE hJob = CreateJobObject(NULL, NULL); // XXX no security attributes passed
assert(hJob != NULL);
int ret = AssignProcessToJobObject(hJob, hProc);
assert(ret);
/* Now run the process and allow it to spawn children */
ResumeThread(procInfo.hThread);
/* Block until the process terminates */
if (WaitForSingleObject(hProc, INFINITE) != WAIT_OBJECT_0) {
DWORD err = GetLastError();
// TODO format error message
std::cerr << "Failed waiting for process termination: " << err << std::endl;
return 1;
}
DWORD exitcode = 0;
ret = GetExitCodeProcess(hProc, &exitcode);
assert(ret);
/* Calculate wallclock time in nanoseconds.
Ignore user and kernel times (third and fourth return parameters) */
FILETIME createTime, exitTime, unusedTime;
ret = GetProcessTimes(hProc, &createTime, &exitTime, &unusedTime, &unusedTime);
assert(ret);
LONGLONG createTimeNs = (LONGLONG)createTime.dwHighDateTime << 32 | createTime.dwLowDateTime;
LONGLONG exitTimeNs = (LONGLONG)exitTime.dwHighDateTime << 32 | exitTime.dwLowDateTime;
LONGLONG wallclockTimeNs = exitTimeNs - createTimeNs;
/* Get total user and kernel times for all processes of the job object */
JOBOBJECT_BASIC_ACCOUNTING_INFORMATION jobInfo;
ret = QueryInformationJobObject(hJob, JobObjectBasicAccountingInformation,
&jobInfo, sizeof(jobInfo), NULL);
assert(ret);
if (jobInfo.ActiveProcesses != 0) {
std::cerr << "Warning: there are still "
<< jobInfo.ActiveProcesses
<< " alive children processes" << std::endl;
/* We may kill survived processes, if desired */
TerminateJobObject(hJob, 127);
}
/* Get kernel and user times in nanoseconds */
LONGLONG kernelTimeNs = jobInfo.TotalKernelTime.QuadPart;
LONGLONG userTimeNs = jobInfo.TotalUserTime.QuadPart;
/* Clean up a bit */
CloseHandle(hProc);
CloseHandle(hJob);
Yes, from timep.exe create a job, and use job accounting. Child processes (unless created in their own jobs) share the job with their parent process.
This pretty much skips your steps 2-4
I've packaged the solution for this problem into a standalone program for Windows called chronos. It creates a job object and then spawns a requested process inside it. All the children spawned later stay in the same job object and thus can be accounted later.

hidapi: Sending packet smaller than caps.OutputReportByteLength

I am working with a device (the wiimote) that takes commands through the DATA pipe, and only accepts command packets that are EXACTLY as long as the command itself. For example, it will accept:
0x11 0x10
but it will not accept:
0x11 0x10 0x00 0x00 0x00 ... etc.
This is a problem on windows, as WriteFile() on windows requires that the byte[] passed to it is at least as long as caps.OutputReportByteLength. On mac, where this limitation isn't present, my code works correctly. Here is the code from hid.c that causes this issue:
/* Make sure the right number of bytes are passed to WriteFile. Windows
expects the number of bytes which are in the _longest_ report (plus
one for the report number) bytes even if the data is a report
which is shorter than that. Windows gives us this value in
caps.OutputReportByteLength. If a user passes in fewer bytes than this,
create a temporary buffer which is the proper size. */
if (length >= dev->output_report_length) {
/* The user passed the right number of bytes. Use the buffer as-is. */
buf = (unsigned char *) data;
} else {
/* Create a temporary buffer and copy the user's data
into it, padding the rest with zeros. */
buf = (unsigned char *) malloc(dev->output_report_length);
memcpy(buf, data, length);
memset(buf + length, 0, dev->output_report_length - length);
length = dev->output_report_length;
}
res = WriteFile(dev->device_handle, buf, length, NULL, &ol);
Removing the above code, as mentioned in the comments, results in an error from WriteFile().
Is there any way that I can pass data to the device of arbitrary size? Thanks in advance for any assistance.
Solved. I used a solution similar to the guys over at Dolphin, a Wii emulator. Apparently, on the Microsoft bluetooth stack, WriteFile() doesn't work correctly, causing the Wiimote to return with an error. By using HidD_SetOutputReport() on the MS stack and WriteFile() on the BlueSoleil stack, I was able to successfully connect to the device (at least on my machine).
I haven't tested this on the BlueSoleil stack, but Dolphin is using this method so it is safe to say it works.
Here is a gist containing an ugly implementation of this fix:
https://gist.github.com/Flafla2/d261a156ea2e3e3c1e5c

pcap_next call fills in pcap_pkthdr with len equal to zero

I'm using libpcap of version 1.1.1 built as a static library(libpcap.a). When I try to execute a following block of code on RHEL 6 64 bit(The executable module itself is built as 32-bit ELF image) I get segmentation fault:
const unsigned char* packet;
pcap_pkthdr pcap_header = {0};
unsigned short ether_type = 0;
while ( ether_type != ntohs( 0x800 ) )
{
packet = pcap_next ( m_pcap_handle, &pcap_header );
if (packet != NULL)
{
memcpy ( &ether_type, &( packet[ 12 ] ), 2 );
}
else
{
/*Sleep call goes here*/
}
}
if ( raw_buff ->data_len >= pcap_header.caplen )
{
memcpy ( raw_buff->data, &(packet[14]), pcap_header.len -14 );
raw_buff->data_len = pcap_header.len -14;
raw_buff->timestamp = pcap_header.ts;
}
A bit of investigation revealed pcap_header.len field is equal to zero upon pcap_next return. In fact caplen field seems to reflect packet size correсtly. If I try to dump a packet memory from packet address - data seems to be valid. As of len field equal to zero I know it's invalid. It supposed to be at least as of caplen magnitude. Is it a bug? What steps shall I take to get this fixed?
GDB shows pcap_header contents as:
(gdb) p pcap_header
$1 = {ts = {tv_sec = 5242946, tv_usec = 1361456997}, caplen = 66, len = 0}
Maybe I can have some workaround applied? I don't want to upgrade libpcap version.
Kernels prior to the 2.6.27 kernel do not support running 32-bit binaries using libpcap 1.0 or later on a 64-bit kernel.
libpcap 1.0 and later use the "memory-mapped" capture mechanism on Linux kernels that have it available, and the first version of that mechanism did not ensure that the data structures shared between the kernel and code using the "memory-mapped" capture mechanism were laid out in memory the same way in 32-bit and 64-bit mode.
2.6 kernels prior to the 2.6.27 kernel have only the first version of that mechanism. The 2.6.27 kernel has the second version of that mechanism, which does ensure that the data structures are laid out in memory the same way in 32-bit and 64-bit mode, so that 32-bit user-mode code works the same atop 32-bit and 64-bit kernels.
Hopefully I googled for "https://bugzilla.redhat.com/show_bug.cgi?id=557728" defect description and it seems it is still relevant nowadays. The problem went away when I linked my application to a shared library version of libpcap instead of having it linked with a static one. Then a system gets my app linked to a libpcap at runtime which is being shipped with RHEL.
Sincerely yours, Alexander Chernyaev.

Process Id's and process names

I'm creating a windows program that basically scans the system to see if a particular process is running or not. I have the process name (AcroRd32.exe) and nothing else.
From what I've read the easiest way to create a snapshot of all processes using CreateToolhelp32Snapshot and then iterate through each process looking for the process name.
My application is highly performance centric. So is there a better more efficient way to do this.
The application collects a snapshot every few seconds. Iterating through 100's of processes in the snapshot doesn't seem efficient. Is there a direct API that can find the Process through its process name (and retrieve process handle or id through the name)?
I've searched extensively without much luck. Has anyone tried this before?
The fastest way to scan for processes is via NTDLL's NtQuerySystemInformation call. It provides you with a list of names and process IDs of all processes on the system with a single call (or more in rare cases, i.e. large # of processes). You can combine NtQuerySystemInformation and use a hash to do string comparisons instead of comparing each byte.
// headers # http://pastebin.com/HWzJYpbv
NtQuerySystemInformation = (_RT_NAPI_QUERYSYSINFO)GetProcAddress(GetModuleHandleA("NTDLL.DLL"), "NtQuerySystemInformation");
// Get process information buffer
do {
// Allocate buffer for process info
pBuffer = HeapAlloc(hHeap, HEAP_ZERO_MEMORY, cbBuffer);
if (pBuffer == NULL) {
// Cannot allocate enough memory for buffer (CRITICAL ERROR)
return 1;
}
// Obtain system process snapshot
Status = NtQuerySystemInformation(5, pBuffer, cbBuffer, NULL);
// Allocate bigger buffer for moar data
if (Status == STATUS_INFO_LENGTH_MISMATCH) {
HeapFree(hHeap, 0, pBuffer);
cbBuffer *= 2; // Increase the size of the buffer :-)
} else if ((Status) != 0x00) {
// Can't query process information (probably rootkit or anti-virus)
HeapFree(hHeap, 0, pBuffer);
return 1;
}
} while (Status == STATUS_INFO_LENGTH_MISMATCH);
// Get pointer to first system process info structure
pInfo = (PSYSTEM_PROCESS_INFORMATION)pBuffer;
// Loop over each process
for (;;) {
// Get process name
pszProcessName = pInfo->ImageName.Buffer;
// ... do work. For a fast string compare, calculate a 32-bit hash of the string, then compare to a static hash.
if(CRC32(pszProcessName) == 0xDEADBEEF /* <- hash of adobe reader process name goez here */) {
// Found process
}
// Load next entry
if (pInfo->NextEntryOffset == 0)
break;
pInfo = (PSYSTEM_PROCESS_INFORMATION)(((PUCHAR)pInfo)+ pInfo->NextEntryOffset);
}
Tested on Windows 2000 - Windows 7 English editions, x64/x86 (except Win XP x64)
Note: It will return all processes to 32-bit WOW64 processes on 64-bit systems.
No.
Each process has a unique ID but not unique name. There could be multiple processes with the same name. So it is impossible to get the process handle out of its name directly without iterating over all processes.
Internally all prcesses are linked together somehow, e.g., in a linked list. Even if there was a function GetProcessByName() provided, it would internally traverse the list to find those processes with that name on behalf of you as well. So that won't make a big difference in performance.
Aside
Give a shot to EnumProcesses() which has less overhead and is simpler. Check here.
BOOL WINAPI EnumProcesses(
__out DWORD *pProcessIds,
__in DWORD cb,
__out DWORD *pBytesReturned
);
MSDN has an example for this.