Writing to read only address with kernel driver c++ - c++

Im trying to write memory to a user mode process with kernel driver,
the current address im trying to write memory for is read only, I want to write 4 bytes to the current address,
the thing is if i change protection ( page ) of the process with VirtualProtectEx , it works and it writes the memory but this is only on user mode level, my intention is to change the protection of the process from kernel mode, i want to make it READWRITE, then change it back to READ from kernel space,
Now what I tried to do is giving me a BSOD ( blue screen of death ) with error : Kmode_exception_not_handld
I cant understand what in my code is triggering this BSOD my PC have very limited specs and i cant debug in VM to know..
I will write the code that works but in user mode , and what the code is not working for me in kernel space:
here the code that works:
void dispatch::handler(void* info_struct)
{
PINFO_STRUCT info = (PINFO_STRUCT)info_struct;
if (info->code == CODE_READ_MEMORY)
{
PEPROCESS target_process = NULL;
if (NT_SUCCESS(PsLookupProcessByProcessId((HANDLE)info->process_id, &target_process)))
{
memory::read_memory(target_process, (void*)info->address, &info->buffer, info->size);
}
DbgPrintEx(0, 0, "[TEST]: Read Memory\n");
}
else if (info->code == CODE_WRITE_MEMORY)
{
PEPROCESS target_process = NULL;
if (NT_SUCCESS(PsLookupProcessByProcessId((HANDLE)info->process_id, &target_process)))
{
memory::write_memory(target_process, &info->buffer, (void*)info->address, info->size);
}
DbgPrintEx(0, 0, "[TEST]: Write Memory\n");
}
}
NTSTATUS memory::write_memory(PEPROCESS target_process, void* source, void* target, size_t size)
{
if (!target_process) { return STATUS_INVALID_PARAMETER; }
size_t bytes = 0;
NTSTATUS status = MmCopyVirtualMemory(IoGetCurrentProcess(), source, target_process, target, size, KernelMode, &bytes);
if (!NT_SUCCESS(status) || !bytes)
{
return STATUS_INVALID_ADDRESS;
}
return status;
}
int main()
{
DWORD oldprt
ULONG writeTest1 = 3204497152;
VirtualProtectEx(ProcManager::hProcess, (PVOID)(testAddr), 4, PAGE_READWRITE, &oldprt);
driver_control::write_memory(process_id, testAddr, writeTest1);
VirtualProtectEx(ProcManager::hProcess, (PVOID)(testAddr), 4, PAGE_READONLY, &oldprt);
return 0;
}
Now what I want to do is stop using the VirtualProtectEx, and change the PAGE protection to READWRITE from kernel space, so what i did is add this in the dispatch::handler function:
else if (info->code == CODE_WRITE_MEMORY)
{
PEPROCESS target_process = NULL;
if (NT_SUCCESS(PsLookupProcessByProcessId((HANDLE)info->process_id, &target_process)))
{
PMDL Mdl = IoAllocateMdl((void*)info->address, info->size, FALSE, FALSE, NULL);
if (!Mdl)
return false;
// Locking and mapping memory with RW-rights:
MmProbeAndLockPages(Mdl, KernelMode, IoReadAccess);
PVOID Mapping = MmMapLockedPagesSpecifyCache(Mdl, KernelMode, MmNonCached, NULL, FALSE, NormalPagePriority);
MmProtectMdlSystemAddress(Mdl, PAGE_READWRITE);
memory::write_memory(target_process, &info->buffer, (void*)info->address, info->size);
// Resources freeing:
MmUnmapLockedPages(Mapping, Mdl);
MmUnlockPages(Mdl);
IoFreeMdl(Mdl);
}
DbgPrintEx(0, 0, "[TEST]: Write Read Only Memory\n");
}
So this what I've added caused the BSOD, but why I cannot understand, what am i doing wrong here?
here is the info struct if needed to understand more, :
#define CODE_READ_MEMORY 0x1
#define CODE_WRITE_MEMORY 0x2
typedef struct _INFO_STRUCT
{
ULONG code;
ULONG process_id;
ULONG address;
ULONG buffer;
ULONG size;
}INFO_STRUCT, * PINFO_STRUCT;
any suggestions on solving this problem?

Related

Driver causing a system crash every time unloaded

The driver I wrote is taken from the Windows Kernel Programming book but for some reason it just doesn't work.
Here is my driver entry:
extern "C" NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
UNREFERENCED_PARAMETER(RegistryPath);
auto status = STATUS_SUCCESS;
InitializeListHead(&g_Globals.ItemsHead);
g_Globals.Mutex.Init();
PDEVICE_OBJECT DeviceObject = nullptr;
UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\sysmon");
bool symLinkCreated = false;
do {
UNICODE_STRING devName = RTL_CONSTANT_STRING(L"\\Device\\sysmon");
status = IoCreateDevice(DriverObject, 0, &devName,
FILE_DEVICE_UNKNOWN, 0, TRUE, &DeviceObject);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to create device (0x%08X)\n",
status));
break;
}
DeviceObject->Flags |= DO_DIRECT_IO;
status = IoCreateSymbolicLink(&symLink, &devName);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to create sym link (0x%08X)\n",
status));
break;
}
symLinkCreated = true;
// register for process notifications
status = PsSetCreateProcessNotifyRoutineEx(OnProcessNotify, FALSE);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to register process callback (0x%08X)\n",
status));
break;
}
status = PsSetCreateThreadNotifyRoutine(OnThreadNotify);
if (!NT_SUCCESS(status)) {
KdPrint((DRIVER_PREFIX "failed to set thread callbacks (status=%08X)\n", status)\
);
break;
}
} while (false);
if (!NT_SUCCESS(status)) {
if (symLinkCreated)
IoDeleteSymbolicLink(&symLink);
if (DeviceObject)
IoDeleteDevice(DeviceObject);
}
DriverObject->DriverUnload = SysMonUnload;
DriverObject->MajorFunction[IRP_MJ_CREATE] =
DriverObject->MajorFunction[IRP_MJ_CLOSE] = SysMonCreateClose;
DriverObject->MajorFunction[IRP_MJ_READ] = SysMonRead;
return status;
}
now this function works well but i'm pretty sure the problem lies around here.
here is the SysMonCrearteClose function which also fails (I must say that I don't fully understand what I need to code in this function however even without calling it the driver failes, I'm sure there is a problem in here as well):
NTSTATUS SysMonCreateClose(_In_ PDEVICE_OBJECT DeviceObject, _In_ PIRP Irp) {
UNREFERENCED_PARAMETER(DeviceObject);
Irp->IoStatus.Status = STATUS_SUCCESS;
Irp->IoStatus.Information = 0;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
return STATUS_SUCCESS;
}
now here is the SysMonUnload function:
void SysMonUnload(PDRIVER_OBJECT DriverObject) {
// unregister process notifications
PsSetCreateProcessNotifyRoutineEx(OnProcessNotify, TRUE);
UNICODE_STRING symLink = RTL_CONSTANT_STRING(L"\\??\\sysmon");
IoDeleteSymbolicLink(&symLink);
IoDeleteDevice(DriverObject->DeviceObject);
// free remaining items
while (!IsListEmpty(&g_Globals.ItemsHead)) {
auto entry = RemoveHeadList(&g_Globals.ItemsHead);
ExFreePool(CONTAINING_RECORD(entry, FullItem<ItemHeader>, Entry));
}
}
Here is the last function that matters, the SysMonRead function:
NTSTATUS SysMonRead(PDEVICE_OBJECT, PIRP Irp) {
UNREFERENCED_PARAMETER(Irp);
auto stack = IoGetCurrentIrpStackLocation(Irp);
auto len = stack->Parameters.Read.Length;
auto status = STATUS_SUCCESS;
auto count = 0;
NT_ASSERT(Irp->MdlAddress); // we're using Direct I/O
auto buffer = (UCHAR*)MmGetSystemAddressForMdlSafe(Irp->MdlAddress,
NormalPagePriority);
if (!buffer) {
status = STATUS_INSUFFICIENT_RESOURCES;
}
else {
AutoLock lock(g_Globals.Mutex);
while (true) {
if (IsListEmpty(&g_Globals.ItemsHead)) // can also check g_Globals.ItemCount
break;
auto entry = RemoveHeadList(&g_Globals.ItemsHead);
auto info = CONTAINING_RECORD(entry, FullItem<ItemHeader>, Entry);
auto size = info->Data.Size;
if (len < size) {
// user's buffer is full, insert item back
InsertHeadList(&g_Globals.ItemsHead, entry);
break;
}
g_Globals.ItemCount--;
::memcpy(buffer, &info->Data, size);
len -= size;
buffer += size;
count += size;
// free data after copy
ExFreePool(info);
}
}
Irp->IoStatus.Status = status;
Irp->IoStatus.Information = count;
IoCompleteRequest(Irp, 0);
return status;
}
throughout the program I allocate memory that is freed in the unload routine and open handles to files and mutexes which I also close. Whenever I load this driver the functionality intended is working properly (when I write to a file it works, the CreateClose function does not work) however when I unload the driver the system crashes. I used windbg to try and see what the issue is to no avail, here is the error message:
*** Fatal System Error: 0x000000ce
(0xFFFFF80260A71390,0x0000000000000010,0xFFFFF80260A71390,0x0000000000000000)
Driver at fault: ProcessThreadMonitorDriver.sys.
Break instruction exception - code 80000003 (first chance)
nt_exe!DbgBreakPointWithStatus:
I noticed sometimes it said something about an invalid memory access but it was inconsistent and unclear.

Irp information return PVOID

I'm writing Windows kernel driver in C++ and I have to return PVOID which has information about address in memory. Unfortunately, Irp->IoStatus.Information is only able to handle ULONG which results in shortened address for example: 0x2e341990000 is shortened to 0x41990000. It is very important to keep the address full otherwise user mode client would not be able to find address in memory. Is there any way to return full PVOID to client?
Driver code:
NTSTATUS status = STATUS_SUCCESS;
ULONG bytesIO = 0;
auto stack = IoGetCurrentIrpStackLocation(Irp);
switch (stack->Parameters.DeviceIoControl.IoControlCode)
{
case IOCTL_SHELL:
{
auto len = stack->Parameters.DeviceIoControl.InputBufferLength;
if (len < sizeof(Data))
{
DbgPrint("[-] Received too small buffer\n");
status = STATUS_BUFFER_TOO_SMALL;
break;
}
auto data = (Data*)stack->Parameters.DeviceIoControl.Type3InputBuffer;
if (data == nullptr)
{
DbgPrint("[-] Received empty buffer\n");
status = STATUS_INVALID_PARAMETER;
break;
}
PVOID buf = SetMemoryAddress(data);
bytesIO = (ULONG)buf; // Buffer is shortened here
DbgBreakPoint();
break;
}
}
Irp->IoStatus.Status = status;
Irp->IoStatus.Information = bytesIO;
IoCompleteRequest(Irp, IO_NO_INCREMENT);
Client code:
HANDLE hDevice;
BOOL success;
hDevice = CreateFile(L"\\\\.\\Driver", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, NULL);
if (hDevice == INVALID_HANDLE_VALUE)
return FALSE;
Data data;
// Fill data structure here
PVOID retn;
PVOID buffer = { 0 };
success = DeviceIoControl(hDevice, IOCTL_SHELL, &data, sizeof(data), NULL, 0, (LPDWORD)&retn, NULL);
printf("0x%x\n", retn); // Shortened address
return success;
I tried using buffered IOCTL methods.
Your retn variable is not a memory address it is the number of bytes returned from the DeviceIoControl call. It is also of double word size (32bits) not equivalent with a void pointer on a modern 64bit machine.
The output data is written into the fifth argument which is optional and you seem to have provided NULL to.
You might want to initialize the value at retn so you can see if it is changed by your DeviceIoControl call, even if you provide nowhere to write output to.
DWORD bufffer_size_ouf = 0;
PVOID retn = (PVOID)&buff_size_out;

How to accelerate C++ writing speed to the speed tested by CrystalDiskMark?

Now I get about 3.6GB data per second in memory, and I need to write them on my SSD continuously. I used CrystalDiskMark to test the writing speed of my SSD, it is almost 6GB per second, so I had thought this work should not be that hard.
![my SSD test result][1]:
[1]https://plus.google.com/u/0/photos/photo/106876803948041178149/6649598887699308850?authkey=CNbb5KjF8-jxJQ "test result":
My computer is Windows 10, using Visual Studio 2017 community.
I found this question and tried the highest voted answer. Unfortunately, the writing speed was only about 1s/GB for his option_2, far slower than tested by CrystalDiskMark. And then I tried memory mapping, this time writing becomes faster, about 630ms/GB, but still much slower. Then I tried multi-thread memory mapping, it seems that when the number of threads is 4, the speed was about 350ms/GB, and when I add the threads' number, the writing speed didn't go up anymore.
Code for memory mapping:
#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>
#include <thread>
#include <windows.h>
#include <sstream>
// Generate random data
std::vector<int> GenerateData(std::size_t bytes) {
assert(bytes % sizeof(int) == 0);
std::vector<int> data(bytes / sizeof(int));
std::iota(data.begin(), data.end(), 0);
std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
return data;
}
// Memory mapping
int map_write(int* data, int size, int id){
char* name = (char*)malloc(100);
sprintf_s(name, 100, "D:\\data_%d.bin",id);
HANDLE hFile = CreateFile(name, GENERIC_READ | GENERIC_WRITE, 0, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);//
if (hFile == INVALID_HANDLE_VALUE){
return -1;
}
Sleep(0);
DWORD dwFileSize = size;
char* rname = (char*)malloc(100);
sprintf_s(rname, 100, "data_%d.bin", id);
HANDLE hFileMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, dwFileSize, rname);//create file
if (hFileMap == NULL) {
CloseHandle(hFile);
return -2;
}
PVOID pvFile = MapViewOfFile(hFileMap, FILE_MAP_WRITE, 0, 0, 0);//Acquire the address of file on disk
if (pvFile == NULL) {
CloseHandle(hFileMap);
CloseHandle(hFile);
return -3;
}
PSTR pchAnsi = (PSTR)pvFile;
memcpy(pchAnsi, data, dwFileSize);//memery copy
UnmapViewOfFile(pvFile);
CloseHandle(hFileMap);
CloseHandle(hFile);
return 0;
}
// Multi-thread memory mapping
void Mem2SSD_write(int* data, int size){
int part = size / sizeof(int) / 4;
int index[4];
index[0] = 0;
index[1] = part;
index[2] = part * 2;
index[3] = part * 3;
std::thread ta(map_write, data + index[0], size / 4, 10);
std::thread tb(map_write, data + index[1], size / 4, 11);
std::thread tc(map_write, data + index[2], size / 4, 12);
std::thread td(map_write, data + index[3], size / 4, 13);
ta.join();
tb.join();
tc.join();
td.join();
}
//Test:
int main() {
const std::size_t kB = 1024;
const std::size_t MB = 1024 * kB;
const std::size_t GB = 1024 * MB;
for (int i = 0; i < 10; ++i) {
std::vector<int> data = GenerateData(1 * GB);
auto startTime = std::chrono::high_resolution_clock::now();
Mem2SSD_write(&data[0], 1 * GB);
auto endTime = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
std::cout << "1G writing cost: " << duration << " ms" << std::endl;
}
system("pause");
return 0;
}
So I'd like to ask, is there any faster writing method for C++ to writing huge files? Or, why can't I write as fast as tested by CrystalDiskMark? How does CrystalDiskMark write?
Any help would be greatly appreciated. Thank you!
first of all this is not c++ question but os related question. for get maximum performance need need use os specific low level api call, which not exist in general c++ libs. from your code clear visible that you use windows api, so search solution for windows how minimum.
from CreateFileW function:
When FILE_FLAG_NO_BUFFERING is combined with FILE_FLAG_OVERLAPPED,
the flags give maximum asynchronous performance, because the I/O does
not rely on the synchronous operations of the memory manager.
so we need use combination of this 2 flags in call CreateFileW or FILE_NO_INTERMEDIATE_BUFFERING in call NtCreateFile
also extend file size and valid data length take some time, so better if final file at begin is known - just set file final size via NtSetInformationFile with FileEndOfFileInformation
or via SetFileInformationByHandle with FileEndOfFileInfo. and then set valid data length with SetFileValidData or via NtSetInformationFile with FileValidDataLengthInformation. set valid data length require SE_MANAGE_VOLUME_NAME privilege enabled when opening a file initially (but not when call SetFileValidData)
also look for file compression - if file compressed (it will be compressed by default if created in compressed folder) this is very slow writting. so need disbale file compression via FSCTL_SET_COMPRESSION
then when we use asynchronous I/O (fastest way) we not need create several dedicated threads. instead we need determine number of I/O requests run in concurrent. if you use CrystalDiskMark it actually run CdmResource\diskspd\diskspd64.exe for test and this is coresponded to it -o<count> parameter (run diskspd64.exe /? > h.txt for look parameters list).
use non Buffering I/O make task more hard, because exist 3 additional requirements:
Any ByteOffset passed to WriteFile must be a multiple of the sector
size.
The Length passed to WriteFile must be an integral of the sector
size
Buffers must be aligned in accordance with the alignment requirement
of the underlying device. To obtain this information, call
NtQueryInformationFile with FileAlignmentInformation
or GetFileInformationByHandleEx with FileAlignmentInfo
in most situations, page-aligned memory will also be sector-aligned,
because the case where the sector size is larger than the page size is
rare.
so almost always buffers allocated with VirtualAlloc function and multiple page size (4,096 bytes ) is ok. in concrete test for smaller code size i use this assumption
struct WriteTest
{
enum { opCompression, opWrite };
struct REQUEST : IO_STATUS_BLOCK
{
WriteTest* pTest;
ULONG opcode;
ULONG offset;
};
LONGLONG _TotalSize, _BytesLeft;
HANDLE _hFile;
ULONG64 _StartTime;
void* _pData;
REQUEST* _pRequests;
ULONG _BlockSize;
ULONG _ConcurrentRequestCount;
ULONG _dwThreadId;
LONG _dwRefCount;
WriteTest(ULONG BlockSize, ULONG ConcurrentRequestCount)
{
if (BlockSize & (BlockSize - 1))
{
__debugbreak();
}
_BlockSize = BlockSize, _ConcurrentRequestCount = ConcurrentRequestCount;
_dwRefCount = 1, _hFile = 0, _pRequests = 0, _pData = 0;
_dwThreadId = GetCurrentThreadId();
}
~WriteTest()
{
if (_pData)
{
VirtualFree(_pData, 0, MEM_RELEASE);
}
if (_pRequests)
{
delete [] _pRequests;
}
if (_hFile)
{
NtClose(_hFile);
}
PostThreadMessageW(_dwThreadId, WM_QUIT, 0, 0);
}
void Release()
{
if (!InterlockedDecrement(&_dwRefCount))
{
delete this;
}
}
void AddRef()
{
InterlockedIncrementNoFence(&_dwRefCount);
}
void StartWrite()
{
IO_STATUS_BLOCK iosb;
FILE_VALID_DATA_LENGTH_INFORMATION fvdl;
fvdl.ValidDataLength.QuadPart = _TotalSize;
NTSTATUS status;
if (0 > (status = NtSetInformationFile(_hFile, &iosb, &_TotalSize, sizeof(_TotalSize), FileEndOfFileInformation)) ||
0 > (status = NtSetInformationFile(_hFile, &iosb, &fvdl, sizeof(fvdl), FileValidDataLengthInformation)))
{
DbgPrint("FileValidDataLength=%x\n", status);
}
ULONG offset = 0;
ULONG dwNumberOfBytesTransfered = _BlockSize;
_BytesLeft = _TotalSize + dwNumberOfBytesTransfered;
ULONG ConcurrentRequestCount = _ConcurrentRequestCount;
REQUEST* irp = _pRequests;
_StartTime = GetTickCount64();
do
{
irp->opcode = opWrite;
irp->pTest = this;
irp->offset = offset;
offset += dwNumberOfBytesTransfered;
DoWrite(irp++);
} while (--ConcurrentRequestCount);
}
void FillBuffer(PULONGLONG pu, LONGLONG ByteOffset)
{
ULONG n = _BlockSize / sizeof(ULONGLONG);
do
{
*pu++ = ByteOffset, ByteOffset += sizeof(ULONGLONG);
} while (--n);
}
void DoWrite(REQUEST* irp)
{
LONG BlockSize = _BlockSize;
LONGLONG BytesLeft = InterlockedExchangeAddNoFence64(&_BytesLeft, -BlockSize) - BlockSize;
if (0 < BytesLeft)
{
LARGE_INTEGER ByteOffset;
ByteOffset.QuadPart = _TotalSize - BytesLeft;
PVOID Buffer = RtlOffsetToPointer(_pData, irp->offset);
FillBuffer((PULONGLONG)Buffer, ByteOffset.QuadPart);
AddRef();
NTSTATUS status = NtWriteFile(_hFile, 0, 0, irp, irp, Buffer, BlockSize, &ByteOffset, 0);
if (0 > status)
{
OnComplete(status, 0, irp);
}
}
else if (!BytesLeft)
{
// write end
ULONG64 time = GetTickCount64() - _StartTime;
WCHAR sz[64];
StrFormatByteSizeW((_TotalSize * 1000) / time, sz, RTL_NUMBER_OF(sz));
DbgPrint("end:%S\n", sz);
}
}
static VOID NTAPI _OnComplete(
_In_ NTSTATUS status,
_In_ ULONG_PTR dwNumberOfBytesTransfered,
_Inout_ PVOID Ctx
)
{
reinterpret_cast<REQUEST*>(Ctx)->pTest->OnComplete(status, dwNumberOfBytesTransfered, reinterpret_cast<REQUEST*>(Ctx));
}
VOID OnComplete(NTSTATUS status, ULONG_PTR dwNumberOfBytesTransfered, REQUEST* irp)
{
if (0 > status)
{
DbgPrint("OnComplete[%x]: %x\n", irp->opcode, status);
}
else
switch (irp->opcode)
{
default:
__debugbreak();
case opCompression:
StartWrite();
break;
case opWrite:
if (dwNumberOfBytesTransfered == _BlockSize)
{
DoWrite(irp);
}
else
{
DbgPrint(":%I64x != %x\n", dwNumberOfBytesTransfered, _BlockSize);
}
}
Release();
}
NTSTATUS Create(POBJECT_ATTRIBUTES poa, ULONGLONG size)
{
if (!(_pRequests = new REQUEST[_ConcurrentRequestCount]) ||
!(_pData = VirtualAlloc(0, _BlockSize * _ConcurrentRequestCount, MEM_COMMIT, PAGE_READWRITE)))
{
return STATUS_INSUFFICIENT_RESOURCES;
}
ULONGLONG sws = _BlockSize - 1;
LARGE_INTEGER as;
_TotalSize = as.QuadPart = (size + sws) & ~sws;
HANDLE hFile;
IO_STATUS_BLOCK iosb;
NTSTATUS status = NtCreateFile(&hFile,
DELETE|FILE_GENERIC_READ|FILE_GENERIC_WRITE&~FILE_APPEND_DATA,
poa, &iosb, &as, 0, 0, FILE_OVERWRITE_IF,
FILE_NON_DIRECTORY_FILE|FILE_NO_INTERMEDIATE_BUFFERING, 0, 0);
if (0 > status)
{
return status;
}
_hFile = hFile;
if (0 > (status = RtlSetIoCompletionCallback(hFile, _OnComplete, 0)))
{
return status;
}
static USHORT cmp = COMPRESSION_FORMAT_NONE;
REQUEST* irp = _pRequests;
irp->pTest = this;
irp->opcode = opCompression;
AddRef();
status = NtFsControlFile(hFile, 0, 0, irp, irp, FSCTL_SET_COMPRESSION, &cmp, sizeof(cmp), 0, 0);
if (0 > status)
{
OnComplete(status, 0, irp);
}
return status;
}
};
void WriteSpeed(POBJECT_ATTRIBUTES poa, ULONGLONG size, ULONG BlockSize, ULONG ConcurrentRequestCount)
{
BOOLEAN b;
NTSTATUS status = RtlAdjustPrivilege(SE_MANAGE_VOLUME_PRIVILEGE, TRUE, FALSE, &b);
if (0 <= status)
{
status = STATUS_INSUFFICIENT_RESOURCES;
if (WriteTest * pTest = new WriteTest(BlockSize, ConcurrentRequestCount))
{
status = pTest->Create(poa, size);
pTest->Release();
if (0 <= status)
{
MessageBoxW(0, 0, L"Test...", MB_OK|MB_ICONINFORMATION);
}
}
}
}
These are the suggestions that come to my mind:
stop all running processes that are using the disk, in particular
disable Windows Defender realtime protection (or other anti virus/malware)
disable pagefile
use Windows Resource Monitor to find processes reading or writing to your disk
make sure you write continuous sectors on disk
don't take into account file opening and closing times
do not use multithreading (your disk is using DMA so the CPU won't matter)
write data that is in RAM (obviously)
be sure to disable all debugging features when building (build a release)
if using M.2 PCIe disk (seems to be your case) make sure other PCIe
devices aren't stealing PCIe lanes to your disk (the CPU has a
limited number AND mobo too)
don't run the test from your IDE
disable Windows file indexing
Finally, you can find good hints on how to code fast writes in C/C++ in this question's thread: Writing a binary file in C++ very fast
One area that might give you improvement is to have your threads running constantly and each reading from a queue.
At the moment every time you go to write you spawn 4 threads (which is slow) and then they're deconstructed at the end of the function. You'll see a speedup of at least the cpu time of your function if you spawn the threads at the start and have them all reading from separate queue's in an infinite loop.
They'll simply check after a SMALL delay if there's anything in their queue, if their is they'll write it all. Your only issue then is making sure order of data is maintained.

How to tell if a virtual memory page has been locked?

Say, if at some point a range of the virtual memory in my process was locked as such:
//Memory was reserved & committed as such
void* pMem = ::VirtualAlloc(NULL, 4096, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
//...
//And then
::VirtualLock(pMem, 4096);
So having an arbitrary address of a page in a virtual memory in my process, can I tell if it was locked?
by using win32 api this is impossible. but if use ZwQueryVirtualMemory with undocumented MEMORY_INFORMATION_CLASS values - this is possible. for data structures definition - see ntmmapi.h
we need use MemoryWorkingSetExInformation with MEMORY_WORKING_SET_EX_BLOCK and use ULONG_PTR Locked : 1; member
demo test:
NTSTATUS IsPageLocked(PVOID BaseAddress, BOOLEAN& Locked)
{
MEMORY_WORKING_SET_EX_INFORMATION mwsei = { BaseAddress };
NTSTATUS status = ZwQueryVirtualMemory(NtCurrentProcess(), 0, MemoryWorkingSetExInformation, &mwsei, sizeof(mwsei), 0);
if (0 <= status)
{
if (mwsei.VirtualAttributes.Valid)
{
Locked = mwsei.VirtualAttributes.Locked;
}
else
{
status = STATUS_INVALID_ADDRESS;
}
}
return status;
}
NTSTATUS IsPageLockedEx(PVOID BaseAddress, BOOLEAN& Locked)
{
NTSTATUS status = IsPageLocked(BaseAddress, Locked);
if (0 > status)
{
DbgPrint("IsPageLocked - error %x\n", status);
}
else
{
DbgPrint("IsPageLocked = %x\n", Locked);
}
return status;
}
void CheckVA()
{
if (PVOID pv = VirtualAlloc(NULL, PAGE_SIZE, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE))
{
BOOLEAN Locked;
NTSTATUS status = IsPageLockedEx(pv, Locked);
if (status == STATUS_INVALID_ADDRESS)
{
// at this point physical memory for pv not yet committed. reference memory for commit physical page
if (*(PBYTE)pv) __nop();
status = IsPageLockedEx(pv, Locked);
}
if (VirtualLock(pv, PAGE_SIZE))
{
IsPageLockedEx(pv, Locked);
if (VirtualUnlock(pv, PAGE_SIZE))
{
IsPageLockedEx(pv, Locked);
}
else
{
__debugbreak();
}
}
VirtualFree(pv, 0, MEM_RELEASE);
}
}
and DbgPrint output
IsPageLocked - error c0000141
IsPageLocked = 0
IsPageLocked = 1
IsPageLocked = 0
For anyone looking at this in the future, XP and later now include APIs that do this: QueryWorkingSet and QueryWorkingSetEx.
Look for the Locked bit in the PSAPI_WORKING_SET_EX_BLOCK structure, similar to the answer relying on ZwQueryVirtualMemory.

Windows Driver IOCTL Code Bluescreens / Crashes The Computer

I have a device driver which I utilize to read other process virtual memory from kernel space so I do not have to use functions like ReadProcessMemory or WriteProcessMemory.
This works fine when I use a structure as a medium to pass the arguments to the kernel via DeviceIoControl, but the driver crashes my computer when I use plain variables like an unsigned long.
Here is an example of perfectly working code
(Driver):
#define IO_KERNEL_READ_REQUEST CTL_CODE(FILE_DEVICE_UNKNOWN, 0x0701, METHOD_BUFFERED, FILE_SPECIAL_ACCESS)
typedef struct _KERNEL_READ_REQUEST
{
ULONG ProcessId;
ULONG Address;
ULONG Response;
ULONG Size;
} KERNEL_READ_REQUEST, *PKERNEL_READ_REQUEST;
if (ControlCode == IO_KERNEL_READ_REQUEST)
{
PKERNEL_READ_REQUEST ReadInput = (PKERNEL_READ_REQUEST)Irp->AssociatedIrp.SystemBuffer;
PKERNEL_READ_REQUEST ReadOutput = (PKERNEL_READ_REQUEST)Irp->AssociatedIrp.SystemBuffer;
PEPROCESS Process;
PsLookupProcessByProcessId(ReadInput->ProcessId, &Process);
KeReadVirtualMemory(Process, ReadInput->Address, &ReadOutput->Response, ReadInput->Size);
DbgPrintEx(0, 0, "Read Params: %lu, %#010x \n", ReadInput->ProcessId, ReadInput->Address);
DbgPrintEx(0, 0, "Value: %lu \n", ReadOutput->Response);
status = STATUS_SUCCESS;
bytesIO = sizeof(KERNEL_READ_REQUEST);
}
(Program):
template <typename type>
type KernelRead(HANDLE hDriver, ULONG ProcessId, ULONG ReadAddress, SIZE_T ReadSize)
{
if (hDriver == INVALID_HANDLE_VALUE)
return (type)false;
DWORD Return;
DWORD Bytes;
KERNEL_READ_REQUEST ReadRequest;
ReadRequest.ProcessId = ProcessId;
ReadRequest.Address = ReadAddress;
ReadRequest.Size = ReadSize;
if (DeviceIoControl(hDriver, IO_KERNEL_READ_REQUEST, &ReadRequest, sizeof(ReadRequest),
&ReadRequest, sizeof(ReadRequest), &Bytes, NULL)) {
return (type)ReadRequest.Response;
}
else
return (type)false;
}
This is what causes the problem
#define IO_KERNEL_GET_ID CTL_CODE(FILE_DEVICE_UNKNOWN, 0x0703, METHOD_BUFFERED, FILE_SPECIAL_ACCESS)
else if (ControlCode == IO_KERNEL_GET_ID)
{
// ProcessId is an ULONG initialized at the driver entry
PULONG OutPut = (PULONG)Irp->AssociatedIrp.SystemBuffer;
OutPut = &ProcessId;
DbgPrintEx(0, 0, "Kernel Get Id: %d \n", *OutPut);
status = STATUS_SUCCESS;
bytesIO = sizeof(OutPut);
}
DWORD KernelGetProcessId(HANDLE hDriver)
{
if (hDriver == INVALID_HANDLE_VALUE)
return false;
ULONG Id;
if (DeviceIoControl(hDriver, IO_KERNEL_GET_ID, &, sizeof(Id),
&Id, sizeof(Id), 0, NULL))
return Id;
else
return false;
}
Calling KernelGetProcessId crashes my driver and the whole computer, how can this be fixed? What am I doing wrong here?
Check DeviceIoControl. If lpOverlapped is NULL, lpBytesReturned cannot be NULL. Even when an operation returns no output data and lpOutBuffer is NULL, DeviceIoControl makes use of lpBytesReturned. After such an operation, the value of lpBytesReturned is meaningless.
Maybe it can be one of the reason.
Other issue which I an see is passing only &, you should pass ULONG variable in it.
Check some thing like this
DWORD KernelGetProcessId(HANDLE hDriver)
{
if (hDriver == INVALID_HANDLE_VALUE)
return false;
ULONG Id;
DWORD Bytes;
if (DeviceIoControl(hDriver, IO_KERNEL_GET_ID, &Id, sizeof(Id),
&Id, sizeof(Id), &Bytes, NULL))
return Id;
else
return false;
}
The Issue:
Apparently the IOCTLs work by using the stack of both the
driver & the user program. To my understanding the stack of the
function calling DeviceIoControl() is copied to kernel space
and then dissected using the arguments of DeviceIoControl()
to know which stack variables we are working with & the buffer is
finally set to Irp->AssociatedIrp.SystemBuffer.
After the IOCTL operation is finished on the kernel side,
IoCompleteRequest() is made which copies the stack of the
kernel module to userspace which then is again dissected to
the form we want it in.
(please correct me if I am wrong)
The Solution:
The crash is caused by this code in the kernel module:
PULONG OutPut = (PULONG)Irp->AssociatedIrp.SystemBuffer;
OutPut = &ProcessId;
in which the global variable's address is set as the value of the output data.
Now when the stack is copied the address obviously does not point anywhere since
the "ProcessId" variable resides in 64 bit kernel space. This is my understanding of the problem.
This fixes the problem:
PULONG OutPut = (PULONG)Irp->AssociatedIrp.SystemBuffer;
*OutPut = ProcessId;
DbgPrintEx(0, 0, "Kernel Get Id: %d \n", *OutPut);
status = STATUS_SUCCESS;
bytesIO = sizeof(OutPut);