Read access violation using lock xcmpchg16b _InterlockedCompareExchange128 - c++

I'm trying to hook a function with a lock xcmpchg16b. I have tried about 20 different things.
Expected result:
In real func
In hook func
Result in Debug build:
Exception thrown at 0x..: 0xC0000005 Access violation reading 0xFFFFFFFFFFFFFFFF
I'm not sure why it is trying to read from 0xFFFFFFFFFFFFFFFF, none of the pointers go there.
In a Release build, it doesn't crash! But it doesn't hook the function either.
Source:
#include <stdio.h>
#include <Windows.h>
int RealFunc()
{
printf("In real func\n");
return 2;
}
int HookFunc()
{
printf("In hook func\n");
return 1;
}
int main()
{
DWORD dwOld;
if (!VirtualProtect(&RealFunc, 0x1000, PAGE_EXECUTE_READWRITE, &dwOld))
{
printf("Unable to make mem RWX.\n");
return 0;
}
RealFunc();
__declspec(align(16)) PVOID ProcAddress = &RealFunc;
__declspec(align(16)) LONG64 Restore[2];
Restore[0] = 0x0000000025ff9090; // nop, nop, jmp [rip + 0]
Restore[1] = (LONG64)&HookFunc;
_InterlockedCompareExchange128((LONG64*)ProcAddress, Restore[0], Restore[1], Restore);
RealFunc();
system("PAUSE");
return 0;
}
Here is the function documentation: https://msdn.microsoft.com/en-us/library/windows/desktop/hh972640(v=vs.85).aspx

_InterlockedCompareExchange128((LONG64*)ProcAddress,
Restore[0], Restore[1], Restore);
this is of course wrong. if look for function signature
unsigned char __cdecl InterlockedCompareExchange128(
_Inout_ LONGLONG volatile *Destination,
_In_ LONGLONG ExchangeHigh,
_In_ LONGLONG ExchangeLow,
_Inout_ LONGLONG *ComparandResult
);
second operand is ExchangeHigh and third is ExchangeLow - so must be Restore[1], Restore[0] but not Restore[0], Restore[1]. also ComparandResult must hold original function data. so it can not be Restore.
also note next, from MSDN:
The parameters for this function must be aligned on a 16-byte
boundary; otherwise, the function will behave unpredictably on x64
systems.
but which parameters ? all ? obvious that no. for example ExchangeHigh and ExchangeLow is passed by value. we can use direct values here which at all have no any address. so speak about align is senseless for second and third params. really InterlockedCompareExchange128 is converted to lock cmpxchg16b instruction. from intel manual
Note that CMPXCHG16B requires that the destination (memory) operand be
16-byte aligned.
so only Destination must be 16-byte aligned. ComparandResult - not (it will be moved to RCX:RBX register pairs)
so __declspec(align(16)) LONG64 Restore[2]; you not need at all - you can pass direct values to InterlockedCompareExchange128. then
__declspec(align(16)) PVOID ProcAddress = &RealFunc;
with _InterlockedCompareExchange128((LONG64*)ProcAddress..
wrong and senseless. what is different which align of ProcAddress ?? the memory to which point ProcAddress must be 16 byte aligned. but not ProcAddress itself. and again we not need any temporary variable here. we can direct use
_InterlockedCompareExchange128((LONG64*)RealFunc, ...)
of course RealFunc must be be 16-byte aligned. otherwise we got exactly 0xC0000005 Access violation reading 0xFFFFFFFFFFFFFFFF exception.
so i guess that in debug mode RealFunc not 16 byte aligned.
In a Release build, it doesn't crash! But it doesn't hook the function
either.
not hook because you use Restore in place ComparandResult and no exception because RealFunc was randomly 16byte align.
because in general function can have any address and must not be aligned on 16 bytes - _InterlockedCompareExchange128 not useful at all here. also this is only for x64, not for x86
code (which anyway not hook function, if RealFunc not aligned on 16 bytes)can look like
int RealFunc()
{
printf("In real func\n");
return 2;
}
int HookFunc()
{
printf("In hook func\n");
return 1;
}
int xxx()
{
DWORD dwOld;
if (VirtualProtect(RealFunc, 2*sizeof(PVOID), PAGE_EXECUTE_READWRITE, &dwOld))
{
RealFunc();
#if defined(_M_X64)
if (!((LONG_PTR)RealFunc & 15))
{
LONG64 Comparand[2] = { ((LONG64*)RealFunc)[0], ((LONG64*)RealFunc)[1] };
InterlockedCompareExchange128((LONG64*)RealFunc, (LONG64)HookFunc, 0x0000000025ff9090, Comparand);
}
else
{
printf("bad function address %p\n", RealFunc);
}
#elif defined(_M_IX86)
static PVOID pvHookFunc = HookFunc;
LARGE_INTEGER Exchange = { 0x25ff9090, (LONG)&pvHookFunc };
LONG64 Comparand;
memcpy(&Comparand, RealFunc, sizeof(Comparand));
InterlockedCompareExchange64((LONG64*)RealFunc, Exchange.QuadPart, Comparand);
#else
#error not implemented
#endif
FlushInstructionCache(NtCurrentProcess(), RealFunc, 2*sizeof(PVOID));
if (dwOld != PAGE_EXECUTE_READWRITE)
{
VirtualProtect(RealFunc, 2*sizeof(PVOID), dwOld, &dwOld);
}
RealFunc();
}
return 0;
}

Related

Getting BSOD After When Trying To Print The Next Process Name In LIST_ENTRY List

I am trying to print the next process name after my process in the LIST_ENTRY list.
But I am always getting BSOD.
#include <Ntifs.h>
#include <ntddk.h>
#include <WinDef.h>
void SampleUnload(_In_ PDRIVER_OBJECT DriverObject) {
UNREFERENCED_PARAMETER(DriverObject);
DbgPrint("Sample driver Unload called\n");
}
extern "C"
NTSTATUS
DriverEntry(_In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath) {
UNREFERENCED_PARAMETER(RegistryPath);
DriverObject->DriverUnload = SampleUnload;
DbgPrint("Sample driver Load called\n");
PEPROCESS EP;
if (::PsLookupProcessByProcessId(::PsGetCurrentProcessId(), &EP) == STATUS_INVALID_PARAMETER) {
DbgPrint("Can't get EPROCESS");
return STATUS_INVALID_PARAMETER;
}
LIST_ENTRY list_entry = *((LIST_ENTRY*)(LPBYTE)EP + 0x448);
UCHAR* fileName;
fileName = ((UCHAR*)(LPBYTE)list_entry.Flink - 0x448 + 0x5a8);
for (int i = 0; i < 15; i++)
DbgPrint("%u" , fileName[i]);
DbgPrint("Finish");
return STATUS_SUCCESS;
}
In the EPROCESS structure there is a LIST_ENTRY object in the 0x448 offset.
So I created a LIST_ENTRY object and assign him to the address of the EPROCESS + 0x448 and than I add 0x5a8-0x448 to the LIST_ENTRY.FLINK.
That suppose to get to ImageFileName array in the 0x5a8 offset.
But It doesn't working from some reason.
You have multiple nonsensical casts in the code:
*((LIST_ENTRY*)(LPBYTE)EP + 0x448);
and
((UCHAR*)(LPBYTE)list_entry.Flink - 0x448 + 0x5a8);
This means for example convert EP (not the address of it) to a byte pointer, then (since casts have right-to-left associativity) immediately forget about converting to a byte pointer and instead convert to a LIST_ENTRY*. Then when done hesitating about which type to use, perform pointer arithmetic on LIST_ENTRY objects. That is 0x448 * sizeof(LIST_ENTRY) bytes.
I guess you actually meant to do this:
LPBYTE lpb = (LPBYTE)&EP + 0x448;
LIST_ENTRY list_entry = *(LIST_ENTRY*)lpb;
That's possibly a strict aliasing violation bug though. Or an alignment bug.

How to trap stack overflow in a Windows x64 C++ application

I am trying to compile an application to x64 platform architecture in Windows. A couple of threads, handling the parsing of a scripting language, uses this code recommended by Microsoft to trap stack overflows and avoid access violation exceptions:
__try
{
DoSomethingThatMightUseALotOfStackMemory();
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
LPBYTE lpPage;
static SYSTEM_INFO si;
static MEMORY_BASIC_INFORMATION mi;
static DWORD dwOldProtect;
// Get page size of system
GetSystemInfo(&si);
// Find SP address
_asm mov lpPage, esp;
// Get allocation base of stack
VirtualQuery(lpPage, &mi, sizeof(mi));
// Go to page beyond current page
lpPage = (LPBYTE)(mi.BaseAddress)-si.dwPageSize;
// Free portion of stack just abandoned
if (!VirtualFree(mi.AllocationBase,
(LPBYTE)lpPage - (LPBYTE)mi.AllocationBase,
MEM_DECOMMIT))
{
exit(1);
}
// Reintroduce the guard page
if (!VirtualProtect(lpPage, si.dwPageSize,
PAGE_GUARD | PAGE_READWRITE,
&dwOldProtect))
{
exit(1);
}
Sleep(2000);
}
Unfortunately it uses one line of inline assembler to get the stack pointer. Visual Studio does not support inline assembly for x64 mode and I can't find a compiler intrinsic for getting the stack pointer neither.
Is it possible to do this in a x64 friendly manner?
As pointed out in a comment to the question, the whole "hack" above can be replaced by the _resetstkoflw function. This works fine in both x86 and x64 mode.
The code snippet above then becomes:
// Filter for the stack overflow exception. This function traps
// the stack overflow exception, but passes all other exceptions through.
int stack_overflow_exception_filter(int exception_code)
{
if (exception_code == EXCEPTION_STACK_OVERFLOW)
{
// Do not call _resetstkoflw here, because at this point
// the stack is not yet unwound. Instead, signal that the
// handler (the __except block) is to be executed.
return EXCEPTION_EXECUTE_HANDLER;
}
else
return EXCEPTION_CONTINUE_SEARCH;
}
void example()
{
int result = 0;
__try
{
DoSomethingThatMightUseALotOfStackMemory();
}
__except(stack_overflow_exception_filter(GetExceptionCode()))
{
// Here, it is safe to reset the stack.
result = _resetstkoflw();
}
// Terminate if _resetstkoflw failed (returned 0)
if (!result)
return 3;
return 0;
}

boost set name from string [duplicate]

Is it possible to give a name to a boost::thread so that the debuggers tables and the crash logs can be more readable? How?
You would need to access the underlying thread primitive and assign a name in a system dependent manner. Debugging and crash logs are inherently system dependent and boost::thread is more about non-system-dependency, i.e. about portability.
It seems ( http://www.boost.org/doc/libs/1_43_0/doc/html/thread.html ) that there is no documented way to access underlying system resources for a boost thread. (But I have never used it myself so I may miss something.)
Edit: (As David writes in the comment) http://www.boost.org/doc/libs/1_43_0/doc/html/thread/thread_management.html#thread.thread_management.thread.nativehandle
I'm using boost 1.50.0 on Win32 + VS2010 and thread::native_handle contains number which I didn't manage to pair to anything in system. On the other hand, the thread::get_id() method returns directly windows thread ID in form of a hexadecimal string. Notice that the value returned is platform specific, though. The following code does work under Boost 1.50.0 + Win32 + VS2010. Parts of code reused from msdn
const DWORD MS_VC_EXCEPTION = 0x406D1388;
#pragma pack(push, 8)
typedef struct THREADNAME_INFO {
DWORD dwType; // Must be 0x1000.
LPCSTR szName; // Pointer to name (in user addr space).
DWORD dwThreadID; // Thread ID (-1=caller thread).
DWORD dwFlags; // Reserved for future use, must be zero.
} THREADNAME_INFO;
#pragma pack(pop)
void _SetThreadName(DWORD threadId, const char* threadName) {
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = threadName;
info.dwThreadID = threadId;
info.dwFlags = 0;
__try {
RaiseException( MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(ULONG_PTR), (ULONG_PTR*)&info );
}
__except(EXCEPTION_EXECUTE_HANDLER) {
}
}
void SetThreadName(boost::thread::id threadId, std::string threadName) {
// convert string to char*
const char* cchar = threadName.c_str();
// convert HEX string to DWORD
unsigned int dwThreadId;
std::stringstream ss;
ss << std::hex << threadId;
ss >> dwThreadId;
// set thread name
_SetThreadName((DWORD)dwThreadId, cchar);
}
Call like this:
boost::thread* thr = new boost::thread(boost::bind(...));
SetThreadName(thr->get_id(), "MyName");
There is a proposal to add this to boost which has had a slow start:
https://github.com/boostorg/thread/issues/84

Convert assembly to machine code in C++

I seek for any lib or function to convert a string of assembly code to machine code,
like the following:
char asmString[] = {"mov eax,13H"};
byte[] output; // array of byte
output = asm2mach(asmString); // {0xB8, 0x13, 0x00, 0x00, 0x00}
The motivation is to inject machine code to call asm function in the program. This injection mainly has 3 steps: VirtualAllocEx, WriteProcessMemory and CreateRemoteThread. Here are the code:
bool injectAsm(const char* exeName,const byte* code, int size)
{
LPVOID allocAddr = NULL;
HANDLE ThreadProcess = NULL;
HANDLE hProcess = OpenProcessEasy(exeName);
allocAddr = VirtualAllocEx(hProcess, NULL, size, MEM_COMMIT, PAGE_READWRITE);
if(allocAddr){
if(WriteProcessMemory(hProcess, allocAddr, code, size, NULL)) {
ThreadProcess = CreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)allocAddr, NULL, 0, NULL);
WaitForSingleObject(ThreadProcess, INFINITE);
VirtualFreeEx(hProcess,allocAddr, 0, MEM_RELEASE);
CloseHandle(ThreadProcess);
return true;
}
}
if(allocAddr){
VirtualFreeEx(hProcess, allocAddr, 0, MEM_RELEASE);
}
return false;
}
int main()
{
byte code[] = {0xB8, 0x10, 0xED, 0x4A, 0x00, 0xFF, 0xD0, 0xC3, 0x90};
injectAsm("game.exe",code,sizeof(code));
system("pause");
return 0;
}
I would recommend using both AsmJit and AsmTK. AsmTK is a new project that uses AsmJit as a toolkit and adds additional functionality on top of it. NOTE that at the moment AsmTK requies asmjit:next branch to work, as this is a new functionality.
This is a minimal example that uses AsmTK to parse some asm:
#include <stdio.h>
#include <stdlib.h>
#include <asmtk/asmtk.h>
using namespace asmjit;
using namespace asmtk;
static const char someAsm[] =
"test eax, eax\n"
"jz L1\n"
"mov eax, ebx\n"
"mov eax, 0xFFFFFFFF\n"
"pand mm0, mm1\n"
"paddw xmm0, xmm1\n"
"vpaddw ymm0, ymm1, ymm7\n"
"vaddpd zmm0 {k1}{z}, zmm1, [rax] {1tox}\n"
"L1:\n";
int main(int argc, char* argv[]) {
CodeHolder code;
// Here select the target architecture - either X86 or X64.
code.init(CodeInfo(Arch::kTypeX64));
X86Assembler a(&code);
AsmParser p(&a);
Error err = p.parse(someAsm);
if (err) {
printf("ERROR: %0.8x (%s)\n", err, DebugUtils::errorAsString(err));
return 1;
}
// The machine-code is now stored in CodeHolder's first section:
code.sync();
CodeBuffer& buffer = code.getSection(0)->buffer;
// You can do whatever you need with the buffer:
uint8_t* data = buffer.data;
size_t length = buffer.length;
return 0;
}
AsmJit uses JitRuntime to allocate executable memory and to relocate the resulting machine code into it. However, AsmJit's virtual memory allocator can be created with hProcess, which could be your remote process handle, so it can also allocate memory of that process. Here is a small example of how it could be done:
bool addToRemote(HPROCESS hRemoteProcess, CodeHolder& code) {
// VMemMgr is AsmJit's low-level VM manager.
VMemMgr vm(hRemoteProcess);
// This will tell `vm` to not destroy allocated blocks when it
// gets destroyed.
vm.setKeepVirtualMemory(true);
// Okay, suppose we have the CodeHolder from previous example.
size_t codeSize = code.getCodeSize();
// Allocate a permanent memory of `hRemoteProcess`.
uint64_t remoteAddr = (uint64_t)
vm.alloc(codeSize, VMemMgr::kAllocPermanent);
// Temporary buffer for relocation.
uint8_t* tmp = ::malloc(code.getCodeSize());
if (!tmp) return false;
// First argument is where to relocate the code (it must be
// current's process memory), second argument is the base
// address of the relocated code - it's the remote process's
// memory. We need `tmp` as it will temporarily hold code
// that we want to write to the remote process.
code.relocate(tmp, remoteAddr);
// Now write to the remote process.
SIZE_T bytesWritten;
BOOL ok = WriteProcessMemory(
hRemoteProcess, (LPVOID)remoteMem, tmp, codeSize, &bytesWritten);
// Release temporary resources.
::free(tmp);
// Now the only thing needed is the CreateRemoteThread thingy...
return ok;
}
I would recommend wrapping such functionality into a new RemoteRuntime to make it a matter of a single function call, I can help with that if needed.
You should define what you really want:
Do you want to generate machine code at runtime? Then use some JIT compilation library like libgccjit, libjit, LLVM, GNU lightning, or asmjit. asmjit is a library emitting x86 machine code, probably what you need. There is no absolute need to use a string containing assembler code.
Or do you want to translate some assembler syntax (and there are several assembler syntaxes even for x86) to object code or machine code? Then you'll better run a real assembler as an external program. The produced object code will contain relocation directives, and you'll need something to handle these (e.g. a linker).
Alternative, you should consider generating some (e.g.) C code at runtime, then forking a compilation, and dynamically loading and using the resulting function at runtime (e.g. with dlopen(3) and dlsym). See this
Details are obviously operating system, ABI, and processor specific.
Here is a project that can convert a string of assembly code (Intel or ARM) into its corresponding bytes.
https://github.com/bsmt/Assembler
It's written in Objective-C, but the source is there. I hope this helps.

Give a name to a boost thread?

Is it possible to give a name to a boost::thread so that the debuggers tables and the crash logs can be more readable? How?
You would need to access the underlying thread primitive and assign a name in a system dependent manner. Debugging and crash logs are inherently system dependent and boost::thread is more about non-system-dependency, i.e. about portability.
It seems ( http://www.boost.org/doc/libs/1_43_0/doc/html/thread.html ) that there is no documented way to access underlying system resources for a boost thread. (But I have never used it myself so I may miss something.)
Edit: (As David writes in the comment) http://www.boost.org/doc/libs/1_43_0/doc/html/thread/thread_management.html#thread.thread_management.thread.nativehandle
I'm using boost 1.50.0 on Win32 + VS2010 and thread::native_handle contains number which I didn't manage to pair to anything in system. On the other hand, the thread::get_id() method returns directly windows thread ID in form of a hexadecimal string. Notice that the value returned is platform specific, though. The following code does work under Boost 1.50.0 + Win32 + VS2010. Parts of code reused from msdn
const DWORD MS_VC_EXCEPTION = 0x406D1388;
#pragma pack(push, 8)
typedef struct THREADNAME_INFO {
DWORD dwType; // Must be 0x1000.
LPCSTR szName; // Pointer to name (in user addr space).
DWORD dwThreadID; // Thread ID (-1=caller thread).
DWORD dwFlags; // Reserved for future use, must be zero.
} THREADNAME_INFO;
#pragma pack(pop)
void _SetThreadName(DWORD threadId, const char* threadName) {
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = threadName;
info.dwThreadID = threadId;
info.dwFlags = 0;
__try {
RaiseException( MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(ULONG_PTR), (ULONG_PTR*)&info );
}
__except(EXCEPTION_EXECUTE_HANDLER) {
}
}
void SetThreadName(boost::thread::id threadId, std::string threadName) {
// convert string to char*
const char* cchar = threadName.c_str();
// convert HEX string to DWORD
unsigned int dwThreadId;
std::stringstream ss;
ss << std::hex << threadId;
ss >> dwThreadId;
// set thread name
_SetThreadName((DWORD)dwThreadId, cchar);
}
Call like this:
boost::thread* thr = new boost::thread(boost::bind(...));
SetThreadName(thr->get_id(), "MyName");
There is a proposal to add this to boost which has had a slow start:
https://github.com/boostorg/thread/issues/84