Porting VC++ inline assembler to x64 (of a __stdcall hook) - c++

I need to port the inline assembler to be able to compile on x64.
I'm trying to get familiar with the x64 Intrinsics etc but I guess someone being into it could easily help me out.
void __stdcall Hook(P1, P2)
{
__asm pushad
static void* OriginalFunctionPointer =
GetProcAddress(GetModuleHandleA("Some.dll"), "[...]");
// [...]
__asm popad
__asm push (P2)
__asm push (P1)
__asm call (OriginalFunctionPointer)
}

seems you need a hooking library like this one(or this if you want a C++ API) along with a function proto, then no inline assembly is needed, in 32 or 64-bit mode. also, those pushad/popad's aren't needed when you are doing inline assembly.
typedef void (__stdcall*myfp)(int,int);
void __stdcall MyHook(int arg1, int arg2)
{
static myfp TheFP = (myfp)GetProcAddress(GetModuleHandleA("Some.dll"), "[...]");
//your extra code
TheFP(arg1,arg2);
}
of course the injection of this hook needs to take place somewhere else.
for hooking classes you need to account for the hidden this pointer (pDevice in this case):
#define D3D8FUNC(name,...) typedef HRESULT (__stdcall * name)(__VA_ARGS__)
D3D8FUNC(D3D8SetTexture,void* pDevice, DWORD dwStage, void* pTexture);
HRESULT __stdcall D3DSetTexture(void* pDevice, DWORD dwStage, void* pTexture)
{
LOG("[D3DSetTexture][0x%p] Device: 0x%p Stage: %u Texture: 0x%p\n",_ReturnAddress(),pDevice,dwStage,pTexture);
return Direct3D::gpfD3D8SetTexture(pDevice,dwStage,pTexture);
}
//in the init
Direct3D::gpfD3D8SetTexture = System::VirtualFunctionHook<Direct3D::D3D8SetTexture>(Direct3D::gpDevice,61,D3DSetTexture);

Related

make compiler copy function's code inside a other function passed as argument

My question is very specific, i want force compiler to take the code of a funtion and copy it inside a another one, like inline or __forceinline keywords can do, but i want pass the function i want to copy in the other funtion, as an argument. Here is a simple example.
using pFunc = void(*)();
void func_1() { /*some code*/ }
void func_2(pFunc function) { /*some code*/ } //after compile i want this funtion takes no argument and copy the func_1 inside this.
int main()
{
func_2(func_1);
}
so with this example the compiler will pass the pointer of func_1 as argunent to func_2, as expected.
I tried add inline keyword for func_1 and also tried to pass the argument of func_2 as reference, but compiler didn't copied the func_1 inside func_2.
Any idea how can i do that?
I use the compiler of visual studio(msvc) with toolset 2017(v141).
My project platform is x64.
You can use a noinline template function to get the asm you want
So you want the compiler to do constant-propagation into a clone of void func_2(pFunc f){ f(); }? Like what GCC might do with __attribute__((noinline)) but not noclone?
For example,
using pFunc = void(*)();
int sink, sink2;
#ifdef _MSC_VER
#define NOINLINE _declspec(noinline)
#else
#define NOINLINE __attribute__((noinline)) // noclone and/or noipa
#endif
__attribute__((always_inline)) // without this, gcc chooses to clone .constprop.0 with just a jmp func_2
void func_1() { sink = 1; sink2 = 2; }
NOINLINE static void func_2(pFunc function) { function(); }
int main()
{
func_2(func_1);
}
produces, with GCC11.3 -O2 or higher, or -O1 -fipa-cp, on Godbolt. (Clang is similar):
# GCC11 -O3 with C++ name demangling
func_1():
mov DWORD PTR sink[rip], 1
mov DWORD PTR sink2[rip], 2
ret
func_2(void (*)()) [clone .constprop.0]:
mov DWORD PTR sink[rip], 1
mov DWORD PTR sink2[rip], 2
ret
main:
# note no arg passed, calling a special version of the function
# specialized for function = func_1
call func_2(void (*)()) [clone .constprop.0]
xor eax, eax
ret
Of course if we hadn't disabled inlining of func_2, main would just call func_1. Or inline that body of func_1 into main and not do any calls.
MSVC might not be willing to do that "optimization", instead preferring to just inline func_2 into main as call func_1.
If you want to force it to make clunky asm that duplicates func_1 unnecessarily, you could use a template to do the same thing as constprop, taking the function pointer as a template arg, so you can instantiate func_2<func1> as a stand-alone non-inline function if you really want. (Perhaps with _declspec(noinline)).
Your func_2 can accept func_1 as an unused argument if you want.
using pFunc = void(*)();
int sink, sink2;
#ifdef _MSC_VER
#define NOINLINE _declspec(noinline)
#define ALWAYS_INLINE /* */
#else
#define NOINLINE __attribute__((noinline)) // not noclone or noipa, we *want* those to happen
#define ALWAYS_INLINE __attribute__((always_inline))
#endif
//ALWAYS_INLINE // Seems not needed for this case, with the template version
void func_1() { sink = 1; sink2 = 2; }
template <pFunc f>
NOINLINE void func_2() { f(); }
int main()
{
func_2<func_1>();
}
Compiles as desired with MSVC -O2 (Godbolt), and GCC/clang
int sink DD 01H DUP (?) ; sink
int sink2 DD 01H DUP (?) ; sink2
void func_2<&void func_1(void)>(void) PROC ; func_2<&func_1>, COMDAT
mov DWORD PTR int sink, 1 ; sink
mov DWORD PTR int sink2, 2 ; sink2
ret 0
void func_2<&void func_1(void)>(void) ENDP ; func_2<&func_1>
void func_1(void) PROC ; func_1, COMDAT
mov DWORD PTR int sink, 1 ; sink
mov DWORD PTR int sink2, 2 ; sink2
ret 0
void func_1(void) ENDP ; func_1
main PROC ; COMDAT
$LN4:
sub rsp, 40 ; 00000028H
call void func_2<&void func_1(void)>(void) ; func_2<&func_1>
xor eax, eax
add rsp, 40 ; 00000028H
ret 0
main ENDP
Note the duplicated bodies of func_1 and func_2.
You should check (with a disassembler) that the linker doesn't do identical code folding and just attach the both symbol names to one block of machine code.
I don't think this looks like much of an obfuscation technique; IDK why having a 2nd copy of a function with identical machine code would be a problem to reverse engineer. I guess it would maybe create more overall work, and people wouldn't notice that two calls to different functions are actually doing the same thing.
I mostly answered as an exercise in making a compiler spit out the asm I wanted it to, whether or not that has value to anyone else.
Obviously it only works for compile-time-constant function pointers; commenters have been discussing self-modifying code and scripting languages. If you wanted this for non-const function pointer args to func_1, you're completely out of luck in a language like C++ that's designed for strictly ahead-of-time compilation.

How to redefine c++ pointer function?

I have code from BASS lib.
#ifndef BASSDEF
#define BASSDEF(f) WINAPI f
#else
#define NOBASSOVERLOADS
#endif
HSAMPLE BASSDEF(BASS_SampleLoad)(BOOL mem, const void *file, QWORD offset, DWORD length, DWORD max, DWORD flags);
I need redefine BASSDEF to call dlsym function. How can i do this?
Update:
I using this on Android NDK (Linux) i loaded bass module via function dlopen and i need to make all functions point (here is original header file of bass lib https://pastebin.com/Z2Ty9UsY ) to this loaded module via dlsym function. I need this to call all functions (from JNI inside bass.so) module easily.
Actually, BASSDEF is not a function. It's macro which is known at compile time. So let's unwrap it ourselves:
HSAMPLE WINAPI BASS_SampleLoad(BOOL mem, const void *file, QWORD offset, DWORD length, DWORD max, DWORD flags);
Whoa, just function declaration here. Now "WINAPI" is basicly __stdcall call convention (Microsoft-specific). But, looking to BASS header you provided one can find for non-WIN32 systems:
#define WINAPI
Basicly, under Linux it's just a placeholder which expands to nothing. Now function declaration looks like this:
HSAMPLE BASS_SampleLoad(BOOL mem, const void *file, QWORD offset, DWORD length, DWORD max, DWORD flags);
What's next? You would like to find this function in some shared library via dlsym?
I assume you wants something like this:
// Declare a function pointer in C++11 style
using BASS_SampleLoad_FuncPtr = std::add_pointer<decltype(BASS_SampleLoad)>::type;
// Open library you wants
void* soHandle = dlopen("your_lib_here.so", RTLD_LAZY);
// Error check!
if (nullptr == soHandle) {
// Fail here
}
// Finally, get pointe to function!
BASS_SampleLoad_FuncPtr BASS_SampleLoad = reinterpret_cast<BASS_SampleLoad_FuncPtr>(dlsym(soHandle, "BASS_SampleLoad"));
// Error check!
if (nullptr == BASS_SampleLoad) {
// Fail here
}
// Now only here it's safe to call "BASS_SampleLoad" with required params
auto sample = BASS_SampleLoad(...);
...
// Don't forget to close lib
dlclose(soHandle);
Please, NOTE!
Provided code is not tested and might contain errors. And, it's C++11 standard.
Also, for C++14 and higher replace 'std::add_pointer<...>::type' with 'std::add_pointer_t<...>'
P.S. this code valid because BASS is cross-platform library and all WinAPI-look-a-like stuff (WINAPI, QWORD, BOOL, DWORD, etc.) is defined for Linux in BASS header

C++ - Method/Member access

We all know that private methods and members are only accessable inside the class, same way that protected methods and members are accessable inside the class and classes that derived from that class. But where is the «access control» of this? Does the «access control» happen in compile time, or does the compiler add addional machine code that controls that in runtime?
Can I create a class like this:
class Print
{
public:
void printPublic();
private:
void printPrivate();
};
int main()
{
Print print;
print.printPublic() // Change this to printPrivate() after compiling the code
return(EXIT_SUCCESS);
}
And then after compiling the code edit the machine code to call printPrivate() instead of printPublic() method without error?
Once you've fiddled around with the machine code, you're no longer compiling C++, but you're programming directly in machine code.
Your question is therefore somewhat moot.
You can regard the access specifiers as being essentially compile time directives, but note that the compiler can make optimisation choices based on them. In other words, it could be either. The C++ standard doesn't have to say anything about this either.
The «access control» happen at compile time and only for c++ code. you even not need edit the machine code - you can easy call private methods from assembly language - so this demonstrate that this is only for c++ restriction. and of course no any additional machine code that controls that in run-time - this at all impossible control who call method.
simply demo . note function names, how it mangled depended from x86 or x64 compiling and from compiler probably - my demo for CL compiler and x64 platform bat it can be easy changed to x86 or other compiler
c++ code
class Print
{
public:
void printPublic();
private:
void printPrivate();
};
// must be not inline or referenced from c++ code or will be droped by compiler!
void Print::printPrivate()// thiscall
{
DbgPrint("%s<%p>\n", __FUNCTION__, this);
}
void Print::printPublic()// thiscall
{
DbgPrint("%s<%p>\n", __FUNCTION__, this);
}
extern "C"
{
// stub impemeted in asm
void __fastcall Print_printPrivate(Print* This);
void __fastcall Print_printPublic(Print* This);
};
Print p;
//p.printPrivate();//error C2248
p.printPublic();
Print_printPrivate(&p);
Print_printPublic(&p);
and asm code (for ml64)
_TEXT segment 'CODE'
extern ?printPrivate#Print##AEAAXXZ:proc
extern ?printPublic#Print##QEAAXXZ:proc
Print_printPrivate proc
jmp ?printPrivate#Print##AEAAXXZ
Print_printPrivate endp
Print_printPublic proc
jmp ?printPublic#Print##QEAAXXZ
Print_printPublic endp
_TEXT ENDS
END
also note for x86 only that all c++ methods use thiscall calling convention - first parameter this in ECX register and next in stack as for __stdcall - so if method have no parameters (really one this ) we can use __fastcall for asm function as is, and if exist parameters we need push EDX to stack in assembler stub. for x64 no this problem - here only one calling convention, but all this already not related to main question.
example for x86 code with extra params, for show how transform __fastcall to __thiscall
class Print
{
public:
void printPublic(int a, int b)// thiscall
{
DbgPrint("%s<%p>(%x, %x)\n", __FUNCTION__, this, a, b);
}
private:
void printPrivate(int a, int b);
};
// must be not inline or referenced from c++ code or will be droped by compiler!
void Print::printPrivate(int a, int b)// thiscall
{
DbgPrint("%s<%p>(%x, %x)\n", __FUNCTION__, this, a, b);
}
extern "C"
{
// stub impemeted in asm
void __fastcall Print_printPrivate(Print* This, int a, int b);
void __fastcall Print_printPublic(Print* This, int a, int b);
};
Print p;
//p.printPrivate(1,2);//error C2248
p.printPublic(1, 2);
Print_printPrivate(&p, 1, 2);
Print_printPublic(&p, 1, 2);
and asm
.686p
_TEXT segment
extern ?printPublic#Print##QAEXHH#Z:proc
extern ?printPrivate#Print##AAEXHH#Z:proc
#Print_printPrivate#12 proc
xchg [esp],edx
push edx
jmp ?printPrivate#Print##AAEXHH#Z
#Print_printPrivate#12 endp
#Print_printPublic#12 proc
xchg [esp],edx
push edx
jmp ?printPublic#Print##QAEXHH#Z
#Print_printPublic#12 endp
_TEXT ends
end
The «access control» happen at compile time

__cdecl, __stdcall and __fastcall are all called the exact same way?

I am using Visual C++ 2010, and MASM as my x64-Assembler.
This is my C++ code:
// include directive
#include "stdafx.h"
// functions
extern "C" int Asm();
extern "C" int (convention) sum(int x, int y) { return x + y; }
// main function
int main()
{
// print asm
printf("Asm returned %d.\n", Asm());
// get char, return
_getch();
return EXIT_SUCCESS;
}
And my assembly code:
; external functions
extern sum : proc
; code segment
.code
Asm proc
; create shadow space
sub rsp, 20o
; setup parameters
mov ecx, 10
mov edx, 15
; call
call sum
; clean-up shadow space
add rsp, 20o
; return
ret
Asm endp
end
The reason I am doing this is so I can learn the different calling conventions.
I would make sum's calling convention stdcall, and modify the asm code so it would call sum the "stdcall" way. Once I got that working, I would make it, say, fastcall, and then call it in asm the "fastcall" way.
But look at my assembly code right now. When I use that code, no matter if sum is stdcall, fastcall or cdecl, it will compile, execute fine, and print 25 as my sum.
My question: How, and why can __cdecl, __stdcall and __fastcall all be called the exact same way?
The problem is that you're compiling for x64 targets. From MSDN
Given the expanded register set, x64 just uses the __fastcall calling
convention and a RISC-based exception-handling model. The __fastcall
model uses registers for the first four arguments and the stack frame
to pass the other parameters.
Switch over to compiling for x86 targets, and you should be able to see the various calling conventions in action.
As far as i know x64 only uses the __fastcall convention. __cdecl and stdcall will just be compiled as __fastcall.

Determine whether caller is called from EXE or DLL

I need to determine the caller code whether is coming from EXE or DLL.
DLL
#ifdef DLL_EXPORTS
__declspec(dllexport) void say_hello();
__declspec(dllexport) void getCurrentModuleName();
#else
__declspec(dllimport) void say_hello();
__declspec(dllexport) void getCurrentModuleName();
#endif
#include <cstdio>
#include <windows.h>
#include <Dbghelp.h>
#include <iostream>
#include <tchar.h>
#include "dll.h"
#include "Psapi.h"
__declspec(naked) void *GetStackPointer()
{
__asm
{
mov eax, esp
ret
}
}
void getCurrentModuleName()
{
BOOL result = SymInitialize(GetCurrentProcess(), NULL , TRUE);
DWORD64 dwBaseAddress = SymGetModuleBase64(GetCurrentProcess(), (DWORD64)GetStackPointer());
TCHAR szBuffer[50];
GetModuleBaseName(GetCurrentProcess(), (HMODULE) dwBaseAddress, szBuffer, sizeof(szBuffer));
std::wcout << _T("--->") << szBuffer << std::endl;
}
void say_hello() {
getCurrentModuleName();
}
EXE
#include <windows.h>
#include <cstdio>
#include "dll.h"
int main() {
printf ("ENTERING EXE CODE...\n");
getCurrentModuleName();
printf ("ENTERING DLL CODE...\n");
say_hello();
getchar();
}
Here is the output.
ENTERING EXE CODE...
--->exe.exe
ENTERING DLL CODE...
--->exe.exe
I wish I can get
ENTERING EXE CODE...
--->exe.exe
ENTERING DLL CODE...
--->dll.dll
As the last caller code are from DLL itself (say_hello in DLL)
Is there any way I can achieve this?
GetStackAddress is returning the value of ESP, which is a reference to the stack. The stack is allocated per thread, independently of any modules loaded in the process.
What you need to do is extract from the stack, the value of the return address - which will be an address in the calling module.
Given that the usual prefix code in a function is:
push ebp
mov ebp,esp
sub esp, bytes_of_local_variables
esp is going to be somewhat random, but [ebp] should be pointing at the previous ebp, and [ebp+4] should be pointing at the current frames return address.
So, you could try this:
__declspec(naked) void *GetReturnAddressAssumingStandardFramePointers()
{
__asm
{
mov eax, [ebp+4]
ret
}
}
Just make sure that functions that call that arn't compiled with /Oy
In that case use the return address of the function, which you can figure out by looking directly at the stack. The rest of the answer still applies.
You get stack pointer inside getCurrentModuleName() which is in DLL, but you need to get returning address from stack at the beginning of getCurrentModuleName() which shows you where getCurrentModuleName() was called from.
Use EnumProcessModules(). For each one call GetModuleInformation(). Compare the address of the function that you're executing (using a function pointer) to the lpBaseOfDll and SizeOfImage members of the MODULEINFO struct. If it falls within the range, you know that's the current module. If so, use GetModuleBaseName to retrieve the name of the module.
Here is the solution. The limitation is that, it is only able to trace up to 62 frames.
// Must have in order for us to turned address into module name.
SymInitialize(GetCurrentProcess(), NULL , TRUE);
// Limitation of RtlCaptureStackBackTrace.
const int kMaxCallers = 62;
void* callers[kMaxCallers];
int count = RtlCaptureStackBackTrace(0, kMaxCallers, callers, NULL);
for (int i = 0; i < count; i++) {
TCHAR szBuffer[50];
DWORD64 dwBaseAddress = SymGetModuleBase64(GetCurrentProcess(), (DWORD64)callers[i]);
GetModuleBaseName(GetCurrentProcess(), (HMODULE) dwBaseAddress, szBuffer, sizeof(szBuffer));
std::wcout << _T("--->") << szBuffer << std::endl;
}