I'm working on a complex program that will have plugins calling functions, however the method for these functions will be selected at start-up, and assigned using a function pointer.
Rather than passing around function pointers I would Like to have some efficient wrapper functions in the main executable to call the appropriate function.
As this is for a plugin interface the calling convention will be defined either __cdecl or __stdcall depending on the build target (using macros), and the functions will be declare as extern "C".
basically I want to be able to declare a SYMBOL in my executable, that the plugins can load as needed. For the different tasks that are need to solve a complex scientific problem, however there are a how range of solutions or methods getting the results for these tasks, these will be stored in plugins them selves so its easy to add new methods (no recompiling the entire application) this also makes it easier to share new methods as anyone with the base code can added any plugin requiring no experience them selves.
Any way I worked out I could either use this concept, or I would have to pass a function map to the plugins when I loaded them however the specifics of that function map depend upon the config and plugins that are loaded hence I don't actually know what it is until I'm finished loading plugins which would be a problem. Hence My Solution is store the map as a set of global variables in the main executable, accessible through wrapper functions.
However This is not straight forward as the functions have calling conventions which involve manipulating the stack after calling and before returning, which should be ignored on the wrapper, also it should perform a uncontitional jump jmp for intel x386 ASM rather than a function call call for intel x386 ASM, and control, should return from the jumped to function to the calling code not the wrapper. However I need C/C++ code to do this independent of Compiler/Platform/Processor.
Below is a basic concept example I throw to gather to test my idea and demonstrate what I want to do:
C++ code (Microsoft Visual C++ 2010 (specific))
#include <iostream>
void * pFunc;
int doit(int,int);
int wrapper(int, int);
int main() {
pFunc = (void*)doit;
std::cout << "Wrapper(2,3): " << wrapper(2,3) << std::endl;
std::cout << "doit(2,3): " << doit(2,3) << std::endl;
return 0; }
int doit(int a,int b) { return a*b; }
__declspec(naked) int wrapper(int, int) { __asm jmp pFunc }
Code has been tested to work properly, both calls output 6
ASM Output for wrapper and doit
PUBLIC ?wrapper##YAHHH#Z ; wrapper
; Function compile flags: /Odtp
; COMDAT ?wrapper##YAHHH#Z
_TEXT SEGMENT
___formal$ = 8 ; size = 4
___formal$ = 12 ; size = 4
?wrapper##YAHHH#Z PROC ; wrapper, COMDAT
; File c:\users\glen fletcher\documents\visual studio 2010\projects\test_wrapper\test_wrapper.cpp
; Line 15
jmp DWORD PTR ?pFunc##3PAXA ; pFunc
?wrapper##YAHHH#Z ENDP ; wrapper
_TEXT ENDS
PUBLIC ?doit##YAHHH#Z ; doit
; Function compile flags: /Ogtp
; COMDAT ?doit##YAHHH#Z
_TEXT SEGMENT
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
?doit##YAHHH#Z PROC ; doit, COMDAT
; Line 14
push ebp
mov ebp, esp
mov eax, DWORD PTR _a$[ebp]
imul eax, DWORD PTR _b$[ebp]
pop ebp
ret 0
?doit##YAHHH#Z ENDP ; doit
; Function compile flags: /Ogtp
_TEXT ENDS
Nonwrapper ASM for wrapper
PUBLIC wrapper
_1$ = 8
_2$ = 12
_TEXT SEGMENT
wrapper PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _2$[ebp]
push eax
mov ecx, DWORD PTR _1$[ebp]
push ecx
call DWORD PTR pFunc
add esp, 8
pop ebp
ret 0
wrapper ENDP
_TEXT ENDS
How can I get the original code generated in a cross-platform and cross-compiler manner?? As opposed to the standard for a C/C++ function with the epilog and prolog code generated by the compiler, NOTE don't want to make assumptions about the processor either so can't do a separate ASM file, want compiler to generate the code with just the unconditional jump statement.
goto doesn't work as pFunc is a variable not a label, not even sure it goto would work between functions anyway.
As far as your question,
How can I get the original code generated in a cross-platform and cross-compiler manner?
goes, the answer is "not at all".
Function calling conventions are deep into platform and compiler / language specifics. You're touching what's called the ABI (Application binary interface); issues like:
how / where are parameters passed from caller to called function, for all numbers / types / ordering of parameters ?
how are "hidden" features of the language (like C++ this) implemented ?
what are the rules for register usage (which regs are clobbered by making a function call to the "target context") ?
how / where are return values, for all types of "returning a value" ?
does source (caller) and target (callee) context use the same data structure layout rules ?
how can you deal with processor operating state changes (like would occur if you try to call 32bit code while executing in 64bit mode, and/or vice versa) ?
I've given a similar answer in this SO thread, then particularly targeted for a question about doing a "downcalls" 64bit Windows --> 32bit Windows stdcall. Alas, not much to add there except "it's complicated, not generally possible and always very strongly code/compiler and OS-dependent".
This can be done in a specific case (the technical term is "thunking". Every "thunk" is very specific: say, if you know the called function uses 32bit Windows/x86 style fastcall and has a single parameter, you can write a "thunk" doing the interfacing (and possibly processor state switch) that'd allow you to call it from, say, 64bit Linux code. That thunk would be different from one where the first parameter is a floating-point value passed in XMM0, though ... and so on.
For the general case ... refer to the infinite heap of programming knowledge that's SO again, sorry, there is no generic function pointer :(
Edit:
if the concern is code generation, then try the following:
/* sourcefile 1 */
extern void (*p)(char *, ...);
static __inline__ void wrapper(char *arg, char *s) {
return p(arg, s);
}
int main(int argc, char **argv)
{
wrapper("Hello, I am %s\n", argv[0]);
return 0;
}
/* sourcefile 2 */
extern void printf(char*, ...);
void (*p)(char *, ...) = printf;
If I compile those two, using gcc with optimization, the compiler creates the following code for main:
0000000000400500 <main>:
400500: 48 83 ec 08 sub $0x8,%rsp
400504: 48 8b 36 mov (%rsi),%rsi
400507: bf 0c 06 40 00 mov $0x40060c,%edi
40050c: ff 15 d6 03 10 00 callq *1049558(%rip) # 5008e8 <p>
400512: 31 c0 xor %eax,%eax
400514: 48 83 c4 08 add $0x8,%rsp
400518: c3 retq
which is pretty much what you want - except it eliminates wrapper(), but directly inlines the call through the function pointer.
I worked out a solution to my problem, rather than using naked functions or passing a list of function pointers.
I can pass a pointer to a struct of function pointers i.e.
struct Functions {
bool (AppAPI *logInfo(std::string,...)),
bool (AppAPI *logWarn(std::string,...)),
bool (AppAPI *logError(std::string,...)),
bool (AppAPI *registerFunction(std::string,void *))
...
} PluginFunctions;
for (int i = 0;i<plugins;i++) {
plugin[i].initialize(&PluginFunctions)
}
PluginFunctions.logInfo = LogInfo;
...
As the plugin init function, is passed a pointer to the struct it can store this and then load the current value of the function pointer from memory, the struct is just a table of pointer in memory, the function pointers can be set after the struct has been passed to the plugin and it still updates the plugin.
Related
I took an inactive project and already fixed a lot in it, but I don't get an Intrinsics replacement correctly to work for the used inline assembly, which is no longer supported in the x86/x64 msvc compilers.
#define XCALL(uAddr) \
__asm { mov esp, ebp } \
__asm { pop ebp } \
__asm { mov eax, uAddr } \
__asm { jmp eax }
Use cases:
static oCMOB * CreateNewInstance() {
XCALL(0x00718590);
}
int Copy(class zSTRING const &, enum zTSTR_KIND const &) {
XCALL(0x0046C2D0);
}
void TrimLeft(char) {
XCALL(0x0046C630);
}
This snippet goes at the bottom of a function (which can't inline, and must be compiled with ebp as a frame pointer, and no other registers that need restoring). It looks quite brittle, or else it's only useful in cases where you didn't need inline asm at all.
Instead of returning, it jumps to uAddr, which is equivalent to making a tailcall.
There aren't intrinsics for arbitrary jumps or manipulation of the stack. If you need that, you're out of luck. It doesn't make sense to ask about this snippet by itself, only with enough context to see how it's being used. i.e. is it important which return address is on the stack, or is it ok for it to compile to call/ret instead of jmp to that address? (See the first version of this answer for a simple example of using it as a function pointer.)
From your update, your use-cases are just a very clunky way to make wrappers for absolute function pointers.
We can instead define static const function pointers of the right types, so no wrapper is needed and the compiler can call directly from wherever you use these. static const is how we let the compile know it can fully inline the function pointers, and doesn't need to store them anywhere as data if it doesn't want to, just like normal static const int xyz = 2;
struct oCMOB;
class zSTRING;
enum zTSTR_KIND { a, b, c }; // enum forward declarations are illegal
// C syntax
//static oCMOB* (*const CreateNewInstance)() = (oCMOB *(*const)())0x00718590;
// C++11
static const auto CreateNewInstance = reinterpret_cast<oCMOB *(*)()>(0x00718590);
// passing an enum by const-reference is dumb. By value is more efficient for integer types
static const auto Copy = reinterpret_cast<int (*)(class zSTRING const &, enum zTSTR_KIND const &)>(0x0046C2D0);
static const auto TrimLeft = reinterpret_cast<void (*)(char)> (0x0046C630);
void foo() {
oCMOB *inst = CreateNewInstance();
(void)inst; // silence unused warning
zSTRING *dummy = nullptr; // work around instantiating an incomplete type
int result = Copy(*dummy, c);
(void) result;
TrimLeft('a');
}
It also compiles just fine with x86-64 and 32-bit x86 MSVC, and gcc/clang 32 and 64-bit on the Godbolt compiler explorer. (And also non-x86 architectures). This is the 32-bit asm output from MSVC, so you could compare with what you get for your nasty wrapper functions. You can see that it's basically inlined the useful part (mov eax, uAddr / jmp or call) into the caller.
;; x86 MSVC -O3
$T1 = -4 ; size = 4
?foo##YAXXZ PROC ; foo
push ecx
mov eax, 7439760 ; 00718590H
call eax
lea eax, DWORD PTR $T1[esp+4]
mov DWORD PTR $T1[esp+4], 2 ; the by-reference enum
push eax
push 0 ; the dummy nullptr
mov eax, 4637392 ; 0046c2d0H
call eax
push 97 ; 00000061H
mov eax, 4638256 ; 0046c630H
call eax
add esp, 16 ; 00000010H
ret 0
?foo##YAXXZ ENDP
For repeated calls to the same function, the compiler would keep the function pointer in a call-preserved register.
For some reason even with 32-bit position-dependent code, we don't get a direct call rel32. The linker can calculate the relative offset from the call-site to the absolute target at link time, so there's no reason for the compiler to use a register-indirect call.
If we didn't tell the compiler to create position-independent code, it's a useful optimization in this case to address absolute addresses relative to the code, for jumps/calls.
In 32-bit code, every possible destination address is in range from every possible source address, but in 64-bit it's harder. In 32-bit mode, clang does spot this optimization! But even in 32-bit mode, MSVC and gcc miss it.
I played around with some stuff with gcc/clang:
// don't use
oCMOB * CreateNewInstance(void) asm("0x00718590");
Kind of works, but only as a total hack. Gcc just uses that string as if it were a symbol, so it feeds call 0x00718590 to the assembler, which handles it correctly (generating an absolute relocation which links just fine in a non-PIE executable). But with -fPIE, we it emits 0x00718590#GOTPCREL as a symbol name, so we're screwed.
Of course, in 64-bit mode a PIE executable or library will be out of range of that absolute address so only non-PIE makes sense anyway.
Another idea was to define the symbol in asm with an absolute address, and provide a prototype that would get gcc to only use it directly, without #PLT or going through the GOT. (I maybe could have done that for the func() asm("0x..."); hack, too, using hidden visibility.)
I only realized after hacking this up with the "hidden" attribute that this is useless in position-independent code, so you can't use this in a shared library or PIE executable anyway.
extern "C" is not necessary, but means I didn't have to mess with name mangling in the inline asm.
#ifdef __GNUC__
extern "C" {
// hidden visibility means that even in a PIE executable, or shared lib,
// calls will go *directly* to that address, not via the PLT or GOT.
oCMOB * CNI(void) __attribute__((__visibility__("hidden")));
}
//asm("CNI = 0x718590"); // set the address of a symbol, like `org 0x71... / CNI:`
asm(".set CNI, 0x718590"); // alternate syntax for the same thing
void *test() {
CNI(); // works
return (void*)CNI; // gcc: RIP+0x718590 instead of the relative displacement needed to reach it?
// clang appears to work
}
#endif
disassembly of compiled+linked gcc output for test, from Godbolt, using the binary output to see how it assembled+linked:
# gcc -O3 (non-PIE). Clang makes pretty much the same code, with a direct call and mov imm.
sub rsp,0x8
call 718590 <CNI>
mov eax,0x718590
add rsp,0x8
ret
With -fPIE, gcc+gas emits lea rax,[rip+0x718590] # b18ab0 <CNI+0x400520>, i.e. it uses the absolute address as an offset from RIP, instead of subtracting. I guess that's because gcc literally emits lea CNI(%rip),%rax, and we've defined CNI as an assemble-time symbol with that numeric value. Oops. So it's not quite like a label with that address like you'd get with .org 0x718590; CNI:.
But since we can only use rel32 call in non-PIE executables, this is ok unless you compile with -no-pie but forget -fno-pie, in which case you're screwed. :/
Providing a separate object file with the symbol definition might have worked.
Clang appears to do exactly what we want, though, even with -fPIE, with its built-in assembler. This machine code could only have linked with -fno-pie (the default on Godbolt, not the default on many distros.)
# disassembly of clang -fPIE machine-code output for test()
push rax
call 718590 <CNI>
lea rax,[rip+0x3180b3] # 718590 <CNI>
pop rcx
ret
So this is actually safe (but sub-optimal because lea rel32 is worse than mov imm32.) With -m32 -fPIE, it doesn't even assemble.
I have the following C++ code:
int main()
{
int i;
int j;
i = 1111;
j = 2222;
return 0;
}
I wanted to view to what Assembly code this C++ code compiles to, so I chose the following option:
This option will output each C++ statement and directly under it the Assembly instruction(s) it corresponds to. But there are some C++ statements that don't corresponds to any Assembly instructions (for example: int i;). So I want to make sure that my following assumption is correct when reading the generated Assembly code:
int i; int j; are just variable declarations.. they are not even being initialized with the declaration, and hence in that sense, there is no explicit assembly instructions for those two lines.. But do note that the local variable declaration does lead to allocation for these local variables on the stack.
And yes, for the latter part of your question, mov DWORD PTR_i$[ebp], 1111 only correspond to i = 1111;.
I think (for educational purposes) you should put those statements in a function and call the function from main and then (in the main function):
sub esp, 216 ; 000000d8H
becomes:
sub esp, 192 ; 000000c0H
and:
lea edi, DWORD PTR [ebp-216]
becomes:
lea edi, DWORD PTR [ebp-192]
What is happening is that those instructions are reserving memory in the stack for i and j. So there are machine instructions (that will always be there except usually with other values) but you need to understand what is happening to understand what the instructions are. The 216 value will be used in the function containing the definition of i and j (assuming that there are no other definitions).
Note that the mov instruction that sets a value for "i" is using the ebp register. That register points to the stack. So I think you can assume that instruction is the only instruction.
I'm trying to write a trampoline hook to some win32 api function, when I write the JMP instruction to the start of the original function I want it to jump to a codecave instead of calling a function.
The original function start looks like this in OllyDBG:
PUSH 14
MOV EAX, 12345678
...
And I patch it to:
JMP 87654321
NOP
NOP
The address of the following function:
int HookFunc(int param)
{
DoStuff(param);
return ExecuteOriginal(param);
}
ExceuteOriginal looks like this:
unsigned long address = AddressOfOriginalFunction + 7;
int ExceuteOriginal(int param)
{
__asm
{
PUSH 0x14
MOV EAX, 0x12345678
JMP address
}
}
Which executes the overridden code and jumps to the original function right after the patched code. The problem is that since it's a function, it'll mess up the stack because the caller should clean it up and the function instead of return, jumps to another function's code. And I guess that's why the program crashes.
Is there a way using Visual C++ compiler to place the assembly code in the code section of the program without having it being inside a function? That way I can jump there, execute whatever, and return back without the risk of messing up the stack.
Solution: __declspec(naked)
For functions declared with the naked attribute, the compiler generates code without prolog and epilog code. You can use this feature to write your own prolog/epilog code sequences using inline assembler code.
Example:
__declspec( naked ) int ExceuteOriginal(int param)
{
__asm
{
PUSH 14
MOV EAX, 0x12345678
JMP address
}
}
I have been learning IA-32 assembly programming. So I would like to write a function in assembly and call it from C++.
The tutorial I am following is actually for x64 assembly. But I am working on IA-32. In x64, it says function arguments are stored in registers like RCX, RDX, R8, R9 etc.
But on searching a little bit, I could understand in IA-32, arguments are stored in stack, not in registers.
Below is my C++ code :
#include <iostream>
#include <conio.h>
using namespace std;
extern "C" int PassParam(int a,int b);
int main()
{
cout << "z is " << PassParam(15,13) << endl;
_getch();
return 0;
}
Below is assembly code for PassParam() function (it just add two arguments, that's all. It is only for learning purpose) :
PassParam() in assembly :
.model C,flat
.code
PassParam proc
mov eax,[ebp-212]
add eax,[ebp-216]
ret
PassParam endp
end
In my assembly code, you can see I moved first argument from [ebp-212] to eax. That value is obtained as follows :
I wrote PassParam() function in C++ itself and disassembled it. Then checked where ebp is and where is second argument stored (arguments are stored from right to left). I could see there is a difference of 212, so that is how i got that value. Then as usual, first argument is stored 4 bytes later. And it works fine.
Question :
Is this the correct method to access arguments from assembly ? I mean, is it always [ebp-212] where argument stored?
If not, can anyone explain the correct method to pass arguments from C++ to assembly ?
Note :
I am working with Visual C++ 2010, on Windows 7 machine.
On 32bit architectures, it depends on the calling convention, Windows for example has both __fastcall and __thiscall that use register and stack args, and __cdecl and __stdcall that use stack args but differ in who does the cleanup. MSDN has a nice listing here (or the more assembly orientated version). Note that FPU/SSE operations also have their own conventions.
For ease and simplicity, try use __stdcall for everything, this allows you to use stack frames to access args via MOV r32,[EBP+4+(arg_index * 4)], or if you aren't using stack frames, you can use MOV r32,[ESP+local_stack_offset+(arg_index * 4)]. The annotated C++ -> x86 Assembly example here should be of help.
So as a simple example, lets say we have the function MulAdd in assembly, with the C++ prototype int __stdcall MulAdd(int base, int mul, int add), it would look something like:
MOV EAX,[ESP+4] //get the first arg('base') off the stack
MOV ECX,[ESP+8] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[ESP+12] //get arg 3 off the stack
ADD EAX,ECX
RETN 12 //cleanup the 3 args and return
Or if you use a stack frame:
PUSH EBP
MOV EBP,ESP //save the stack
MOV EAX,[EBP+8] //get the first arg('base') off the stack
MOV ECX,[EBP+12] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[EBP+16] //get arg 3 off the stack
ADD EAX,ECX
MOV ESP,EBP //restore the stack
POP EBP
RETN //return to caller
Using the stack frame avoids needing to adjust for changes made to the stack by PUSH'ing of args, spilling or registers or stack allocations made for local variables. Its downside is that it reduces the number of registers you have to work with.
I'm trying to learn reverse engineering, and I'm stuck on this little thing. I have code like this:
.text:10003478 mov eax, HWHandle
.text:1000347D lea ecx, [eax+1829B8h] <------
.text:10003483 mov dword_1000FA64, ecx
.text:10003489 lea esi, [eax+166A98h]<------
.text:1000348F lea edx, [eax+11FE320h]
.text:10003495 mov dword_1000FCA0, esi
and I'm wondering, how does it look like in C or C++? Especially the two instructions marked by arrows. HWHandle is variable which holds the a value returned from the GetModuleHandle() function.
More interesting is that a couple of lines below this instructions, dword_1000FCA0 is used as a function:
.text:1000353C mov eax, dword_1000FCA0
.text:10003541 mov ecx, [eax+0A0h]
.text:10003547 push offset asc_1000C9E4 ; "\r\n========================\r\n"
.text:1000354C call ecx
This will draw this text in my game console. Have you got any ideas, guys?
LEA is nothing more than an arithmetic operation : in that case, ECX is just filled with EAX+offset (the very address, not the pointed contents). if HWHandle pointed to a (very large) structure, ECX would just be one of its members.
This could be an associated source code:
extern A* HWHandle; // mov eax, HWHandle
B* ECX = HWHandle->someStructure; // lea ecx, [eax+1829B8h]
and later, one of B’s members is used as a function.
*(ECX->ptrFunction(someArg)) // mov ecx, [eax+0A0h]
// call ecx
Since HWHandle is a module handle, which is just the base address of a DLL, it looks as if the constants that are being added to this are offsets for functions or static data inside the DLL. The code is computing the addresses of these functions or data items and storing them for later use.
Since this is typically the job of a dynamic linker, I'm not sure that this assembly code corresponds to actual C++ code. It would be helpful to know what environment you're working in exactly -- since you refer to games consoles, is this Xbox code? Unfortunately, I don't know how exactly dynamic linking works on Xbox, but it looks as if this may be what is going on here.
In the specific case of dword_1000FCA0, it looks as if this is the location of a jump table (i.e. essentially a list of function pointers) inside the DLL. Your second code snippet is getting a function pointer from offset 0xA inside this table, then calling it -- apparently, the function being called outputs strings to the screen. (The pointer to the string to be output is pushed to the stack, which a usual x86 calling convention.) The C++ code corresponding to this would be something like
my_print_function("\r\n========================\r\n");
Edit:
If you want to call functions in a DLL yourself, the canonical way of getting at the function pointer is to use GetProcAddress():
FARPROC func=GetProcAddress(HWHandle, "MyFunction");
However, the code you posted is calculating offsets itself, and if you really want to do the same, you could use something like this:
DWORD func=(DWORD)HWHandle + myOffset;
myOffset is the offset you want to use -- of course, you'd need to have some way of determining this offset, and this can change every time the DLL is recompiled, so it's not a technique I would recommend -- but it is, after all, what you were asking but.
Regardless of which of these two ways you use to get at the address of the function, you need to call it. To do this, you need to declare a function pointer -- and to do that, you need to know the signature of your function (its parameters and return types). For example:
typedef void (*print_func_type)(const char *);
print_func_type my_func_pointer=(print_func_type)func;
my_func_pointer("\r\n========================\r\n");
Beware -- if you get the address of the function or its signature wrong, your code will likely crash. All part of the fun of this kind of low-level work.
It looks like HWHandle is apointer to some structure (a big one). lea instruction is reading address(es) from that structure, e.g:
mov eax, HWHandle
lea ecx, [eax+1829B8h]
mov dword_1000FA64, ecx
means:
Read address from HWHandle + 0x1829B8 and put it into ecx
Put that address (from ecx) into some (global) variable dword_1000FA64
The rest looks simmilar.
In C++ you can get it almost anywhere and you really cannot predict where (depends on a compiler and optimizations), e.g.:
int x;
int* pX = &X;
The second line may generate lea.
Another example:
struct s
{
int x;
int y;
};
my_s s;
int Y = s.y; //here: probably lea <something> , [address(my_s) + 0x4]
Hope that helps.
In C++ this is roughly equivalent to
char* ecx, eax, esi;
ecx = eax+0x1829B8 // lea ecx, [eax+1829B8h]
esi = eax+0x166A98 // lea esi, [eax+166A98h]
Under the assumption that eax, esi and ecx are really holding pointers to memory locations. Of course the lea instruction can be used to to simple arithmetic too, and in fact it often is used for addition by the compilers. The advantage compared to a simple add: It can have up to three input operands and a different destination.
For example, foo = &bar->baz is the same as (simplified) foo = (char *)bar + offsetof(typeof(*bar), baz), which can be translated to lea foo, [bar+offsetofbaz].
It really is compiler and optimization dependent, but if IIRC, lea could be emitted just for additions.... So lea ecx, [eax+1829B8h] can be understood as ecx = eax + 0x1829B8