Accessing function parameters in C++ from assembly in IA-32 - c++

I have been learning IA-32 assembly programming. So I would like to write a function in assembly and call it from C++.
The tutorial I am following is actually for x64 assembly. But I am working on IA-32. In x64, it says function arguments are stored in registers like RCX, RDX, R8, R9 etc.
But on searching a little bit, I could understand in IA-32, arguments are stored in stack, not in registers.
Below is my C++ code :
#include <iostream>
#include <conio.h>
using namespace std;
extern "C" int PassParam(int a,int b);
int main()
{
cout << "z is " << PassParam(15,13) << endl;
_getch();
return 0;
}
Below is assembly code for PassParam() function (it just add two arguments, that's all. It is only for learning purpose) :
PassParam() in assembly :
.model C,flat
.code
PassParam proc
mov eax,[ebp-212]
add eax,[ebp-216]
ret
PassParam endp
end
In my assembly code, you can see I moved first argument from [ebp-212] to eax. That value is obtained as follows :
I wrote PassParam() function in C++ itself and disassembled it. Then checked where ebp is and where is second argument stored (arguments are stored from right to left). I could see there is a difference of 212, so that is how i got that value. Then as usual, first argument is stored 4 bytes later. And it works fine.
Question :
Is this the correct method to access arguments from assembly ? I mean, is it always [ebp-212] where argument stored?
If not, can anyone explain the correct method to pass arguments from C++ to assembly ?
Note :
I am working with Visual C++ 2010, on Windows 7 machine.

On 32bit architectures, it depends on the calling convention, Windows for example has both __fastcall and __thiscall that use register and stack args, and __cdecl and __stdcall that use stack args but differ in who does the cleanup. MSDN has a nice listing here (or the more assembly orientated version). Note that FPU/SSE operations also have their own conventions.
For ease and simplicity, try use __stdcall for everything, this allows you to use stack frames to access args via MOV r32,[EBP+4+(arg_index * 4)], or if you aren't using stack frames, you can use MOV r32,[ESP+local_stack_offset+(arg_index * 4)]. The annotated C++ -> x86 Assembly example here should be of help.
So as a simple example, lets say we have the function MulAdd in assembly, with the C++ prototype int __stdcall MulAdd(int base, int mul, int add), it would look something like:
MOV EAX,[ESP+4] //get the first arg('base') off the stack
MOV ECX,[ESP+8] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[ESP+12] //get arg 3 off the stack
ADD EAX,ECX
RETN 12 //cleanup the 3 args and return
Or if you use a stack frame:
PUSH EBP
MOV EBP,ESP //save the stack
MOV EAX,[EBP+8] //get the first arg('base') off the stack
MOV ECX,[EBP+12] //get the second arg('mul') off the stack
IMUL EAX,ECX //base * mul
MOV ECX,[EBP+16] //get arg 3 off the stack
ADD EAX,ECX
MOV ESP,EBP //restore the stack
POP EBP
RETN //return to caller
Using the stack frame avoids needing to adjust for changes made to the stack by PUSH'ing of args, spilling or registers or stack allocations made for local variables. Its downside is that it reduces the number of registers you have to work with.

Related

Why is the __stdcall calling convention ignored in x64?

I know what the differences between __cdecl and __stdcall are, but I'm not quite sure as to why __stdcall is ignored by the compiler in x64 builds.
The functions in the following code
int __stdcall stdcallFunc(int a, int b, int c, int d, int e, int f, int g)
{
return a + b + c + d + e + f + g;
}
int __cdecl cdeclFunc(int a, int b, int c, int d, int e, int f, int g)
{
return a + b + c + d + e + f + g;
}
int main()
{
stdcallFunc(1, 2, 3, 4, 5, 6, 7);
cdeclFunc(1, 2, 3, 4, 5, 6, 7);
return 0;
}
have enough parameters to exceed the available CPU registers. Therefore, some arguments must be passed via the stack. I'm not fluent in assembly but I noticed some differences between x86 and x64 assembly.
x64
main PROC
$LN3:
sub rsp, 72 ; 00000048H
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?stdcallFunc##YAHHHHHHHH#Z ; stdcallFunc
mov DWORD PTR [rsp+48], 7
mov DWORD PTR [rsp+40], 6
mov DWORD PTR [rsp+32], 5
mov r9d, 4
mov r8d, 3
mov edx, 2
mov ecx, 1
call ?cdeclFunc##YAHHHHHHHH#Z ; cdeclFunc
xor eax, eax
add rsp, 72 ; 00000048H
ret 0
main ENDP
x86
_main PROC
push ebp
mov ebp, esp
push 7
push 6
push 5
push 4
push 3
push 2
push 1
call ?stdcallFunc##YGHHHHHHHH#Z ; stdcallFunc
push 7
push 6
push 5
push 4
push 3
push 2
push 1
call ?cdeclFunc##YAHHHHHHHH#Z ; cdeclFunc
add esp, 28 ; 0000001cH
xor eax, eax
pop ebp
ret 0
_main ENDP
The first 4 arguments are, as expected, passed via registers in x64.
The remaining arguments are put on the stack in the same order as in x86.
Contrary to x86, in x64 we don't use push instructions. Instead we reserve enough stack space at the beginning of main and use mov instructions to add the arguments to the stack.
In x64, no stack cleanup is happening after both calls, but at the end of main.
This brings me to my questions:
Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.
Why is there no stack cleanup after the call instructions in x64?
What's the reason that Microsoft chose to ignore __stdcall in x64 assembly?
From the docs:
On ARM and x64 processors, __stdcall is accepted and ignored by the compiler
Here is the example code and assembly.
Why does x64 use mov rather than push? I assume it's just more efficient and wasn't available in x86.
That is not the reason. Both of these instructions also exist in x86 assembly language.
The reason why your compiler is not emitting a push instruction for the x64 code is probably because it must adjust the stack pointer directly anyway, in order to create 32 bytes of "shadow space" for the called function. See this link (which was provided by #NateEldredge) for further information on "shadow space".
Allocating 32 bytes of "shadow space" with push instructions would take 4 64-bit push instructions, but only one sub instruction. That is why it prefers to use the sub instruction. Since it is using the sub instruction anyway to create 32 bytes of shadow space, there is no penalty to change the operand of the sub instruction from 32 to 72, which allocates 72 bytes of memory on the stack, which is enough to also pass 3 paramters on the stack (the other 4 are passed in CPU registers).
I don't understand why it is allocating 72 bytes on the stack, though, as, according to my calculcations, it only has to be 56 bytes (32 bytes of "shadow space" and 24 bytes for the 3 parameters that are passed on the stack). Possibly, the compiler is reserving those extra 16 bytes for local variables or for exception handling, which may be optimized away when compiler optimizations are active.
Why is there no stack cleanup after the call instructions in x64?
There is stack cleanup after the call instructions. This is what the line
add rsp, 72
does.
However, for some reason (probably increased performance), the x64 compiler only performs the cleanup at the end of the calling function, instead of after every function call. This means that with the x64 compiler, all function calls share the same stack space for their parameters, whereas with the x86 compiler, the stack space is allocated and cleaned up at every function call.
What's the reason that Microsoft chose to ignore __stdcall in x64 assembly?
The keywords _stdcall and _cdecl specify 32-bit calling conventions. That's why they are not relevant for 64-bit programs (i.e. x64). On x64, there is only the standard calling convention and the extended __vectorcall calling convenction.

Cross Compiler/Platform Naked Wrapper Function, Unconditional Jump to Function Pointer

I'm working on a complex program that will have plugins calling functions, however the method for these functions will be selected at start-up, and assigned using a function pointer.
Rather than passing around function pointers I would Like to have some efficient wrapper functions in the main executable to call the appropriate function.
As this is for a plugin interface the calling convention will be defined either __cdecl or __stdcall depending on the build target (using macros), and the functions will be declare as extern "C".
basically I want to be able to declare a SYMBOL in my executable, that the plugins can load as needed. For the different tasks that are need to solve a complex scientific problem, however there are a how range of solutions or methods getting the results for these tasks, these will be stored in plugins them selves so its easy to add new methods (no recompiling the entire application) this also makes it easier to share new methods as anyone with the base code can added any plugin requiring no experience them selves.
Any way I worked out I could either use this concept, or I would have to pass a function map to the plugins when I loaded them however the specifics of that function map depend upon the config and plugins that are loaded hence I don't actually know what it is until I'm finished loading plugins which would be a problem. Hence My Solution is store the map as a set of global variables in the main executable, accessible through wrapper functions.
However This is not straight forward as the functions have calling conventions which involve manipulating the stack after calling and before returning, which should be ignored on the wrapper, also it should perform a uncontitional jump jmp for intel x386 ASM rather than a function call call for intel x386 ASM, and control, should return from the jumped to function to the calling code not the wrapper. However I need C/C++ code to do this independent of Compiler/Platform/Processor.
Below is a basic concept example I throw to gather to test my idea and demonstrate what I want to do:
C++ code (Microsoft Visual C++ 2010 (specific))
#include <iostream>
void * pFunc;
int doit(int,int);
int wrapper(int, int);
int main() {
pFunc = (void*)doit;
std::cout << "Wrapper(2,3): " << wrapper(2,3) << std::endl;
std::cout << "doit(2,3): " << doit(2,3) << std::endl;
return 0; }
int doit(int a,int b) { return a*b; }
__declspec(naked) int wrapper(int, int) { __asm jmp pFunc }
Code has been tested to work properly, both calls output 6
ASM Output for wrapper and doit
PUBLIC ?wrapper##YAHHH#Z ; wrapper
; Function compile flags: /Odtp
; COMDAT ?wrapper##YAHHH#Z
_TEXT SEGMENT
___formal$ = 8 ; size = 4
___formal$ = 12 ; size = 4
?wrapper##YAHHH#Z PROC ; wrapper, COMDAT
; File c:\users\glen fletcher\documents\visual studio 2010\projects\test_wrapper\test_wrapper.cpp
; Line 15
jmp DWORD PTR ?pFunc##3PAXA ; pFunc
?wrapper##YAHHH#Z ENDP ; wrapper
_TEXT ENDS
PUBLIC ?doit##YAHHH#Z ; doit
; Function compile flags: /Ogtp
; COMDAT ?doit##YAHHH#Z
_TEXT SEGMENT
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
?doit##YAHHH#Z PROC ; doit, COMDAT
; Line 14
push ebp
mov ebp, esp
mov eax, DWORD PTR _a$[ebp]
imul eax, DWORD PTR _b$[ebp]
pop ebp
ret 0
?doit##YAHHH#Z ENDP ; doit
; Function compile flags: /Ogtp
_TEXT ENDS
Nonwrapper ASM for wrapper
PUBLIC wrapper
_1$ = 8
_2$ = 12
_TEXT SEGMENT
wrapper PROC
push ebp
mov ebp, esp
mov eax, DWORD PTR _2$[ebp]
push eax
mov ecx, DWORD PTR _1$[ebp]
push ecx
call DWORD PTR pFunc
add esp, 8
pop ebp
ret 0
wrapper ENDP
_TEXT ENDS
How can I get the original code generated in a cross-platform and cross-compiler manner?? As opposed to the standard for a C/C++ function with the epilog and prolog code generated by the compiler, NOTE don't want to make assumptions about the processor either so can't do a separate ASM file, want compiler to generate the code with just the unconditional jump statement.
goto doesn't work as pFunc is a variable not a label, not even sure it goto would work between functions anyway.
As far as your question,
How can I get the original code generated in a cross-platform and cross-compiler manner?
goes, the answer is "not at all".
Function calling conventions are deep into platform and compiler / language specifics. You're touching what's called the ABI (Application binary interface); issues like:
how / where are parameters passed from caller to called function, for all numbers / types / ordering of parameters ?
how are "hidden" features of the language (like C++ this) implemented ?
what are the rules for register usage (which regs are clobbered by making a function call to the "target context") ?
how / where are return values, for all types of "returning a value" ?
does source (caller) and target (callee) context use the same data structure layout rules ?
how can you deal with processor operating state changes (like would occur if you try to call 32bit code while executing in 64bit mode, and/or vice versa) ?
I've given a similar answer in this SO thread, then particularly targeted for a question about doing a "downcalls" 64bit Windows --> 32bit Windows stdcall. Alas, not much to add there except "it's complicated, not generally possible and always very strongly code/compiler and OS-dependent".
This can be done in a specific case (the technical term is "thunking". Every "thunk" is very specific: say, if you know the called function uses 32bit Windows/x86 style fastcall and has a single parameter, you can write a "thunk" doing the interfacing (and possibly processor state switch) that'd allow you to call it from, say, 64bit Linux code. That thunk would be different from one where the first parameter is a floating-point value passed in XMM0, though ... and so on.
For the general case ... refer to the infinite heap of programming knowledge that's SO again, sorry, there is no generic function pointer :(
Edit:
if the concern is code generation, then try the following:
/* sourcefile 1 */
extern void (*p)(char *, ...);
static __inline__ void wrapper(char *arg, char *s) {
return p(arg, s);
}
int main(int argc, char **argv)
{
wrapper("Hello, I am %s\n", argv[0]);
return 0;
}
/* sourcefile 2 */
extern void printf(char*, ...);
void (*p)(char *, ...) = printf;
If I compile those two, using gcc with optimization, the compiler creates the following code for main:
0000000000400500 <main>:
400500: 48 83 ec 08 sub $0x8,%rsp
400504: 48 8b 36 mov (%rsi),%rsi
400507: bf 0c 06 40 00 mov $0x40060c,%edi
40050c: ff 15 d6 03 10 00 callq *1049558(%rip) # 5008e8 <p&gt
400512: 31 c0 xor %eax,%eax
400514: 48 83 c4 08 add $0x8,%rsp
400518: c3 retq
which is pretty much what you want - except it eliminates wrapper(), but directly inlines the call through the function pointer.
I worked out a solution to my problem, rather than using naked functions or passing a list of function pointers.
I can pass a pointer to a struct of function pointers i.e.
struct Functions {
bool (AppAPI *logInfo(std::string,...)),
bool (AppAPI *logWarn(std::string,...)),
bool (AppAPI *logError(std::string,...)),
bool (AppAPI *registerFunction(std::string,void *))
...
} PluginFunctions;
for (int i = 0;i<plugins;i++) {
plugin[i].initialize(&PluginFunctions)
}
PluginFunctions.logInfo = LogInfo;
...
As the plugin init function, is passed a pointer to the struct it can store this and then load the current value of the function pointer from memory, the struct is just a table of pointer in memory, the function pointers can be set after the struct has been passed to the plugin and it still updates the plugin.

Inline assembly language

I am doing 64 bit migration and i need to port inline assembly code to cpp Here is he code
void ExternalFunctionCall::callFunction(ArgType resultType, void* resultBuffer)
{
// I386
// just copy the args buffer to the stack (it's already layed out correctly)
int* begin = m_argsBegin;
int* ptr = m_argsEnd;
while (ptr > begin) {
int val = *(--ptr);
__asm push val
}
}
I want to migrate this __asm push val to cpp. This function is called four times and for every call we get different values of m_argsBegin and m_argsEnd(both m_argsBegin and m_argsEnd are dynamic arrays).
This while loop executes 4 times for every call of this "callFunction" function. So, in total 4x4 = 16 values are to be stored in a "CONTINUOUS memory location" this is what "__asm push val" does i guess. I need to implement this in c++ . I tried every possible way (stack, array, Lnked list, Queue even separated this into a separate asm file but none are working)
Can anyone help?
I separated this inline assembly function into a separate assembly file . Here is the code:
.386
.model c,flat
public callFunction_asm
CSEG segment public 'CODE'
callFunction_asm PROC
push ebp
mov ebp, esp
mov ecx, [ebp+8] ;val
push dword ptr [ecx]
mov esp, ebp
pop ebp
RETN
callFunction_asm ENDP
CSEG ends
END
where callFunction_asm is an extern function , I declared it as:
extern "C"
void callFunction_asm(int val);
and I am calling this function as:
while (ptr > begin) {
int val = *(--ptr);
callFunction_asm(val); //possible replacement
}
but even this is not working, can anyone tell where am I going wrong. I am new to assembly coding.
push puts its operand on the stack, as well as decrementing the stack pointer.
If you looked at the stack pointer plus 1 (1($sp)), you should see the value (but if you wanted it back, you'd typically use pop).

Calling Win32's Sleep function from assembly creates access violation error

I'm using MASM and Visual C++, and I'm compiling in x64. This is my C++ code:
// include directive
#include "stdafx.h"
// external functions
extern "C" int Asm();
// main function
int main()
{
// call asm
Asm();
// get char, return success
_getch();
return EXIT_SUCCESS;
}
and my assembly code:
extern Sleep : proc
; code segment
.code
; assembly procedure
Asm proc
; sleep for 1 second
mov ecx, 1000 ; ecx = sleep time
sub rsp, 8 ; 8 bytes of shadow space
call Sleep ; call sleep
add rsp, 8 ; get rid of shadow space
; return
ret
Asm endp
end
Using breakpoints, I've pinpointed the line of code where the access violation occurs: right after the ret statement in my assembly code.
Extra info:
I'm using the fastcall convention to pass my parameters into Sleep (even though it is declared as stdcall), because from what I have read, x64 will always use the fastcall convention.
My Asm procedure compiles and executes with no errors when I get rid of the Sleep related code.
Even when I try to call Sleep with the stdcall convention, I still get an access violation error.
So obviously, my question is, how do I get rid of the access violation error, what am I doing wrong?
Edit:
This is the generated assembly for Sleep(500); in C++:
mov ecx,1F4h
call qword ptr [__imp_Sleep (13F54B308h)]
This generated assembly is confusing me... it looks like fastcall because it moves the parameter into ecx, but at the same time it doesn't create any shadow space. And I have no clue what this means: qword ptr [__imp_Sleep (13F54B308h)].
And again, edit, the full disassembly for main.
int main()
{
000000013F991020 push rdi
000000013F991022 sub rsp,20h
000000013F991026 mov rdi,rsp
000000013F991029 mov ecx,8
000000013F99102E mov eax,0CCCCCCCCh
000000013F991033 rep stos dword ptr [rdi]
Sleep(500); // this here is the asm generated by the compiler!
000000013F991035 mov ecx,1F4h
000000013F99103A call qword ptr [__imp_Sleep (13F99B308h)]
// call asm
Asm();
000000013F991040 call #ILT+5(Asm) (13F99100Ah)
// get char, return success
_getch();
000000013F991045 call qword ptr [__imp__getch (13F99B540h)]
return EXIT_SUCCESS;
000000013F99104B xor eax,eax
}
If Asm() were a normal C/C++ function, eg:
void Asm()
{
Sleep(1000);
}
The following is what my x64 compiler generates for it:
Asm proc
push rbp ; re-aligns the stack to a 16-byte boundary (CALL pushed 8 bytes for the caller's return address) as well as prepares for setting up a stack frame
sub rsp, 32 ; 32 bytes of shadow space
mov rbp, rsp ; finalizes the stack frame using the current stack pointer
; sleep for 1 second
mov ecx, 1000 ; ecx = sleep time
call Sleep ; call sleep
lea rsp, [rbp+32] ; get rid of shadow space
pop rbp ; clears the stack frame and sets the stack pointer back to the location of the caller's return address
ret ; return to caller
Asm endp
MSDN says:
The caller is responsible for allocating space for parameters to the callee, and must always allocate sufficient space for the 4 register parameters, even if the callee doesn’t have that many parameters.
Have a look at the following page for more information about how x64 uses the stack:
Stack Allocation

How can I get the "lea" instruction from a C++ function by disassembly?

I'm trying to learn reverse engineering, and I'm stuck on this little thing. I have code like this:
.text:10003478 mov eax, HWHandle
.text:1000347D lea ecx, [eax+1829B8h] <------
.text:10003483 mov dword_1000FA64, ecx
.text:10003489 lea esi, [eax+166A98h]<------
.text:1000348F lea edx, [eax+11FE320h]
.text:10003495 mov dword_1000FCA0, esi
and I'm wondering, how does it look like in C or C++? Especially the two instructions marked by arrows. HWHandle is variable which holds the a value returned from the GetModuleHandle() function.
More interesting is that a couple of lines below this instructions, dword_1000FCA0 is used as a function:
.text:1000353C mov eax, dword_1000FCA0
.text:10003541 mov ecx, [eax+0A0h]
.text:10003547 push offset asc_1000C9E4 ; "\r\n========================\r\n"
.text:1000354C call ecx
This will draw this text in my game console. Have you got any ideas, guys?
LEA is nothing more than an arithmetic operation : in that case, ECX is just filled with EAX+offset (the very address, not the pointed contents). if HWHandle pointed to a (very large) structure, ECX would just be one of its members.
This could be an associated source code:
extern A* HWHandle; // mov eax, HWHandle
B* ECX = HWHandle->someStructure; // lea ecx, [eax+1829B8h]
and later, one of B’s members is used as a function.
*(ECX->ptrFunction(someArg)) // mov ecx, [eax+0A0h]
// call ecx
Since HWHandle is a module handle, which is just the base address of a DLL, it looks as if the constants that are being added to this are offsets for functions or static data inside the DLL. The code is computing the addresses of these functions or data items and storing them for later use.
Since this is typically the job of a dynamic linker, I'm not sure that this assembly code corresponds to actual C++ code. It would be helpful to know what environment you're working in exactly -- since you refer to games consoles, is this Xbox code? Unfortunately, I don't know how exactly dynamic linking works on Xbox, but it looks as if this may be what is going on here.
In the specific case of dword_1000FCA0, it looks as if this is the location of a jump table (i.e. essentially a list of function pointers) inside the DLL. Your second code snippet is getting a function pointer from offset 0xA inside this table, then calling it -- apparently, the function being called outputs strings to the screen. (The pointer to the string to be output is pushed to the stack, which a usual x86 calling convention.) The C++ code corresponding to this would be something like
my_print_function("\r\n========================\r\n");
Edit:
If you want to call functions in a DLL yourself, the canonical way of getting at the function pointer is to use GetProcAddress():
FARPROC func=GetProcAddress(HWHandle, "MyFunction");
However, the code you posted is calculating offsets itself, and if you really want to do the same, you could use something like this:
DWORD func=(DWORD)HWHandle + myOffset;
myOffset is the offset you want to use -- of course, you'd need to have some way of determining this offset, and this can change every time the DLL is recompiled, so it's not a technique I would recommend -- but it is, after all, what you were asking but.
Regardless of which of these two ways you use to get at the address of the function, you need to call it. To do this, you need to declare a function pointer -- and to do that, you need to know the signature of your function (its parameters and return types). For example:
typedef void (*print_func_type)(const char *);
print_func_type my_func_pointer=(print_func_type)func;
my_func_pointer("\r\n========================\r\n");
Beware -- if you get the address of the function or its signature wrong, your code will likely crash. All part of the fun of this kind of low-level work.
It looks like HWHandle is apointer to some structure (a big one). lea instruction is reading address(es) from that structure, e.g:
mov eax, HWHandle
lea ecx, [eax+1829B8h]
mov dword_1000FA64, ecx
means:
Read address from HWHandle + 0x1829B8 and put it into ecx
Put that address (from ecx) into some (global) variable dword_1000FA64
The rest looks simmilar.
In C++ you can get it almost anywhere and you really cannot predict where (depends on a compiler and optimizations), e.g.:
int x;
int* pX = &X;
The second line may generate lea.
Another example:
struct s
{
int x;
int y;
};
my_s s;
int Y = s.y; //here: probably lea <something> , [address(my_s) + 0x4]
Hope that helps.
In C++ this is roughly equivalent to
char* ecx, eax, esi;
ecx = eax+0x1829B8 // lea ecx, [eax+1829B8h]
esi = eax+0x166A98 // lea esi, [eax+166A98h]
Under the assumption that eax, esi and ecx are really holding pointers to memory locations. Of course the lea instruction can be used to to simple arithmetic too, and in fact it often is used for addition by the compilers. The advantage compared to a simple add: It can have up to three input operands and a different destination.
For example, foo = &bar->baz is the same as (simplified) foo = (char *)bar + offsetof(typeof(*bar), baz), which can be translated to lea foo, [bar+offsetofbaz].
It really is compiler and optimization dependent, but if IIRC, lea could be emitted just for additions.... So lea ecx, [eax+1829B8h] can be understood as ecx = eax + 0x1829B8