C++/Assembly pushing arguments onto the stack for function call

C++/Assembly pushing arguments onto the stack for function call - c++

i have a special case where i need to push arguments onto the stack one at a time and then call a function which takes a callable as an argument and pass those arguments that were pushed to the stack to the callable/function. to solve this i created a few functions in assembly to push arguments onto the stack and call a function passing those arguments. however it seams to work but my code started breaking in weird places whenever i made calls to other functions inside the callable passed to CallFunction.
i need this function to take any callable. any help would be much appreciated.
please note that std::bind and other similar functions are not an option. each argument needs to be pushed onto the stack one at a time using a function call and i need to be able to call functions between calls to PushArg and CallFunction should return the return value of the function it executes.
What ive tried:
saving the stack pointer at the beginning of the call and restoring it before i return (current state)
creating my own stack in memory for storing arguments (too complex and messy and not as efficient as just using the registers and existing stack. would like to avoid if possible)
Code:
.data
FunctionPointer QWORD 0
ReturnPointer QWORD 0
StackPointer QWORD 0
.code
ALIGN 16
FT_StartCall PROC
mov [StackPointer],rsp
ret
FT_StartCall ENDP
FT_PushIntPointer PROC
pop ReturnPointer
push rcx
push ReturnPointer
ret
FT_PushIntPointer ENDP
FT_CallFunction PROC
;save return address in Memory
pop ReturnPointer
mov FunctionPointer,rcx
pop rcx
pop rdx
pop r8
pop r9
push r9
push r8
push rdx
push rcx
call FunctionPointer
mov rcx,0
mov rdx,0
mov r8,0
mov r9,0
mov rsp, [StackPointer]
push ReturnPointer
ret
FT_CallFunction ENDP
END
C++:
extern "C" void* __fastcall FT_CallFunction(void* Function);
extern "C" void __fastcall FT_PushIntPointer(void* ArgOrPointer);
extern "C" void __fastcall FT_StartCall();
int ADD(int first,int second){
return first+second;
}
int main(){
FT_StartCall();
FT_PushIntPointer(3);
FT_PushIntPointer(2);
int res = FT_CallFunction(ADD);
return res;
}
Output:
5

As everyone agrees, you are delving into UB territory with this code, but it does have some experimental value, I guess.
You should consider allocating your own private stack for something a bit more robust.
This bit of code:
pop rcx
pop rdx
pop r8
pop r9
push r9
push r8
push rdx
push rcx
Does nothing, besides scrambling rcd, rdx, r8 and r9. You should remove it.
The same goes for this pîece of code:
mov rcx,0
mov rdx,0
mov r8,0
mov r9,0
Did you try something like this?
FT_CallFunction PROC
; save return address in Memory
pop [ReturnPointer]
; call and replace with our own return point.
call [rcx]
mov rsp,[StackPointer]
; return to caller
push [ReturnPointer]
ret
FT_CallFunction ENDP

Related

Implementation of Stackful Coroutine in C++

Now I am trying to implement stackful coroutine in C++17 on Windows x64 OS, but, unfortunately, I have encountered the problem: I can't throw exception in my coroutine, if I do so, the program is immediately terminated with a bad exit code.
Implementation
At the begining, I allocate a stack for a new coroutine, the code looks something like that:
void* Allocate() {
static constexpr std::size_t kStackSize{524'288};
auto new_stack{::operator new(kStackSize)};
return static_cast<std::byte *>(new_stack) + kStackSize;
}
The next step is setting a trampoline function on the recently allocated stack. The code is written using MASM, since I utilize MVSC (I would like to use GCC and NASM but I have the problem with thread_local variables, see question, if it is interesting):
SetTrampoline PROC
mov rax, rsp ; saves the current stack pointer
mov rsp, [rcx] ; sets the new stack pointer
sub rsp, 20h ; shadow stack
push rdx ; saves the function pointer
; place for nonvolatile registers
sub rsp, 0e0h
mov [rcx], rsp ; saves the moved stack pointer
mov rsp, rax ; returns the initial stack pointer
ret
SetTrampoline ENDP
Then I switch machine context with this assembly function (I read this calling convetion):
SwitchContext PROC
; saves all nonvolatile registers to the caller stack
push rbx
push rbp
push rdi
push rsi
push r12
push r13
push r14
push r15
sub rsp, 10h
movdqu [rsp], xmm6
; ... pushes xmm7 - xmm14 in here, removed for brevity
sub rsp, 10h
movdqu [rsp], xmm15
mov [rdx], rsp ; saves the caller stack pointer
SwitchContextFinally PROC
mov rsp, [rcx] ; sets the callee stack pointer
; takes out the callee registers
movdqu xmm15, [rsp]
add rsp, 10h
; ... pops xmm7 - xmm14 in here, removed for brevity
movdqu xmm6, [rsp]
add rsp, 10h
pop r15
pop r14
pop r13
pop r12
pop rsi
pop rdi
pop rbp
pop rbx
ret
SwitchContextFinally ENDP
SwitchContext ENDP
Problem
Inside the trampoline I just invoke any passed function and within these functions I can't throw exceptions and catch them instantly in the same fucntion. What have I done wrong? Is it possible to throw exceptions in my case? Should I have shadow stack in SetTrampoline?
Also, I guarantee that the exception thrown don't go outside the trampoline function.

Why are parameters allocated below the frame pointer instead of above?

I have tried to understand this basing on a square function in c++ at godbolt.org . Clearly, return, parameters and local variables use “rbp - alignment” for this function.
Could someone please explain how this is possible?
What then would rbp + alignment do in this case?
int square(int num){
int n = 5;// just to test how locals are treated with frame pointer
return num * num;
}
Compiler (x86-64 gcc 11.1)
Generated Assembly:
square(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi. ;\\Both param and local var use rbp-*
mov DWORD PTR[rbp-4], 5. ;//
mov eax, DWORD PTR [rbp-20]
imul eax, eax
pop rbp
ret

This is one of those cases where it’s handy to distinguish between parameters and arguments. In short: arguments are the values given by the caller, while parameters are the variables holding them.
When square is called, the caller places the argument in the rdi register, in accordance with the standard x86-64 calling convention. square then allocates a local variable, the parameter, and places the argument in the parameter. This allows the parameter to be used like any other variable: be read, written into, having its address taken, and so on. Since in this case it’s the callee that allocated the memory for the parameter, it necessarily has to reside below the frame pointer.
With an ABI where arguments are passed on the stack, the callee would be able to reuse the stack slot containing the argument as the parameter. This is exactly what happens on x86-32 (pass -m32 to see yourself):
square(int): # #square(int)
push ebp
mov ebp, esp
push eax
mov eax, dword ptr [ebp + 8]
mov dword ptr [ebp - 4], 5
mov eax, dword ptr [ebp + 8]
imul eax, dword ptr [ebp + 8]
add esp, 4
pop ebp
ret
Of course, if you enabled optimisations, the compiler would not bother with allocating a parameter on the stack in the callee; it would just use the value in the register directly:
square(int): # #square(int)
mov eax, edi
imul eax, edi
ret

GCC allows "leaf" functions, those that don't call other functions, to not bother creating a stack frame. The free stack is fair game to do so as these fns wish.

Stack must be clean before function epilogue confusion

I'm studying assembly language from the book "Assembly Language Step-by-Step: Programming with Linux" by Jeff Dunteman, and have come across an interesting paragraph in the book which I'm most likely misunderstanding, hence would appreciate some clarification on:
"The stack must be clean before we destroy the stack frame and return control. This simply means that any temporary values that we may have pushed onto the stack during the program’s run must be gone. All that is left on the stack should be the caller’s EBP, EBX, ESI, and EDI values.
...
Once the stack is clean, to destroy the stack frame we must first pop the caller’s register values back into their registers, ensuring that the pops are in the correct order.
...
We restore the caller’s ESP by moving the value from EBP into ESP, and finally pop the caller’s EBP value off the stack."
Consider the following code generated from Visual Studio 2008:
int myFuncSum( int a, int b)
{
001B1020 push ebp
001B1021 mov ebp,esp
001B1023 push ecx <------------------
int c;
c = a + b;
001B1024 mov eax,dword ptr [ebp+8]
001B1027 add eax,dword ptr [ebp+0Ch]
001B102A mov dword ptr [ebp-4],eax
return c;
001B102D mov eax,dword ptr [ebp-4]
}
001B1030 mov esp,ebp
001B1032 pop ebp
001B1033 ret
The value of ecx (indicated), pushed to make space on the stack for my variable c, is, as far as I can see, only gone from the stack when we reset ESP; however, as quoted, the book states that the stack must be clean before we reset ESP. Can someone please clarify whether or not I am missing something?

The example from Visual Studio 2008 doesn't contradict the book. The book is covering the most elaborate case of a call. See the x86-32 Calling Convention as a cross-reference which spells it out with pictures.
In your example, there were no caller registers saved on the stack, so there are no pop instructions to be performed. This is part of the "clean up" that must occur before mov esp, ebp that the book is referring to. So more specifically, let's suppose the callee is saving si and di for the caller, then the prelude and postlude for the function might look like this:
push ebp ; save base pointer
mov ebp, esp ; setup stack frame in base pointer
sub esp, 4 ; reserve 4 bytes of local data
push si ; save caller's registers
push di
; do some stuff, reference 32-bit local variable -4(%ebp), etc
; Use si and di for our own purposes...
; clean up
pop di ; do the stack clean up
pop si ; restoring the caller's values
mov esp, ebp ; restore the stack pointer
pop ebp
ret
In your simple example, there were no saved caller registers, so no final pop instructions needed at the end.
Perhaps because it's simpler or faster, the compiler elected to do the following instruction in place of sub esp, 4:
push ecx
But the effect is the same: reserve 4 bytes for a local variable.

Notice the instruction:
push ebp
mov ebp,esp ; <<<<=== saves the stack base pointer
and the instruction:
mov esp,ebp ; <<<<<== restore the stack base pointer
pop ebp
So after this sequence the stack is clean again

How to use variables in __asm?

I'm compiling this C++ code with the VC compiler. I'm trying to call a function that takes two WORD (aka unsigned short) parameters using the __asm statement, like this:
__declspec(naked) void __stdcall CallFunction(WORD a, WORD b)
{
__asm {
PUSH EBP
MOV EBP, ESP
PUSH a
PUSH b
CALL functionAddress
LEAVE
RETN
}
}
The function at functionAddress simply outputs the result of doing a + b. Then calling CallFuncion(5, 5); prints "64351" or something like that. The problem is when using the a and b variables inside the __asm statement because this works:
PUSH EBP
MOV EBP, ESP
PUSH 5
PUSH 5
CALL functionAddress
LEAVE
This is the function at functionAddress:
void __stdcall Add(WORD a, WORD b)
{
WORD c;
c = a + b;
printf("The result is %d\n", c);
}
How can I do this the right way? So the __asm statement interpretate the a and b values?

Since you're using __declspec(naked) and setting up your own stack frame, I don't believe the compiler will let you refer to a and b by name. Using __declspec(naked) basically means you're responsible for dealing with the stack frame, parameters, etc., on your own.
You probably want code more on this general order:
__asm {
PUSH EBP
MOV EBP, ESP
mov eax, [ebp+8]
mov ebx, [ebp+12]
push eax
push ebx
CALL functionAddress
LEAVE
RETN
}
I'ts been a while since I've handled things like this by hand, so you might want to re-check those offsets, but if I recall correctly, the return address should be at [ebp+4]. Parameters are (usually) pushed from right to left, so the the left-most parameter should be next at [ebp+8], and the next parameter at [ebp+12] (keeping in mind that the stack grows downward).
Edit: [I should have looked more carefully at the function heading.]
You've marked CallFunction as using the __stdcall calling convention. That means it's required to clean up the parameters that were passed to it. So, since it receives 8 bytes of parameters, it needs to remove 8 bytes from the stack as it returns:
PUSH EBP
MOV EBP, ESP
mov eax, [ebp+8]
mov ebx, [ebp+12]
push eax
push ebx
CALL Add_f
LEAVE
RET 8

Static methods save memory? (unmanaged code)

In the wake of this question about static methods in managed code, I'm interesting if the answers there is relevant to unmanaged code like c++.
I make thousands of instances, and my question is mainly about static methods. Do this methods save memory compared regular methods?
thank you, and sorry about my poor English.

All methods require their binary code to be in memory in order to run. The executable code for static and non-static methods is (largely) the same.
Both types of methods require only one place in memory, so they're not replicated with every instance of the class.
Let's now take a look at some code:
class A
{
public:
void foo();
static void goo();
};
void A::foo()
{
004113D0 push ebp
004113D1 mov ebp,esp
004113D3 sub esp,0CCh
004113D9 push ebx
004113DA push esi
004113DB push edi
004113DC push ecx
004113DD lea edi,[ebp-0CCh]
004113E3 mov ecx,33h
004113E8 mov eax,0CCCCCCCCh
004113ED rep stos dword ptr es:[edi]
004113EF pop ecx
004113F0 mov dword ptr [ebp-8],ecx
}
004113F3 pop edi
004113F4 pop esi
004113F5 pop ebx
004113F6 mov esp,ebp
004113F8 pop ebp
004113F9 ret
void A::goo()
{
00411530 push ebp
00411531 mov ebp,esp
00411533 sub esp,0C0h
00411539 push ebx
0041153A push esi
0041153B push edi
0041153C lea edi,[ebp-0C0h]
00411542 mov ecx,30h
00411547 mov eax,0CCCCCCCCh
0041154C rep stos dword ptr es:[edi]
}
0041154E pop edi
0041154F pop esi
00411550 pop ebx
00411551 mov esp,ebp
00411553 pop ebp
00411554 ret
int main()
{
A a;
a.foo();
0041141E lea ecx,[a]
00411421 call foo (4111E5h)
a.goo();
00411426 call A::goo (4111EAh)
return 0;
}
There are only minor differences, such as pushing the this pointer onto the stack for the non-static function, but they are minor, and probably a decent optimizer will reduce the differences even further.
A decision about whether or not to use static functions should be strictly design-driven, not memory-driven.

Static methods are essentially just free functions and so their memory footprint is the same. Member functions have an extra parameter and so the added memory is slightly larger, although it's meaningless to care about such things.
The amount of memory a function takes up is per-class, not per-instance. You shouldn't be concerned.

Short answer: No. A method is a function with an implicit first argument equal to its class, and a static function lacks this first argument. Actually, the situation is just the same as in garbage collected languages, so the answers to the other question apply fully.

The difference between a static and instance method is just the first parameter. In C++ all instance methods compile to a normal function with a substituted first parameter called this which is a pointer to the object on which the method was called.
On most architectures this will be an 8-byte value, so it's not really significant unless you're doing some very resource-strict embedded systems coding.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++/Assembly pushing arguments onto the stack for function call - c++

Related

Implementation of Stackful Coroutine in C++

Why are parameters allocated below the frame pointer instead of above?

Stack must be clean before function epilogue confusion

How to use variables in __asm?

Static methods save memory? (unmanaged code)

Categories

Resources