What happens in assembly language when you call a method/function? - c++

If I have a program in C++/C that (language doesn't matter much, just needed to illustrate a concept):
#include <iostream>
void foo() {
printf("in foo");
}
int main() {
foo();
return 0;
}
What happens in the assembly? I'm not actually looking for assembly code as I haven't gotten that far in it yet, but what's the basic principle?

In general, this is what happens:
Arguments to the function are stored on the stack. In platform specific order.
Location for return value is "allocated" on the stack
The return address for the function is also stored in the stack or in a special purpose CPU register.
The function (or actually, the address of the function) is called, either through a CPU specific call instruction or through a normal jmp or br instruction (jump/branch)
The function reads the arguments (if any) from the stack and the runs the function code
Return value from function is stored in the specified location (stack or special purpose CPU register)
Execution jumps back to the caller and the stack is cleared (by restoring the stack pointer to its initial value).
The details of the above vary from platform to platform and even from compiler to compiler (see e.g. STDCALL vs CDECL calling conventions). For instance, in some cases, CPU registers are used instead of storing stuff on the stack. The general idea is the same though

You can see it for yourself:
Under Linux 'compile' your program with:
gcc -S myprogram.c
And you'll get a listing of the programm in assembler (myprogram.s).
Of course you should know a little bit about assembler to understand it (but it's worth learning because it helps to understand how your computer works). Calling a function (on x86 architecture) is basically:
put variable a on stack
put variable b on stack
put variable n on stack
jump to address of the function
load variables from stack
do stuff in function
clean stack
jump back to main

What happens in the assembly?
A brief explanation: The current stack state is saved, a new stack is created and the code for the function to be executed is loaded and run. This involves inconveniencing a few registers of your microprocessor, some frantic to and fro read/writes to the memory and once done, the calling function's stack state is restored.

What happens? In x86, the first line of your main function might look something like:
call foo
The call instruction will push the return address on the stack and then jmp to the location of foo.

Arguments are pushed in stack and "call" instruction is made
Call is a simple "jmp" with pushing an address of instruction into stack ("ret" in the end of a method popping it and jumping on it)

I think you want to take a look at call stack to get a better idea what happens during a function call: http://en.wikipedia.org/wiki/Call_stack

A very good illustration:
http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.pdf

What happens?
C mimics what will occur in assembly...
It is so close to machine that you can realize what will occur
void foo() {
printf("in foo");
/*
db mystring 'in foo'
mov eax, dword ptr mystring
mov edx , dword ptr _printf
push eax
call edx
add esp, 8
ret
//thats it
*/
}
int main() {
foo();
return 0;
}

1- a calling context is established on the stack
2- parameters are pushed on the stack
3- a "call" is performed to the method

The general idea is that you need to
Save the current local state
Pass the arguments to a function
Call the actual function. This involves putting the return address somewhere so the RET instruction knows where to continue.
The specifics vary from architecture to architecture. And the even more specific specifics might vary between various languages. Although there usually are ways of controlling this to some extent to allow for interoperability between different languages.
A pretty useful starting point is the Wikipedia article on calling conventions. On x86 for example the stack is almost always used for passing arguments to functions. On many RISC architectures, however, registers are mainly used while the stack is only needed in exceptional cases.

The common idea is that the registers that are used in the calling method are pushed on the stack (stack pointer is in ESP register), this process is called "push the registers". Sometimes they're also zeroed, but that depends. Assembly programmers tend to free more registers then the common 4 (EAX, EBX, ECX and EDX on x86) to have more possibilities within the function.
When the function ends, the same happens in the reverse: the stack is restored to the state from before calling. This is called "popping the registers".
Update: this process does not necessarily have to happen. Compilers can optimize it away and inline your functions.
Update: normally parameters of the function are pushed on the stack in reverse order, when they are retrieved from the stack, they appear as if in normal order. This order is not guaranteed by C. (ref: Inner Loops by Rick Booth)

Related

How to mark that C++ function modifies all possible registers?

If I have some non-inline function and C++ compiler knows that this function modifies some registers then compiler will save all necessary registers before doing function CALL.
At least I expect that compiler does this (saving) as far as it knows what registers will be modified inside called function.
Now imagine that my function modifies ALL possible registers of CPU (general purpose, SIMD, FPU, etc.). How can I enforce compiler to save everything what it needs before doing any CALL to this function? To remind, my function is non-inline, i.e. is called through CALL instruction.
Of course through asm I can push all possible registers on stack at my function start and pop all registers back before function return.
Although I can save ALL possible registers I would better prefer if compiler saves only necessary registers, that were used by function's caller, for performance (speed) and memory usage reasons.
Because inside my function I don't know in advance who will use it hence I have to save every possible register. But at the place where my function was used compiler knows exactly what registers are used in caller's function hence it may save much fewer registers needed, because for sure not all registers will be used.
Hence I want to mark my function as "modifying all registers" so that C++ compiler will push to stack just registers that it needs before calling my function.
Is there any way to do this? Any GCC/CLang/MSVC attribute of function? Or maybe listing all registers in clobber section of asm statement?
Main thing is that I don't want to save registers myself inside this function (for some specific reason), instead I want all callers to save all needed registers before calling my function, but I want all callers to be aware that my function modifies everything what is possible.
I'm looking for some imaginary modifies-all attribute like:
__attribute__((modifies_all_registers)) void f();
I did following experiment:
Try it online!
__attribute__((noinline)) int modify(int i) {
asm volatile(
""
: "+m" (i) ::
"rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp", "rsp",
"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
"xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
"xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
"ymm0", "ymm1", "ymm2", "ymm3", "ymm4", "ymm5", "ymm6","ymm7",
"ymm8", "ymm9", "ymm10", "ymm11", "ymm12", "ymm13", "ymm14", "ymm15",
"zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5", "zmm6", "zmm7",
"zmm8", "zmm9", "zmm10", "zmm11", "zmm12", "zmm13", "zmm14", "zmm15"
);
return i + 1;
}
int main(int argc, char ** argv) {
auto volatile x = modify(argc);
}
in other words I asm-clobbered almost all possible registers, and compiler generated following push-sequence inside modify() (and also same pop sequence at the end):
push rbp
mov rbp, rsp
push r15
push r14
push r13
push r12
push rbx
nothing else was pushed, so I can see that somehow compiler (CLang) didn't care about other regiesters except rbx, rbp, r12-r15. Does it mean that there is some C++ calling convention that says that I can modify any other registers besides these few, without restoring them on function return?
Does it mean that there is some C++ calling convention that says that I can modify any other registers besides these few, without restoring them on function return?
Yes. Among other things, ABI specification that is used on a given platform defines function calling conventions. Calling conventions define a set of registers that are allowed to be clobbered by the function and a set of registers that are required to be preserved by the function. If registers of the former set contain useful data for the caller, the caller is expected to save that data before the call. If registers from the latter set have to be used in the called function, the function must save and restore these registers before returning.
There are also conventions regarding which registers, in what order, are used to pass arguments to the function and to receive the returned value. You can consider those registers as clobbered, since the caller must initialize them with parameter values (and thus save any useful data that was in those registers before the call) and the callee is allowed to modify them.
In your case, the asm statement marks all registers as clobbered, and the compiler only saves and restores registers that it is required to preserve across the function call. Note that by default the caller will always save the registers from the clobber set before a function call, whether they are actually modified by the function or not. In some cases, the optimizer may be able to remove saving the registers that are not actually modified - for example, if the function call is inlined or the compiler is able to analyze the function body (e.g. in case of LTO). However, if the function body is not known at compile time, the compiler must assume the worst and adhere the ABI specification.
So, in general, you do not need to mark the function in any special way - the ABI rules already work in such a way that registers are saved and restored as needed. And, as you witnessed yourself, even with asm statements the compilers are able to tell which registers are used in a function. If you still want to save specific, or all, registers for some reason, your only option is to write in assembler. Or, in case if you're implementing some sort of context switching, use specialized instructions like XSAVE/XRSTOR or APIs like ucontext.

How can one protect a register from being overwritten during a function call?

I am trying to write assembly code for the following function:
#include <iostream>
void f(int x) {
if (x > 0) {
std::cout << x << std::endl;
f(x-1);
std::cout << x << std::endl;
}
}
int main() {
f(1);
}
The output of this function script is 1 1. I try to write the assembly code for the so-called "low-cost computer" assembler, a computer invented by Anthony Dos Reis for his book "C and C++ under the hood". The assembly code I wrote is:
startup jsr main
halt ; back to operating system
;==============================================================
; #include <stdio.h>
greater dout
nl
sub r1, r0, 1
push lr
push fp
mov fp, sp
push r1
jsr f
add sp, sp, 1
dout
nl
mov sp, fp
pop fp
pop lr
ret
;==============================================================
f push lr ; int f()
push fp ; {
mov fp, sp
ldr r0, fp, 2
cmp r0, 0
brgt greater
mov sp, fp
pop fp
pop lr
ret
;==============================================================
main push lr
push fp
mov fp, sp
mov r0, 1
push r0
jsr f
add sp, sp, 1
mov sp, fp
pop fp
pop lr
ret
The code prints 1 0 to stdout, and is obviously false. The reason for the false output lies in the fact that the register r0 contains 1 before it jumps to the function f during evaluation of the branch greater, and then the function f modifies the register and sets r0 to 0 when doing the comparison cmp. This let me wonder how I the assembler can keep the registers invariant during function calls. I thought of the following:
By simply pushing the entire register to the stack and loading it again afterwards
From somehow gleaning what the function call thus and then "protecting" the registers if it needs to.
I thought the solution 1 is very defensive and likely costly, whilst solution 2 is probably complicated and assumes a lot of knowledge. Most likely, I simply made a mistake in writing the assembly, but I still don't understand how the assembler can keep its registers invariant when it needs to in general cases, or how one can address such a problem as outlined above. Does somebody know what to do in this case? I am grateful for any help or suggestions!
As the others are saying, usually each register is assigned a usage model by an agreement called the calling convention.  There are several usage models:
Call clobbered — sometimes also called "scratch", these registers are understood to be clobbered by a function call, and as such, they can be used in between calls, and are free to be used by any function.Sometimes these are called "caller saves" because the caller is responsible for preserving any of their values if they are needed after the call; also known with the term "volatile".  In practice, however, once moved to memory, they don't need to be restored until their values are actually needed; those values don't need to be restored to the same register they were in when stored to memory, and, on some architectures, the values can be used directly from memory as well.
Call preserved — these registers are understood to be preserved by function calling, which means that in order to use one of these the original contents of the register must be preserved (typically on function entry) and restored later (typically at function exit).  Sometimes also called "callee saves" because as the caller can rely their values being preserved, a callee must save & restore them if used; also known with the term non-volatile.
Others — on some processors certain registers are dedicated to parameter passing and return values; because of these uses, they don't necessarily behave strictly as call clobbered or call preserved — i.e. it is not necessarily the called function that clobbers them but the caller may be required to clobber them in merely making the call, i.e. in parameter passing before the call.  A function can have formal parameter values in these registers that are needed after a call, and yet need to place actual arguments into them in the instruction sequence of calling another function.  When this occurs, the parameter values needed after a call are typically relocated (to memory or to call preserved registers).
From somehow gleaning what the function call thus and then "protecting" the registers if it needs to.
This can work.  The calling convention is a general purpose agreement that is particularly useful when caller or callee does not know the implementation details of the other, or when an indirect function call (call by pointer) could call one of several different actual functions.  However, when both a callee and caller are known in implementation, then as an optimization we can deviate from the standard calling convention.  For example, we can use a scratch register to hold a value live across a particular call if we know the called function does not modify that scratch register.
Commonly, a computing platform includes an Application Binary Interface (ABI) that specifies, among other things, protocols for function calls. The ABI specifies that certain processor registers are used for passing arguments or return results, certain registers may be freely used by the called routine (scratch or volatile registers), certain registers may be used by the called routine but must be restored to their original values (preserved or non-volatile registers), and/or certain registers must not be altered by the called routine. Rules about calling routines may also be called a “calling convention.”
If your routine uses one of the registers that called functions are free to use, and it wants the value of that register to be retained across a function call, it must save it before the function call and restore it afterward.
Generally, the ABI and functions seek a balance in which registers they use. If the ABI said all registers had to be saved and restored, then a called function would have to save and restore each register it used. Conversely, if the ABI said no registers have to be saved and restored, then a calling function would have to save and restore each register it needed. With a mix, a calling routine may often have some of its data in registers that are preserved (so it does not have to save them before the call and restore them afterward), and a called routine will not use all of the registers it must preserve (so it does not have to save and restore them; it uses the scratch registers instead), so the overall number of register saves and restores that are performed is reduced.
Architecture/Platform combinations such as Windows-on-x64, or Linux-on-ARM32, have what's called ABIs, or Application Binary Interfaces.
The ABI specifies precisely how registers are used, how functions are called, how exceptions work, and how function arguments are passed. An important aspect of this is, during a function call, what registers must be saved, and who saves them, and what registers may be destroyed.
Registers that can be destroyed during a function call are called volatile registers. Registers that must be preserved by a function call are called non-volatile registers. Typically, if a caller wants to preserve a volatile register, it pushes the value onto the stack, calls the function, and then pops it off when done. If a called function (callee) wants to use a non-volatile register, it must save it similarly.
Read the Windows x64 calling convention here.

Why does `RtlGetFullPathName_U` look different in ntdll.dll and reactos' docs?

I'm hooking an udocumented Windows API function RtlGetFullPathName_U (residing in ntdll.dll), to detect process injections in my game. However, the function type looks different when looking at the function in IDA, and when looking at the function through the only info I could find about the function (from ReactOS's docs).
When looking in IDA:
The file analyzed above is ntdll.dll found through x32dbg:
When looking in ReactOS' docs, I see RtlGetFullPathName_U looks like this:
ULONG
NTAPI
RtlGetFullPathName_U(
IN PCWSTR FileName,
IN ULONG Size,
IN PWSTR Buffer,
OUT PWSTR *ShortName
);
Using ReactOS' version of RtlGetFullPathName_U works when I hook, but I notice a difference in amount of parameters, why is that? I mean my approach would normally be to see the exported functions through IDA, not through ReactOS' documentation.
A last question; are there other relevant functions I could hook to detect process injections? Besides LoadLibraryA/W/Ex?
As you can see in the disassembly, the function uses push ecx early on, followed by saving the address of the just-pushed value in eax. The address in eax is then pushed onto the stack as an argument for the next function.
So what you read in the decompiler output is not technically wrong: it stores the value of ecx in a local variable and then passes the address of that local variable to RtlGetFullPathName_UEx.
To capture this, IDA assumes that the value passed to the function in ecx might matter and marks it as a parameter.
However, most likely, the real purpose of the push ecx instruction here is not to save the value of ecx, but simply to reserve four bytes on the stack for a local variable (a more common idiom for which would be sub esp, 4). Using push is an optimization.
To confirm this definitively, you would have to analyze the called function, RtlGetFullPathName_UEx, and see whether it ever reads the contents of the memory pointed to by its last parameter. If, as I strongly suspect, it does not, and this parameter is only used for output, then the value in the caller can simply be considered uninitialized.
After you've confirmed this (or if for some other reason, e.g. trusting ReactOS's declaration, you believe this is the case), you can modify the function prototype to use __stdcall and remove the void *this parameter in IDA, and it will show it as what it (probably) is: passing a pointer to an uninitialized local variable.

Pop{pc} in assembly

This may be a stupid question, but in my assembly code, during debugging, I have
pop{r2-r6,pc}
and I think it is giving me an hard fault exception. I understand what pop does, but I am unsure what the pc part means. I cannot find it explained anywhere on the internet and it is not a variable in my code anywhere.
I am using keil on an stm32 in c++
pc or r15 is the program counter, the register which gives the address that the processor fetches instructions from. Changing it to another address makes the program execution jump to that address.
In this case, the address is read off the stack to return from a function call; the return address would have been pushed onto the stack (from the link register lr or r14) at the start of the function.
If that's causing a crash, then it's probably because the address on the stack has been corrupted. Perhaps you're writing outside the bounds of a local array, or overflowing the stack with too deep a function call level.
The PC register is the program counter, it holds the address of the next instruction to be executed on an ARM architecture (STM32 uses the ARM architecture).
The default in ARM assembly it to simply overwrite the PC register when a function is to return. What you are seeing with the pop statement is just a direct way to return, see here.
The rest of your question is neatly explained in Mike's post.

How does the compiler know where control should return to after a function call?

Consider the following functions:
int main()
{
//statement(s);
func1();
//statement(s);
}
void func1()
{
//statement(s);
func2();
//statement(s);
}
void func2()
{
//statement(s);
}
How does the compiler know where to return to after the func2 has performed all its operations? I know the control transfers to function func1 (and exactly which statement), but how does the compiler knows it? What tells the compiler where to return to?
This is typically implemented using a call stack:
When control is being transfered to a function, the address to return to is pushed onto the stack.
When the function finishes, the address is popped off the stack and used to transfer control back to the callee.
The details are typically mandated by the hardware architecture for which the code is being compiled.
Actually, the compiler doesn't run the code, but the machine does, and when it calls a new function, it stores the address of the next instruction to be executed after the function currently being called on the stack, so that when the function returns it can pop it off back in to the Instruction Pointer (IP) and resume from there.
I've simplified things a bit for the sake of explanation.
When a function is called, the correct return address in the calling function is placed somewhere, usually the stack though the standard does not mandate that, that is used for precisely the purpose of storing the return address.
It is the compiler's duty to ensure that its calling conventions are such that unless something goes wrong (for example, a stack overflow), then the called function knows how to return to the calling function.
The runtime makes use of some thing called as a 'call stack' which basically holds the address of the next statement to call after the function being called is returned. So when a function call is made and before the control jumps to the new instruction address, the next instruction address in the calling function is pushed on to the stack. And this process is repeated for every subsequent call to any function. Now why only a stack? because it's necessary to get back to the point where it left off - which is basically a 'last in first out' behavior and stack is the data structure that does that. You can actually look at this call stack when you are debugging a program in Visual Studio - there's a separate window called 'Call Stack' which shows the entries of the addresses placed in the call stack.