Function Prologue and Epilogue in C - c++

I know data in nested function calls go to the Stack.The stack itself implements a step-by-step method for storing and retrieving data from the stack as the functions get called or returns.The name of these methods is most known as Prologue and Epilogue.
I tried with no success to search material on this topic. Do you guys know any resource ( site,video, article ) about how function prologue and epilogue works generally in C ? Or if you can explain would be even better.
P.S : I just want some general view, not too detailed.

There are lots of resources out there that explain this:
Function prologue (Wikipedia)
x86 Disassembly/Calling Conventions (WikiBooks)
Considerations for Writing Prolog/Epilog Code (MSDN)
to name a few.
Basically, as you somewhat described, "the stack" serves several purposes in the execution of a program:
Keeping track of where to return to, when calling a function
Storage of local variables in the context of a function call
Passing arguments from calling function to callee.
The prolouge is what happens at the beginning of a function. Its responsibility is to set up the stack frame of the called function. The epilog is the exact opposite: it is what happens last in a function, and its purpose is to restore the stack frame of the calling (parent) function.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep track of the function's stack frame. The esp register is used by the processor to point to the most recent addition (the top value) on the stack. (In optimized code, using ebp as a frame pointer is optional; other ways of unwinding the stack for exceptions are possible, so there's no actual requirement to spend instructions setting it up.)
The call instruction does two things: First it pushes the return address onto the stack, then it jumps to the function being called. Immediately after the call, esp points to the return address on the stack. (So on function entry, things are set up so a ret could execute to pop that return address back into EIP. The prologue points ESP somewhere else, which is part of why we need an epilogue.)
Then the prologue is executed:
push ebp ; Save the stack-frame base pointer (of the calling function).
mov ebp, esp ; Set the stack-frame base pointer to be the current
; location on the stack.
sub esp, N ; Grow the stack by N bytes to reserve space for local variables
At this point, we have:
...
ebp + 4: Return address
ebp + 0: Calling function's old ebp value
ebp - 4: (local variables)
...
The epilog:
mov esp, ebp ; Put the stack pointer back where it was when this function
; was called.
pop ebp ; Restore the calling function's stack frame.
ret ; Return to the calling function.

C Function Call Conventions and the Stack explains well the concept of a call stack
Function prologue briefly explains the assembly code and the hows and whys.
The gen on function perilogues

I am quite late to the party & I am sure that in the last 7 years since the question was asked, you'd have gotten a way clearer understanding of things, that is of course if you chose to pursue the question any further. However, I thought I would still give a shot at especially the why part of the prolog & the epilog.
Also, the accepted answer elegantly & quite simply explains the how of the epilog & the prolog, with good references. I only intend to supplement that answer with the why (at least the logical why) part.
I will quote the below from the accepted answer & try to extend it's explanation.
In IA-32 (x86) cdecl, the ebp register is used by the language to keep
track of the function's stack frame. The esp register is used by the
processor to point to the most recent addition (the top value) on the
stack.
The call instruction does two things: First it pushes the return
address onto the stack, then it jumps to the function being called.
Immediately after the call, esp points to the return address on the
stack.
The last line in the quote above says immediately after the call, esp points to the return address on the stack.
Why's that?
So let's say that our code that's getting currently executed has the following situation, as shown in the (really badly drawn) diagram below
So our next instruction to be executed is, say at the address 2. This is where the EIP is pointing. The current instruction has a function call (that would internally translate to the assembly call instruction).
Now ideally, because the EIP is pointing to the very next instruction, that would indeed be the next instruction to get executed. But since there's sort of a diversion from the current execution flow path, (that is now expected because of the call) the EIP's value would change. Why? Because now another instruction, that may be somewhere else, say at the address 1234 (or whatever), may need to get executed. But in order to complete the execution flow of the program as was intended by the programmer, after the diversion activities are done, the control must return back to the address 2 as that is what should have been executed next should the diversion have not happened. Let us call this address 2 as the return address in the context of the call that is being made.
Problem 1
So, before the diversion actually happens, the return address, 2, would need to be stored somewhere temporarily.
There could have been many choices of storing it in any of the available registers, or some memory location etc. But for (I believe good reason) it was decided that the return address would be stored onto the stack.
So what needs to be done now is increment the ESP (the stack pointer) such that the top of the stack now points at the next address on the stack. So TOS' (TOS before the increment) which was pointing to the address, say 292, now gets incremented & starts pointing to the address 293. That is where we put our return address 2. So something like this:
So it looks like now we have achieved our goal of temporarily storing the return address somewhere. We should now just go about making the diversion call. And we could. But there's a small problem. During the execution of the called function, the stack pointer, along with the other register values, could be manipulated multiple times.
Problem 2
So, although the return address of ours, is still stored on the stack, at location 293, after the called function finishes off executing, how would the execution flow know that it should now goto 293 & that's where it would find the return address?
So (I believe for good reason again) one of the ways of solving the above problem could be to store the stack address 293 (where the return address is) in a (designated) register called EBP. But then what about the contents of EBP? Would that not be overwritten? Sure, that's a valid point. So let's store the current contents of EBP on to the stack & then store this stack address into EBP. Something like this:
The stack pointer is incremented. The current value of EBP (denoted as EBP'), which is say xxx, is stored onto the top of the stack, i.e. at the address 294. Now that we have taken a backup of the current contents of EBP, we can safely put any other value onto the EBP. So we put the current address of the top of the stack, that is the address 294, in EBP.
With the above strategy in place, we solve for the Problem 2 discussed above. How? So now when the execution flow wants to know where from should it fetch the return address, it would :
first get the value from EBP out and point the ESP to that value. In our case, this would make TOS (top of stack) point to the address 294 (since that is what is stored in EBP).
Then it would restore the previous value of EBP. To do this it would simply take the value at 294 (the TOS), which is xxx (which was actually the older value of EBP), & put it back to EBP.
Then it would decrement the stack pointer to go to the next lower address in the stack which is 293 in our case. Thus finally reaching 293 (see that's what our problem 2 was). That's where it would find the return address, which is 2.
It will finally pop this 2 out into the EIP, that's the instruction that should have ideally been executed should the diversion have not happened, remember.
And the steps that we just saw being performed, with all the jugglery, to store the return address temporarily & then retrieve it is exactly what gets done with the function prolog (before the function call) & the epilog (before the function ret). The how was already answered, we just answered the why as well.
Just an end note: For the sake of brevity, I have not taken care of the fact that the stack addresses may grow the other way round.

Every function has an identical prologue(The starting of function code) and epilogue ( The ending of a function).
Prologue: The structure of Prologue is look like:
push ebp
mov esp,ebp
Epilogue: The structure of Prologue is look like:
leave
ret
More in detail : what is Prologue and Epilogue

Related

segmentation fault after linking c++ file with asm file [duplicate]

I am currently learning x86 assembly. Something is not clear to me still however when using the stack for function calls. I understand that the call instruction will involve pushing the return address on the stack and then load the program counter with the address of the function to call. The ret instruction will load this address back to the program counter.
My confusion is, does it matter when the ret instruction is called within the procedure/function? Will it always find the correct return address stored on the stack, or must the stack pointer be currently pointing to where the return address was stored? If that's the case, can't we just use push and pop instead of call and ret?
For example, the code below could be the first on entering the function , if we push different registers on the stack, must the ret instruction only be called after the registers are popped in the reverse order so that after the pop %ebp instruction , the stack pointer will point to the correct place on the stack where the return address is, or will it still find it regardless where it is called? Thanks in advance
push %ebp
mov %ebp, %esp
//push other registers
...
//pop other registers
mov %esp, %ebp
(could ret instruction go here for example and still pop the correct return address?)
pop %ebp
ret
You must leave the stack and non-volatile registers as you found them. The calling function has no clue what you might have done with them otherwise - the calling function will simply continue to its next instruction after ret. Only ret after you're done cleaning up.
ret will always look to the top of the stack for its return address and will pop it into EIP. If the ret is a "far" return then it will also pop the code segment into the CS register (which would also have been pushed by call for a "far" call). Since these are the first things pushed by call, they must be the last things popped by ret. Otherwise you'll end up reting somewhere undefined.
The CPU has no idea what is function/etc... The ret instruction will fetch value from memory pointed to by esp a jump there. For example you can do things like (to illustrate the CPU is not interested into how you structurally organize your source code):
; slow alternative to "jmp continue_there_address"
push continue_there_address
ret
continue_there_address:
...
Also you don't need to restore the registers from stack, (not even restore them to the original registers), as long as esp points to the return address when ret is executed, it will be used:
call SomeFunction
...
SomeFunction:
push eax
push ebx
push ecx
add esp,8 ; forget about last 2 push
pop ecx ; ecx = original eax
ret ; returns back after call
If your function should be interoperable from other parts of code, you may still want to store/restore the registers as required by the calling convention of the platform you are programming for, so from the caller point of view you will not modify some register value which should be preserved, etc... but none of that bothers CPU and executing instruction ret, the CPU just loads value from stack ([esp]), and jumps there.
Also when the return address is stored to stack, it does not differ from other values pushed to stack in any way, all of them are just values written in memory, so the ret has no chance to somehow find "return address" in stack and skip "values", for CPU the values in memory look the same, each 32 bit value is that, 32 bit value. Whether it was stored by call, push, mov, or something else, doesn't matter, that information (origin of value) is not stored, only value.
If that's the case, can't we just use push and pop instead of call and ret?
You can certainly push preferred return address into stack (my first example). But you can't do pop eip, there's no such instruction. Actually that's what ret does, so pop eip is effectively the same thing, but no x86 assembly programmer use such mnemonics, and the opcode differs from other pop instructions. You can of course pop the return address into different register, like eax, and then do jmp eax, to have slow ret alternative (modifying also eax).
That said, the complex modern x86 CPUs do keep some track of call/ret pairings (to predict where the next ret will return, so it can prefetch the code ahead quickly), so if you will use one of those alternative non-standard ways, at some point the CPU will realize it's prediction system for return address is off the real state, and it will have to drop all those caches/preloads and re-fetch everything from real eip value, so you may pay performance penalty for confusing it.
In the example code, if the return was done before pop %ebp, it would attempt to return to the "address" that was in ebp at the start of the function, which would be the wrong address to return to.

Understanding the assembly language for if-else in following code [duplicate]

What happens if i say 'call ' instead of jump? Since there is no return statement written, does control just pass over to the next line below, or is it still returned to the line after the call?
start:
mov $0, %eax
jmp two
one:
mov $1, %eax
two:
cmp %eax, $1
call one
mov $10, %eax
The CPU always executes the next instruction in memory, unless a branch instruction sends execution somewhere else.
Labels don't have a width, or any effect on execution. They just allow you to make reference to this address from other places. Execution simply falls through labels, even off the end of your code if you don't avoid that.
If you're familiar with C or other languages that have goto (example), the labels you use to mark places you can goto to work exactly the same as asm labels, and jmp / jcc work exactly like goto or if(EFLAGS_condition) goto. But asm doesn't have special syntax for functions; you have to implement that high-level concept yourself.
If you leave out the ret at the end of a block of code, execution keeps doing and decodes whatever comes next as instructions. (Maybe What would happen if a system executes a part of the file that is zero-padded? if that was the last function in an asm source file, or maybe execution falls into some CRT startup function that eventually returns.)
(In which case you could say that the block you're talking about isn't a function, just part of one, unless it's a bug and a ret or jmp was intended.)
You can (and maybe should) try this yourself in a debugger. Single-step through that code and watch RSP and RIP change. The nice thing about asm is that the total state of the CPU (excluding memory contents) is not very big, so it's possible to watch the entire architectural state in a debugger window. (Well, at least the interesting part that's relevant for user-space integer code, so excluding model-specific registers that the only the OS can tweak, and excluding the FPU and vector registers.)
call and ret aren't "special" (i.e. the CPU doesn't "remember" that it's inside a "function").
They just do exactly what the manual says they do, and it's up to you to use them correctly to implement function calls and returns. (e.g. make sure the stack pointer is pointing at a return address when ret runs.) It's also up to you to get the calling convention correct, and all that stuff. (See the x86 tag wiki.)
There's also nothing special about a label that you jmp to vs. a label that you call. An assembler just assembles bytes into the output file, and remembers where you put label markers. It doesn't truly "know" about functions the way a C compiler does. You can put labels wherever you want, and it doesn't affect the machine code bytes.
Using the .globl one directive would tell the assembler to put an entry in the symbol table so the linker could see it. That would let you define a label that's usable from other files, or even callable from C. But that's just meta-data in the object file and still doesn't put anything between instructions.
Labels are just part of the machinery that you can use in asm to implement the high-level concept of a "function", aka procedure or subroutine: A label for callers to call to, and code that will eventually jump back to a return address the caller passed, one way or another. But not every label is the start of a function. Some are just the tops of loops, or other targets of conditional branches within a function.
Your code would run exactly the same way if you emulated call with an equivalent push of the return address and then a jmp.
one:
mov $1, %eax
# missing ret so we fall through
two:
cmp %eax, $1
# call one # emulate it instead with push+jmp
pushl $.Lreturn_address
jmp one
.Lreturn_address:
mov $10, %eax
# fall off into whatever comes next, if it ever reaches here.
Note that this sequence only works in non-PIC code, because the absolute return address is encoded into the push imm32 instruction. In 64-bit code with a spare register available, you can use a RIP-relative lea to get the return address into a register and push that before jumping.
Also note that while architecturally the CPU doesn't "remember" past CALL instructions, real implementations run faster by assuming that call/ret pairs will be matched, and use a return-address predictor to avoid mispredicts on the ret.
Why is RET hard to predict? Because it's an indirect jump to an address stored in memory! It's equivalent to pop %internal_tmp / jmp *%internal_tmp, so you can emulate it that way if you have a spare register to clobber (e.g. rcx is not call-preserved in most calling conventions, and not used for return values). Or if you have a red-zone so values below the stack-pointer are still safe from being asynchronously clobbered (by signal handlers or whatever), you could add $8, %rsp / jmp *-8(%rsp).
Obviously for real use you should just use ret, because it's the most efficient way to do that. I just wanted to point out what it does using multiple simpler instructions. Nothing more, nothing less.
Note that functions can end with a tail-call instead of a ret:
(see this on Godbolt)
int ext_func(int a); // something that the optimizer can't inline
int foo(int a) {
return ext_func(a+a);
}
# asm output from clang:
foo:
add edi, edi
jmp ext_func # TAILCALL
The ret at the end of ext_func will return to foo's caller. foo can use this optimization because it doesn't need to make any modifications to the return value or do any other cleanup.
In the SystemV x86-64 calling convention, the first integer arg is in edi. So this function replaces that with a+a, then jumps to the start of ext_func. On entry to ext_func, everything is in the correct state just like it would be if something had run call ext_func. The stack pointer is pointing to the return address, and the args are where they're supposed to be.
Tail-call optimizations can be done more often in a register-args calling convention than in a 32-bit calling convention that passes args on the stack. You often run into situations where you have a problem because the function you want to tail-call takes more args than the current function, so there isn't room to rewrite our own args into args for the function. (And compilers don't tend to create code that modifies its own args, even though the ABI is very clear that functions own the stack space holding their args and can clobber it if they want.)
In a calling convention where the callee cleans the stack (with ret 8 or something to pop another 8 bytes after the return address), you can only tail-call a function that takes exactly the same number of arg bytes.
Your intuition is correct: the control just passes to the next line below after the function returns.
In your case, after call one, your function will jump to mov $1, %eax and then continue down to cmp %eax, $1 and end up in an infinite loop as you will call one again.
Beyond just an infinite loop, your function will eventually go beyond its memory constraints since a call command writes the current rip (instruction pointer) to the stack. Eventually, you'll overflow the stack.

Does function stack save the return address?

I'm learning things about function stack based on assembly using the system Linux x86.
I've read some article, which told me that a function stack (callee) would save the return address where it is called by the caller so that the computer could know the point where to continue when the function returns.
This is why there is a kind of attack: stack smashing.
The stack smashing means that if we can overflow a function stack, especially overflow the return address with a designed address, the program will execute the codes of hackers.
However, today I'm trying to use gdb to check a simple c++ program as below but I can't find any saved return address in any function stack.
Here is the code:
void func(int x)
{
int a = x;
int b = 0; // set a breakpoint
}
int main()
{
func(10); // set a breakpoint
return 0;
}
Then I use gdb to get its assembly:
main:
func:
Now we can see that there is no return address being saved in the two function stacks (at least this is my view).
If a hacker wants to hack this program with stack smashing, which address in the function stack will be edited illegally by him?
Yes there is. Examine the stack immediately before the callq and immediately after it. You will find that the address of the instruction following the callq now appears on the top of the stack and RSP has decremented by 8.
The callq instruction causes the address of the following instruction to be pushed onto the stack. Inversely, the retq instruction at the end of the function causes the address on the stack to be popped into the RIP.
Now we can see that there is no return address being saved in the two function stacks (at least this is my view).
What you are actually showing is the disassembled code, not the stack.
The return address is pushed on the stack by the caller by means of the callq instruction. At the moment of entering the callee function it is at the top of the stack, i.e.: at that moment, rsp contains the address where the return address is stored.
Inspecting the stack with GDB
p/x $rsp displays the value of the rsp register, i.e.: the address of the top of the stack, since rsp points to the top of the stack.
x/x $rsp displays the memory contents located at the top of stack (i.e.: the contents located at the address pointed by rsp).
With this information in mind you can run the command x/x $rsp at the moment of entering the callee function (before anything else is pushed onto the stack) to obtain the return address.
You can also use the command info frame to inspect the current stack frame. The displayed field with the name saved rip corresponds to the current function's return address. However, you need to run this command after the stack frame for the current function has been created and before it is destroyed (i.e.: after mov %rsp,%rbp but before pop %rbp inside the callee).

Strange behavior when obtaining addresses of functions

I've yet again been faced with the challenge of a slight hack in my code to accommodate for an unmodifiable library of some poorly written code.
After hours of research (after finding out you cannot pass any sort of state to a function pointer), I need to NOP out the ASM bytes within the function header that reset EAX to 0xCCCCCCCC.
By using the built in VC++ debugger to obtain the 'address' of the function and manually creating an array of bytes entry in cheat engine (which is surprisingly useful for this sort of thing), it successfully pulled up the bytes and I could NOP the 5 byte sequence manually.
However, doing this programmatically is a little different.
When I do any of the following, the address is significantly higher than what the debugger is reporting, this giving me a pointer to the wrong bytes.
unsigned int ptr = reinterpret_cast<unsigned int>(testhack);
unsigned char* tstc = (unsigned char*)&testhack;
FARPROC prc = GetProcAddress(GetModuleHandle("mod.dll"), "testhack"); // Still
// returns the incorrect address (same as the previous two)
What is the debugger doing to find the correct address? How can I find the correct address programmatically?
Presumably the function address is in a DLL. In that case what you're seeing is the loader relocating the addresses of for example &testhack into the correct location for the memory in which your program is loaded. The debugger probably already knows about the relocations and is helpfully automatically taking care of that for you, allowing you to modify the bytes.
Unfortunately I'm not aware of any mechanism to work around this, but if you link statically the addresses will be fixed at link time rather than runtime.
Figured it out.
Totally forgot about (what I believe are called) lookup tables. The address I was receiving was the pointer to the lookup table entry, which was a simple 5-byte unconditional JMP to the real function body. All I had to do was get the offset, add it to the end of the lookup table entry, and then I had my correct address.
Here is the code to reach the correct start address of the function.
// Get the lookup table address
const char* ptr = (const char*)&testhack;
// Get the offset (+ 1 byte retrieved as a long)
unsigned int offset = (unsigned int)(*((unsigned int*)(ptr + 1)));
// Add to the original look up pointer, +5 (the offset starts after the
// end of the instruction)
const char* tst = (const char*)(ptr + offset + 5);
tst then holds the correct address!
As of now, the calling ASM instructions put the pointer to the struct containing the function pointers into EAX. Normally, the callback function would reset EAX. However, I've applied this hack and NOP'd out that part. The first lines of code within the function create a local pointer, and then inline assembly to mov [ptr],eax the address into the pointer. I can then use the pointer normally.

VirtualAlloc C++ , injected dll, asm

I want to reserve space for my codecave in application.
I use VirtualAlloc function to reserve this space.
I have X questions.
What parameters (sllocation type and protection) should I use to allocate memory for code-cave?
As return value I get address of my codecave. In other part of the program I want to JMP to that codecave. How to do it? I know (correct me if I'm wrong) that JMP takes as agument nuber that is offset from current location. But I want to JMP to ma codecave. How to calculate this offset.
Just stumbled across. To clearify this topic for the rest of us: Calculating the relative JMP offset to a codecave patch works by subtracting your patch address with your current programm counter address:
uint32_t patch_address = (uint32_t) VirtualAlloc(...);
uint32_t jmp_offset = patch_address - (current_offset + current_len);
Note: current_len is the number of bytes your JMP instruction takes. This depends on the fact if its a short jmp (EB) or a long jump (E9). In your example 2 bytes, but a regular JMP (E8 0x12345678) takes 5 bytes.
So here we see that your example wont work easily, because you would have to override the next bytes that belong to the following MOV and even the CALL instruction(s). This relies on the fact that your codecave has a greater distance to the current instruction offset because it is allocated in a different region in the address space.
So what you can do is to copy the overwritten 7 Bytes into your cave. That can only work if you dont mess with EDI register in your patch (because of the "MOV ECX, EDI"). And you would have to correct the CALLs address you are overwriting. So this is probably not the best location to place a codecave, but its doable.
i wrote my own hooking library that cares for generic register arguments, stack cleanup and overwritten asm paddings, but i suggest to use the above mentioned frameworks.
regards,
michael
Subtracting the address of your jump target from the address of the instruction after the jump will give you the jump offset.
If you don't get such things, use a library like MS Detours, N-CodeHook, or something else.