How is an exception transferred to find a handler? - c++

When an exception is thrown stack unwinding is initiated until handling code is encountered, but I am a little unclear on the mechanics of the whole process.
1 - where is the exception stored? I don't mean the actual exception object, which may be quite big, e.g. have a message string or something, but the actual reference or pointer if you will. It must be some uniform storage location so that it can survive going down as the stack is unwinding and reach a handling location?
2 - how does the program flow determine whether to it has to unwind the particular function frame and call the appropriate destructors associated with the program counter indicated location or seek exception handing before it unwinds further?
3 - how is the actual check between what is thrown and what exceptions are being couth happening?
I am aware that the answer might include platform specific stuff, in which case such will be appreciated. No need to go beyond x86/x64 and ARM though.

These are all implementation details, to be decided during the (non-trivial) process of designing an exception handling mechanism. I can only give a sketch of how one might (or might not) choose to implement this.
If you want a detailed description of one implementation, you could read the specification for the Itanium ABI used by GCC and other popular compilers.
1 - The exception object is stored in an unspecified place, which must last until the exception has been handled. Pointers or references are passed around within the exception handling code like any other variable, before being passed to the handler (if it takes a reference) by some mechanism similar to passing a function argument.
2 - There are two common approaches: a static data structure mapping the program location to information about the stack frame; or a dynamic stack-like data structure containing information about active handlers and non-trivial stack objects that need destroying.
In the first case, on throwing it will look at that information to see if there are any local objects to destroy, and any local handlers; if not, it will find the function return address on the local stack frame and apply the same process to the calling function's stack frame until a handler is found. Once the handler is found, the CPU registers are updated to refer to that stack frame, and the program can jump to the handler's code.
In the second case it will pop entries from the stack structure, using them to tell it how to destroy stack objects, until it finds a suitable handler. Once the handler is found, and all unwound stack objects destroyed, it can use longjmp or a similar mechanism to jump to the handler.
Other approaches are possible.
3 - The exception handling code will use some kind of data structure to identify a type, allowing it to compare the type being thrown with the type for a handler. This is somewhat complicated by inheritance; the test can't be a simple comparison. I don't know the details for any particular implementation.

Source: How do exceptions work (behind the scenes) in c++ (I read the assembly and answered the questions by what I understood)
Question 1#:
movl $1, (%esp)
call __cxa_allocate_exception
movl $_ZN11MyExceptionD1Ev, 8(%esp)
movl $_ZTI11MyException, 4(%esp)
_ZTI11MyException is the exception. It looks as if it has it's own allocation not in the stack and it places the pointer in register named eax.
Question 2#:
.LFE9:
.size _Z20my_catching_functionv, .-_Z20my_catching_functionv
.section .gcc_except_table,"a",#progbits
.align 4
It looks like table that is stored in static data in the program. So it can know where it can catch. There was nothing about how objects destruct themself after unwinding frames so this is from Visual Studio: (The link at the top is from Linux)
MyClass s, s2, s3, s4;
mov dword ptr [ebp-4],3
try {
{
MyClass s, s2, s3, s4;
mov byte ptr [ebp-4],7
}
It looks like it saves the number of objects to destroy. For example when it finishes:
call MyClass::~MyClass (0DC1163h)
mov dword ptr [ebp-4],0FFFFFFFFh
0FFFFFFFFh means nothing's to destruct. If I find something about how it actually finds and destroyes them I will add here.
Question 3#:
As in the previous question, you see there's table for it, it can know whatever it's in the right function.

Related

How can one protect a register from being overwritten during a function call?

I am trying to write assembly code for the following function:
#include <iostream>
void f(int x) {
if (x > 0) {
std::cout << x << std::endl;
f(x-1);
std::cout << x << std::endl;
}
}
int main() {
f(1);
}
The output of this function script is 1 1. I try to write the assembly code for the so-called "low-cost computer" assembler, a computer invented by Anthony Dos Reis for his book "C and C++ under the hood". The assembly code I wrote is:
startup jsr main
halt ; back to operating system
;==============================================================
; #include <stdio.h>
greater dout
nl
sub r1, r0, 1
push lr
push fp
mov fp, sp
push r1
jsr f
add sp, sp, 1
dout
nl
mov sp, fp
pop fp
pop lr
ret
;==============================================================
f push lr ; int f()
push fp ; {
mov fp, sp
ldr r0, fp, 2
cmp r0, 0
brgt greater
mov sp, fp
pop fp
pop lr
ret
;==============================================================
main push lr
push fp
mov fp, sp
mov r0, 1
push r0
jsr f
add sp, sp, 1
mov sp, fp
pop fp
pop lr
ret
The code prints 1 0 to stdout, and is obviously false. The reason for the false output lies in the fact that the register r0 contains 1 before it jumps to the function f during evaluation of the branch greater, and then the function f modifies the register and sets r0 to 0 when doing the comparison cmp. This let me wonder how I the assembler can keep the registers invariant during function calls. I thought of the following:
By simply pushing the entire register to the stack and loading it again afterwards
From somehow gleaning what the function call thus and then "protecting" the registers if it needs to.
I thought the solution 1 is very defensive and likely costly, whilst solution 2 is probably complicated and assumes a lot of knowledge. Most likely, I simply made a mistake in writing the assembly, but I still don't understand how the assembler can keep its registers invariant when it needs to in general cases, or how one can address such a problem as outlined above. Does somebody know what to do in this case? I am grateful for any help or suggestions!
As the others are saying, usually each register is assigned a usage model by an agreement called the calling convention.  There are several usage models:
Call clobbered — sometimes also called "scratch", these registers are understood to be clobbered by a function call, and as such, they can be used in between calls, and are free to be used by any function.Sometimes these are called "caller saves" because the caller is responsible for preserving any of their values if they are needed after the call; also known with the term "volatile".  In practice, however, once moved to memory, they don't need to be restored until their values are actually needed; those values don't need to be restored to the same register they were in when stored to memory, and, on some architectures, the values can be used directly from memory as well.
Call preserved — these registers are understood to be preserved by function calling, which means that in order to use one of these the original contents of the register must be preserved (typically on function entry) and restored later (typically at function exit).  Sometimes also called "callee saves" because as the caller can rely their values being preserved, a callee must save & restore them if used; also known with the term non-volatile.
Others — on some processors certain registers are dedicated to parameter passing and return values; because of these uses, they don't necessarily behave strictly as call clobbered or call preserved — i.e. it is not necessarily the called function that clobbers them but the caller may be required to clobber them in merely making the call, i.e. in parameter passing before the call.  A function can have formal parameter values in these registers that are needed after a call, and yet need to place actual arguments into them in the instruction sequence of calling another function.  When this occurs, the parameter values needed after a call are typically relocated (to memory or to call preserved registers).
From somehow gleaning what the function call thus and then "protecting" the registers if it needs to.
This can work.  The calling convention is a general purpose agreement that is particularly useful when caller or callee does not know the implementation details of the other, or when an indirect function call (call by pointer) could call one of several different actual functions.  However, when both a callee and caller are known in implementation, then as an optimization we can deviate from the standard calling convention.  For example, we can use a scratch register to hold a value live across a particular call if we know the called function does not modify that scratch register.
Commonly, a computing platform includes an Application Binary Interface (ABI) that specifies, among other things, protocols for function calls. The ABI specifies that certain processor registers are used for passing arguments or return results, certain registers may be freely used by the called routine (scratch or volatile registers), certain registers may be used by the called routine but must be restored to their original values (preserved or non-volatile registers), and/or certain registers must not be altered by the called routine. Rules about calling routines may also be called a “calling convention.”
If your routine uses one of the registers that called functions are free to use, and it wants the value of that register to be retained across a function call, it must save it before the function call and restore it afterward.
Generally, the ABI and functions seek a balance in which registers they use. If the ABI said all registers had to be saved and restored, then a called function would have to save and restore each register it used. Conversely, if the ABI said no registers have to be saved and restored, then a calling function would have to save and restore each register it needed. With a mix, a calling routine may often have some of its data in registers that are preserved (so it does not have to save them before the call and restore them afterward), and a called routine will not use all of the registers it must preserve (so it does not have to save and restore them; it uses the scratch registers instead), so the overall number of register saves and restores that are performed is reduced.
Architecture/Platform combinations such as Windows-on-x64, or Linux-on-ARM32, have what's called ABIs, or Application Binary Interfaces.
The ABI specifies precisely how registers are used, how functions are called, how exceptions work, and how function arguments are passed. An important aspect of this is, during a function call, what registers must be saved, and who saves them, and what registers may be destroyed.
Registers that can be destroyed during a function call are called volatile registers. Registers that must be preserved by a function call are called non-volatile registers. Typically, if a caller wants to preserve a volatile register, it pushes the value onto the stack, calls the function, and then pops it off when done. If a called function (callee) wants to use a non-volatile register, it must save it similarly.
Read the Windows x64 calling convention here.

Base of Global Call Stack in C/C++

I have read that each function invocation leads to pushing of a stack frame in the global call stack and once the function call is completed the call stack is popped off and the control passes to the address that we get from the popped of stack frame. If a called function calls on to yet another function, it will push another return address onto the top of the same call stack, and so on, with the information stacking up and unstacking as the program dictates.
I was wondering what's at the base of global call stack in a C or C++ program?
I did some searching on the internet but none of the sources explicitly mention about it. Is the call stack empty when our program starts and only once a function is called, the call stack usage starts? OR Is the address where main() function has to return, gets implicitly pushed as the base of our call stack and is a stack frame in our call stack? I expect the main() would also have a stack frame in our call stack since we are always returning something at end of our main() function and there needs to be some address to return to. OR is this dependent on compiler/OS and differs according to implementation?
It would be helpful if someone has some informative links about this or could provide details on the process that goes into it.
main() is invoked by the libc code that handles setting up the environment for the executable etc. So by the time main() is called, the stack already has at least one frame created by the caller.
I'm not sure if there is a universal answer, as stack is something that may be implemented differently per architecture. For example a stack may grow up (i.e. stack position pointer value increases when pushing onto the stack) or grow downwards.
Exiting main() is usually done by calling an operating function to indicate the program wishes to to terminate (with the specified return code), so I don't expect a return address for main() to be present on the stack, but this may differ per operating system and even compiler.
I'm not sure why you need to know this, as this is typically something you leave up to the system.
First of all, there is no such thing as a "global call stack". Each thread has a stack, and the stack for the main thread is often looking quite different from the thread of any thread spawned later on. And mostly, each of these "stacks" is just an arbitrary memory segment currently declared to be used as such, sub-allocated from any arbitrary suitable memory pool.
And due to compiler optimizations, many function calls will not even end up on the stack, usually. Meaning there isn't necessarily a distinguishable stack frame. You are only guaranteed that you can reference variables you put on the stack, but not that the compiler must preserve anything you didn't explicitly reference.
There is not even a guarantee that the memory layout for your call stack must even be organized in distinguishable frames. Function pointers are never guaranteed to be part of the stack frame, just happens to be an implementation detail in architectures where data and function pointers may co-exist in the address space. (As there are architectures which require return addresses to be stored in a different address space than the data used in the call stack.)
That aside, yes, there is code which is executed outside of the main() function. Specifically initializers for global static variables, code to set up the runtime environment (env, call parameters, stdin/stdout) etc.
E.g. when having linked to libc, there is __libc_start_main which will call your main function after initialization is done. And clean up when your main function returns.
__libc_start_main is about the point where "stack" starts being used, as far as you can see from within the program. That's not actually true though, there has already been some loader code been executed in kernel space, for reserving memory for your process to operate in initially (including memory for the future stack), initializing registers and memory to well defined values etc.
Right before actually "starting" your process, after dropping out of kernel mode, arbitrary pointers to a future stack, and the first instruction of your program, are loaded into the corresponding processor registers. Effectively, that's where __libc_start_main (or any other initialization function, depending on your runtime) starts running, and the stack visible to you starts building up.
Getting back into the kernel usually involves an interrupt now, which doesn't follow the stack either, but may just directly access processor registers to simply swap the contents of the corresponding processor registers. (E.g. if you call a function from the kernel, the memory required by the call stack inside the function call is not allocated from your stack, but from one you don't even have access to.)
Either way, everything that happens before main() is called, and whenever you enter a syscall, is implementation dependent, and you are not guaranteed any specific observable behavior. And messing around with processor registers, and thereby alternating the program flow, is also far outside defined behavior as far as a pure C / C++ run time is concerned.
Every system I have seen, when main() is called a stack is setup. It has to be or just declaring a variable inside main would fail. A stack is setup once a thread or process is created. Thus any thread of execution has a stack. Further in every assembly language i know, a register or fixed memory location is used to indicate the current value of the stack pointer, so the concept of a stack always exists (the stack pointer might be bad, but stack operations always exist since they are built into the every mainstream assembly language).

How is the stack unwound and the exception handler found? [duplicate]

When an exception is thrown stack unwinding is initiated until handling code is encountered, but I am a little unclear on the mechanics of the whole process.
1 - where is the exception stored? I don't mean the actual exception object, which may be quite big, e.g. have a message string or something, but the actual reference or pointer if you will. It must be some uniform storage location so that it can survive going down as the stack is unwinding and reach a handling location?
2 - how does the program flow determine whether to it has to unwind the particular function frame and call the appropriate destructors associated with the program counter indicated location or seek exception handing before it unwinds further?
3 - how is the actual check between what is thrown and what exceptions are being couth happening?
I am aware that the answer might include platform specific stuff, in which case such will be appreciated. No need to go beyond x86/x64 and ARM though.
These are all implementation details, to be decided during the (non-trivial) process of designing an exception handling mechanism. I can only give a sketch of how one might (or might not) choose to implement this.
If you want a detailed description of one implementation, you could read the specification for the Itanium ABI used by GCC and other popular compilers.
1 - The exception object is stored in an unspecified place, which must last until the exception has been handled. Pointers or references are passed around within the exception handling code like any other variable, before being passed to the handler (if it takes a reference) by some mechanism similar to passing a function argument.
2 - There are two common approaches: a static data structure mapping the program location to information about the stack frame; or a dynamic stack-like data structure containing information about active handlers and non-trivial stack objects that need destroying.
In the first case, on throwing it will look at that information to see if there are any local objects to destroy, and any local handlers; if not, it will find the function return address on the local stack frame and apply the same process to the calling function's stack frame until a handler is found. Once the handler is found, the CPU registers are updated to refer to that stack frame, and the program can jump to the handler's code.
In the second case it will pop entries from the stack structure, using them to tell it how to destroy stack objects, until it finds a suitable handler. Once the handler is found, and all unwound stack objects destroyed, it can use longjmp or a similar mechanism to jump to the handler.
Other approaches are possible.
3 - The exception handling code will use some kind of data structure to identify a type, allowing it to compare the type being thrown with the type for a handler. This is somewhat complicated by inheritance; the test can't be a simple comparison. I don't know the details for any particular implementation.
Source: How do exceptions work (behind the scenes) in c++ (I read the assembly and answered the questions by what I understood)
Question 1#:
movl $1, (%esp)
call __cxa_allocate_exception
movl $_ZN11MyExceptionD1Ev, 8(%esp)
movl $_ZTI11MyException, 4(%esp)
_ZTI11MyException is the exception. It looks as if it has it's own allocation not in the stack and it places the pointer in register named eax.
Question 2#:
.LFE9:
.size _Z20my_catching_functionv, .-_Z20my_catching_functionv
.section .gcc_except_table,"a",#progbits
.align 4
It looks like table that is stored in static data in the program. So it can know where it can catch. There was nothing about how objects destruct themself after unwinding frames so this is from Visual Studio: (The link at the top is from Linux)
MyClass s, s2, s3, s4;
mov dword ptr [ebp-4],3
try {
{
MyClass s, s2, s3, s4;
mov byte ptr [ebp-4],7
}
It looks like it saves the number of objects to destroy. For example when it finishes:
call MyClass::~MyClass (0DC1163h)
mov dword ptr [ebp-4],0FFFFFFFFh
0FFFFFFFFh means nothing's to destruct. If I find something about how it actually finds and destroyes them I will add here.
Question 3#:
As in the previous question, you see there's table for it, it can know whatever it's in the right function.

Struct RUNTIME_FUNCTION

I found a large array in .pdata segment of RUNTIME_FUNCTION structures by IDA.
So, where I can find information: from what it's compiled, how I can create this and how to use it in C++.
Give me please books, or links with good descriptions and tutorials for exception handling and unwinding with this structure.
Windows x64 SEH
The compiler puts an exception directory in the .pdata section of an .exe image, but it also can be placed in any section such as .rdata and it is pointed to by the PE header NtHeaders64->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXCEPTION].VirtualAddress. The compiler fills the exception directory with RUNTIME_FUNCTIONs.
typedef struct _RUNTIME_FUNCTION {
ULONG BeginAddress;
ULONG EndAddress;
ULONG UnwindData;
} RUNTIME_FUNCTION, *PRUNTIME_FUNCTION;
Each RUNTIME_FUNCTION describes a function in the image. Every function in the program (apart from leaf functions) has one, regardless of whether there is a SEH exception clause in it, because an exception can occur in a callee function, therefore you need unwind codes to get to a caller function which might have the SEH handler, so most functions will have unwind codes but not a scope table. BeginAddress points to the start of the function and EndAddress points to the end of the function.
UnwindData points to an _UNWIND_INFO table structure.
typedef struct _UNWIND_INFO {
UBYTE Version : 3;
UBYTE Flags : 5;
UBYTE SizeOfProlog;
UBYTE CountOfCodes; //so the beginning of ExceptionData is known as they're both FAMs
UBYTE FrameRegister : 4;
UBYTE FrameOffset : 4;
UNWIND_CODE UnwindCode[1];
union {
//
// If (Flags & UNW_FLAG_EHANDLER)
//
OPTIONAL ULONG ExceptionHandler;
//
// Else if (Flags & UNW_FLAG_CHAININFO)
//
OPTIONAL ULONG FunctionEntry;
};
//
// If (Flags & UNW_FLAG_EHANDLER)
//
OPTIONAL ULONG ExceptionData[];
} UNWIND_INFO, *PUNWIND_INFO;
Flags can be one of:
#define UNW_FLAG_NHANDLER 0
#define UNW_FLAG_EHANDLER 1
#define UNW_FLAG_UHANDLER 2
#define UNW_FLAG_FHANDLER 3
#define UNW_FLAG_CHAININFO 4
If UNW_FLAG_EHANDLER is set then ExceptionHandler points to a generic handler called __C_specific_handler (which is an import from libcmt.lib) whose purpose is to parse the ExceptionData which is a flexible array member of type SCOPE_TABLE. If UNW_FLAG_UHANDLER is set then it indicates the __C_specific_handler is also to be used to call a finally block, i.e. there is a finally block within the function. If the UNW_FLAG_CHAININFO flag is set, then an unwind info structure is a secondary one, and contains an image-relative pointer in the shared exception handler/chained info address field which points to the RUNTIME_FUNCTION entry pointing to the primary unwind info. This is used for noncontiguous functions. UNW_FLAG_FHANDLER indicates it is a 'frame handler' and I don't know what that is.
typedef struct _SCOPE_TABLE {
ULONG Count;
struct
{
ULONG BeginAddress;
ULONG EndAddress;
ULONG HandlerAddress;
ULONG JumpTarget;
} ScopeRecord[1];
} SCOPE_TABLE, *PSCOPE_TABLE;
The SCOPE_TABLE structure is a variable length structure with a ScopeRecord for each try block in the function and contains the start and end address (probably RVA) of the try block. HandlerAddress is an offset to code that evaluates the exception filter expression in the parenthesis of __except (EXCEPTION_EXECUTE_HANDLER means always run the except, so it's analogous to except Exception) and JumpTarget is the offset to the first instruction in the __except block associated with the __try block. CountOfCodes is needed because UnwindCode is also a flexible array member and there's no other way of knowing where the data after this flexible array member begins. If it is a try/finally block, then because there is no filter in a finally, HandlerAddress instead of JumpTarget is used to point to an copy of the finally block that is embellished with a prologue and epilogue (copy is needed for when it is called in the context of an exception rather than normally after arriving at the end of the try block -- which can't happen with an exception, because it's never run after successful completion, so the exception block is always separate and there's no original copy).
Once the exception is raised by the processor, the exception handler in the IDT will pass exception information to a main exception handling function in Windows, which will find the RUNTIME_FUNCTION for the offending instruction pointer and call the ExceptionHandler. If the exception falls within the function and not the epilogue or prologue then it will call the __C_specific_handler. __C_specific_handler will then begin walking all of the SCOPE_TABLE entries searching for a match on the faulting instruction, and will hopefully find an __except statement that covers the offending code. (Source)
To add to this, for nested exceptions I'd imagine the __C_specific_handler would always find the smallest range that covers the current faulting instruction and will unwind through the larger ranges of the exception is not handled. The implementation of the __C_specific_handler on the source above shows a simple iteration through the records which would not happen in practice.
It is also not made clear how the OS Exception handler knows which dll's exception directory to look in. I suppose it could use the RIP and consult the process VAD and then get the first address of the particular allocation and call RtlLookupFunctionEntry on it. The RIP may also be a kernel address in a driver or ntoskrnl.exe; in which case, the windows exception handler will consult the exception directory of those images, but I'm not sure how it gets the image base from the RIP as kernel allocations aren't tracked in a VAD.
Exception Filters
An example function that uses SEH:
BOOL SafeDiv(INT32 dividend, INT32 divisor, INT32 *pResult)
{
__try
{
*pResult = dividend / divisor;
}
__except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO ?
EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{
return FALSE;
}
return TRUE;
}
Let's say catch (ArithmeticException a){//do something} in Java were C++ code, it would translate to the following C++ code and then compile (only theoretically, because in reality EXCEPTION_INT_DIVIDE_BY_ZERO doesn't seem to be produced by the compiler for any exception object)
__except(GetExceptionCode() == EXCEPTION_INT_DIVIDE_BY_ZERO ?
EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) {//do something}
The filter string in the parenthesis is pointed to by the HandlerAddress value in the scope record for the try block. The filter is always either equal to EXCEPTION_CONTINUE_SEARCH, EXCEPTION_EXECUTE_HANDLER or EXCEPTION_CONTINUE_EXECUTION. GetExceptionCode gets the ExceptionCode (windows specific error constant) from the _EXCEPTION_RECORD which was probably created by the specific exception handler in the IDT using the error code and exception no. (_EXCEPTION_RECORD is stored somewhere such that it is accessible through the call). It is compared against the specific error, EXCEPTION_INT_DIVIDE_BY_ZERO being what would be used by ArithmeticException. If the filter expression evaluates to EXCEPTION_EXECUTE_HANDLER then it will jump to JumpTarget; otherwise, if it evaluates to EXCEPTION_CONTINUE_SEARCH and I'd imagine the __C_specific_handler looks for a ScopeRecord with a wider scope. If it runs out of ScopeRecords that cover the RIP of the faulting instruction then the __C_specific_handler returns EXCEPTION_CONTINUE_SEARCH and the windows exception handler unwinds the stack prologue and continues with the new RIP in the context record which it changes while unwinding, checking the _RUNTIME_FUNCTION structs.
There is a SEH block in mainCRTStartup, but not in BaseThreadInitThunk. Eventually, the base of the stack will be reached – RtlUserThreadStart, which has a filter expression containing a call to RtlpUnhandledExceptionFilter(GetExceptionInformation()) by the OS (RtlpUnhandledExceptionFilter is initialised to UnhandledExceptionFilter in kernel32!_BaseDllInitialize, and GetExceptionInformation is passed in rcx to the filter expression HandlerAddress by the _C_specific_handler), which will call the filter specified in SetUnhandledExceptionFilter, which is the variable BasepCurrentTopLevelFilter (which is what SetUnhandledExceptionFilter sets), which gets initialised on the dynamic linking of kernel32.dll. If the application is not currently being debugged then the user specified unhandled filter will be called and must return EXCEPTION_EXECUTE_HANDLER, which causes the except block to be called by __C_specific_handlerand the except block terminates the whole process using ZwTerminateProcess.
Prologue and Epilogue exceptions
Within a function described by a _RUNTIME_FUNCTION structure, an exception can occur in the prologue or the epilogue of the function as well as well as in the body of the function, which may or may not be in a try block. The prologue is the part of the function that saves registers, stores parameters on the stack (if -O0). The epilogue is the reversal of this process, i.e. returning from the function. The compiler stores each action that takes place in the prologue in an UnwindCodes array; each action is represented by a 2 byte UNWIND_CODE structure which contains a member for the offset in the prologue (1 byte), unwind operation code (4 bits) and operation info (4 bits).
After finding a RUNTIME_FUNCTION for which the RIP is between the BeginAddress and EndAddress, before invoking __C_specific_handler, the OS exception handling code checks whether the RIP lies between BeginAddress and BeginAddress + SizeOfProlog of the function defined in the RUNTIME_FUNCTION and _UNWIND_INFO structures respectively. If it is then it is in the prologue and looks at the UnwindCodes array for the first entry with an offset less than or equal to the offset of the RIP from the function start. It then undoes all of the actions described in the array in order. One of these actions might be UWOP_PUSH_MACHFRAME which signifies that a trap frame has been pushed, which might be the case in kernel code. The ultimate result is restoring the RIP to the what it was before the call instruction was executed by eventually undoing the call instruction, as well as restoring the values of other registers to what they were before the call. While doing so, it updates the CONTEXT_RECORD. The process is restarted using the RIP before the function call once the actions have been undone; the OS exception handling will now use this RIP to find the RUNTIME_FUNCTION which will be that of the calling function. This will now be in the body of the calling function so the __C_specific_handler of the parent _UNWIND_INFO can now be invoked to scan the ScopeRecords i.e. the try blocks in the function.
If the RIP is not in the range BeginAddress – BeginAddress + SizeOfProlog then it examines the code stream after RIP and if it matches to the trailing portion of a legitimate epilogue then it's in an epilogue and the remaining portion of the epilogue is simulated and it updates the CONTEXT_RECORD as each instruction is processed. The RIP will now be the address after the call instruction in the calling function, hence it will search the RUNTIME_FUNCTION for this RIP and it will be the parent's RUNTIME_FUNCTION, and then the scope records in that will be used to handle the exception.
If it is neither in a prologue or epilogue then it invokes the __C_specific_handler in the unwind info structure to examine the try block scope records. If there are no try blocks in the function then there will be no handler (when the UNW_FLAG_EHANDLER bit is set, the ExceptionHandler field of the UNWIND_INFO structure is assumed to be valid, and in this case it will be UNW_FLAG_EHANDLER instead), and if there are try blocks but the RIP is not within the range of any try block, then the whole prologue is unwound. If it is within a try block, then it evaluates the filter evaluating code pointed to by HandlerAddress and based on the value returned by that code, the __C_specific_handler either looks for a parent scope record if the return value is EXCEPTION_CONTINUE_SEARCH (and if there isn't one, unwinds the prologue and looks for a parent RUNTIME_FUNCTION) (by parent I mean an encapsulating try scope and an a caller function) and if the return value is EXCEPTION_EXECUTE_HANDLER then it jumps to JumpTarget. If this is a try/finally block, it will just jump to HandlerAddress (instead of evaluating a filter expression), which is the finally code, and then it is done.
Another scenario that is worth mentioning is if the function is a leaf function it will not have a RUNTIME_FUNCTION record because a leaf function does not call any other functions or allocate any local variables on the stack. Hence, RSP directly addresses the return pointer. The return pointer at [RSP] is stored in the updated context, the simulated RSP is incremented by 8 and then it looks for another RUNTIME_FUNCTION.
Unwinding
When the __C_specific_handler returns EXCEPTION_CONTINUE_SEARCH rather than EXCEPTION_EXECUTE_HANDLER, it needs to return from the function, which is called unwinding -- it needs to undo the prologue of the function. The opposite of unwinding is 'simulating', and that's done to the epilogue. To do so, it calls the handler in ExceptionHandler, which is __C_specific_handler, which goes through the UnwindCode array as stated earlier and undoes all of the actions to restore the state of the CPU to before the function call -- it doesn't have to worry about locals because they'll be lost to the aether when it moves down a stack frame. The unwind code array is used to unwind (modify) the context record which was initially snapshotted by the windows exception handler. It then looks at the new RIP in the context record which will fall in the range of the RUNTIME_FUNCTION of the parent function and it will call the __C_specific_handler. If the exception gets handled then it passes control to the except block at JumpTarget and execution continues as normal. If it is not handled (i.e. the filter expression does not evaluate to EXCEPTION_EXECUTE_HANDLER then it continues unwinding the stack until it reaches RtlUserThreadStart and the RIP is in the bounds of that function, which means the exception is unhandled.
There is a very good diagrammatic example of this on this page.
IDA pro seems to show an __unwind{} clause when there is either an exception or termination handler present and the function has unwind codes.
Windows x86 SEH
x86 uses stack based exception handling rather than table based which x64 uses. This made it vulnerable to buffer overflow attacks //i'll continue later
You can find more information on RUNTIME_FUNCTION and related structures at Microsoft's MSDN.
These structures are generated by the compiler and used to implement structured exception handling. During the execution of your code an exception may occur, and the runtime system needs to be able to walk up the call stack to find a handler for that exception. To do so, the runtime system needs to know the layout of the function prologs, which registers they save, in order to correctly unwind the individual function stack frames. More details are here.
The RUNTIME_FUNCTION is the structure which describes a single function, and it contains the data required to unwind it.
If you generate code at runtime and need to make that code available to the runtime system (because your code calls out to already compiled code which may raise an exception) then you create RUNTIME_FUNCTION instances for each of your generated functions, fill in the UNWIND_INFO for each, and then tell the runtime system by calling RtlAddFunctionTable.

What happens in assembly language when you call a method/function?

If I have a program in C++/C that (language doesn't matter much, just needed to illustrate a concept):
#include <iostream>
void foo() {
printf("in foo");
}
int main() {
foo();
return 0;
}
What happens in the assembly? I'm not actually looking for assembly code as I haven't gotten that far in it yet, but what's the basic principle?
In general, this is what happens:
Arguments to the function are stored on the stack. In platform specific order.
Location for return value is "allocated" on the stack
The return address for the function is also stored in the stack or in a special purpose CPU register.
The function (or actually, the address of the function) is called, either through a CPU specific call instruction or through a normal jmp or br instruction (jump/branch)
The function reads the arguments (if any) from the stack and the runs the function code
Return value from function is stored in the specified location (stack or special purpose CPU register)
Execution jumps back to the caller and the stack is cleared (by restoring the stack pointer to its initial value).
The details of the above vary from platform to platform and even from compiler to compiler (see e.g. STDCALL vs CDECL calling conventions). For instance, in some cases, CPU registers are used instead of storing stuff on the stack. The general idea is the same though
You can see it for yourself:
Under Linux 'compile' your program with:
gcc -S myprogram.c
And you'll get a listing of the programm in assembler (myprogram.s).
Of course you should know a little bit about assembler to understand it (but it's worth learning because it helps to understand how your computer works). Calling a function (on x86 architecture) is basically:
put variable a on stack
put variable b on stack
put variable n on stack
jump to address of the function
load variables from stack
do stuff in function
clean stack
jump back to main
What happens in the assembly?
A brief explanation: The current stack state is saved, a new stack is created and the code for the function to be executed is loaded and run. This involves inconveniencing a few registers of your microprocessor, some frantic to and fro read/writes to the memory and once done, the calling function's stack state is restored.
What happens? In x86, the first line of your main function might look something like:
call foo
The call instruction will push the return address on the stack and then jmp to the location of foo.
Arguments are pushed in stack and "call" instruction is made
Call is a simple "jmp" with pushing an address of instruction into stack ("ret" in the end of a method popping it and jumping on it)
I think you want to take a look at call stack to get a better idea what happens during a function call: http://en.wikipedia.org/wiki/Call_stack
A very good illustration:
http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.pdf
What happens?
C mimics what will occur in assembly...
It is so close to machine that you can realize what will occur
void foo() {
printf("in foo");
/*
db mystring 'in foo'
mov eax, dword ptr mystring
mov edx , dword ptr _printf
push eax
call edx
add esp, 8
ret
//thats it
*/
}
int main() {
foo();
return 0;
}
1- a calling context is established on the stack
2- parameters are pushed on the stack
3- a "call" is performed to the method
The general idea is that you need to
Save the current local state
Pass the arguments to a function
Call the actual function. This involves putting the return address somewhere so the RET instruction knows where to continue.
The specifics vary from architecture to architecture. And the even more specific specifics might vary between various languages. Although there usually are ways of controlling this to some extent to allow for interoperability between different languages.
A pretty useful starting point is the Wikipedia article on calling conventions. On x86 for example the stack is almost always used for passing arguments to functions. On many RISC architectures, however, registers are mainly used while the stack is only needed in exceptional cases.
The common idea is that the registers that are used in the calling method are pushed on the stack (stack pointer is in ESP register), this process is called "push the registers". Sometimes they're also zeroed, but that depends. Assembly programmers tend to free more registers then the common 4 (EAX, EBX, ECX and EDX on x86) to have more possibilities within the function.
When the function ends, the same happens in the reverse: the stack is restored to the state from before calling. This is called "popping the registers".
Update: this process does not necessarily have to happen. Compilers can optimize it away and inline your functions.
Update: normally parameters of the function are pushed on the stack in reverse order, when they are retrieved from the stack, they appear as if in normal order. This order is not guaranteed by C. (ref: Inner Loops by Rick Booth)