How to replace alloca in an implementation of execvp()? - c++

Take a look at the NetBSD implementation of execvp here:
http://cvsweb.netbsd.se/cgi-bin/bsdweb.cgi/src/lib/libc/gen/execvp.c?rev=1.30.16.2;content-type=text%2Fplain
Note the comment at line 130, in the special case for handling ENOEXEC:
/*
* we can't use malloc here because, if we are doing
* vfork+exec, it leaks memory in the parent.
*/
if ((memp = alloca((cnt + 2) * sizeof(*memp))) == NULL)
goto done;
memp[0] = _PATH_BSHELL;
memp[1] = bp;
(void)memcpy(&memp[2], &argv[1], cnt * sizeof(*memp));
(void)execve(_PATH_BSHELL, __UNCONST(memp), environ);
goto done;
I am trying to port this implementation of execvp to standalone C++. alloca is nonstandard so I want to avoid it. (Actually the function I want is execvpe from FreeBSD, but this demonstrates the problem more clearly.)
I think I understand why it would leak memory if plain malloc was used - while the caller of execvp can execute code in the parent, the inner call to execve never returns so the function cannot free the memp pointer, and there's no way to get the pointer back to the caller. However, I can't think of a way to replace alloca - it seems to be necessary magic to avoid this memory leak. I have heard that C99 provides variable length arrays, which I cannot use sadly as the eventual target is C++.
Is it possible to replace this use of alloca? If it's mandated to stay within C++/POSIX, is there an inevitable memory leak when using this algorithm?

Edit: As Michael has pointed out in the comments, what is written below really won't work in the real-world due to stack-relative addressing by an optimizing compiler. Therefore a production-level alloca needs the help of the compiler to actually "work". But hopefully the code below could give some ideas about what's happening under the hood, and how a function like alloca might have worked if there were no stack-relative addressing optimizations to worry about.
BTW, just in case you were stil curious about how you could make a simple version of alloca for yourself, since that function basically returns a pointer to allocated space on the stack, you can write a function in assembly that can properly manipulate the stack, and return a pointer you can use in the current scope of the caller (once the caller returns, the stack space pointer from this version of alloca is invalidated since the return from the caller cleans up the stack).
Assuming you're using some flavor of Linux on a x86_64 platform using the Unix 64-bit ABI, place the following inside a file called "my_alloca.s":
.section .text
.global my_alloca
my_alloca:
movq (%rsp), %r11 # save the return address in temp register
subq %rdi, %rsp # allocate space on stack from first argument
movq $0x10, %rax
negq %rax
andq %rax, %rsp # align the stack to 16-byte boundary
movq %rsp, %rax # save address in return register
pushq %r11 # push return address on stack
ret # return back to caller
Then inside your C/C++ code module (i.e, your ".cpp" files), you can use it the following way:
extern my_alloca(unsigned int size);
void function()
{
void* stack_allocation = my_alloca(BUFFERSIZE);
//...do something with the allocated space
return; //WARNING: stack_allocation will be invalid after return
}
You can compile "my_alloca.s" using gcc -c my_alloca.s. This will give you a file named "my_alloca.o" that you can then use to link with your other object files using gcc -o or using ld.
The main "gotcha" that I could think of with this implementation is that you could crash or end up with undefined behavior if the compiler did not work by allocating space on the stack using an activation record and a stack base-pointer (i.e., the RBP pointer in x86_64), but rather explicitly allocated memory for each function call. Then, since the compiler won't be aware of the memory we've allocated on the stack, when it cleans up the stack at the return of the caller and tries to jump back using what it believes is the caller's return address that was pushed on the stack at the beginning of the function call, it will jump to an instruction pointer that's pointing to no-wheres-ville and you'll most likely crash with a bus error or some type of access error since you'll be trying to execute code in a memory location you're not allowed to.
There's actually other dangerous things that could happen, such as if the compiler used stack-space to allocate the arguments (it shouldn't for this function per the Unix 64-bit ABI since there's only a single argument), as that would again cause a stack clean-up right after the function call, messing up the validity of the pointer. But with a function like execvp(), which won't return unless there's an error, this shouldn't be so much of an issue.
All-in-all, a function like this will be platform-dependent.

You can replace the call to alloca with a call to malloc made before the call to vfork. After the vfork returns in the caller the memory can be deleted. (This is safe because vfork will not return until exec has been called and the new program started.) The caller can then free the memory it allocated with malloc.
This doesn't leak memory in the child because the exec call completely replaces the child image with the image of the parent process, implicitly releasing the memory that the forked process was holding.
Another possible solution is to switch to fork instead of vfork. This will require a little extra code in the caller because fork returns before the exec call is complete so the caller will need to wait for it. But once forked the new process could use malloc safely. My understanding of vfork is it was basically a poor man's fork because fork was expensive in the days before kernels had copy-on-write pages. Modern kernels implement fork very efficiently and there's no need resort to the somewhat dangerous vfork.

Related

How to mark that C++ function modifies all possible registers?

If I have some non-inline function and C++ compiler knows that this function modifies some registers then compiler will save all necessary registers before doing function CALL.
At least I expect that compiler does this (saving) as far as it knows what registers will be modified inside called function.
Now imagine that my function modifies ALL possible registers of CPU (general purpose, SIMD, FPU, etc.). How can I enforce compiler to save everything what it needs before doing any CALL to this function? To remind, my function is non-inline, i.e. is called through CALL instruction.
Of course through asm I can push all possible registers on stack at my function start and pop all registers back before function return.
Although I can save ALL possible registers I would better prefer if compiler saves only necessary registers, that were used by function's caller, for performance (speed) and memory usage reasons.
Because inside my function I don't know in advance who will use it hence I have to save every possible register. But at the place where my function was used compiler knows exactly what registers are used in caller's function hence it may save much fewer registers needed, because for sure not all registers will be used.
Hence I want to mark my function as "modifying all registers" so that C++ compiler will push to stack just registers that it needs before calling my function.
Is there any way to do this? Any GCC/CLang/MSVC attribute of function? Or maybe listing all registers in clobber section of asm statement?
Main thing is that I don't want to save registers myself inside this function (for some specific reason), instead I want all callers to save all needed registers before calling my function, but I want all callers to be aware that my function modifies everything what is possible.
I'm looking for some imaginary modifies-all attribute like:
__attribute__((modifies_all_registers)) void f();
I did following experiment:
Try it online!
__attribute__((noinline)) int modify(int i) {
asm volatile(
""
: "+m" (i) ::
"rax", "rbx", "rcx", "rdx", "rsi", "rdi", "rbp", "rsp",
"r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15",
"xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6", "xmm7",
"xmm8", "xmm9", "xmm10", "xmm11", "xmm12", "xmm13", "xmm14", "xmm15",
"ymm0", "ymm1", "ymm2", "ymm3", "ymm4", "ymm5", "ymm6","ymm7",
"ymm8", "ymm9", "ymm10", "ymm11", "ymm12", "ymm13", "ymm14", "ymm15",
"zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5", "zmm6", "zmm7",
"zmm8", "zmm9", "zmm10", "zmm11", "zmm12", "zmm13", "zmm14", "zmm15"
);
return i + 1;
}
int main(int argc, char ** argv) {
auto volatile x = modify(argc);
}
in other words I asm-clobbered almost all possible registers, and compiler generated following push-sequence inside modify() (and also same pop sequence at the end):
push rbp
mov rbp, rsp
push r15
push r14
push r13
push r12
push rbx
nothing else was pushed, so I can see that somehow compiler (CLang) didn't care about other regiesters except rbx, rbp, r12-r15. Does it mean that there is some C++ calling convention that says that I can modify any other registers besides these few, without restoring them on function return?
Does it mean that there is some C++ calling convention that says that I can modify any other registers besides these few, without restoring them on function return?
Yes. Among other things, ABI specification that is used on a given platform defines function calling conventions. Calling conventions define a set of registers that are allowed to be clobbered by the function and a set of registers that are required to be preserved by the function. If registers of the former set contain useful data for the caller, the caller is expected to save that data before the call. If registers from the latter set have to be used in the called function, the function must save and restore these registers before returning.
There are also conventions regarding which registers, in what order, are used to pass arguments to the function and to receive the returned value. You can consider those registers as clobbered, since the caller must initialize them with parameter values (and thus save any useful data that was in those registers before the call) and the callee is allowed to modify them.
In your case, the asm statement marks all registers as clobbered, and the compiler only saves and restores registers that it is required to preserve across the function call. Note that by default the caller will always save the registers from the clobber set before a function call, whether they are actually modified by the function or not. In some cases, the optimizer may be able to remove saving the registers that are not actually modified - for example, if the function call is inlined or the compiler is able to analyze the function body (e.g. in case of LTO). However, if the function body is not known at compile time, the compiler must assume the worst and adhere the ABI specification.
So, in general, you do not need to mark the function in any special way - the ABI rules already work in such a way that registers are saved and restored as needed. And, as you witnessed yourself, even with asm statements the compilers are able to tell which registers are used in a function. If you still want to save specific, or all, registers for some reason, your only option is to write in assembler. Or, in case if you're implementing some sort of context switching, use specialized instructions like XSAVE/XRSTOR or APIs like ucontext.

How can one protect a register from being overwritten during a function call?

I am trying to write assembly code for the following function:
#include <iostream>
void f(int x) {
if (x > 0) {
std::cout << x << std::endl;
f(x-1);
std::cout << x << std::endl;
}
}
int main() {
f(1);
}
The output of this function script is 1 1. I try to write the assembly code for the so-called "low-cost computer" assembler, a computer invented by Anthony Dos Reis for his book "C and C++ under the hood". The assembly code I wrote is:
startup jsr main
halt ; back to operating system
;==============================================================
; #include <stdio.h>
greater dout
nl
sub r1, r0, 1
push lr
push fp
mov fp, sp
push r1
jsr f
add sp, sp, 1
dout
nl
mov sp, fp
pop fp
pop lr
ret
;==============================================================
f push lr ; int f()
push fp ; {
mov fp, sp
ldr r0, fp, 2
cmp r0, 0
brgt greater
mov sp, fp
pop fp
pop lr
ret
;==============================================================
main push lr
push fp
mov fp, sp
mov r0, 1
push r0
jsr f
add sp, sp, 1
mov sp, fp
pop fp
pop lr
ret
The code prints 1 0 to stdout, and is obviously false. The reason for the false output lies in the fact that the register r0 contains 1 before it jumps to the function f during evaluation of the branch greater, and then the function f modifies the register and sets r0 to 0 when doing the comparison cmp. This let me wonder how I the assembler can keep the registers invariant during function calls. I thought of the following:
By simply pushing the entire register to the stack and loading it again afterwards
From somehow gleaning what the function call thus and then "protecting" the registers if it needs to.
I thought the solution 1 is very defensive and likely costly, whilst solution 2 is probably complicated and assumes a lot of knowledge. Most likely, I simply made a mistake in writing the assembly, but I still don't understand how the assembler can keep its registers invariant when it needs to in general cases, or how one can address such a problem as outlined above. Does somebody know what to do in this case? I am grateful for any help or suggestions!
As the others are saying, usually each register is assigned a usage model by an agreement called the calling convention.  There are several usage models:
Call clobbered — sometimes also called "scratch", these registers are understood to be clobbered by a function call, and as such, they can be used in between calls, and are free to be used by any function.Sometimes these are called "caller saves" because the caller is responsible for preserving any of their values if they are needed after the call; also known with the term "volatile".  In practice, however, once moved to memory, they don't need to be restored until their values are actually needed; those values don't need to be restored to the same register they were in when stored to memory, and, on some architectures, the values can be used directly from memory as well.
Call preserved — these registers are understood to be preserved by function calling, which means that in order to use one of these the original contents of the register must be preserved (typically on function entry) and restored later (typically at function exit).  Sometimes also called "callee saves" because as the caller can rely their values being preserved, a callee must save & restore them if used; also known with the term non-volatile.
Others — on some processors certain registers are dedicated to parameter passing and return values; because of these uses, they don't necessarily behave strictly as call clobbered or call preserved — i.e. it is not necessarily the called function that clobbers them but the caller may be required to clobber them in merely making the call, i.e. in parameter passing before the call.  A function can have formal parameter values in these registers that are needed after a call, and yet need to place actual arguments into them in the instruction sequence of calling another function.  When this occurs, the parameter values needed after a call are typically relocated (to memory or to call preserved registers).
From somehow gleaning what the function call thus and then "protecting" the registers if it needs to.
This can work.  The calling convention is a general purpose agreement that is particularly useful when caller or callee does not know the implementation details of the other, or when an indirect function call (call by pointer) could call one of several different actual functions.  However, when both a callee and caller are known in implementation, then as an optimization we can deviate from the standard calling convention.  For example, we can use a scratch register to hold a value live across a particular call if we know the called function does not modify that scratch register.
Commonly, a computing platform includes an Application Binary Interface (ABI) that specifies, among other things, protocols for function calls. The ABI specifies that certain processor registers are used for passing arguments or return results, certain registers may be freely used by the called routine (scratch or volatile registers), certain registers may be used by the called routine but must be restored to their original values (preserved or non-volatile registers), and/or certain registers must not be altered by the called routine. Rules about calling routines may also be called a “calling convention.”
If your routine uses one of the registers that called functions are free to use, and it wants the value of that register to be retained across a function call, it must save it before the function call and restore it afterward.
Generally, the ABI and functions seek a balance in which registers they use. If the ABI said all registers had to be saved and restored, then a called function would have to save and restore each register it used. Conversely, if the ABI said no registers have to be saved and restored, then a calling function would have to save and restore each register it needed. With a mix, a calling routine may often have some of its data in registers that are preserved (so it does not have to save them before the call and restore them afterward), and a called routine will not use all of the registers it must preserve (so it does not have to save and restore them; it uses the scratch registers instead), so the overall number of register saves and restores that are performed is reduced.
Architecture/Platform combinations such as Windows-on-x64, or Linux-on-ARM32, have what's called ABIs, or Application Binary Interfaces.
The ABI specifies precisely how registers are used, how functions are called, how exceptions work, and how function arguments are passed. An important aspect of this is, during a function call, what registers must be saved, and who saves them, and what registers may be destroyed.
Registers that can be destroyed during a function call are called volatile registers. Registers that must be preserved by a function call are called non-volatile registers. Typically, if a caller wants to preserve a volatile register, it pushes the value onto the stack, calls the function, and then pops it off when done. If a called function (callee) wants to use a non-volatile register, it must save it similarly.
Read the Windows x64 calling convention here.

How is the stack unwound and the exception handler found? [duplicate]

When an exception is thrown stack unwinding is initiated until handling code is encountered, but I am a little unclear on the mechanics of the whole process.
1 - where is the exception stored? I don't mean the actual exception object, which may be quite big, e.g. have a message string or something, but the actual reference or pointer if you will. It must be some uniform storage location so that it can survive going down as the stack is unwinding and reach a handling location?
2 - how does the program flow determine whether to it has to unwind the particular function frame and call the appropriate destructors associated with the program counter indicated location or seek exception handing before it unwinds further?
3 - how is the actual check between what is thrown and what exceptions are being couth happening?
I am aware that the answer might include platform specific stuff, in which case such will be appreciated. No need to go beyond x86/x64 and ARM though.
These are all implementation details, to be decided during the (non-trivial) process of designing an exception handling mechanism. I can only give a sketch of how one might (or might not) choose to implement this.
If you want a detailed description of one implementation, you could read the specification for the Itanium ABI used by GCC and other popular compilers.
1 - The exception object is stored in an unspecified place, which must last until the exception has been handled. Pointers or references are passed around within the exception handling code like any other variable, before being passed to the handler (if it takes a reference) by some mechanism similar to passing a function argument.
2 - There are two common approaches: a static data structure mapping the program location to information about the stack frame; or a dynamic stack-like data structure containing information about active handlers and non-trivial stack objects that need destroying.
In the first case, on throwing it will look at that information to see if there are any local objects to destroy, and any local handlers; if not, it will find the function return address on the local stack frame and apply the same process to the calling function's stack frame until a handler is found. Once the handler is found, the CPU registers are updated to refer to that stack frame, and the program can jump to the handler's code.
In the second case it will pop entries from the stack structure, using them to tell it how to destroy stack objects, until it finds a suitable handler. Once the handler is found, and all unwound stack objects destroyed, it can use longjmp or a similar mechanism to jump to the handler.
Other approaches are possible.
3 - The exception handling code will use some kind of data structure to identify a type, allowing it to compare the type being thrown with the type for a handler. This is somewhat complicated by inheritance; the test can't be a simple comparison. I don't know the details for any particular implementation.
Source: How do exceptions work (behind the scenes) in c++ (I read the assembly and answered the questions by what I understood)
Question 1#:
movl $1, (%esp)
call __cxa_allocate_exception
movl $_ZN11MyExceptionD1Ev, 8(%esp)
movl $_ZTI11MyException, 4(%esp)
_ZTI11MyException is the exception. It looks as if it has it's own allocation not in the stack and it places the pointer in register named eax.
Question 2#:
.LFE9:
.size _Z20my_catching_functionv, .-_Z20my_catching_functionv
.section .gcc_except_table,"a",#progbits
.align 4
It looks like table that is stored in static data in the program. So it can know where it can catch. There was nothing about how objects destruct themself after unwinding frames so this is from Visual Studio: (The link at the top is from Linux)
MyClass s, s2, s3, s4;
mov dword ptr [ebp-4],3
try {
{
MyClass s, s2, s3, s4;
mov byte ptr [ebp-4],7
}
It looks like it saves the number of objects to destroy. For example when it finishes:
call MyClass::~MyClass (0DC1163h)
mov dword ptr [ebp-4],0FFFFFFFFh
0FFFFFFFFh means nothing's to destruct. If I find something about how it actually finds and destroyes them I will add here.
Question 3#:
As in the previous question, you see there's table for it, it can know whatever it's in the right function.

How is an exception transferred to find a handler?

When an exception is thrown stack unwinding is initiated until handling code is encountered, but I am a little unclear on the mechanics of the whole process.
1 - where is the exception stored? I don't mean the actual exception object, which may be quite big, e.g. have a message string or something, but the actual reference or pointer if you will. It must be some uniform storage location so that it can survive going down as the stack is unwinding and reach a handling location?
2 - how does the program flow determine whether to it has to unwind the particular function frame and call the appropriate destructors associated with the program counter indicated location or seek exception handing before it unwinds further?
3 - how is the actual check between what is thrown and what exceptions are being couth happening?
I am aware that the answer might include platform specific stuff, in which case such will be appreciated. No need to go beyond x86/x64 and ARM though.
These are all implementation details, to be decided during the (non-trivial) process of designing an exception handling mechanism. I can only give a sketch of how one might (or might not) choose to implement this.
If you want a detailed description of one implementation, you could read the specification for the Itanium ABI used by GCC and other popular compilers.
1 - The exception object is stored in an unspecified place, which must last until the exception has been handled. Pointers or references are passed around within the exception handling code like any other variable, before being passed to the handler (if it takes a reference) by some mechanism similar to passing a function argument.
2 - There are two common approaches: a static data structure mapping the program location to information about the stack frame; or a dynamic stack-like data structure containing information about active handlers and non-trivial stack objects that need destroying.
In the first case, on throwing it will look at that information to see if there are any local objects to destroy, and any local handlers; if not, it will find the function return address on the local stack frame and apply the same process to the calling function's stack frame until a handler is found. Once the handler is found, the CPU registers are updated to refer to that stack frame, and the program can jump to the handler's code.
In the second case it will pop entries from the stack structure, using them to tell it how to destroy stack objects, until it finds a suitable handler. Once the handler is found, and all unwound stack objects destroyed, it can use longjmp or a similar mechanism to jump to the handler.
Other approaches are possible.
3 - The exception handling code will use some kind of data structure to identify a type, allowing it to compare the type being thrown with the type for a handler. This is somewhat complicated by inheritance; the test can't be a simple comparison. I don't know the details for any particular implementation.
Source: How do exceptions work (behind the scenes) in c++ (I read the assembly and answered the questions by what I understood)
Question 1#:
movl $1, (%esp)
call __cxa_allocate_exception
movl $_ZN11MyExceptionD1Ev, 8(%esp)
movl $_ZTI11MyException, 4(%esp)
_ZTI11MyException is the exception. It looks as if it has it's own allocation not in the stack and it places the pointer in register named eax.
Question 2#:
.LFE9:
.size _Z20my_catching_functionv, .-_Z20my_catching_functionv
.section .gcc_except_table,"a",#progbits
.align 4
It looks like table that is stored in static data in the program. So it can know where it can catch. There was nothing about how objects destruct themself after unwinding frames so this is from Visual Studio: (The link at the top is from Linux)
MyClass s, s2, s3, s4;
mov dword ptr [ebp-4],3
try {
{
MyClass s, s2, s3, s4;
mov byte ptr [ebp-4],7
}
It looks like it saves the number of objects to destroy. For example when it finishes:
call MyClass::~MyClass (0DC1163h)
mov dword ptr [ebp-4],0FFFFFFFFh
0FFFFFFFFh means nothing's to destruct. If I find something about how it actually finds and destroyes them I will add here.
Question 3#:
As in the previous question, you see there's table for it, it can know whatever it's in the right function.

What happens in assembly language when you call a method/function?

If I have a program in C++/C that (language doesn't matter much, just needed to illustrate a concept):
#include <iostream>
void foo() {
printf("in foo");
}
int main() {
foo();
return 0;
}
What happens in the assembly? I'm not actually looking for assembly code as I haven't gotten that far in it yet, but what's the basic principle?
In general, this is what happens:
Arguments to the function are stored on the stack. In platform specific order.
Location for return value is "allocated" on the stack
The return address for the function is also stored in the stack or in a special purpose CPU register.
The function (or actually, the address of the function) is called, either through a CPU specific call instruction or through a normal jmp or br instruction (jump/branch)
The function reads the arguments (if any) from the stack and the runs the function code
Return value from function is stored in the specified location (stack or special purpose CPU register)
Execution jumps back to the caller and the stack is cleared (by restoring the stack pointer to its initial value).
The details of the above vary from platform to platform and even from compiler to compiler (see e.g. STDCALL vs CDECL calling conventions). For instance, in some cases, CPU registers are used instead of storing stuff on the stack. The general idea is the same though
You can see it for yourself:
Under Linux 'compile' your program with:
gcc -S myprogram.c
And you'll get a listing of the programm in assembler (myprogram.s).
Of course you should know a little bit about assembler to understand it (but it's worth learning because it helps to understand how your computer works). Calling a function (on x86 architecture) is basically:
put variable a on stack
put variable b on stack
put variable n on stack
jump to address of the function
load variables from stack
do stuff in function
clean stack
jump back to main
What happens in the assembly?
A brief explanation: The current stack state is saved, a new stack is created and the code for the function to be executed is loaded and run. This involves inconveniencing a few registers of your microprocessor, some frantic to and fro read/writes to the memory and once done, the calling function's stack state is restored.
What happens? In x86, the first line of your main function might look something like:
call foo
The call instruction will push the return address on the stack and then jmp to the location of foo.
Arguments are pushed in stack and "call" instruction is made
Call is a simple "jmp" with pushing an address of instruction into stack ("ret" in the end of a method popping it and jumping on it)
I think you want to take a look at call stack to get a better idea what happens during a function call: http://en.wikipedia.org/wiki/Call_stack
A very good illustration:
http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.pdf
What happens?
C mimics what will occur in assembly...
It is so close to machine that you can realize what will occur
void foo() {
printf("in foo");
/*
db mystring 'in foo'
mov eax, dword ptr mystring
mov edx , dword ptr _printf
push eax
call edx
add esp, 8
ret
//thats it
*/
}
int main() {
foo();
return 0;
}
1- a calling context is established on the stack
2- parameters are pushed on the stack
3- a "call" is performed to the method
The general idea is that you need to
Save the current local state
Pass the arguments to a function
Call the actual function. This involves putting the return address somewhere so the RET instruction knows where to continue.
The specifics vary from architecture to architecture. And the even more specific specifics might vary between various languages. Although there usually are ways of controlling this to some extent to allow for interoperability between different languages.
A pretty useful starting point is the Wikipedia article on calling conventions. On x86 for example the stack is almost always used for passing arguments to functions. On many RISC architectures, however, registers are mainly used while the stack is only needed in exceptional cases.
The common idea is that the registers that are used in the calling method are pushed on the stack (stack pointer is in ESP register), this process is called "push the registers". Sometimes they're also zeroed, but that depends. Assembly programmers tend to free more registers then the common 4 (EAX, EBX, ECX and EDX on x86) to have more possibilities within the function.
When the function ends, the same happens in the reverse: the stack is restored to the state from before calling. This is called "popping the registers".
Update: this process does not necessarily have to happen. Compilers can optimize it away and inline your functions.
Update: normally parameters of the function are pushed on the stack in reverse order, when they are retrieved from the stack, they appear as if in normal order. This order is not guaranteed by C. (ref: Inner Loops by Rick Booth)