what's on the stack when a function is called? - c++

I can only imagine
1) parameters;
2) local variables;
what else?
1) function return address?
2) function name?

It really depends on platform and architecture, but typically:
Function return address
Saved values of caller's CPU registers - most importantly, caller's stack frame pointer value
Variables allocated with alloca().
Sometimes - extra stuff for exception handling, this is VERY platform-dependent.
Sometimes - guard values to detect stack clobbering
Function name is never in the stack, to the best of my knowledge, unless your code places it there.

I think that a picture really is a thousand words.

It depends on the calling convention; for Unix, you typically look up this information in the SYSV ABI (Application Binary Interface).
You may find:
Return address (if the machine is a popular Intel architecture). On more modern architectures, the return address is passed in a register.
Callee-saves registers—these are registers that "belong" to the caller which the callee has chosen to borrow and must therefore save and restore.
Any incoming parameters that could not be passed in registers. In IA-32, no parameters are passed in registers; they all go on the stack. In x86-64, up to six integer and six floating-point parameters can be passed in registers, so it is seldom necessary to use the stack for that purpose.
You may or may not find a saved stack pointer or frame pointer. Most modern calling conventions go without a frame pointer in order to save an extra registers. In this design, the size of each frame is known at compile time, so restoring the old stack pointer is just a matter of adding a constant. But it makes it harder to implement alloca().
The older Intel calling conventions use both stack pointer and frame pointer, which burns an extra register, but it simplifies alloca() and also stack unwinding.
Local variables with storage class auto are allocated on the stack.
The stack may contain compiler temporaries that hold values which are "spilled" if the hardware does not provide enough registers to hold all the intermediate results of computations. (This happens if at any point the number of live intermediate results—the ones that will be needed later in the program—exceeds the number of registers available to the compiler for storing intermediate results.)
You may find variables allocated with alloca().
You may find metadata that says which PC ranges are in scope for which exception handlers, or other very platform-dependent exception stuff.
C and C++ do not support garbage collection, but in a language that does, you will often find metadata that identifies where in the stack frame you will find pointers.
Finally, the stack may contain "padding" used to ensure that the stack pointer is aligned on an 8-byte or 16-byte boundary.
Calling conventions are complex beasts, and stack-frame layout is not for the faint of heart!

Related

C++ garbage collection

There are a number of garbage collection libraries for C++.
I am kind of confused how the pointer tracking works.
In particular, suppose we have a base pointer P and a list of other pointers who are computed as offsets from P using an array.
Ex,
P2 = P+offset[0]
How does the garbage collector know P2 is still in scope? It has no direct reference but it's still accessible.
Probably the most popular C++ gc is
https://en.m.wikipedia.org/wiki/Boehm_garbage_collector
But following their example syntax it seems very easy to break so I must not be understanding something.
This question cannot be answered in general. There are different systems that may be regarded as garbage collection for C++; for example, Herb Sutter's deferred_ptr is basically a garbage collecting smart pointer. I've personally implemented another version of this idea, similar to Sutter's but less fancy.
I can answer about Boehm, however. How the Boehm garbage collector recognizes pointers when it does its "mark" phase, is basically by scanning memory and assuming that things that look like pointers are pointers.
The garbage collector knows all the areas of memory where user data is and it knows all of the pointers that it has allocated and how big those allocations were. It just looks for chains of pointers starting from "root segments" defined as below, where by "look" we mean explicitly scanning memory for 64 bit values that are the same as one of the GC allocations it has done.
From here:
Since it cannot generally tell where pointer variables are located, it
scans the following root segments for pointers:
The registers. Depending on the architecture, this may be done using assembly code, or by calling a setjmp-like function which
saves register contents on the stack.
The stack(s). In the case of a single-threaded application, on most platforms this is done by scanning the memory between (an
approximation of) the current stack pointer and GC_stackbottom. (For
Itanium, the register stack scanned separately.) The GC_stackbottom
variable is set in a highly platform-specific way depending on the
appropriate configuration information in gcconfig.h. Note that the
currently active stack needs to be scanned carefully, since
callee-save registers of client code may appear inside collector
stack frames, which may change during the mark process. This is
addressed by scanning some sections of the stack "eagerly",
effectively capturing a snapshot at one point in time.
Static data region(s). In the simplest case, this is the region between DATASTART and DATAEND, as defined in gcconfig.h. However, in
most cases, this will also involve static data regions associated
with dynamic libraries. These are identified by the mostly
platform-specific code in dyn_load.c.
The address space for 64-bit pointers is huge so false positives will be rare, but even if they occur, false positives would just be leaks, that last as long as there happens to be some other variable in the memory the mark phase scans that is exactly the same value as some 64-bit pointer that was allocated by the garbage collector.

Function return pointers [duplicate]

Recently, I came across this question in an interview: How can we determine how much storage on the stack a particular function is consuming?
The "stack" is famously an implementation detail of the platform that is not inspectable or in any way queryable from within the language itself. It is essentially impossible to guarantee within any part of a C or C++ program whether it will be possible to make another function call. The "stack size", or maybe better called "function call and local variable storage depth", is one of the implementation limits whose existence is acknowledged by the language standard but considered out of scope. (E.g. for C++ see [implimits], Annex B.)
Individual platforms may offer APIs to allow programs to introspect the platform limitations, but neither C nor C++ specify that or how this should be possible.
Exceeding the implementation-defined resource limits leads to undefined behaviour, and you cannot know whether you will exceed the limits.
It's completely implementation defined - the standard does not in any way impose requirements on the possible underlying mechanisms used by a program.
On a x86 machine, one stack frame consists of a return address (4/8 byte), parameters and local variables.
The parameters, if e.g. scalars, may be passed through registers, so we can't say for sure whether they contribute to the storage taken up. The locals may be padded (and often are); We can only deduce a minimum amount of storage for these.
The only way to know for sure is to actually analyze the assembler code a compiler generates, or look at the absolute difference of the stack pointer values at runtime - before and after a particular function was called.
E.g.
#include <iostream>
void f()
{
register void* foo asm ("esp");
std::cout << foo << '\n';
}
int main()
{
register void* foo asm ("esp");
std::cout << foo << '\n';
f();
}
Now compare the outputs. GCC on Coliru gives
0x7fffbcefb410
0x7fffbcefb400
A difference of 16 bytes. (The stack grows downwards on x86.)
As stated by other answers, the program stack is a concept which is not specified within the language itself. However with a knowledge how typical implementation works, you can assume that the address of the first argument of a function is the beginning of its stack frame. The address of the first argument of a next called function is the beginning of the next stack frame. So, they probably wanted to see a code like:
void bar(void *b) {
printf("Foo stack frame is around %lld bytes\n", llabs((long long)b - (long long)&b));
}
void foo(int x) {
bar(&x);
}
The size increase of the stack, for those implementations that use a stack, is:
size of variables that don't fit in the available registers
size of variables declared in the function declared upfront that live for the life of the function
size of other local variables declared along the way or in statement blocks
the maximum stack size used by functions called by this function
everything above * the number of recursive calls
size of the return address
Return Address
Most implementations push the return address on the stack before any other data. So this address takes up space.
Available Registers
Some processors have many registers; however, only a few may be available for passing variables. For example, if the convention allows for 2 variables but there are 5 parameters, 3 parameters will be placed on the stack.
When large objects are passed by value, they will take up space on the stack.
Function Local Variables
This is tricky to calculate, because variables may be pushed onto the stack and then popped off when not used.
Some variables may not be pushed onto the stack until they are declared. So if a function returns midway through, it may not use the remaining variables, so the stack size won't increase for those variables.
The compiler may elect to use registers to hold values or place constants directly into the executable code. In this case, they don't add any length to the stack.
Calling Other Functions
The function may call other functions. Each called function may increase the amount of data on the stack. Those functions that are called may call other functions, and so on.
This again, depends on the snapshot in time of the execution. However, one can produce an approximate maximum increase of the stack by the other called functions.
Recursion
As with calling other functions, a recursive call may increase the size of the stack. A recursive call at the end of the function may increase the stack more than a recursive call near the beginning.
Register Value Saving
Sometimes, the compiler may need more space for data than the allocated registers allow. Thus the compiler may push variables on the stack.
The compiler may push registers on the stack for convenience, such as swapping registers or changing the value's order.
Summary
The exact size of stack space required for a function is very difficult to calculate and may depend on where the execution is. There are many items to consider in stack size calculation, such as parameter quantity and size as well as any other functions called. Due to the variability, most stack size measurements are based on a maximum size, or worst case size. Stack allocation is usually based on the worst case scenario.
For an interview question, I would mention all of the above, which usually makes the interviewer want to move on to the next question quickly.

C++ how are variables accessed in memory?

When I create a new variable in a C++ program, eg a char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.
See the docs:
When a variable is declared, the memory needed to store its value is
assigned a specific location in memory (its memory address).
Generally, C++ programs do not actively decide the exact memory
addresses where its variables are stored. Fortunately, that task is
left to the environment where the program is run - generally, an
operating system that decides the particular memory locations on
runtime. However, it may be useful for a program to be able to obtain
the address of a variable during runtime in order to access data cells
that are at a certain position relative to it.
You can also refer this article on Variables and Memory
The Stack
The stack is where local variables and function parameters reside. It
is called a stack because it follows the last-in, first-out principle.
As data is added or pushed to the stack, it grows, and when data is
removed or popped it shrinks. In reality, memory addresses are not
physically moved around every time data is pushed or popped from the
stack, instead the stack pointer, which as the name implies points to
the memory address at the top of the stack, moves up and down.
Everything below this address is considered to be on the stack and
usable, whereas everything above it is off the stack, and invalid.
This is all accomplished automatically by the operating system, and as
a result it is sometimes also called automatic memory. On the
extremely rare occasions that one needs to be able to explicitly
invoke this type of memory, the C++ key word auto can be used.
Normally, one declares variables on the stack like this:
void func () {
int i; float x[100];
...
}
Variables that are declared on the stack are only valid within the
scope of their declaration. That means when the function func() listed
above returns, i and x will no longer be accessible or valid.
There is another limitation to variables that are placed on the stack:
the operating system only allocates a certain amount of space to the
stack. As each part of a program that is being executed comes into
scope, the operating system allocates the appropriate amount of memory
that is required to hold all the local variables on the stack. If this
is greater than the amount of memory that the OS has allowed for the
total size of the stack, then the program will crash. While the
maximum size of the stack can sometimes be changed by compile time
parameters, it is usually fairly small, and nowhere near the total
amount of RAM available on a machine.
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
Lets say our program starts with a stack address of 4000000
When, you call a function, depending how much stack you use, it will "allocate it" like this
Let's say we have 2 ints (8bytes)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
so our ESP address was changed from
4000000
to
3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
char my_array[20000];
since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
how does C++ then have access to this variable in memory?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".

Why can't we allocate dynamic memory on the stack?

Allocating stuff on the stack is awesome because than we have RAII and don't have to worry about memory leaks and such. However sometimes we must allocate on the heap:
If the data is really big (recommended) - because the stack is small.
If the size of the data to be allocated is only known at runtime (dynamic allocation).
Two questions:
Why can't we allocate dynamic memory (i.e. memory of size that is
only known at runtime) on the stack?
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
Edit: I know some compilers support Variable Length Arrays - which is dynamically allocated stack memory. But that's really an exception to the general rule. I'm interested in understanding the fundamental reasons for why generally, we can't allocate dynamic memory on the stack - the technical reasons for it and the rational behind it.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
It's more complicated to achieve this. The size of each stack frame is burned-in to your compiled program as a consequence of the sort of instructions the finished executable needs to contain in order to work. The layout and whatnot of your function-local variables, for example, is literally hard-coded into your program through the register and memory addresses it describes in its low-level assembly code: "variables" don't actually exist in the executable. To let the quantity and size of these "variables" change between compilation runs greatly complicates this process, though it's not completely impossible (as you've discovered, with non-standard variable-length arrays).
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable
This is just a consequence of the syntax. C++'s "normal" variables happen to be those with automatic or static storage duration. The designers of the language could technically have made it so that you can write something like Thing t = new Thing and just use a t all day, but they did not; again, this would have been more difficult to implement. How do you distinguish between the different types of objects, then? Remember, your compiled executable has to remember to auto-destruct one kind and not the other.
I'd love to go into the details of precisely why and why not these things are difficult, as I believe that's what you're after here. Unfortunately, my knowledge of assembly is too limited.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
Technically, this is possible. But not approved by the C++ standard. Variable length arrays(VLA) allows you to create dynamic size constructs on stack memory. Most compilers allow this as compiler extension.
example:
int array[n];
//where n is only known at run-time
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable? I.e. Thing t;.
We can. Whether you do it or not depends on implementation details of a particular task at hand.
example:
int i;
int *ptr = &i;
We can allocate variable length space dynamically on stack memory by using function _alloca. This function allocates memory from the program stack. It simply takes number of bytes to be allocated and return void* to the allocated space just as malloc call. This allocated memory will be freed automatically on function exit.
So it need not to be freed explicitly. One has to keep in mind about allocation size here, as stack overflow exception may occur. Stack overflow exception handling can be used for such calls. In case of stack overflow exception one can use _resetstkoflw() to restore it back.
So our new code with _alloca would be :
int NewFunctionA()
{
char* pszLineBuffer = (char*) _alloca(1024*sizeof(char));
…..
// Program logic
….
//no need to free szLineBuffer
return 1;
}
Every variable that has a name, after compilation, becomes a dereferenced pointer whose address value is computed by adding (depending on the platform, may be "subtracting"...) an "offset value" to a stack-pointer (a register that contains the address the stack actually is reaching: usually "current function return address" is stored there).
int i,j,k;
becomes
(SP-12) ;i
(SP-8) ;j
(SP-4) ;k
To let this "sum" to be efficient, the offsets have to be constant, so that they can be encode directly in the instruction op-code:
k=i+j;
become
MOV (SP-12),A; i-->>A
ADD A,(SP-8) ; A+=j
MOV A,(SP-4) ; A-->>k
You see here how 4,8 and 12 are now "code", not "data".
That implies that a variable that comes after another requires that "other" to retain a fixed compile-time defined size.
Dynamically declared arrays can be an exception, but they can only be that last variable of a function. Otherwise, all the variables that follows will have an offset that have to be adjusted run-time after that array allocation.
This creates the complication that dereferencing the addresses requires arithmetic (not just a plain offset) or the capability to modify the opcode as variables are declared (self modifying code).
Both the solution becomes sub-optimal in term of performance, since all can break the locality of the addressing, or add more calculation for each variable access.
Why can't we allocate dynamic memory (i.e. memory of size that is only known at runtime) on the stack?
You can with Microsoft compilers using _alloca() or _malloca(). For gcc, it's alloca()
I'm not sure it's part of the C / C++ standards, but variations of alloca() are included with many compilers. If you need aligned allocation, such a "n" bytes of memory starting on a "m" byte boundary (where m is a power of 2), you can allocate n+m bytes of memory, add m to the pointer and mask off the lower bits. Example to allocate hex 1000 bytes of memory on a hex 100 boundary. You don't need to preserve the value returned by _alloca() since it's stack memory and automatically freed when the function exits.
char *p;
p = _alloca(0x1000+0x100);
(size_t)p = ((size_t)0x100 + (size_t)p) & ~(size_t)0xff;
Most important reason is that Memory used can be deallocated in any order but stack requires deallocation of memory in a fixed order i.e LIFO order.Hence practically it would be difficult to implement this.
Virtual memory is a virtualization of memory, meaning that it behaves as the resource it is virtualizing (memory). In a system, each process has a different virtual memory space:
32-bits programs: 2^32 bytes (4 Gigabytes)
64-bits programs: 2^64 bytes (16 Exabytes)
Because virtual space is so big, only some regions of that virtual space are usable (meaning that only some regions can be read/written just as if it were real memory). Virtual memory regions are initialized and made usable through mapping. Virtual memory does not consume resources and can be considered unlimited (for 64-bits programs) BUT usable (mapped) virtual memory is limited and use up resources.
For every process, some mapping is done by the kernel and other by the user code. For example, before even the code start executing, the kernel maps specific regions of the virtual memory space of a process for the code instructions, global variables, shared libraries, the stack space... etc. The user code uses dynamic allocation (allocation wrappers such as malloc and free), or garbage collectors (automatic allocation) to manage the virtual memory mapping at application-level (for example, if there is no enough free usable virtual memory available when calling malloc, new virtual memory is automatically mapped).
You should differentiate between mapped virtual memory (the total size of the stack, the total current size of the heap...) and allocated virtual memory (the part of the heap that malloc explicitly told the program that can be used)
Regarding this, I reinterpret your first question as:
Why can't we save dynamic data (i.e. data whose size is only known at runtime) on the stack?
First, as other have said, it is possible: Variable Length Arrays is just that (at least in C, I figure also in C++). However, it has some technical drawbacks and maybe that's the reason why it is an exception:
The size of the stack used by a function became unknown at compile time, this adds complexity to stack management, additional register (variables) must be used and it may impede some compiler optimizations.
The stack is mapped at the beginning of the process and it has a fixed size. That size should be increased greatly if variable-size-data is going to be placed there by default. Programs that do not make extensive use of the stack would waste usable virtual memory.
Additionally, data saved on the stack must be saved and deleted in Last-In-First-Out order, which is perfect for local variables within functions but unsuitable if we need a more flexible approach.
Why can we only refer to memory on the heap through pointers, while memory on the stack can be referred to via a normal variable?
As this answer explains, we can.
Read a bit about Turing Machines to understand why things are the way they are. Everything was built around them as the starting point.
https://en.wikipedia.org/wiki/Turing_machine
Anything outside of this is technically an abomination and a hack.

Understanding stack frame of function call in C/C++? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I am new to C/C++ and assembly lang as well.
This could also be very basic question.
I am trying to understand how stack frames are built and which variables(params) are pushed to stack in what order?.
Some search results showed that....compiler of C/C++ decides based on operations performed within a function. for e.g if the function was suppose to just increment value by 1 of the passed int param and return (similar to ++ operator) it would put all ..the param of the function and local variable within the function in registers and perform addition ....wondering which register is used for returned/pass by value ?....how are references returned? .....difference b/w eax, ebx,ecx and edx.
Requesting for a book/blog/link or any kind of material to understand registers,stack and heap references are used/built and destroyed during function call's....and also how main function is stored?
Thanks In Advance
Your question is borderline here. programmers could be a better place.
A good book to understand the concepts of stack etc might be Queinnec's Lisp In Small Pieces (it explains quite well what a stack is for Lisp). Also, SICP
is a good book to read.
D.Knuth's books and MMIX is also a good read.
Read carefully Wikipedia Call stack page.
In theory, no call stack is needed, and some languages and implementations (e.g. old SML/NJ) did not use any stack (but allocated the call frame in the garbage collected heap). See A.Appel's old paper Garbage Collection Can be Faster than Stack Allocation (and learn more about garbage collection in general).
Usually C and C++ implementations have a stack (and often use the hardware stack). Some C local variables might not have any stack location (because they have been optimized, or are kept in a register). Sometimes, the stack location of a C local variable may change (the compiler would use one call stack slot for some occurrences, and another call stack slot for other occurrences of the same local variable). And of course some temporary values may be compiled like your local variables (so stay in a register, on in one stack slot then another one, etc....). When optimizing the compiler could do weird tricks with variables.
On some old machines IBM/360 or IBM z/series, there is no hardware stack; the stack used by the C compiler is a software convention (e.g. some register is dedicated to that usage, without specific hardware support)
Think about the execution (or interpretation) of a recursively defined function (like the good old factorial naively coded). Read about recursion (in general, in computer science), primitive recursive functions, lambda calculus, denotational semantics, stack automaton, register allocation, tail calls, continuations, ABI, interrupts, Posix signals, sigaltstack(2), getcontext(2), longjmp(3)etc.... etc...
Read also books about Computer Architecture. In practice, the call stack is so important that several hardware resources (including the stack pointer register, often the call frame base pointer register, and perhaps hidden machinery e.g. cache related) are dedicated to it on common processors.
You could also look at the intermediate representations used by the GCC compiler. Then use -fdump-tree-all or the GCC MELT probe. If looking at the generated assembly be sure to pass -S -fverbose-asm to your gcc command.
See also the linux assembly howto.
I gave a lot of links. It is difficult to answer better, because I have no idea of your background.
I am trying to understand how stack frames are built and which
variables(params) are pushed to stack in what order?
THis is dependent on the architecture of the processor. However, typically, the stack grows from a high address towards a lower address (if we look at memory addressses as numeric values). One stackframe is "whatever this function puts on the stack"
The "stuff" that gets put on the stack typically is:
Return address back to the calling function.
A frame-pointer, pointing to the stack-frame at the start of the call.
Saved registers that need to be "preserved" for when this function returns.
Local variables.
Parameters to the "next" function in the call-stack.
compiler of C/C++ decides based on operations performed within a
function. for e.g if the function was suppose to just increment value
by 1 of the passed int param and return (similar to ++ operator) it
would put all ...
the param of the function and local variable within the function in registers
and perform addition ....wondering which register is used for
returned/pass by value ?....how are references returned?
The compiler has rules for how parameters are passed, and for regular function calls [that is, not "inlined" functions], the parameters are always passed in the same order, in the same combination of registers and stack-memory. If that wasn't the case, the compiler would have to know exactly what the function does before it could decide to pass the arguments.
Different processor architectures have different rules. x86-32 typically has one or two registers used for input parameters, and typically one register for the return values. x86-64 used up to 5 registers for passing the first five values to the function. Any further arguments are passed in registers.
Returning a reference is no different from returning any other value. The value (which in this case is the address of the object being returned). In x86-32, return values are in EAX. In x86-64, return values are in RAX. In ARM, R0 is used for the return value. In 29K, R96 is used for the return value.