Why is the C++ function parameter stored 20 bytes off of the rbp in x86-64 when the method body only has one 4 byte variable? - c++

Consider the following program, compiled using x86-64 GCC 12.2 with flags --std=c++17 -O0:
int square(int num, int num2) {
int foo = 37;
return num * num;
int main () {
return square(10, 5);
The resulting assembly using godbolt is:
square(int, int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov DWORD PTR [rbp-4], 37
mov eax, DWORD PTR [rbp-20]
imul eax, eax
pop rbp
push rbp
mov rbp, rsp
mov esi, 5
mov edi, 10
call square(int, int)
pop rbp
I read about shadow spaces and it appears that in x64 there must be at minimum 32 bytes allocated: "32 bytes above the return address which the called function owns" ...
With that said, how is the offset -20 determined for the parameter num? If there's 32 bytes from rbp, wouldn't that be -24?
I noticed even if you add more local variables, it'll remain -20 until it gets pushed over to -36, but I cannot understand why. Thanks!


Why is the object prefix converted to function argument?

In the learncpp article about the hidden this pointer, the author mentioned that the compiler converts the object prefix to an argument passed by address to the function.
In the example:
Will be converted to:
setID(&simple, 2); // note that simple has been changed from an object prefix to a function argument!
Why does the compiler do this? I've tried searching other documentation about it but couldn't find any. I've asked other people but they say it is a mistake or the compiler doesn't do that.
I have a second question on this topic. Let's go back to the example:
simple.setID(2); //Will be converted to setID(&simple, 2);
If the compiler converts it, won't it just look exactly like a function that has a name of setID and has two parameters?
void setID(MyClass* obj, int id) {
int main() {
MyClass simple;
simple.setID(2); //Will be converted to setID(&simple, 2);
setID(&simple, 2);
Line 6 and 7 would look exactly the same.
object prefix to an argument passed by address to the function
This refers to how implementations use to translate it to machine code (but they could do it any other way)
Why does the compiler do this?
In some way, you need to be able to refer to the object in the called member function, and one way is to just handle it like an argument.
If the compiler converts it, won't it just look exactly like a function that has a name of setID and has two parameters?
If you have this code:
struct Test {
int v = 0;
Test(int v ) : v(v) {
void test(int a) {
int v = this->v;
int r = a;
void test(Test* t, int a) {
int v = t->v;
int r = a + v;
int main() {
Test a(2);
test(&a, 1);
return 0;
gcc-12 will create this assembly code (for x86 and if optimizations are turned off):
Test::Test(int) [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov DWORD PTR [rbp-12], esi
mov rax, QWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-12]
mov DWORD PTR [rax], edx
pop rbp
Test::test(int a):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
// int v = this->v;
mov rax, QWORD PTR [rbp-24]
mov eax, DWORD PTR [rax]
mov DWORD PTR [rbp-4], eax
// int r = a;
mov eax, DWORD PTR [rbp-28]
mov DWORD PTR [rbp-8], eax
// end of function
pop rbp
test(Test* t, int a):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-24], rdi
mov DWORD PTR [rbp-28], esi
// int v = t->v;
mov rax, QWORD PTR [rbp-24]
mov eax, DWORD PTR [rax]
mov DWORD PTR [rbp-4], eax
// int r = a + v;
mov edx, DWORD PTR [rbp-28]
mov eax, DWORD PTR [rbp-4]
add eax, edx
mov DWORD PTR [rbp-8], eax
// end of function
pop rbp
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-4]
mov esi, 2
mov rdi, rax
call Test::Test(int) [complete object constructor]
// a.test(1);
lea rax, [rbp-4]
mov esi, 1
mov rdi, rax
call Test::test(int)
// test(&a, 1);
lea rax, [rbp-4]
mov esi, 1
mov rdi, rax
call test(Test*, int)
// end of main
mov eax, 0
So the machine code generated with no optimizations, looks identical for test(&a, 1) and a.test(1). And that's what the statement refers to.
But again that is an implementation detail how the compiler translates c++ to machine code, and not related to c++ itself.

allocating memory on stack is bigger when disassembling code

I have the following code. I expected the size of the stack in main function to be 8 bytes on 64 bit system, but when disassembling I see strange thing: it is 16. I am using https://godbolt.org/ x86-64 GCC 9.3. So my question is why?
#include <memory>
struct my_struct {
char a[10];
int b;
char c;
short d;
int main() {
struct my_struct* s = (struct my_struct*)malloc(sizeof(struct my_struct));
printf("%lu\n", sizeof(s));
return 0;
.string "%lu\n"
push rbp
mov rbp, rsp
sub rsp, 16
mov edi, 20
call malloc
mov QWORD PTR [rbp-8], rax
mov esi, 8
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0

Why there is no `leave` instruction at function epilog on x64? [duplicate]

This question already has answers here:
Why does the x86-64 GCC function prologue allocate less stack than the local variables?
(1 answer)
Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?
(2 answers)
Closed 4 years ago.
I'm on the way to get idea how the stack works on x86 and x64 machines. What I observed however is that when I manually write a code and disassembly it, it differs from what I see in the code people provide (eg. in their questions and tutorials). Here is little example:
int add(int a, int b) {
int c = 16;
return a + b + c;
int main () {
return 0;
add(int, int):
push ebp
mov ebp, esp
sub esp, 16
mov DWORD PTR [ebp-4], 16
mov edx, DWORD PTR [ebp+8]
mov eax, DWORD PTR [ebp+12]
add edx, eax
mov eax, DWORD PTR [ebp-4]
add eax, edx
leave (!)
push ebp
mov ebp, esp
push 4
push 3
call add(int, int)
add esp, 8
mov eax, 0
leave (!)
Now goes x64
add(int, int):
push rbp
mov rbp, rsp
(?) where is `sub rsp, X`?
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov DWORD PTR [rbp-4], 16
mov edx, DWORD PTR [rbp-20]
mov eax, DWORD PTR [rbp-24]
add edx, eax
mov eax, DWORD PTR [rbp-4]
add eax, edx
(?) where is `mov rsp, rbp` before popping rbp?
pop rbp
push rbp
mov rbp, rsp
mov esi, 4
mov edi, 3
call add(int, int)
mov eax, 0
(?) where is `mov rsp, rbp` before popping rbp?
pop rbp
As you can see, my main confusion is that when I compile against x86 - I see what I expect. When it's x64 - I miss leave instruction or exact following sequence: mov rsp, rbp then pop rbp. What's worng?
It seems like leave is missing, just because it wasn't altered previously. But then, goes another question - why there is no allocation for local vars in the frame?
To this question #melpomene gives pretty straightforward answer - because of "red zone". Which basically means the function that calls no further functions (leaf) can use the first 128 bytes below the stack without allocating space. So if I insert a call inside an add() to any other dumb function - sub rsp, X and add rsp, X will be added to prologue and epilogue respectively.

A temporary array is assigned but not a temporary primary value

I am amazed that this C++ code is compiled:
int main()
return 0;
The equivalent assembly is
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-48], 0
mov QWORD PTR [rbp-40], 0
mov QWORD PTR [rbp-32], 0
mov QWORD PTR [rbp-24], 0
mov QWORD PTR [rbp-16], 0
mov DWORD PTR [rbp-48], 15
mov eax, 0
pop rbp
According to this code, an array is defined without having any name and then assigned.
Interestingly, when there is no array, the code does not compile:
int main()
(int){}=15; /* <Compilation failed> */
return 0;
1- Why is the first expression (maybe you call it assigning to an xvalue) legal in C++ for a temporary array but not the second one for a basic primary type? Why the language is designed this way?
2- What is the application of such a temporary array?

GCC generated assembly

Why printf function causes the change of prologue?
C code_1:
#include <cstdio>
int main(){
int a = 11;
printf("%d", a);
GCC -m32 generated one:
.string "%d"
lea ecx, [esp+4] // What's purpose of this three
and esp, -16 // lines?
push DWORD PTR [ecx-4] //
push ebp
mov ebp, esp
push ecx
sub esp, 20 // why sub 20?
mov DWORD PTR [ebp-12], 11
sub esp, 8
push DWORD PTR [ebp-12]
call printf
add esp, 16
mov eax, 0
mov ecx, DWORD PTR [ebp-4]
lea esp, [ecx-4]
C code_2:
#include <cstdio>
int main(){
int a = 11;
GCC -m32:
push ebp
mov ebp, esp
sub esp, 16
mov DWORD PTR [ebp-4], 11
mov eax, 0
What is the purpose of first three lines added in first code?
Please, explain first assembly code, if you can.
64-bit mode:
.string "%d"
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 11
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
The insight is that the compiler keep the stack aligned at function calls.
The alignment is 16 byte.
lea ecx, [esp+4] ;Save original ESP to ECX (ESP+4 actually)
and esp, -16 ;Align stack on 16 bytes (Lower esp)
push DWORD PTR [ecx-4] ;Push main return address (Stack at 16B + 4)
;My guess is to aid debugging tools that expect the RA
;to be at [ebp+04h]
push ebp
mov ebp, esp ;Prolog (Stack at 16B+8)
push ecx ;Save ECX (Original stack pointer) (Stack at 16B+12)
sub esp, 20 ;Reserve 20 bytes (Stack at 16B+0, ALIGNED AGAIN)
;4 for alignment + 1x16 for a variable (variable space is
;allocated in multiple of 16)
mov DWORD PTR [ebp-12], 11 ;a = 11
sub esp, 8 ;Stack at 16B+8 for later alignment
push DWORD PTR [ebp-12] ;a
push OFFSET FLAT:.LC0 ;"%d" (Stack at 16B)
call printf
add esp, 16 ;Remove args+pad from the stack (Stack at 16B)
mov eax, 0 ;Return 0
mov ecx, DWORD PTR [ebp-4] ;Restore ECX without the need to add to esp
leave ;Restore EBP
lea esp, [ecx-4] ;Restore original ESP
I don't know why the compiler saves esp+4 in ecx instead of esp (esp+4 is the address of the first parameter of main).