I'm trying to experiment with code by myself here on different compilers.
I've been trying to lookup the advantages of disabling exceptions on certain functions (via the binary footprint) and to compare that to functions that don't disable exceptions, and I've actually stumbled onto a weird case where it's better to have exceptions than not.
I've been using Matt Godbolt's Compiler Explorer to do these checks, and it was checked on x86-64 clang 12.0.1 without any flags (on GCC this weird behavior doesn't exist).
Looking at this simple code:
auto* allocated_int()
{
return new int{};
}
int main()
{
delete allocated_int();
return 0;
}
Very straight-forward, pretty much deletes an allocated pointer returned from the function allocated_int().
As expected, the binary footprint is minimal, as well:
allocated_int(): # #allocated_int()
push rbp
mov rbp, rsp
mov edi, 4
call operator new(unsigned long)
mov rcx, rax
mov rax, rcx
mov dword ptr [rcx], 0
pop rbp
ret
Also, very straight-forward.
But the moment I apply the noexcept keyword to the allocated_int() function, the binary bloats. I'll apply the resulting assembly here:
allocated_int(): # #allocated_int()
push rbp
mov rbp, rsp
sub rsp, 16
mov edi, 4
call operator new(unsigned long)
mov rcx, rax
mov qword ptr [rbp - 8], rcx # 8-byte Spill
jmp .LBB0_1
.LBB0_1:
mov rcx, qword ptr [rbp - 8] # 8-byte Reload
mov rax, rcx
mov dword ptr [rcx], 0
add rsp, 16
pop rbp
ret
mov rdi, rax
call __clang_call_terminate
__clang_call_terminate: # #__clang_call_terminate
push rax
call __cxa_begin_catch
call std::terminate()
Why is clang doing this extra code for us? I didn't request any other action but calling new(), and I was expecting the binary to reflect that.
Thank you for those who can explain!
Why is clang doing this extra code for us?
Because the behaviour of the function is different.
I didn't request any other action but calling new()
By declaring the function noexcept, you've requested std::terminate to be called in case an exception propagates out of the function.
allocated_int in the first program never calls std::terminate, while
allocated_int in the second program may call std::terminate. Note that the amount of added code is much less if you remember to enable the optimiser. Comparing non-optimised assembly is mostly futile.
You can use non-throwing allocation to prevent that:
return new(std::nothrow) int{};
It's indeed an astute observation that doing potentially throwing things inside non-throwing function can introduce some extra work that wouldn't need to be done if the same things were done in a potentially throwing function.
I've been trying to lookup the advantages of disabling exceptions on certain functions
The advantage of using non-throwing is potentially realised where such function is called; not within the function itself.
Without nothrow, your function just acts as a front end to the allocation function you call. It doesn't have any real behavior of its own. In fact, in a real executable, if you do link-time optimization there's a pretty good chance that it'll completely disappear.
When you add noexcept, your code is silently transformed into something roughly like this:
auto* allocated_int()
{
try {
return new int{};
}
catch(...) {
terminate();
}
}
The extra code you see generated is what's needed to catch the exception and call terminate when/if needed.
I think I get the functionality -- passing a reference into a function passes the address, so modifications to a_val and b_val in get_point below change the values of variables in calling_func.
What I don't understand is how this is actually achieved -- are the values moved to heap space and their addresses passed into get_point? Or can addresses from the calling_func stack frame be passed into get_point and modified there?
void calling_func() {
float a, b;
get_point(a,b);
}
void get_point(float& a_val, float& b_val) {
a_val = 5.5;
b_val = 6.6;
}
Or can addresses from the calling_func stack frame be passed into get_point and modified there?
Exactly; the stack grows downwards for each function when called, and the callers stack space above is still valid when calling the callee. Usually this is achieved by passing a pointer wherever the argument would've been passed, using a lea instruction:
lea rcx, [rsp + offset to a]
lea rdx, [rsp + offset to b]
call get_point
Inside of get_point, rcx and rdx (assuming a win64 calling convention), are dereferenced and moved into xmm registers in order to operate on these variables as floating-point numbers. This is achieved for example using movss:
movss xmm0, [rcx] // this is where the actual dereferencing of the references in question happens
movss xmm1, [rdx]
Furthermore, I suggest checking out Compiler Explorer ( https://godbolt.org/ ), if you want to see the actual assembly generated by your compiler.
I created a minimal C++ program:
int main() {
return 1234;
}
and compiled it with clang++5.0 with optimization disabled (the default -O0). The resulting assembly code is:
pushq %rbp
movq %rsp, %rbp
movl $1234, %eax # imm = 0x4D2
movl $0, -4(%rbp)
popq %rbp
retq
I understand most of the lines, but I do not understand the "movl $0, -4(%rbp)". It seems the program initializes some local variable to 0. Why?
What compiler-internal detail leads to this store that doesn't correspond to anything in the source?
TL;DR : In unoptimized code your CLANG++ set aside 4 bytes for the return value of main and set it to zero as per the C++(including C++11) standards. It generated the code for a main function that didn't need it. This is a side effect of not being optimized. Often an unoptimized compiler will generate code it may need, then doesn't end up needing it, and nothing is done to clean it up.
Because you are compiling with -O0 there is a very minimum of optimizations done on code (-O0 may remove dead code etc). Trying to understand artifacts in unoptimized code is usually a wasted exercise. The results of unoptimized code are extra loads and stores and other artifacts of raw code generation.
In this case main is special because in C99/C11 and C++ the standards effectively say that when reaching the outer block of main the default return value is 0. The C11 standard says:
5.1.2.2.3 Program termination
1 If the return type of the main function is a type compatible with int, a return from the
initial call to the main function is equivalent to calling the exit function with the value
returned by the main function as its argument;11) reaching the } that terminates the
main function returns a value of 0. If the return type is not compatible with int, the
termination status returned to the host environment is unspecified.
The C++11 standard says:
3.6.1 Main function
5) A return statement in main has the effect of leaving the main function (destroying any objects with automatic
storage duration) and calling std::exit with the return value as the argument. If control reaches the end
of main without encountering a return statement, the effect is that of executing
return 0;
In the version of CLANG++ you are using the unoptimized 64-bit code by default has the return value of 0 placed at dword ptr [rbp-4].
The problem is that your test code is a bit too trivial to see how this default return value comes in to play. Here is an example that should be a better demonstration:
int main() {
int a = 3;
if (a > 3) return 5678;
else if (a == 3) return 42;
}
This code has two exit explicit exit points via return 5678 and return 42; but there isn't a final return at the end of the function. If } is reached the default is to return 0. If we examine the godbolt output we see this:
main: # #main
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0 # Default return value of 0
mov dword ptr [rbp - 8], 3
cmp dword ptr [rbp - 8], 3 # Is a > 3
jle .LBB0_2
mov dword ptr [rbp - 4], 5678 # Set return value to 5678
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_2:
cmp dword ptr [rbp - 8], 3 # Is a == 3?
jne .LBB0_4
mov dword ptr [rbp - 4], 42 # Set return value to 42
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_4:
jmp .LBB0_5 # Extraneous unoptimized jump artifact
# This is common exit point of all the returns from `main`
.LBB0_5:
mov eax, dword ptr [rbp - 4] # Use return value from memory
pop rbp
ret
As one can see the compiler has generated a common exit point that sets the return value (EAX) from the stack address dword ptr [rbp-4]. At the beginning of the code dword ptr [rbp-4] is explicitly set to 0. In the simpler case, the unoptimized code still generates that instruction but goes unused.
If you build the code with the option -ffreestanding you should see the default return value for main no longer set to 0. This is because the requirement for a default return value of 0 from main applies to a hosted environment and not a freestanding one.
I created a minimal C++ program:
int main() {
return 1234;
}
and compiled it with clang++5.0 with optimization disabled (the default -O0). The resulting assembly code is:
pushq %rbp
movq %rsp, %rbp
movl $1234, %eax # imm = 0x4D2
movl $0, -4(%rbp)
popq %rbp
retq
I understand most of the lines, but I do not understand the "movl $0, -4(%rbp)". It seems the program initializes some local variable to 0. Why?
What compiler-internal detail leads to this store that doesn't correspond to anything in the source?
TL;DR : In unoptimized code your CLANG++ set aside 4 bytes for the return value of main and set it to zero as per the C++(including C++11) standards. It generated the code for a main function that didn't need it. This is a side effect of not being optimized. Often an unoptimized compiler will generate code it may need, then doesn't end up needing it, and nothing is done to clean it up.
Because you are compiling with -O0 there is a very minimum of optimizations done on code (-O0 may remove dead code etc). Trying to understand artifacts in unoptimized code is usually a wasted exercise. The results of unoptimized code are extra loads and stores and other artifacts of raw code generation.
In this case main is special because in C99/C11 and C++ the standards effectively say that when reaching the outer block of main the default return value is 0. The C11 standard says:
5.1.2.2.3 Program termination
1 If the return type of the main function is a type compatible with int, a return from the
initial call to the main function is equivalent to calling the exit function with the value
returned by the main function as its argument;11) reaching the } that terminates the
main function returns a value of 0. If the return type is not compatible with int, the
termination status returned to the host environment is unspecified.
The C++11 standard says:
3.6.1 Main function
5) A return statement in main has the effect of leaving the main function (destroying any objects with automatic
storage duration) and calling std::exit with the return value as the argument. If control reaches the end
of main without encountering a return statement, the effect is that of executing
return 0;
In the version of CLANG++ you are using the unoptimized 64-bit code by default has the return value of 0 placed at dword ptr [rbp-4].
The problem is that your test code is a bit too trivial to see how this default return value comes in to play. Here is an example that should be a better demonstration:
int main() {
int a = 3;
if (a > 3) return 5678;
else if (a == 3) return 42;
}
This code has two exit explicit exit points via return 5678 and return 42; but there isn't a final return at the end of the function. If } is reached the default is to return 0. If we examine the godbolt output we see this:
main: # #main
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0 # Default return value of 0
mov dword ptr [rbp - 8], 3
cmp dword ptr [rbp - 8], 3 # Is a > 3
jle .LBB0_2
mov dword ptr [rbp - 4], 5678 # Set return value to 5678
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_2:
cmp dword ptr [rbp - 8], 3 # Is a == 3?
jne .LBB0_4
mov dword ptr [rbp - 4], 42 # Set return value to 42
jmp .LBB0_5 # Go to common exit point .LBB0_5
.LBB0_4:
jmp .LBB0_5 # Extraneous unoptimized jump artifact
# This is common exit point of all the returns from `main`
.LBB0_5:
mov eax, dword ptr [rbp - 4] # Use return value from memory
pop rbp
ret
As one can see the compiler has generated a common exit point that sets the return value (EAX) from the stack address dword ptr [rbp-4]. At the beginning of the code dword ptr [rbp-4] is explicitly set to 0. In the simpler case, the unoptimized code still generates that instruction but goes unused.
If you build the code with the option -ffreestanding you should see the default return value for main no longer set to 0. This is because the requirement for a default return value of 0 from main applies to a hosted environment and not a freestanding one.
What is difference between
int x=7;
and
register int x=7;
?
I am using C++.
register is a hint to the compiler, advising it to store that variable in a processor register instead of memory (for example, instead of the stack).
The compiler may or may not follow that hint.
According to Herb Sutter in "Keywords That Aren't (or, Comments by Another Name)":
A register specifier has the same
semantics as an auto specifier...
According to Herb Sutter, register is "exactly as meaningful as whitespace" and has no effect on the semantics of a C++ program.
In C++ as it existed in 2010, any program which is valid that uses the keywords "auto" or "register" will be semantically identical to one with those keywords removed (unless they appear in stringized macros or other similar contexts). In that sense the keywords are useless for properly-compiling programs. On the other hand, the keywords might be useful in certain macro contexts to ensure that improper usage of a macro will cause a compile-time error rather than producing bogus code.
In C++11 and later versions of the language, the auto keyword was re-purposed to act as a pseudo-type for objects which are initialized, which a compiler will automatically replace with the type of the initializing expression. Thus, in C++03, the declaration: auto int i=(unsigned char)5; was equivalent to int i=5; when used within a block context, and auto i=(unsigned char)5; was a constraint violation. In C++11, auto int i=(unsigned char)5; became a constraint violation while auto i=(unsigned char)5; became equivalent to auto unsigned char i=5;.
With today's compilers, probably nothing. Is was orginally a hint to place a variable in a register for faster access, but most compilers today ignore that hint and decide for themselves.
register is deprecated in C++11. It is unused and reserved in C++17.
Source: http://en.cppreference.com/w/cpp/keyword/register
Almost certainly nothing.
register is a hint to the compiler that you plan on using x a lot, and that you think it should be placed in a register.
However, compilers are now far better at determining what values should be placed in registers than the average (or even expert) programmer is, so compilers just ignore the keyword, and do what they wants.
The register keyword was useful for:
Inline assembly.
Expert C/C++ programming.
Cacheable variables declaration.
An example of a productive system, where the register keyword was required:
typedef unsigned long long Out;
volatile Out out,tmp;
Out register rax asm("rax");
asm volatile("rdtsc":"=A"(rax));
out=out*tmp+rax;
It has been deprecated since C++11 and is unused and reserved in C++17.
As of gcc 9.3, compiling using -std=c++2a, register produces a compiler warning, but it still has the desired effect and behaves identically to C's register when compiling without -O1–-Ofast optimisation flags in the respect of this answer. Using clang++-7 causes a compiler error however. So yes, register optimisations only make a difference on standard compilation with no optimisation -O flags, but they're basic optimisations that the compiler would figure out even with -O1.
The only difference is that in C++, you are allowed to take the address of the register variable which means that the optimisation only occurs if you don't take the address of the variable or its aliases (to create a pointer) or take a reference of it in the code (only on - O0, because a reference also has an address, because it's a const pointer on the stack, which, like a pointer can be optimised off the stack if compiling using -Ofast, except they will never appear on the stack using -Ofast, because unlike a pointer, they cannot be made volatile and their addresses cannot be taken), otherwise it will behave like you hadn't used register, and the value will be stored on the stack.
On -O0, another difference is that const register on gcc C and gcc C++ do not behave the same. On gcc C, const register behaves like register, because block-scope consts are not optimised on gcc. On clang C, register does nothing and only const block-scope optimisations apply. On gcc C, register optimisations apply but const at block-scope has no optimisation. On gcc C++, both register and const block-scope optimisations combine.
#include <stdio.h> //yes it's C code on C++
int main(void) {
const register int i = 3;
printf("%d", i);
return 0;
}
int i = 3;:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
register int i = 3;:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
push rbx
sub rsp, 8
mov ebx, 3
mov esi, ebx
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
mov rbx, QWORD PTR [rbp-8] //callee restoration
leave
ret
const int i = 3;
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3 //still saves to stack
mov esi, 3 //immediate substitution
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
const register int i = 3;
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 3 //loads straight into esi saving rbx push/pop and extra indirection (because C++ block-scope const is always substituted immediately into the instruction)
mov edi, OFFSET FLAT:.LC0 // can't optimise away because printf only takes const char*
mov eax, 0 //zeroed: https://stackoverflow.com/a/6212755/7194773
call printf
mov eax, 0 //default return value of main is 0
pop rbp //nothing else pushed to stack -- more efficient than leave (rsp == rbp already)
ret
register tells the compiler to 1)store a local variable in a callee saved register, in this case rbx, and 2)optimise out stack writes if address of variable is never taken. const tells the compiler to substitute the value immediately (instead of assigning it a register or loading it from memory) and write the local variable to the stack as default behaviour. const register is the combination of these emboldened optimisations. This is as slimline as it gets.
Also, on gcc C and C++, register on its own seems to create a random 16 byte gap on the stack for the first local on the stack, which doesn't happen with const register.
Compiling using -Ofast however; register has 0 optimisation effect because if it can be put in a register or made immediate, it always will be and if it can't it won't be; const still optimises out the load on C and C++ but at file scope only; volatile still forces the values to be stored and loaded from the stack.
.LC0:
.string "%d"
main:
//optimises out push and change of rbp
sub rsp, 8 //https://stackoverflow.com/a/40344912/7194773
mov esi, 3
mov edi, OFFSET FLAT:.LC0
xor eax, eax //xor 2 bytes vs 5 for mov eax, 0
call printf
xor eax, eax
add rsp, 8
ret
Consider a case when compiler's optimizer has two variables and is forced to spill one onto stack. It so happened that both variables have the same weight to the compiler. Given there is no difference, the compiler will arbitrarily spill one of the variables. On the other hand, the register keyword gives compiler a hint which variable will be accessed more frequently. It is similar to x86 prefetch instruction, but for compiler optimizer.
Obviously register hints are similar to user-provided branch probability hints, and can be inferred from these probability hints. If compiler knows that some branch is taken often, it will keep branch related variables in registers. So I suggest caring more about branch hints, and forgetting about register. Ideally your profiler should communicate somehow with the compiler and spare you from even thinking about such nuances.