Function attribute [[clang::minsize]] does not behave as expected - c++

[[clang::minsize]]:
This attribute suggests that optimization passes and code generator passes make choices that keep the code size of this function as small as possible and perform optimizations that may sacrifice runtime performance in order to minimize the size of the generated code.
With the following code:
[[gnu::noinline]] int bar()
{
return rand();
}
[[gnu::noinline]] int laa()
{
return 129345;
}
[[clang::minsize]] int foo()
{
return bar() + laa();
}
When using the minsize attribute it doesn't appear to perform any optimization:
clang example
I'm expecting something vaguely similar to GCC's [[gnu::optimize("s")]] which works nicely!
gcc example
When clang is configured with Os:
foo():
push rax
call bar()
add eax, 129345
pop rcx
ret
However, with O0 + the attribute:
foo():
push rbp
mov rbp, rsp
sub rsp, 16
call bar()
mov dword ptr [rbp - 4], eax # 4-byte Spill
call laa()
mov ecx, eax
mov eax, dword ptr [rbp - 4] # 4-byte Reload
add eax, ecx
add rsp, 16
pop rbp
ret

Related

C++ compiler: inline usage of a non-inline function defined in the same module

T.hpp
class T
{
int _i;
public:
int get() const;
int some_fun();
};
T.cpp
#include "T.hpp"
int T::get() const
{ return _i; }
int T::some_fun()
{
// noise
int i = get(); // (1)
// noise
}
get() is a non-inline function, however, it's defined in the same module as some_fun. Since the compiler can see the definition of get in the context of some_fun, do compilers, in optimized builds at least, apply the optimization of replacing get() by just _i in line (1)?
If I'm not wrong, I think that, with the exception of templates, the compiler only does a one-pass parsing. What if get is defined after some_fun?
Ok, I answered myself. I thought I didn't speak assembly but it wasn't that hard to try.
Code:
class T
{
int _i = 5;
public:
int get() const;
int some_fun();
};
int T::get() const { return _i; }
int T::some_fun()
{
int i = get();
return i;
}
int main()
{
T o;
return o.some_fun();
}
Non-optimized assembly output (using godbolt.org). A lot of stuff but you can see the explicit calls:
T::get() const:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
pop rbp
ret
T::some_fun():
push rbp
mov rbp, rsp
sub rsp, 24
mov QWORD PTR [rbp-24], rdi
mov rax, QWORD PTR [rbp-24]
mov rdi, rax
call T::get() const // !!!!
mov DWORD PTR [rbp-4], eax
mov eax, DWORD PTR [rbp-4]
leave
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 5
lea rax, [rbp-4]
mov rdi, rax
call T::some_fun() // !!!!
nop
leave
ret
Optimized output (-O3):
T::get() const:
mov eax, DWORD PTR [rdi]
ret
T::some_fun():
mov eax, DWORD PTR [rdi]
ret
main:
mov eax, 5
ret
Here, some_fun has inlined the call to get (the call instruction has been removed and its definition is the same as get now), but the get function is still defined.
main went even further by doing an inline substitution of the call to some_fun and then realizing that o hasn't changed and at that point it still retains its default value of 5, so main directly returns 5 without even creating o.

Why can't the constructor return void expression? [duplicate]

This question already has answers here:
Why do constructors not return values?
(19 answers)
Closed 2 years ago.
The following example does not meet the standard:
void f();
struct A {
A() {
return f(); // should be replaced to `f(); return;`
}
};
But when the constructor is replaced with a function that returns void, this is legal.
I know this is required by the standard, as follows:
12.1 Constructors
12 No return type (not even void) shall be specified for a constructor. A return statement in the body of a constructor shall not specify a return value.
But why?
A constructor is a special method that is designed to initialize a new instance of that class. Under the hood they are not actually called (see this question), so really any return would be inaccurate since there really nothing is returned. Nothing here is different than void, because void is a type, and returning a type in a block of code which does not return would be confusing and misleading syntax.
Further, constructors are called as part of initialization and only writes the values of the arguments to sections of memory in the same way writing int n = 5, writes the value 5 to a block of memory which is referenced when n is used.
To the user the initialization process seems like it is just a function call, but is in reality a completely different process.
You had stated:
But when the constructor is replaced with a function that returns void, this is legal.
However, I don't believe that it is: Here's a link to Compiler Explorer that demonstrates that your code fails to compile.
I've tested this with x86-64 clang 10.0.1, x86-64 gcc 10.2, and x64 msvc v19.24 and all three fail to compile. Here are the errors that they are each reporting:
clang: - error: constructor A must not return void expression
where the highlight is under the return keyword.
gcc: - error: return a value from a constructor
where the highlight is under the value 0.
msvc: - error C2534: A constructor cannot return a value
where the highlight is under the entire line of code for that constructor.
This statement from your questions seems a bit misleading... I don't see how it is legal at all!
There are only 3 things a constructor can do...
Early Return
Throw an exception
Return after }; at the end of the constructor's block or scope is reached and control flow is returned back to the caller.
All functions must have a return type, even those that don't return such as:
void print(){/*...*/ return;}
int add(int a, int b) { return (a+b); }
etc...
However, constructors and destructors are never declared with a type.
You will never see:
struct A{
void A() { return; } // Fails to compile
int A() { return 0; } // Fails to compile
void ~A() { return; } // Fails to compile
int ~A() {return 0; } // Fails to compile
};
There are NO types associated with ctors and dtors, at least not in c++.
Now if you remove the (void)0 within the code from Compiler Explorer you will see the foo(): and bar(): labels and their stack frames. You will also see the main: label and its stack frame. Yet you will see nothing for A. Now if you add an instance of A in main by instantiating it with an instance of the class object, you will see the change in the assembly code within main's stack frame as it is local to the main() function.
Here is clang's assembly:
Without A a being declared...
foo(): # #foo()
push rbp
mov rbp, rsp
pop rbp
ret
bar(): # #bar()
push rbp
mov rbp, rsp
xor eax, eax
pop rbp
ret
main: # #main
push rbp
mov rbp, rsp
xor eax, eax
mov dword ptr [rbp - 4], 0
pop rbp
ret
With A a; being declared...
foo(): # #foo()
push rbp
mov rbp, rsp
pop rbp
ret
bar(): # #bar()
push rbp
mov rbp, rsp
xor eax, eax
pop rbp
ret
main: # #main
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], 0
lea rdi, [rbp - 8]
call A::A() [base object constructor]
xor eax, eax
add rsp, 16
pop rbp
ret
A::A() [base object constructor]: # #A::A() [base object constructor]
push rbp
mov rbp, rsp
mov qword ptr [rbp - 8], rdi
pop rbp
ret
Here's gcc's assembly:
Without:
foo():
push rbp
mov rbp, rsp
nop
pop rbp
ret
bar():
push rbp
mov rbp, rsp
mov eax, 0
pop rbp
ret
main:
push rbp
mov rbp, rsp
mov eax, 0
pop rbp
ret
With:
foo():
push rbp
mov rbp, rsp
nop
pop rbp
ret
bar():
push rbp
mov rbp, rsp
mov eax, 0
pop rbp
ret
A::A() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
lea rax, [rbp-1]
mov rdi, rax
call A::A() [complete object constructor]
mov eax, 0
leave
ret
Here's msvcs assembly:
Without:
void foo(void) PROC ; foo
ret 0
void foo(void) ENDP ; foo
int bar(void) PROC ; bar
xor eax, eax
ret 0
int bar(void) ENDP ; bar
main PROC
xor eax, eax
ret 0
main ENDP
With:
void foo(void) PROC ; foo
ret 0
void foo(void) ENDP ; foo
int bar(void) PROC ; bar
xor eax, eax
ret 0
int bar(void) ENDP ; bar
this$ = 8
A::A(void) PROC ; A::A, COMDAT
mov QWORD PTR [rsp+8], rcx
mov rax, QWORD PTR this$[rsp]
ret 0
A::A(void) ENDP ; A::A
a$ = 32
main PROC
$LN3:
sub rsp, 56 ; 00000038H
lea rcx, QWORD PTR a$[rsp]
call A::A(void) ; A::A
xor eax, eax
add rsp, 56 ; 00000038H
ret 0
main ENDP
As you can see from all 3 compilers when you do not have an instance of an object there is no generated assembly code. When you do have an instance of an object all compilers invoke call to A::A() or A::A(void)... Execution control enters into these constructors just like a function, however, they have no types because they are not actual functions, they are just treated like one...
They do have a stack frame, a scope, and a lifetime like a function, but these are only invoked when an object of the class's or struct's type is being declared. The assembly instructions for class constructors are only generated when an instance is being created.
They are not like a regular function where you could do this:
foo() {
A(); // this is not valid
A::A(); // this is not valid
}
However, this is valid:
foo() {
A a(); // Valid
}
Here the constructor is invoked on the object that is named a of type A.
I hope this helps to clarify why constructors or ctors don't have return types associated with them. The same thing goes for their destructors or dtors.
Edit
I think people were miss interpreting what I was trying to get at... I made a slight modification to the code: Maybe this will illustrate my intent more clearly...
Here's the C++ code:
struct A {
//A() { return (void)0; }
A() {};
};
void foo() {
A a;
return (void)0;
}
int bar() {
A a;
return 0;
}
int main() {
foo();
int baz = bar();
A a;
return 0;
}
And here's GCC's version of its Assembly:
foo(): # #foo()
push rbp
mov rbp, rsp
sub rsp, 16
lea rdi, [rbp - 8]
call A::A() [base object constructor]
add rsp, 16
pop rbp
ret
A::A() [base object constructor]: # #A::A() [base object constructor]
push rbp
mov rbp, rsp
mov qword ptr [rbp - 8], rdi
pop rbp
ret
bar(): # #bar()
push rbp
mov rbp, rsp
sub rsp, 16
lea rdi, [rbp - 8]
call A::A() [base object constructor]
xor eax, eax
add rsp, 16
pop rbp
ret
main: # #main
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], 0
call foo()
call bar()
mov dword ptr [rbp - 8], eax
lea rdi, [rbp - 16]
call A::A() [base object constructor]
xor eax, eax
add rsp, 16
pop rbp
ret
Here's the udpated link to Compiler Explorer. And if you look closely at the generated assembly, When A::A() is called there is no information, nor any assembly code in regards to a type. When foo() is called its return type void is optimized away and when bar() is called there is assembly code to store it' return value.

calls to constexpr vs inline functions compile to different assembly with optimization disabled

I came across this awesome online Compiler Explorer https://godbolt.org/
which shows assembly version of your code.
I was also reading about new C++ 11 features and found out about constexpr.
take a look at square function below :
constexpr int square(int num) {
return num * num;
}
int main()
{
int result = square(2);
return 0;
}
and following assembly code generated for two versions (constexpr and inline)
CONSTEXPR https://godbolt.org/z/c69qrevET
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 4 ; compile time constant 4 = 2*2
mov eax, 0
pop rbp
ret
INLINE https://godbolt.org/z/czaKT8fhY
square(int):
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-4]
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov edi, 2
call square(int)
mov DWORD PTR [rbp-4], eax
mov eax, 0
leave
ret
I read everywhere that functions like this can be inlined but why there's a function call code in asm version? According to inline definition it should be avoided right?
constexpr functions are not guaranteed to be executed at compile-time unless they're used in a context where a constant expression is required. Change your code to
int main()
{
constexpr int result = square(2);
return 0;
}
and you'll see a difference, because constexpr variables require to be initialized with a constant expression.
Note that optimization level also matters.

Why are duplicate implementations of templates generated when the type info is not used?

Consider:
void* global_ptr;
template<typename T>
void set_global_ptr(T* ptr)
{
global_ptr = ptr;
}
int main()
{
int foo = 123;
float bar = 456;
set_global_ptr(&foo);
set_global_ptr(&bar);
return 0;
}
On gcc 8.1 with flags -O3 -fno-inline this gets compiled to:
void set_global_ptr<int>(int*):
mov QWORD PTR global_ptr[rip], rdi
ret
void set_global_ptr<float>(float*):
mov QWORD PTR global_ptr[rip], rdi
ret
main:
sub rsp, 24
lea rdi, [rsp+8]
mov DWORD PTR [rsp+8], 123
mov DWORD PTR [rsp+12], 0x43e40000
call void set_global_ptr<int>(int*)
lea rdi, [rsp+12]
call void set_global_ptr<float>(float*)
xor eax, eax
add rsp, 24
ret
global_ptr:
.zero 8
Clang 6.0 produces something similar. I disabled inlining, otherwise no functions get generated at all.
It would make sense that if the type is not used, or gets type-erased, only one implementation should be generated. However I can see that two identical implementations of set_global_ptr were generated. Why is that?
I used Compiler Explorer to produce the assembly.

Why does gcc and clang produce very differnt code for member function template parameters?

I am trying to understand what is going on when a member function pointer is used as template parameter. I always thought that function pointers (or member function pointers) are a run-time concept, so I was wondering what happens when they are used as template parameters. For this reason I took a look a the output produced by this code:
struct Foo { void foo(int i){ } };
template <typename T,void (T::*F)(int)>
void callFunc(T& t){ (t.*F)(1); }
void callF(Foo& f){ f.foo(1);}
int main(){
Foo f;
callF(f);
callFunc<Foo,&Foo::foo>(f);
}
where callF is for comparison. gcc 6.2 produces the exact same output for both functions:
callF(Foo&): // void callFunc<Foo, &Foo::foo>(Foo&):
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov esi, 1
mov rdi, rax
call Foo::foo(int)
nop
leave
ret
while clang 3.9 produces almost the same output for callF():
callF(Foo&): # #callF(Foo&)
push rbp
mov rbp, rsp
sub rsp, 16
mov esi, 1
mov qword ptr [rbp - 8], rdi
mov rdi, qword ptr [rbp - 8]
call Foo::foo(int)
add rsp, 16
pop rbp
ret
but very different output for the template instantiation:
void callFunc<Foo, &Foo::foo>(Foo&): # #void callFunc<Foo, &Foo::foo>(Foo&)
push rbp
mov rbp, rsp
sub rsp, 32
xor eax, eax
mov cl, al
mov qword ptr [rbp - 8], rdi
mov rdi, qword ptr [rbp - 8]
test cl, 1
mov qword ptr [rbp - 16], rdi # 8-byte Spill
jne .LBB3_1
jmp .LBB3_2
.LBB3_1:
movabs rax, Foo::foo(int)
sub rax, 1
mov rcx, qword ptr [rbp - 16] # 8-byte Reload
mov rdx, qword ptr [rcx]
mov rax, qword ptr [rdx + rax]
mov qword ptr [rbp - 24], rax # 8-byte Spill
jmp .LBB3_3
.LBB3_2:
movabs rax, Foo::foo(int)
mov qword ptr [rbp - 24], rax # 8-byte Spill
jmp .LBB3_3
.LBB3_3:
mov rax, qword ptr [rbp - 24] # 8-byte Reload
mov esi, 1
mov rdi, qword ptr [rbp - 16] # 8-byte Reload
call rax
add rsp, 32
pop rbp
ret
Why is that? Is gcc taking some (possibly non-standard) shortcut?
gcc was able to figure out what the template was doing, and generated the simplest code possible. clang didn't. A compiler is permitted to perform any optimization as long as the observable results are compliant with the C++ specification. If optimizing away an intermediate function pointer, so be it. Nothing else in the code references the temporary function pointer, so it can be optimized away completely, and the whole thing replaced with a simple function call.
gcc and clang are different compilers, written by different people, with different approaches and algorithms for compiling C++.
It is natural, and expected to see different results from different compilers. In this case, gcc was able to figure things out better than clang. I'm sure there are other situations where clang will be able to figure things out better than gcc.
This test was done without any optimizations requested.
One compiler generated more verbose unoptimized code.
Unoptimized code is, quite simply, uninteresting. It is intended to be correct and easy to debug and derive directly from some intermediate representation that is easy to optimize.
The details of optimized code are what matter, barring a ridiculous and widespread slowdown that makes debugging painful.
There is nothing of interest to see or explain here.