Why doesn't `clang -S` generate assembly code for member functions? - c++

Let's say I want to get clang to show me what assembly it generates for Node::Destroy in the following code:
struct Node {
Node* next = nullptr;
int x;
~Node() {
delete next;
}
void Destroy() {
delete this;
}
};
If I run clang++ -S foo.cc then it gives essentially empty output:
.text
.file "foo.cc"
.ident "Debian clang version 14.0.6-2"
.section ".note.GNU-stack","",#progbits
.addrsig
But if I change Destroy to a free-standing function that accepts Node* rather than a member function, then it actually does generate assembly:
.text
.file "foo.cc"
.globl _Z7DestroyP4Node # -- Begin function _Z7DestroyP4Node
.p2align 4, 0x90
.type _Z7DestroyP4Node,#function
_Z7DestroyP4Node: # #_Z7DestroyP4Node
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movq %rdi, -8(%rbp)
[...]
The same is true in compiler explorer. What's the reason for this difference?

Related

C++ [[noreturn]] function call and destructors

I have some C++ code in which I must be sure that a specific destructor is called before exiting and I was wondering whether or not it was called before a [[noreturn]] function.
So I wrote this simple dummy example
#include <cstdio>
#include <cstdlib>
class A {
char *i;
public:
A() : i{new char[4]} {}
~A() { delete[] i; }
void hello() { puts(i); }
};
int func()
{
A b;
exit(1);
b.hello(); // Not reached
}
I compiled with g++ /tmp/l.cc -S -O0 and I got this assembly
.file "l.cc"
.text
.section .text._ZN1AC2Ev,"axG",#progbits,_ZN1AC5Ev,comdat
.align 2
.weak _ZN1AC2Ev
.type _ZN1AC2Ev, #function
_ZN1AC2Ev:
.LFB18:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
movq %rdi, -8(%rbp)
movl $4, %edi
call _Znam
movq %rax, %rdx
movq -8(%rbp), %rax
movq %rdx, (%rax)
nop
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE18:
.size _ZN1AC2Ev, .-_ZN1AC2Ev
.weak _ZN1AC1Ev
.set _ZN1AC1Ev,_ZN1AC2Ev
.text
.globl func
.type func, #function
func:
.LFB24:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $16, %rsp
leaq -8(%rbp), %rax
movq %rax, %rdi
call _ZN1AC1Ev
movl $1, %edi
call exit
.cfi_endproc
.LFE24:
.size func, .-func
.ident "GCC: (GNU) 12.2.1 20221121 (Red Hat 12.2.1-4)"
.section .note.GNU-stack,"",#progbits
There was clearly no call to the destructor.
In this stupid case it doesn't matter much, but what if I had to close a file before exiting?
Apart from the fact that terminating a program with exit() is generally considered bad practice, you could try the following:
int func()
{
{
A b;
/* ... */
} // Leaving scope => destructing b
exit(1);
}
PS: Assuming that you aren't writing a driver, most kernels (including Microsoft Windows NT, Unix (e.g. BSD), XNU (macOS) and Linux) automatically deallocate any allocated memory as the program exits.

What is the meaning of the assembly output of the new operator?

The new operator returns the address of some newly allocated memory from the heap to hold some data until it is no longer needed.
Using GCC, I stopped the compiler immediately after preprocessing and compilation proper, and this is the assembly code generated.
What (if anything) does the compiler generate in order to request memory dynamically?
Since memory management is handled by the OS, there must be some platform + CPU specific processes happening; when does the process transition from platform independent (source code) to dependent (memory management, executable, etc.)?
C++
int main()
{
int* x = new int;
}
Assembly
.file "main.cpp"
.def __main; .scl 2; .type 32; .endef
.text
.globl main
.def main; .scl 2; .type 32; .endef
.seh_proc main
main:
.LFB0:
pushq %rbp
.seh_pushreg %rbp
movq %rsp, %rbp
.seh_setframe %rbp, 0
subq $48, %rsp
.seh_stackalloc 48
.seh_endprologue
call __main
movl $4, %ecx
call _Znwy
movq %rax, -8(%rbp)
movl $0, %eax
addq $48, %rsp
popq %rbp
ret
.seh_endproc
.ident "GCC: (x86_64-posix-seh-rev1, Built by MinGW-W64 project) 7.2.0"
.def _Znwy; .scl 2; .type 32; .endef

Virtual function compiler optimization c++

class Base
{
public:
virtual void fnc(size_t nm)
{
// do some work here
}
void process()
{
for(size_t i = 0; i < 1000; i++)
{
fnc(i);
}
}
}
Can and will the c++ compiler optimize calls to the fnc function from the process funtion, considering its going to be the same function every time it's invoked inside the loop ?
Or is it gonna fetch the function adress from the vtable every time the function is invoked ?
I checked an example on godbolt.org. the result is that NO, none of the compiler optimise that.
Here's the test source:
class Base
{
public:
// made it pure virtual to decrease clutter
virtual void fnc(int nm) =0;
void process()
{
for(int i = 0; i < 1000; i++)
{
fnc(i);
}
}
};
void test(Base* b ) {
return b->process();
}
and the generated asm:
test(Base*):
push rbp ; setup function call
push rbx
mov rbp, rdi ; Base* rbp
xor ebx, ebx ; int ebx=0;
sub rsp, 8 ; advance stack ptr
.L2:
mov rax, QWORD PTR [rbp+0] ; read 8 bytes from our Base*
; rax now contains vtable ptr
mov esi, ebx ; int parameter for fnc
add ebx, 1 ; i++
mov rdi, rbp ; (Base*) this parameter for fnc
call [QWORD PTR [rax]] ; read vtable and call fnc
cmp ebx, 1000 ; back to the top of the loop
jne .L2
add rsp, 8 ; reset stack ptr and return
pop rbx
pop rbp
ret
as you can see it reads the vtable on each call. I guess it's because the compiler can't prove that you don't change the vtable inside the function call (e.g. if you call placement new or something silly), so, technically, the virtual function call could change between iterations.
Usually, compilers are allowed to optimize anything that doesn't change the observable behavior of a program. There are some exceptions, such as eliding non-trivial copy constructors when returning from a function, but it can be assumed that any change in expected code generation that does not change the output or the side effects of a program in the C++ Abstract Machine can be done by the compiler.
So, can devirtualizing a function change the observable behavior? According to this article, yes.
Relevant passage:
[...] optimizer will have to assume that [virtual function] might
change the vptr in passed object. [...]
void A::foo() { // virtual
static_assert(sizeof(A) == sizeof(Derived));
new(this) Derived;
}
This is call of placement new operator - it doesn’t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derived’s vtable. Is this code even legal? C++ Standard says yes."
Therefore, if the compiler does not have access to the definition of the virtual function (and know the concrete type of *this at compile type), then this optimization is risky.
According to this same article, you use -fstrict-vtable-pointers on Clang to allow this optimization, at the risk of making your code less C++ Standard complying.
I wrote a very small implementation and compiled them using g++ --save-temps opt.cpp. This flag kept the temporary preprocessed file, assembly file, & object file. I ran it once with the virtual keyword and once without. Here's the program.
class Base
{
public:
virtual int fnc(int nm)
{
int i = 0;
i += 3;
return i;
}
void process()
{
int x = 9;
for(int i = 0; i < 1000; i++)
{
x += i;
}
}
};
int main(int argc, char* argv[]) {
Base b;
return 0;
}
When I ran with the virtual keyword the resulting assembly on an x86_64 Linux box was:
.file "opt.cpp"
.section .text._ZN4Base3fncEi,"axG",#progbits,_ZN4Base3fncEi,comdat
.align 2
.weak _ZN4Base3fncEi
.type _ZN4Base3fncEi, #function
_ZN4Base3fncEi:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movq %rdi, -24(%rbp)
movl %esi, -28(%rbp)
movl $0, -4(%rbp)
addl $3, -4(%rbp)
movl -4(%rbp), %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size _ZN4Base3fncEi, .-_ZN4Base3fncEi
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
leaq 16+_ZTV4Base(%rip), %rax
movq %rax, -16(%rbp)
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L5
call __stack_chk_fail#PLT
.L5:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.weak _ZTV4Base
.section .data.rel.ro.local._ZTV4Base,"awG",#progbits,_ZTV4Base,comdat
.align 8
.type _ZTV4Base, #object
.size _ZTV4Base, 24
_ZTV4Base:
.quad 0
.quad _ZTI4Base
.quad _ZN4Base3fncEi
.weak _ZTI4Base
.section .data.rel.ro._ZTI4Base,"awG",#progbits,_ZTI4Base,comdat
.align 8
.type _ZTI4Base, #object
.size _ZTI4Base, 16
_ZTI4Base:
.quad _ZTVN10__cxxabiv117__class_type_infoE+16
.quad _ZTS4Base
.weak _ZTS4Base
.section .rodata._ZTS4Base,"aG",#progbits,_ZTS4Base,comdat
.type _ZTS4Base, #object
.size _ZTS4Base, 6
_ZTS4Base:
.string "4Base"
.ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
.section .note.GNU-stack,"",#progbits
Without the virtual keyword, the final assembly was:
.file "opt.cpp"
.text
.globl main
.type main, #function
main:
.LFB2:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -20(%rbp)
movq %rsi, -32(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE2:
.size main, .-main
.ident "GCC: (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005"
.section .note.GNU-stack,"",#progbits
Now unlike the Posted question, this example doesn't even utilize the virtual method and the resulting assembly is much larger. I did not try compiling with optimizations but give it a go.

Do C++ compilers typically optimize out static (global) references?

Let's say I have a static reference to a static object or primitive in the global namespace (or any other namespace):
int a = 2;
int& b = a;
int main(int argc, char** argv) {
b++;
return b;
}
This is a very basic example, but what does a compiler usually do with this code? Will the resulting machine code actually traverse a pointer to read/write a, or will the compiler just insert the address of a in place of b?
The answer to this will obviously be compiler specific. I decided to try with clang-500.2.79 on x86-64 and with the -O3 flag. As given, your source yields:
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl _a(%rip), %eax
incl %eax
movl %eax, _a(%rip)
popq %rbp
ret
.cfi_endproc
.section __DATA,__data
.globl _a ## #a
.align 2
_a:
.long 2 ## 0x2
.section __DATA,__const
.globl _b ## #b
.align 3
_b:
.quad _a
As you can see, both the symbols a and b are retained (munged to _a and _b); this is required because these symbols have global linkage.
If you change your code slightly, to declare a and b as static, the result is quite different:
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
movl $3, %eax
popq %rbp
ret
.cfi_endproc
In this case, the compiler is able to optimize a and b away completely and just have main return the constant 3, because there's no way that another compilation unit can screw with the values.
I try to disassamble following code,
int pranit = 2;
int& sumit = pranit;
int main(int argc, char** argv) {
sumit++;
return sumit;
}
And following instruction suggest pranit has address of sumit.
013B13C8 8B15 04803B01 MOV EDX,DWORD PTR [sumit] ; ConsoleA.pranit
Moreover both variables have different address,
Names in ConsoleA, item 313
Address=013B8004
Section=.data
Type=Library
Name=sumit
Names in ConsoleA, item 257
Address=013B8000
Section=.data
Type=Library
Name=pranit
I have used OllyDbg as disassembler.

g++ incorrect loop?

I have a real world program that is similar to this one, which I'll call test.cpp:
#include <stdlib.h>
extern void f(size_t i);
int sample(size_t x)
{
size_t a = x;
size_t i;
for (i = a-2; i>=0; i--) {
f(i);
}
}
And my problem is that i is an infinite loop.
If I run the following command:
g++ -S -o test.s test.cpp
I get the following assembly sequence:
.file "test.cpp"
.text
.globl _Z6samplem
.type _Z6samplem, #function
_Z6samplem:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $32, %rsp
movq %rdi, -24(%rbp)
movq -24(%rbp), %rax
movq %rax, -8(%rbp)
movq -8(%rbp), %rax
subq $2, %rax
movq %rax, -16(%rbp)
.L2:
movq -16(%rbp), %rax
movq %rax, %rdi
call _Z1fm
subq $1, -16(%rbp)
jmp .L2
.cfi_endproc
.LFE0:
.size _Z6samplem, .-_Z6samplem
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
I'm no expert in assembly language, but I would expect to see code for the comparison i >= 0 and a conditional jump out of the loop. What's going on here??
GNU C++ 4.6.3 on Ubuntu Linux
size_t is unsigned, so the condition i>=0 is always true. It is impossible for i to be negative.