Devirtualizing a non-final method - c++

Suppose I have a class setup like the following:
class A {
public:
virtual void foo() { printf("default implementation\n"); }
};
class B : public A {
public:
void foo() override { printf("B implementation\n"); }
};
class C : public B {
public:
inline void foo() final { A::foo(); }
};
int main(int argc, char **argv) {
auto c = new C();
c->foo();
}
In general, can the call to c->foo() be devirtualized and inlined down to the printf("default implementation") call? Is this guaranteed, for example in gcc? My intuition is that A::foo() is non-virtual because the class is specified explicitly, and so the printf will always be inlined.

You're asking about optimizations, so in general we have to pick a compiler and try it. We can look at the assembly output to determine if the compiler is optimizing the way you want.
Let's try GCC 5.2:
.LC0:
.string "B implementation"
B::foo():
movl $.LC0, %edi
jmp puts
.LC2:
.string "default implementation"
A::foo():
movl $.LC2, %edi
jmp puts
C::foo():
movl $.LC2, %edi
jmp puts
main:
subq $8, %rsp
movl $8, %edi
call operator new(unsigned long)
movl $.LC2, %edi
call puts
xorl %eax, %eax
addq $8, %rsp
ret
And let's try out Clang 3.6:
main: # #main
pushq %rax
movl $.Lstr, %edi
callq puts
xorl %eax, %eax
popq %rdx
retq
.Lstr:
.asciz "default implementation"
In both cases you can see that pretty clearly that all of the virtual functions have been inlined.
"Is this guaranteed, for example in gcc?"
If the compiler is confident about what the actual type of an object is, then I suspect this optimization will always happen. I don't have anything to back up this claim though.

Related

Throwing exception causes SIGSEGV on OSX 10.11.4 + clang

Given the following code:
#include <stdexcept>
#include <string>
using namespace std;
class exception_base : public runtime_error {
public:
exception_base()
: runtime_error(string()) { }
};
class my_exception : public exception_base {
public:
};
int main() {
throw my_exception();
}
This works fine on GNU/Linux and Windows and used to work fine on OSX before the latest update to version 10.11.4. By fine I mean since nothing catches the exception, std::terminate is called.
However, on OSX 10.11.4 using clang (LLVM 7.3.0), the program crashes with segmentation fault. The stack trace is not helpful:
Program received signal SIGSEGV, Segmentation fault.
0x0000000100000ad1 in main () at test.cpp:17
17 throw my_exception();
(gdb) bt
#0 0x0000000100000ad1 in main () at test.cpp:17
(gdb)
Nor is what valgrind has to say about this:
==6500== Process terminating with default action of signal 11 (SIGSEGV)
==6500== General Protection Fault
==6500== at 0x100000AD1: main (test.cpp:17)
I don't think that code violates the standard in any way. Am I missing something here?
Note that even if I add a try-catch around the throw the code still crashes due to SIGSEGV.
If you look at the disassembly, you will see that a general-protection (GP) exception is occurring on an SSE movaps instruction:
a.out`main:
0x100000ad0 : pushq %rbp
0x100000ad1 : movq %rsp, %rbp
0x100000ad4 : subq $0x20, %rsp
0x100000ad8 : movl $0x0, -0x4(%rbp)
0x100000adf : movl $0x10, %eax
0x100000ae4 : movl %eax, %edi
0x100000ae6 : callq 0x100000dea ; symbol stub for: __cxa_allocate_exception
0x100000aeb : movq %rax, %rdi
0x100000aee : xorps %xmm0, %xmm0
-> 0x100000af1 : movaps %xmm0, (%rax)
0x100000af4 : movq %rdi, -0x20(%rbp)
0x100000af8 : movq %rax, %rdi
0x100000afb : callq 0x100000b40 ; my_exception::my_exception
...
Before the my_exception::my_exception() constructor is even called, a movaps instruction is used to zero out the block of memory returned by __cxa_allocate_exception(size_t). However, this pointer (0x0000000100103498 in my case) is not guaranteed to be 16-byte aligned. When the source or destination operand of a movaps instruction is a memory operand, the operand must be aligned on a 16-byte boundary or else a GP exception is generated.
One way to fix the problem temporarily is to compile without SSE instructions (-mno-sse). It's not an ideal solution because SSE instructions can improve performance.
I think that this is related to http://reviews.llvm.org/D18479 :
r246985 made changes to give a higher alignment for exception objects on the grounds that Itanium says _Unwind_Exception should be "double-word" aligned and the structure is normally declared with __attribute__((aligned)) guaranteeing 16-byte alignment. It turns out that libc++abi doesn't declare the structure with __attribute__((aligned)) and therefore only guarantees 8-byte alignment on 32-bit and 64-bit platforms. This caused a crash in some cases when the backend emitted SIMD store instructions that requires 16-byte alignment (such as movaps).
This patch makes ItaniumCXXABI::getAlignmentOfExnObject return an 8-byte alignment on Darwin to fix the crash.
.. which patch was committed on March 31, 2016 as r264998.
There's also https://llvm.org/bugs/show_bug.cgi?id=24604 and https://llvm.org/bugs/show_bug.cgi?id=27208 which appear related.
UPDATE I installed Xcode 7.3.1 (released yesterday) and the problem appears to be fixed; the generated assembly is now:
a.out`main:
0x100000ac0 : pushq %rbp
0x100000ac1 : movq %rsp, %rbp
0x100000ac4 : subq $0x20, %rsp
0x100000ac8 : movl $0x0, -0x4(%rbp)
0x100000acf : movl $0x10, %eax
0x100000ad4 : movl %eax, %edi
0x100000ad6 : callq 0x100000dea ; symbol stub for: __cxa_allocate_exception
0x100000adb : movq %rax, %rdi
0x100000ade : movq $0x0, 0x8(%rax)
0x100000ae6 : movq $0x0, (%rax)
0x100000aed : movq %rdi, -0x20(%rbp)
0x100000af1 : movq %rax, %rdi
0x100000af4 : callq 0x100000b40 ; my_exception::my_exception
...

why object code generated for noexcept and throw() is same in c++11?

Code using noexcept .
//hello.cpp
class A{
public:
A(){}
~A(){}
};
void fun() noexcept{ //c++11 style
A a[10];
}
int main()
{
fun();
}
Code using throw() .
//hello1.cpp
class A{
public:
A(){}
~A(){}
};
void fun() throw(){//c++98 style
A a[10];
}
int main()
{
fun();
}
As per various online links and scott meyer's book "If, at runtime, an exception leaves fun, fun’s exception specification is violated. With the
C++98 exception specification, the call stack is unwound to f’s caller, and, after some
actions not relevant here, program execution is terminated. With the C++11 exception
specification, runtime behavior is slightly different: the stack is only possibly
unwound before program execution is terminated." He said code using noexcept is more optimized than code using throw() .
But when I have generated machine code for above program , i found code generated for both cases is exactly same .
$ g++ --std=c++11 hello1.cpp -O0 -S -o throw1.s
$ g++ --std=c++11 hello.cpp -O0 -S -o throw.s
diff is below .
$ diff throw.s throw1.s
1c1
< .file "hello.cpp"
---
> .file "hello1.cpp"
Machine code generated is as below for function "fun" for both cases .
.LFB1202:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
pushq %r12
pushq %rbx
subq $16, %rsp
.cfi_offset 12, -24
.cfi_offset 3, -32
leaq -32(%rbp), %rax
movl $9, %ebx
movq %rax, %r12
jmp .L5
.L6:
movq %r12, %rdi
call _ZN1AC1Ev
addq $1, %r12
subq $1, %rbx
.L5:
cmpq $-1, %rbx
jne .L6
leaq -32(%rbp), %rbx
addq $10, %rbx
.L8:
leaq -32(%rbp), %rax
cmpq %rax, %rbx
je .L4
subq $1, %rbx
movq %rbx, %rdi
call _ZN1AD1Ev
jmp .L8
.L4:
addq $16, %rsp
popq %rbx
popq %r12
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE1202:
.size _Z3funv, .-_Z3funv
.globl main
.type main, #function
What is advantage of using noexcept when noexcept and throw are generating the same code ?
They are generating the same code because you are not throwing anything. Your test program is so simple that the compiler can trivially analyze it, determine that it is not throwing an exception, and in fact not doing anything at all! With optimizations enabled (-O1 and higher), the object code:
fun():
rep ret
main:
xor eax, eax
ret
shows that your test code is being optimized simply to the most trivial valid C++ application:
int main()
{
return 0;
}
If you want to really test the difference in object code generation for the two types of exception specifiers, you need to use a real (i.e., non-trivial) test program. Something that actually throws an exception, and where that throw cannot be analyzed out by a bit of compile-time analysis:
void fun(int args) throw() // C++98 style
{
if (args == 0)
{
throw "Not enough arguments!";
}
else
{
// do something
}
}
int main(int argc, char** argv)
{
fun(argc);
return 0;
}
In this code, an exception is conditionally thrown depending on the value of an input parameter (argc) passed to the main function. It is impossible for the compiler to know, at compile-time, what the value of this argument will be, so it cannot optimize out either this conditional check or the throwing of the exception. That forces it to generate exception-throwing and stack-unwinding code.
Now we can compare the resulting object code. Using GCC 5.3, with -O3 and -std=c++11, I get the following:
C++98 style (throw())
.LC0:
.string "Not enough arguments!"
fun(int):
test edi, edi
je .L9
rep ret
.L9:
push rax
mov edi, 8
call __cxa_allocate_exception
xor edx, edx
mov QWORD PTR [rax], OFFSET FLAT:.LC0
mov esi, OFFSET FLAT:typeinfo for char const*
mov rdi, rax
call __cxa_throw
add rdx, 1
mov rdi, rax
je .L4
call _Unwind_Resume
.L4:
call __cxa_call_unexpected
main:
sub rsp, 8
call fun(int)
xor eax, eax
add rsp, 8
ret
C++11 style (noexcept)
.LC0:
.string "Not enough arguments!"
fun(int) [clone .part.0]:
push rax
mov edi, 8
call __cxa_allocate_exception
xor edx, edx
mov QWORD PTR [rax], OFFSET FLAT:.LC0
mov esi, OFFSET FLAT:typeinfo for char const*
mov rdi, rax
call __cxa_throw
fun(int):
test edi, edi
je .L8
rep ret
.L8:
push rax
call fun(int) [clone .part.0]
main:
test edi, edi
je .L12
xor eax, eax
ret
.L12:
push rax
call fun(int) [clone .part.0]
Note that they are clearly different. Just as Meyers et al. have claimed, the C++98 style throw() specification, which indicates that a function does not throw, causes a standards-compliant compiler to emit code to unwind the stack and call std::unexpected when an exception is thrown from inside of that function. That is exactly what happens here. Because fun is marked throw() but in fact does throw, the object code shows the compiler emitting a call to __cxa_call_unexpected.
Clang is also standards-compliant here and does the same thing. I won't reproduce the object code, because it's longer and harder to follow (you can see it on Matt Godbolt's excellent site), but putting the C++98 style exception specification on the function causes the compiler to explicitly call std::terminate if the function throws in violation of its specification, whereas the C++11 style exception specification does not result in a call to std::terminate.

When can/will a function be inlined in C++? Can inline behavior be forced?

I am trying to get the expected behavior when I use the keyword inline.
I tried calling a function in different files, templating the function, using different implementation of the inline function, but whatever I do, the compiler is never inlining the function.
So in which case exactly will the compiler chose to inline a function in C++ ?
Here is the code I have tried :
inline auto Add(int i) -> int {
return i+1;
}
int main() {
Add(1);
return 0;
}
In this case, I get:
Add(int):
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
addl $1, %eax
popq %rbp
ret
main:
pushq %rbp
movq %rsp, %rbp
movl $1, %edi
call Add(int)
movl $0, %eax
popq %rbp
ret
Or again,
template<typename T>
inline auto Add(const T &i) -> decltype(i+1) {
return i+1;
}
int main() {
Add(1);
return 0;
}
And I got:
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl $1, -4(%rbp)
leaq -4(%rbp), %rax
movq %rax, %rdi
call decltype ({parm#1}+(1)) Add<int>(int const&)
movl $0, %eax
leave
ret
decltype ({parm#1}+(1)) Add<int>(int const&):
pushq %rbp
movq %rsp, %rbp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rax
movl (%rax), %eax
addl $1, %eax
popq %rbp
ret
I used https://gcc.godbolt.org/ to get the assembly code here, but I also tried on my machine with clang and gcc (with and without optimization options).
EDIT:
Ok, I was missing something with the optimization options. If I set GCC to use o3 optimization level, my method is inlined.
But still. How does GCC, or another compiler, know when it is better to inline a function or not ?
As a rule, your code is always inlined only if you specify:
__attribute__((always_inline))
eg (from gcc documentation):
inline void foo (const char) __attribute__((always_inline));
Though it is almost never a good idea to force your compiler to inline your code.
You may set a high optimization level (though the O flag) to achieve maximum inlining, but for more details please see the gcc documentation
Inlining is actually controlled by a number of parameters. You can set them using the -finline-* options. You can have a look at them here
By the way, you did not actually declare a function. You declared a functor, an object that can be called, but can also store state. Instead of using the syntax:
inline auto Add(int i) -> int {
you mean to say simply:
inline int Add(int i) {

Strange Clang behaviour

Have a look at this piece of code:
#include <iostream>
#include <string>
void foo(int(*f)()) {
std::cout << f() << std::endl;
}
void foo(std::string(*f)()) {
std::string s = f();
std::cout << s << std::endl;
}
int main() {
auto bar = [] () -> std::string {
return std::string("bla");
};
foo(bar);
return 0;
}
Compiling it with
g++ -o test test.cpp -std=c++11
leads to:
bla
like it should do. Compiling it with
clang++ -o test test.cpp -std=c++11 -stdlib=libc++
leads to:
zsh: illegal hardware instruction ./test
And Compiling it with
clang++ -o test test.cpp -std=c++11 -stdlib=stdlibc++
leads also to:
zsh: illegal hardware instruction ./test
Clang/GCC Versions:
clang version 3.2 (tags/RELEASE_32/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
gcc version 4.7.2 (Gentoo 4.7.2-r1 p1.5, pie-0.5.5)
Anyone any suggestions what is going wrong?
Thanks in advance!
Yes, it is a bug in Clang++. I can reproduce it with CLang 3.2 in i386-pc-linux-gnu.
And now some random analysis...
I've found that the bug is in the conversion from labmda to pointer-to-function: the compiler creates a kind of thunk with the appropriate signature that calls the lambda, but it has the instruction ud2 instead of ret.
The instruction ud2, as you all probably know, is an instruction that explicitly raises the "Invalid Opcode" exception. That is, an instruction intentionally left undefined.
Take a look at the disassemble: this is the thunk function:
main::$_0::__invoke():
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl 8(%ebp), %eax
movl %eax, (%esp)
movl %ecx, 4(%esp)
calll main::$_0::operator()() const ; this calls to the real lambda
subl $4, %esp
ud2 ; <<<-- What the...!!!
So a minimal example of the bug will be simply:
int main() {
std::string(*f)() = [] () -> std::string {
return "bla";
};
f();
return 0;
}
Curiously enough, the bug doesn't happen if the return type is a simple type, such as int. Then the generated thunk is:
main::$_0::__invoke():
pushl %ebp
movl %esp, %ebp
subl $8, %esp
movl %eax, (%esp)
calll main::$_0::operator()() const
addl $8, %esp
popl %ebp
ret
I suspect that the problem is in the forwarding of the return value. If it fits in a register, such as eax all goes well. But if it is a big struct, such as std::string it is returned in the stack, the compiler is confused and emits the ud2 in desperation.
This is most likely a bug in clang 3.2. I can't reproduce the crash with clang trunk.

c++ Function pointer inlining

I know I can pass a function pointer as a template parameter and get a call to it inlined but I wondered if any compilers these days can inline an 'obvious' inline-able function like:
inline static void Print()
{
std::cout << "Hello\n";
}
....
void (*func)() = Print;
func();
Under Visual Studio 2008 its clever enough to get it down to a direct call instruction so it seems a shame it can't take it a step further?
Newer releases of GCC (4.4 and up) have an option named -findirect-inlining. If GCC can prove to itself that the function pointer is constant then it makes a direct call to the function or inlines the function entirely.
GNU's g++ 4.5 inlines it for me starting at optimization level -O1
main:
subq $8, %rsp
movl $6, %edx
movl $.LC0, %esi
movl $_ZSt4cout, %edi
call _ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_E
movl $0, %eax
addq $8, %rsp
ret
where .LC0 is the .string "Hello\n".
To compare, with no optimization, g++ -O0, it did not inline:
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movq $_ZL5Printv, -8(%rbp)
movq -8(%rbp), %rax
call *%rax
movl $0, %eax
leave
ret
Well the compiler doesn't really know if that variable will be overwritten somewhere or not (maybe in another thread?) so it errs on the side of caution and implements it as a function call.
I just checked in VS2010 in a release build and it didn't get inlined.
By the way, you decorating the function as inline is useless. The standard says that if you ever get the address of a function, any inline hint will be ignored.
edit: note however that while your function didn't get inlined, the variable IS gone. In the disassembly the call uses a direct address, it doesn't load the variable in a register and calls that.