Why does LLVM bitcode have duplicate symbols for constructors? [duplicate] - llvm

Today, I discovered a rather interesting thing about either g++ or nm...constructor definitions appear to have two entries in libraries.
I have a header thing.hpp:
class Thing
{
Thing();
Thing(int x);
void foo();
};
And thing.cpp:
#include "thing.hpp"
Thing::Thing()
{ }
Thing::Thing(int x)
{ }
void Thing::foo()
{ }
I compile this with:
g++ thing.cpp -c -o libthing.a
Then, I run nm on it:
%> nm -gC libthing.a
0000000000000030 T Thing::foo()
0000000000000022 T Thing::Thing(int)
000000000000000a T Thing::Thing()
0000000000000014 T Thing::Thing(int)
0000000000000000 T Thing::Thing()
U __gxx_personality_v0
As you can see, both of the constructors for Thing are listed with two entries in the generated static library. My g++ is 4.4.3, but the same behavior happens in clang, so it isn't just a gcc issue.
This doesn't cause any apparent problems, but I was wondering:
Why are defined constructors listed twice?
Why doesn't this cause "multiple definition of symbol __" problems?
EDIT: For Carl, the output without the C argument:
%> nm -g libthing.a
0000000000000030 T _ZN5Thing3fooEv
0000000000000022 T _ZN5ThingC1Ei
000000000000000a T _ZN5ThingC1Ev
0000000000000014 T _ZN5ThingC2Ei
0000000000000000 T _ZN5ThingC2Ev
U __gxx_personality_v0
As you can see...the same function is generating multiple symbols, which is still quite curious.
And while we're at it, here is a section of generated assembly:
.globl _ZN5ThingC2Ev
.type _ZN5ThingC2Ev, #function
_ZN5ThingC2Ev:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
.LFE1:
.size _ZN5ThingC2Ev, .-_ZN5ThingC2Ev
.align 2
.globl _ZN5ThingC1Ev
.type _ZN5ThingC1Ev, #function
_ZN5ThingC1Ev:
.LFB2:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
So the generated code is...well...the same.
EDIT: To see what constructor actually gets called, I changed Thing::foo() to this:
void Thing::foo()
{
Thing t;
}
The generated assembly is:
.globl _ZN5Thing3fooEv
.type _ZN5Thing3fooEv, #function
_ZN5Thing3fooEv:
.LFB550:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp
movq %rdi, -40(%rbp)
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingC1Ev
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingD1Ev
leave
ret
.cfi_endproc
So it is invoking the complete object constructor.

We'll start by declaring that GCC follows the Itanium C++ ABI.
According to the ABI, the mangled name for your Thing::foo() is easily parsed:
_Z | N | 5Thing | 3foo | E | v
prefix | nested | `Thing` | `foo`| end nested | parameters: `void`
You can read the constructor names similarly, as below. Notice how the constructor "name" isn't given, but instead a C clause:
_Z | N | 5Thing | C1 | E | i
prefix | nested | `Thing` | Constructor | end nested | parameters: `int`
But what's this C1? Your duplicate has C2. What does this mean?
Well, this is quite simple too:
<ctor-dtor-name> ::= C1 # complete object constructor
::= C2 # base object constructor
::= C3 # complete object allocating constructor
::= D0 # deleting destructor
::= D1 # complete object destructor
::= D2 # base object destructor
Wait, why is this simple? This class has no base. Why does it have a "complete object constructor" and a "base object constructor" for each?
This Q&A implies to me that this is simply a by-product of polymorphism support, even though it's not actually required in this case.
Note that c++filt used to include this information in its demangled output, but doesn't any more.
This forum post asks the same question, and the only response doesn't do any better at answering it, except for the implication that GCC could avoid emitting two constructors when polymorphism is not involved, and that this behaviour ought to be improved in the future.
This newsgroup posting describes a problem with setting breakpoints in constructors due to this dual-emission. It's stated again that the root of the issue is support for polymorphism.
In fact, this is listed as a GCC "known issue":
G++ emits two copies of constructors and destructors.
In general there are three types of constructors (and
destructors).
The complete object constructor/destructor.
The base object constructor/destructor.
The allocating constructor/deallocating destructor.
The first two are different, when virtual base classes are
involved.
The meaning of these different constructors seems to be as follows:
The "complete object constructor". It additionally constructs virtual base classes.
The "base object constructor". It creates the object itself, as well as data members and non-virtual base classes.
The "allocating object constructor". It does everything the complete object constructor does, plus it calls operator new to actually allocate the memory... but apparently this is not usually seen.
If you have no virtual base classes, [the first two] are are
identical; GCC will, on sufficient optimization levels, actually alias
the symbols to the same code for both.

Related

Why is the constructor of a class in a shared library exported twice? [duplicate]

Today, I discovered a rather interesting thing about either g++ or nm...constructor definitions appear to have two entries in libraries.
I have a header thing.hpp:
class Thing
{
Thing();
Thing(int x);
void foo();
};
And thing.cpp:
#include "thing.hpp"
Thing::Thing()
{ }
Thing::Thing(int x)
{ }
void Thing::foo()
{ }
I compile this with:
g++ thing.cpp -c -o libthing.a
Then, I run nm on it:
%> nm -gC libthing.a
0000000000000030 T Thing::foo()
0000000000000022 T Thing::Thing(int)
000000000000000a T Thing::Thing()
0000000000000014 T Thing::Thing(int)
0000000000000000 T Thing::Thing()
U __gxx_personality_v0
As you can see, both of the constructors for Thing are listed with two entries in the generated static library. My g++ is 4.4.3, but the same behavior happens in clang, so it isn't just a gcc issue.
This doesn't cause any apparent problems, but I was wondering:
Why are defined constructors listed twice?
Why doesn't this cause "multiple definition of symbol __" problems?
EDIT: For Carl, the output without the C argument:
%> nm -g libthing.a
0000000000000030 T _ZN5Thing3fooEv
0000000000000022 T _ZN5ThingC1Ei
000000000000000a T _ZN5ThingC1Ev
0000000000000014 T _ZN5ThingC2Ei
0000000000000000 T _ZN5ThingC2Ev
U __gxx_personality_v0
As you can see...the same function is generating multiple symbols, which is still quite curious.
And while we're at it, here is a section of generated assembly:
.globl _ZN5ThingC2Ev
.type _ZN5ThingC2Ev, #function
_ZN5ThingC2Ev:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
.LFE1:
.size _ZN5ThingC2Ev, .-_ZN5ThingC2Ev
.align 2
.globl _ZN5ThingC1Ev
.type _ZN5ThingC1Ev, #function
_ZN5ThingC1Ev:
.LFB2:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
So the generated code is...well...the same.
EDIT: To see what constructor actually gets called, I changed Thing::foo() to this:
void Thing::foo()
{
Thing t;
}
The generated assembly is:
.globl _ZN5Thing3fooEv
.type _ZN5Thing3fooEv, #function
_ZN5Thing3fooEv:
.LFB550:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp
movq %rdi, -40(%rbp)
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingC1Ev
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingD1Ev
leave
ret
.cfi_endproc
So it is invoking the complete object constructor.
We'll start by declaring that GCC follows the Itanium C++ ABI.
According to the ABI, the mangled name for your Thing::foo() is easily parsed:
_Z | N | 5Thing | 3foo | E | v
prefix | nested | `Thing` | `foo`| end nested | parameters: `void`
You can read the constructor names similarly, as below. Notice how the constructor "name" isn't given, but instead a C clause:
_Z | N | 5Thing | C1 | E | i
prefix | nested | `Thing` | Constructor | end nested | parameters: `int`
But what's this C1? Your duplicate has C2. What does this mean?
Well, this is quite simple too:
<ctor-dtor-name> ::= C1 # complete object constructor
::= C2 # base object constructor
::= C3 # complete object allocating constructor
::= D0 # deleting destructor
::= D1 # complete object destructor
::= D2 # base object destructor
Wait, why is this simple? This class has no base. Why does it have a "complete object constructor" and a "base object constructor" for each?
This Q&A implies to me that this is simply a by-product of polymorphism support, even though it's not actually required in this case.
Note that c++filt used to include this information in its demangled output, but doesn't any more.
This forum post asks the same question, and the only response doesn't do any better at answering it, except for the implication that GCC could avoid emitting two constructors when polymorphism is not involved, and that this behaviour ought to be improved in the future.
This newsgroup posting describes a problem with setting breakpoints in constructors due to this dual-emission. It's stated again that the root of the issue is support for polymorphism.
In fact, this is listed as a GCC "known issue":
G++ emits two copies of constructors and destructors.
In general there are three types of constructors (and
destructors).
The complete object constructor/destructor.
The base object constructor/destructor.
The allocating constructor/deallocating destructor.
The first two are different, when virtual base classes are
involved.
The meaning of these different constructors seems to be as follows:
The "complete object constructor". It additionally constructs virtual base classes.
The "base object constructor". It creates the object itself, as well as data members and non-virtual base classes.
The "allocating object constructor". It does everything the complete object constructor does, plus it calls operator new to actually allocate the memory... but apparently this is not usually seen.
If you have no virtual base classes, [the first two] are are
identical; GCC will, on sufficient optimization levels, actually alias
the symbols to the same code for both.

weak symbols and custom sections in inline assembly

I'm stuck with a problem which is illustrated by the following g++ code:
frob.hpp:
template<typename T> T frob(T x);
template<> inline int frob<int>(int x) {
asm("1: nop\n"
".pushsection \"extra\",\"a\"\n"
".quad 1b\n"
".popsection\n");
return x+1;
}
foo.cpp:
#include "frob.hpp"
extern int bar();
int foo() { return frob(17); }
int main() { return foo() + bar(); }
bar.cpp:
#include "frob.hpp"
int bar() { return frob(42); }
I'm doing these quirky custom section things as a way to mimick the mechanism here in the linux kernel (but in a userland and C++ way).
My problem is that the instantiation of frob<int> is recognized as a weak symbol, which is fine, and one of the two is eventually elided by the linker, which is fine too. Except that the linker is not disturbed by the fact that the extra section has references to that symbol (via .quad 1b), and the linker want to resolve them locally. I get:
localhost /tmp $ g++ -O3 foo.cpp bar.cpp
localhost /tmp $ g++ -O0 foo.cpp bar.cpp
`.text._Z4frobIiET_S0_' referenced in section `extra' of /tmp/ccr5s7Zg.o: defined in discarded section `.text._Z4frobIiET_S0_[_Z4frobIiET_S0_]' of /tmp/ccr5s7Zg.o
collect2: error: ld returned 1 exit status
(-O3 is fine because no symbol is emitted altogether).
I don't know how to work around this.
would there be a way to tell the linker to also pay attention to symbol resolution in the extra section too ?
perhaps one could trade the local labels for .weak global labels ? E.g. like in:
asm(".weak exception_handler_%=\n"
"exception_handler_%=: nop\n"
".pushsection \"extra\",\"a\"\n"
".quad exception_handler_%=\n"
".popsection\n"::);
However I fear that if I go this way, distinct asm statements in distinct compilation units may get the same symbol via this mechanism (may they ?).
Is there a way around that I've overlooked ?
g++ (5,6, at least) compiles an inline function with external linkage - such as
template<> inline int frob<int>(int x) - at a weak global
symbol in a [COMDAT] [function-section] in
its own section-group. See:-
g++ -S -O0 bar.cpp
bar.s
.file "bar.cpp"
.section .text._Z4frobIiET_S0_,"axG",#progbits,_Z4frobIiET_S0_,comdat
.weak _Z4frobIiET_S0_
.type _Z4frobIiET_S0_, #function
_Z4frobIiET_S0_:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
#APP
# 8 "frob.hpp" 1
1: nop
.pushsection "extra","a"
.quad 1b
.popsection
# 0 "" 2
#NO_APP
movl -4(%rbp), %eax
addl $1, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
...
...
The relevant directives are:
.section .text._Z4frobIiET_S0_,"axG",#progbits,_Z4frobIiET_S0_,comdat
.weak _Z4frobIiET_S0_
(The compiler-generated #APP and #NO_APP delimit your inline assembly).
Do as the compiler does by making extra likewise a COMDAT section in
a section-group:
frob.hpp (fixed)
template<typename T> T frob(T x);
template<> inline int frob<int>(int x) {
asm("1: nop\n"
".pushsection \"extra\", \"axG\", #progbits,extra,comdat" "\n"
".quad 1b\n"
".popsection\n");
return x+1;
}
and the linkage error will be cured:
$ g++ -O0 foo.cpp bar.cpp
$ ./a.out; echo $?
61

Convert C++ to C [duplicate]

This question already has answers here:
How to convert C++ Code to C [closed]
(6 answers)
Closed 6 years ago.
Suppose I where to take a c++ program and compile it into an assembly (.S) file.
Then I take that assembly file and "dissassemble" that into C, would that code be recompileable on a different platform?
The reason I ask this is that the platform I am trying to develop on does not have a c++ compiler by it does have a c compiler.
Yes, it's indeed possible in the way you describe. No, it won't be portable to any CPU architecture, OS, and compiler triplet other than yours.
Let's see why. Take some basic C++ code...
#include <iostream>
int main()
{
std::cout << "Hello, world!\n";
return 0;
}
Let's turn this into assembler using g++, on a x86-64 Linux box (I turned optimizations on, and discarded debug symbols, in purpose)...
$ g++ -o test.s -O3 -S test.cpp
And the result is...
.file "test.cpp"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Hello, world!\n"
.section .text.unlikely,"ax",#progbits
.LCOLDB1:
.section .text.startup,"ax",#progbits
.LHOTB1:
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB1027:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $.LC0, %esi
movl $_ZSt4cout, %edi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1027:
.size main, .-main
.section .text.unlikely
.LCOLDE1:
.section .text.startup
.LHOTE1:
.section .text.unlikely
.LCOLDB2:
.section .text.startup
.LHOTB2:
.p2align 4,,15
.type _GLOBAL__sub_I_main, #function
_GLOBAL__sub_I_main:
.LFB1032:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZStL8__ioinit, %edi
call _ZNSt8ios_base4InitC1Ev
movl $__dso_handle, %edx
movl $_ZStL8__ioinit, %esi
movl $_ZNSt8ios_base4InitD1Ev, %edi
addq $8, %rsp
.cfi_def_cfa_offset 8
jmp __cxa_atexit
.cfi_endproc
.LFE1032:
.size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.section .text.unlikely
.LCOLDE2:
.section .text.startup
.LHOTE2:
.section .init_array,"aw"
.align 8
.quad _GLOBAL__sub_I_main
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.hidden __dso_handle
.ident "GCC: (GNU) 5.3.1 20151207 (Red Hat 5.3.1-2)"
.section .note.GNU-stack,"",#progbits
That mess is the price we pay for exception handling, templates, and namespaces. Let's disassemble this into C by hand, discarding the exception handling tables for a cleaner view...
/* std::ostream */
typedef struct
{
/* ... */
} _ZSo;
/* extern operator<<(std::basic_ostream&, const char*); */
extern _ZStlsIst11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc();
/* namespace std { extern ostream cout; } */
extern _ZSo _ZSt4cout;
/* Our string, of course! */
static const char* LC0 = "Hello, world!\n";
int main()
{
_ZStlsIst11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(&_ZSt4cout, LC0);
return 0;
}
Seems unreadable, yet portable, right? Right now, it is. But, this code won't work! You need to define _ZSo (std::ostream) yet in C struct's terms, not to mention including all the exception handling stuff.
Things get near impossible to get portable once you start using try/throw/catch. I mean, there's absolutely no way that __cxa_allocate_exception or __cxa_throw will ever be portable! Your only way around this is to rewrite all your program (and that includes all libraries you use, even the standard library!) to use the slower setjmp/longjmp approach to exceptions instead of zero-cost exception handling.
Finally, but not least, a non-human disassembler will most likely fail at doing this properly, even at the simplest of inputs. Remember that, at the lowest levels of stuff (a.k.a machine code and assembly language), there's no concept of types. You can't never know if a register is a signed integer, and unsigned integer, or basically anything else, for instance. The compiler can also play with the stack pointer at its will, worsening the job.
Once compilation is over, the compiler, not expecting any future disassembling, wipes out all this precious information, because you normally don't need it at run-time. In most cases, disassembling is really not worth it, if even possible, and most likely you're seeking for a different solution. Translating from a higher-middle-level language to a lower-level language to a lower-middle-level language takes this to an extreme, and approaches the limits of what can be translated to what else.

copy constructor & assignement operator [duplicate]

Today, I discovered a rather interesting thing about either g++ or nm...constructor definitions appear to have two entries in libraries.
I have a header thing.hpp:
class Thing
{
Thing();
Thing(int x);
void foo();
};
And thing.cpp:
#include "thing.hpp"
Thing::Thing()
{ }
Thing::Thing(int x)
{ }
void Thing::foo()
{ }
I compile this with:
g++ thing.cpp -c -o libthing.a
Then, I run nm on it:
%> nm -gC libthing.a
0000000000000030 T Thing::foo()
0000000000000022 T Thing::Thing(int)
000000000000000a T Thing::Thing()
0000000000000014 T Thing::Thing(int)
0000000000000000 T Thing::Thing()
U __gxx_personality_v0
As you can see, both of the constructors for Thing are listed with two entries in the generated static library. My g++ is 4.4.3, but the same behavior happens in clang, so it isn't just a gcc issue.
This doesn't cause any apparent problems, but I was wondering:
Why are defined constructors listed twice?
Why doesn't this cause "multiple definition of symbol __" problems?
EDIT: For Carl, the output without the C argument:
%> nm -g libthing.a
0000000000000030 T _ZN5Thing3fooEv
0000000000000022 T _ZN5ThingC1Ei
000000000000000a T _ZN5ThingC1Ev
0000000000000014 T _ZN5ThingC2Ei
0000000000000000 T _ZN5ThingC2Ev
U __gxx_personality_v0
As you can see...the same function is generating multiple symbols, which is still quite curious.
And while we're at it, here is a section of generated assembly:
.globl _ZN5ThingC2Ev
.type _ZN5ThingC2Ev, #function
_ZN5ThingC2Ev:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
.LFE1:
.size _ZN5ThingC2Ev, .-_ZN5ThingC2Ev
.align 2
.globl _ZN5ThingC1Ev
.type _ZN5ThingC1Ev, #function
_ZN5ThingC1Ev:
.LFB2:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
So the generated code is...well...the same.
EDIT: To see what constructor actually gets called, I changed Thing::foo() to this:
void Thing::foo()
{
Thing t;
}
The generated assembly is:
.globl _ZN5Thing3fooEv
.type _ZN5Thing3fooEv, #function
_ZN5Thing3fooEv:
.LFB550:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp
movq %rdi, -40(%rbp)
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingC1Ev
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingD1Ev
leave
ret
.cfi_endproc
So it is invoking the complete object constructor.
We'll start by declaring that GCC follows the Itanium C++ ABI.
According to the ABI, the mangled name for your Thing::foo() is easily parsed:
_Z | N | 5Thing | 3foo | E | v
prefix | nested | `Thing` | `foo`| end nested | parameters: `void`
You can read the constructor names similarly, as below. Notice how the constructor "name" isn't given, but instead a C clause:
_Z | N | 5Thing | C1 | E | i
prefix | nested | `Thing` | Constructor | end nested | parameters: `int`
But what's this C1? Your duplicate has C2. What does this mean?
Well, this is quite simple too:
<ctor-dtor-name> ::= C1 # complete object constructor
::= C2 # base object constructor
::= C3 # complete object allocating constructor
::= D0 # deleting destructor
::= D1 # complete object destructor
::= D2 # base object destructor
Wait, why is this simple? This class has no base. Why does it have a "complete object constructor" and a "base object constructor" for each?
This Q&A implies to me that this is simply a by-product of polymorphism support, even though it's not actually required in this case.
Note that c++filt used to include this information in its demangled output, but doesn't any more.
This forum post asks the same question, and the only response doesn't do any better at answering it, except for the implication that GCC could avoid emitting two constructors when polymorphism is not involved, and that this behaviour ought to be improved in the future.
This newsgroup posting describes a problem with setting breakpoints in constructors due to this dual-emission. It's stated again that the root of the issue is support for polymorphism.
In fact, this is listed as a GCC "known issue":
G++ emits two copies of constructors and destructors.
In general there are three types of constructors (and
destructors).
The complete object constructor/destructor.
The base object constructor/destructor.
The allocating constructor/deallocating destructor.
The first two are different, when virtual base classes are
involved.
The meaning of these different constructors seems to be as follows:
The "complete object constructor". It additionally constructs virtual base classes.
The "base object constructor". It creates the object itself, as well as data members and non-virtual base classes.
The "allocating object constructor". It does everything the complete object constructor does, plus it calls operator new to actually allocate the memory... but apparently this is not usually seen.
If you have no virtual base classes, [the first two] are are
identical; GCC will, on sufficient optimization levels, actually alias
the symbols to the same code for both.

Dual emission of constructor symbols

Today, I discovered a rather interesting thing about either g++ or nm...constructor definitions appear to have two entries in libraries.
I have a header thing.hpp:
class Thing
{
Thing();
Thing(int x);
void foo();
};
And thing.cpp:
#include "thing.hpp"
Thing::Thing()
{ }
Thing::Thing(int x)
{ }
void Thing::foo()
{ }
I compile this with:
g++ thing.cpp -c -o libthing.a
Then, I run nm on it:
%> nm -gC libthing.a
0000000000000030 T Thing::foo()
0000000000000022 T Thing::Thing(int)
000000000000000a T Thing::Thing()
0000000000000014 T Thing::Thing(int)
0000000000000000 T Thing::Thing()
U __gxx_personality_v0
As you can see, both of the constructors for Thing are listed with two entries in the generated static library. My g++ is 4.4.3, but the same behavior happens in clang, so it isn't just a gcc issue.
This doesn't cause any apparent problems, but I was wondering:
Why are defined constructors listed twice?
Why doesn't this cause "multiple definition of symbol __" problems?
EDIT: For Carl, the output without the C argument:
%> nm -g libthing.a
0000000000000030 T _ZN5Thing3fooEv
0000000000000022 T _ZN5ThingC1Ei
000000000000000a T _ZN5ThingC1Ev
0000000000000014 T _ZN5ThingC2Ei
0000000000000000 T _ZN5ThingC2Ev
U __gxx_personality_v0
As you can see...the same function is generating multiple symbols, which is still quite curious.
And while we're at it, here is a section of generated assembly:
.globl _ZN5ThingC2Ev
.type _ZN5ThingC2Ev, #function
_ZN5ThingC2Ev:
.LFB1:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
.LFE1:
.size _ZN5ThingC2Ev, .-_ZN5ThingC2Ev
.align 2
.globl _ZN5ThingC1Ev
.type _ZN5ThingC1Ev, #function
_ZN5ThingC1Ev:
.LFB2:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
movq %rdi, -8(%rbp)
leave
ret
.cfi_endproc
So the generated code is...well...the same.
EDIT: To see what constructor actually gets called, I changed Thing::foo() to this:
void Thing::foo()
{
Thing t;
}
The generated assembly is:
.globl _ZN5Thing3fooEv
.type _ZN5Thing3fooEv, #function
_ZN5Thing3fooEv:
.LFB550:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $48, %rsp
movq %rdi, -40(%rbp)
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingC1Ev
leaq -32(%rbp), %rax
movq %rax, %rdi
call _ZN5ThingD1Ev
leave
ret
.cfi_endproc
So it is invoking the complete object constructor.
We'll start by declaring that GCC follows the Itanium C++ ABI.
According to the ABI, the mangled name for your Thing::foo() is easily parsed:
_Z | N | 5Thing | 3foo | E | v
prefix | nested | `Thing` | `foo`| end nested | parameters: `void`
You can read the constructor names similarly, as below. Notice how the constructor "name" isn't given, but instead a C clause:
_Z | N | 5Thing | C1 | E | i
prefix | nested | `Thing` | Constructor | end nested | parameters: `int`
But what's this C1? Your duplicate has C2. What does this mean?
Well, this is quite simple too:
<ctor-dtor-name> ::= C1 # complete object constructor
::= C2 # base object constructor
::= C3 # complete object allocating constructor
::= D0 # deleting destructor
::= D1 # complete object destructor
::= D2 # base object destructor
Wait, why is this simple? This class has no base. Why does it have a "complete object constructor" and a "base object constructor" for each?
This Q&A implies to me that this is simply a by-product of polymorphism support, even though it's not actually required in this case.
Note that c++filt used to include this information in its demangled output, but doesn't any more.
This forum post asks the same question, and the only response doesn't do any better at answering it, except for the implication that GCC could avoid emitting two constructors when polymorphism is not involved, and that this behaviour ought to be improved in the future.
This newsgroup posting describes a problem with setting breakpoints in constructors due to this dual-emission. It's stated again that the root of the issue is support for polymorphism.
In fact, this is listed as a GCC "known issue":
G++ emits two copies of constructors and destructors.
In general there are three types of constructors (and
destructors).
The complete object constructor/destructor.
The base object constructor/destructor.
The allocating constructor/deallocating destructor.
The first two are different, when virtual base classes are
involved.
The meaning of these different constructors seems to be as follows:
The "complete object constructor". It additionally constructs virtual base classes.
The "base object constructor". It creates the object itself, as well as data members and non-virtual base classes.
The "allocating object constructor". It does everything the complete object constructor does, plus it calls operator new to actually allocate the memory... but apparently this is not usually seen.
If you have no virtual base classes, [the first two] are are
identical; GCC will, on sufficient optimization levels, actually alias
the symbols to the same code for both.