Compile-time or Run-time evaluation of expression in constructor - c++

I have the following class:
template<ItType I, LockType L>
class ArcItBase;
with a (one of them) constructor:
ArcItBase ( StableRootedDigraph& g_, Node const n_ ) noexcept :
srd ( g_ ),
arc ( I == ItType::in
? srd.nodes [ n_ ].head_in
: srd.nodes [ n_ ].head_out ) { }
The question is (which I don't see how to test) whether the value of the expression for the constructor of arc will be determined at compile-time or at run-time (Release, full optimization, clang-cl and VC14), given that I == ItType::in can be evaluated (is known, I is either ItType::in or ItType::out) at compile-time to either true or false?

It is not possible to have your code compiling without knowing the ItType at compile time.
The template parameter is evaluated at compile time and the conditional is a core constant expression, standard reference is C++11 5.19/2.
In the contrasting case the compiler would have to generate code that is equivalent to
arc(true ? : )
Which if you would actually write it would be optimized. However the rest of the conditional will not be optimized since you are accessing a what seems to be a non static member and cannot be evaluated as a core constant expression.
However, compilers may not always work as we expect so if you would actually want to test this you should dump the disassembled object file
objdump -DS file.o
and then you can better navigate the output.
Another option would be to launch the debugger and inspect the code.
Don't forget that you can always have your symbols even in case of optimizing, e.g.
g++ -O3 -g -c foo.cpp
Below you will find a toy implementation . In the first case values are given to the constructor of arcbase is called as:
arcbase<true> a(10,9);
Whereas in the second it is given non const random values that cannot be known at compile time.
After compiling with g++ --stc=c++11 -c -O3 -g the first case creates:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 89 75 f4 mov %esi,-0xc(%rbp)
f: 89 55 f0 mov %edx,-0x10(%rbp)
12: 48 8b 45 f8 mov -0x8(%rbp),%rax
16: 8b 55 f0 mov -0x10(%rbp),%edx
19: 8b 4d f4 mov -0xc(%rbp),%ecx
1c: 89 ce mov %ecx,%esi
1e: 48 89 c7 mov %rax,%rdi
21: e8 00 00 00 00 callq 26 <arcbase<true>::arcbase(int, int)+0x26>
26: 48 8b 45 f8 mov -0x8(%rbp),%rax
2a: 8b 00 mov (%rax),%eax
2c: 48 8b 55 f8 mov -0x8(%rbp),%rdx
30: 48 83 c2 08 add $0x8,%rdx
34: 89 c6 mov %eax,%esi
36: 48 89 d7 mov %rdx,%rdi
39: e8 00 00 00 00 callq 3e <arcbase<true>::arcbase(int, int)+0x3e>
3e: c9 leaveq
3f: c3 retq
Whereas the second case:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 53 push %rbx
1: 48 89 fb mov %rdi,%rbx
4: e8 00 00 00 00 callq 9 <arcbase<true>::arcbase(int, int)+0x9>
9: 48 8d 7b 08 lea 0x8(%rbx),%rdi
d: 8b 33 mov (%rbx),%esi
f: 5b pop %rbx
10: e9 00 00 00 00 jmpq 15 <arcbase<true>::arcbase(int, int)+0x15>
Looking at the dissasembly you should notice that even in the first case the value of 10 is not directly passed as is to the constructor, but instead only placed in the register from where is is retrieved.
Here is the output from gdb :
0x400910 <_ZN3arcC2Ei> mov %esi,(%rdi)
0x400912 <_ZN3arcC2Ei+2> retq
0x400913 nop
0x400914 nop
0x400915 nop
0x400916 nop
0x400917 nop
0x400918 nop
0x400919 nop
0x40091a nop
0x40091b nop
0x40091c nop
0x40091d nop
0x40091e nop
0x40091f nop
0x400920 <_ZN7arcbaseILb1EEC2Eii> push %rbx
0x400921 <_ZN7arcbaseILb1EEC2Eii+1> mov %rdi,%rbx
0x400924 <_ZN7arcbaseILb1EEC2Eii+4> callq 0x400900 <_ZN3srdC2Eii>
0x400929 <_ZN7arcbaseILb1EEC2Eii+9> lea 0x8(%rbx),%rdi
0x40092d <_ZN7arcbaseILb1EEC2Eii+13> mov (%rbx),%esi
0x40092f <_ZN7arcbaseILb1EEC2Eii+15> pop %rbx
0x400930 <_ZN7arcbaseILb1EEC2Eii+16> jmpq 0x400910 <_ZN3arcC2Ei>
The code for the second case is :
struct llist
{
int head_in;
int head_out;
llist(int a , int b ) : head_in(a), head_out(b) {}
};
struct srd
{
llist nodes;
srd(int a, int b) : nodes(a,b) {}
};
struct arc
{
int y;
arc( int x):y(x) {}
};
template< bool I > class arcbase
{
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
void print()
{
std::cout << iarc.y << std::endl;
}
};
int main(void)
{
std::srand(time(0));
volatile int a_ = std::rand()%100;
volatile int b_ = std::rand()%4;
arcbase<true> a(a_,b_);
a.print();
return 0;
}

Related

C++ assembly code analysis (compiled with clang)

I am trying to figure out how the C++ binary code looks like, especially for virtual function calls. I have come up with few curious things. I have this following C++ code:
#include <iostream>
using namespace std;
class Base {
public:
virtual void print() { cout << "from base" << endl; }
};
class Derived : public Base {
public:
virtual void print() { cout << "from derived" << endl; }
};
int main() {
Base *b;
Derived d;
d.print();
b = &d;
b->print();
return 0;
}
I compiled it with clang++, and then use objdump:
00000000004008b0 <main>:
4008b0: 55 push rbp
4008b1: 48 89 e5 mov rbp,rsp
4008b4: 48 83 ec 20 sub rsp,0x20
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008bc: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4008c3: e8 28 00 00 00 call 4008f0 <Derived::Derived()>
4008c8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008cc: e8 5f 00 00 00 call 400930 <Derived::print()>
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d5: 48 89 7d f0 mov QWORD PTR [rbp-0x10],rdi
4008d9: 48 8b 7d f0 mov rdi,QWORD PTR [rbp-0x10]
4008dd: 48 8b 07 mov rax,QWORD PTR [rdi]
4008e0: ff 10 call QWORD PTR [rax]
4008e2: 31 c0 xor eax,eax
4008e4: 48 83 c4 20 add rsp,0x20
4008e8: 5d pop rbp
4008e9: c3 ret
4008ea: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
My question is why in assembly code, we have the following code:
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
The local variable d in main() is stored at location [rbp-0x18]. This is in the automatic storage allocated on the stack for main().
lea rdi,[rbp-0x18]
This line loads the address of d into the rdi register. By convention, member functions of Derived treat rdi as the this pointer.

Why does GAS inline assembly wrapped in a function generate different instructions for the caller than a pure assembly function

I've been writing some basic functions using GCC's asm to practice for an actual application.
My functions pretty, wrap, and pure generate the same instructions to unpack a 64 bit integer into a 128 bit vector. add1 and add2 which call pretty and wrap respectively also generate the same instructions. But add3 differs by saving its xmm0 register by pushing it to the stack rather than by copying it to another xmm register. This I don't understand because the compiler can see the details of pure to know none of the other xmm registers will be clobbered.
Here is the C++
#include <immintrin.h>
__m128i pretty(long long b) { return (__m128i){b,b}; }
__m128i wrap(long long b) {
asm ("mov qword ptr [rsp-0x10], rdi\n"
"vmovddup xmm0, qword ptr [rsp-0x10]\n"
:
: "r"(b)
);
}
extern "C" __m128i pure(long long b);
asm (".text\n.global pure\n\t.type pure, #function\n"
"pure:\n\t"
"mov qword ptr [rsp-0x10], rdi\n\t"
"vmovddup xmm0, qword ptr [rsp-0x10]\n\t"
"ret\n\t"
);
__m128i add1(__m128i in, long long in2) { return in + pretty(in2);}
__m128i add2(__m128i in, long long in2) { return in + wrap(in2);}
__m128i add3(__m128i in, long long in2) { return in + pure(in2);}
Compiled with g++ -c so.cpp -march=native -masm=intel -O3 -fno-inline and disassembled with objdump -d -M intel so.o | c++filt.
so.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <pure>:
0: 48 89 7c 24 f0 mov QWORD PTR [rsp-0x10],rdi
5: c5 fb 12 44 24 f0 vmovddup xmm0,QWORD PTR [rsp-0x10]
b: c3 ret
c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000000010 <pretty(long long)>:
10: 48 89 7c 24 f0 mov QWORD PTR [rsp-0x10],rdi
15: c5 fb 12 44 24 f0 vmovddup xmm0,QWORD PTR [rsp-0x10]
1b: c3 ret
1c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000000020 <wrap(long long)>:
20: 48 89 7c 24 f0 mov QWORD PTR [rsp-0x10],rdi
25: c5 fb 12 44 24 f0 vmovddup xmm0,QWORD PTR [rsp-0x10]
2b: c3 ret
2c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
0000000000000030 <add1(long long __vector(2), long long)>:
30: c5 f8 28 c8 vmovaps xmm1,xmm0
34: 48 83 ec 08 sub rsp,0x8
38: e8 00 00 00 00 call 3d <add1(long long __vector(2), long long)+0xd>
3d: 48 83 c4 08 add rsp,0x8
41: c5 f9 d4 c1 vpaddq xmm0,xmm0,xmm1
45: c3 ret
46: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
4d: 00 00 00
0000000000000050 <add2(long long __vector(2), long long)>:
50: c5 f8 28 c8 vmovaps xmm1,xmm0
54: 48 83 ec 08 sub rsp,0x8
58: e8 00 00 00 00 call 5d <add2(long long __vector(2), long long)+0xd>
5d: 48 83 c4 08 add rsp,0x8
61: c5 f9 d4 c1 vpaddq xmm0,xmm0,xmm1
65: c3 ret
66: 66 2e 0f 1f 84 00 00 nop WORD PTR cs:[rax+rax*1+0x0]
6d: 00 00 00
0000000000000070 <add3(long long __vector(2), long long)>:
70: 48 83 ec 18 sub rsp,0x18
74: c5 f8 29 04 24 vmovaps XMMWORD PTR [rsp],xmm0
79: e8 00 00 00 00 call 7e <add3(long long __vector(2), long long)+0xe>
7e: c5 f9 d4 04 24 vpaddq xmm0,xmm0,XMMWORD PTR [rsp]
83: 48 83 c4 18 add rsp,0x18
87: c3 ret
GCC does not understand assembly language.
Since pure is an external function it cannot determine which registers it alters so according to the ABI has to assume all the xmm registers are changed.
wrap has undefined behaviour as the asm statement clobbers xmm0 and [rsp-0x10] which are not listed as clobbers or outputs (to a value which may or may not depend on b), and the function has no return statement.
Edit: The ABI does not apply to inline assembly, I expect your program will not work if you remove -fno-inline from the command line.

No initializer list vs. initializer list with empty pairs of parentheses

This is copy paste from this topic Initializing fields in constructor - initializer list vs constructor body
The author explains the following equivalence:
public : Thing(int _foo, int _bar){
member1 = _foo;
member2 = _bar;
}
is equivalent to
public : Thing(int _foo, int _bar) : member1(), member2(){
member1 = _foo;
member2 = _bar;
}
My understanding was that
snippet 1 is a case of default-initialization (because of the absence of an initializer list)
snippet 2 is a case of value-initialization (empty pairs of parentheses).
How are these two equivalent?
Your understanding is correct (assuming member1 and member2
have type `int). The two forms are not equivalent; in the
first, the members are not initialized at all, and cannot be
used until they have been assigned. In the second case, the
members will be initialized to 0. The two formulations are only
equivalent if the members are class types with user defined
constructors.
You are right but the author is kind of right too!
Your interpretation is completely correct as are the answers given by others. In summary the two snippets are equivalent if member1 and member2 are non-POD types.
For certain POD types they are also equivalent in some sense. Well, let's simplify a little more and assume member1 and member2 have type int. Then, under the as-if-rule the complier is allowed to replace the second snippet with the first one. Indeed, in the second snippet the fact that member1 is first initlialized to 0 is not observable. Only its assignment to _foo is. This is the same reasoning that allows the compiler to replace these two lines
int x = 0;
x = 1;
with this one
int x = 1;
For instance, I've compiled this code
struct Thing {
int member1, member2;
__attribute__ ((noinline)) Thing(int _foo, int _bar)
: member1(), member2() // initialization line
{
member1 = _foo;
member2 = _bar;
}
};
Thing dummy(255, 256);
with GCC 4.8.1 using option -O1. (The __atribute((noinline))__ prevents the compiler from inlining the function). Then the generated assembly code is the same regardless whether the initialization line is present or not:
-O1 with or without initialization
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 89 01 mov %eax,(%ecx)
6: 8b 44 24 08 mov 0x8(%esp),%eax
a: 89 41 04 mov %eax,0x4(%ecx)
d: c2 08 00 ret $0x8
On the other hand, when compiled with -O0 the assembly code is different depending on whether the initialization line is present or not:
-O0 without initialization
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 89 4d fc mov %ecx,-0x4(%ebp)
9: 8b 45 fc mov -0x4(%ebp),%eax
c: 8b 55 08 mov 0x8(%ebp),%edx
f: 89 10 mov %edx,(%eax)
11: 8b 45 fc mov -0x4(%ebp),%eax
14: 8b 55 0c mov 0xc(%ebp),%edx
17: 89 50 04 mov %edx,0x4(%eax)
1a: c9 leave
1b: c2 08 00 ret $0x8
1e: 90 nop
1f: 90 nop
-O0 with initialization
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 89 4d fc mov %ecx,-0x4(%ebp)
9: 8b 45 fc mov -0x4(%ebp),%eax ; extra line #1
c: c7 00 00 00 00 00 movl $0x0,(%eax) ; extra line #2
12: 8b 45 fc mov -0x4(%ebp),%eax ; extra line #3
15: c7 40 04 00 00 00 00 movl $0x0,0x4(%eax) ; extra line #4
1c: 8b 45 fc mov -0x4(%ebp),%eax
1f: 8b 55 08 mov 0x8(%ebp),%edx
22: 89 10 mov %edx,(%eax)
24: 8b 45 fc mov -0x4(%ebp),%eax
27: 8b 55 0c mov 0xc(%ebp),%edx
2a: 89 50 04 mov %edx,0x4(%eax)
2d: c9 leave
2e: c2 08 00 ret $0x8
31: 90 nop
32: 90 nop
33: 90 nop
Notice that -O0 with initialization has four extra lines (marked above) than -O0 without initialization. These extra lines initialize the two members to zero.

G++ 4.6 -std=gnu++0x: Static Local Variable Constructor Call Timing and Thread Safety

void a() { ... }
void b() { ... }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
...
}
Assume f is called multiple times from various threads (potentially contended) after the entry of main. (and of course that the only calls to a and b are those seen above)
When the above code is compiled with gcc g++ 4.6 in -std=gnu++0x mode:
Q1. Is it guaranteed that a() will be called at least once and return before b() is called? That is to ask, on the first call to f(), is the constructor of x called at the same time an automatic duration local variable (non-static) would be (and not at global static initialization time for example)?
Q2. Is it guaranteed that b() will be called exactly once? Even if two threads execute f for the first time at the same time on different cores? If yes, by which specific mechanism does the GCC generated code provide synchronization? Edit: Additionally could one of the threads calling f() obtain access to x before the constructor of X returns?
Update: I am trying to compile an example and decompile to investigate mechanism...
test.cpp:
struct X;
void ext1(int x);
void ext2(X& x);
void a() { ext1(1); }
void b() { ext1(2); }
struct X
{
X() { b(); }
};
void f()
{
a();
static X x;
ext2(x);
}
Then:
$ g++ -std=gnu++0x -c -o test.o ./test.cpp
$ objdump -d test.o -M intel > test.dump
test.dump:
test.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_Z1av>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: bf 01 00 00 00 mov edi,0x1
9: e8 00 00 00 00 call e <_Z1av+0xe>
e: 5d pop rbp
f: c3 ret
0000000000000010 <_Z1bv>:
10: 55 push rbp
11: 48 89 e5 mov rbp,rsp
14: bf 02 00 00 00 mov edi,0x2
19: e8 00 00 00 00 call 1e <_Z1bv+0xe>
1e: 5d pop rbp
1f: c3 ret
0000000000000020 <_Z1fv>:
20: 55 push rbp
21: 48 89 e5 mov rbp,rsp
24: 41 54 push r12
26: 53 push rbx
27: e8 00 00 00 00 call 2c <_Z1fv+0xc>
2c: b8 00 00 00 00 mov eax,0x0
31: 0f b6 00 movzx eax,BYTE PTR [rax]
34: 84 c0 test al,al
36: 75 2d jne 65 <_Z1fv+0x45>
38: bf 00 00 00 00 mov edi,0x0
3d: e8 00 00 00 00 call 42 <_Z1fv+0x22>
42: 85 c0 test eax,eax
44: 0f 95 c0 setne al
47: 84 c0 test al,al
49: 74 1a je 65 <_Z1fv+0x45>
4b: 41 bc 00 00 00 00 mov r12d,0x0
51: bf 00 00 00 00 mov edi,0x0
56: e8 00 00 00 00 call 5b <_Z1fv+0x3b>
5b: bf 00 00 00 00 mov edi,0x0
60: e8 00 00 00 00 call 65 <_Z1fv+0x45>
65: bf 00 00 00 00 mov edi,0x0
6a: e8 00 00 00 00 call 6f <_Z1fv+0x4f>
6f: 5b pop rbx
70: 41 5c pop r12
72: 5d pop rbp
73: c3 ret
74: 48 89 c3 mov rbx,rax
77: 45 84 e4 test r12b,r12b
7a: 75 0a jne 86 <_Z1fv+0x66>
7c: bf 00 00 00 00 mov edi,0x0
81: e8 00 00 00 00 call 86 <_Z1fv+0x66>
86: 48 89 d8 mov rax,rbx
89: 48 89 c7 mov rdi,rax
8c: e8 00 00 00 00 call 91 <_Z1fv+0x71>
Disassembly of section .text._ZN1XC2Ev:
0000000000000000 <_ZN1XC1Ev>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 83 ec 10 sub rsp,0x10
8: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi
c: e8 00 00 00 00 call 11 <_ZN1XC1Ev+0x11>
11: c9 leave
12: c3 ret
I don't see the synchronization mechanism? Or is it added at linktime?
Update2: Ok when I link it I can see it...
400973: 84 c0 test %al,%al
400975: 75 2d jne 4009a4 <_Z1fv+0x45>
400977: bf 98 20 40 00 mov $0x402098,%edi
40097c: e8 1f fe ff ff callq 4007a0 <__cxa_guard_acquire#plt>
400981: 85 c0 test %eax,%eax
400983: 0f 95 c0 setne %al
400986: 84 c0 test %al,%al
400988: 74 1a je 4009a4 <_Z1fv+0x45>
40098a: 41 bc 00 00 00 00 mov $0x0,%r12d
400990: bf a0 20 40 00 mov $0x4020a0,%edi
400995: e8 a6 00 00 00 callq 400a40 <_ZN1XC1Ev>
40099a: bf 98 20 40 00 mov $0x402098,%edi
40099f: e8 0c fe ff ff callq 4007b0 <__cxa_guard_release#plt>
4009a4: bf a0 20 40 00 mov $0x4020a0,%edi
4009a9: e8 72 ff ff ff callq 400920 <_Z4ext2R1X>
4009ae: 5b pop %rbx
4009af: 41 5c pop %r12
4009b1: 5d pop %rbp
It surrounds it with __cxa_guard_acquire and __cxa_guard_release, whatever they do.
Q1. Yes. According to C++11, 6.7/4:
such a variable is initialized the first time control passes through its declaration
so it will be initialised after the first call to a().
Q2. Under GCC, and any compiler that supports the C++11 thread model: yes, initialisation of local static variables is thread safe. Other compilers might not give that guarantee. The exact mechanism is an implementation detail. I believe GCC uses an atomic flag to indicate whether it's initialised, and a mutex to protect initialisation when the flag is not set, but I could be wrong. Certainly, this thread implies that it was originally implemented like that.
UPDATE: your code does indeed contain the initialisation code. You can see it more clearly if you link it, and then disassemble the program, so that you can see which functions are being called. I also used objdump -SC to interleave the source and demangle C++ names. It uses internal locking functions __cxa_guard_acquire and __cxa_guard_release, to make sure only one thread executes the initialisation code.
#void f()
#{
400724: push rbp
400725: mov rbp,rsp
400728: push r13
40072a: push r12
40072c: push rbx
40072d: sub rsp,0x8
# a();
400731: call 400704 <a()>
# static X x;
# if (!guard) {
400736: mov eax,0x601050
40073b: movzx eax,BYTE PTR [rax]
40073e: test al,al
400740: jne 400792 <f()+0x6e>
# if (__cxa_guard_acquire(&guard)) {
400742: mov edi,0x601050
400747: call 4005c0 <__cxa_guard_acquire#plt>
40074c: test eax,eax
40074e: setne al
400751: test al,al
400753: je 400792 <f()+0x6e>
# // initialise x
400755: mov ebx,0x0
40075a: mov edi,0x601058
40075f: call 4007b2 <X::X()>
# __cxa_guard_release(&guard);
400764: mov edi,0x601050
400769: call 4005e0 <__cxa_guard_release#plt>
# } else {
40076e: jmp 400792 <f()+0x6e>
# // already initialised
400770: mov r12d,edx
400773: mov r13,rax
400776: test bl,bl
400778: jne 400784 <f()+0x60>
40077a: mov edi,0x601050
40077f: call 4005f0 <__cxa_guard_abort#plt>
400784: mov rax,r13
400787: movsxd rdx,r12d
40078a: mov rdi,rax
40078d: 400610 <_Unwind_Resume#plt>
# }
# }
# ext2(x);
400792: mov edi,0x601058
400797: call 4007d1 <_Z4ext2R1X>
#}
As far as I know it is guaranteed that b is only called once. However, it is not guaranteed that the initialisation is performed thread safe, which means another thread could potentially work with a half/not initialized x. (That's kind of funny because static mutexes are basicly useless this way.)

"call" instruction that seemingly jumps into itself

I have some C++ code
#include <cstdio>
#include <boost/bind.hpp>
#include <boost/function.hpp>
class A {
public:
void do_it() { std::printf("aaa"); }
};
void
call_it(const boost::function<void()> &f)
{
f();
}
void
func()
{
A *a = new A;
call_it(boost::bind(&A::do_it, a));
}
which gcc 4 compiles into the following assembly (from objdump):
00000030 <func()>:
30: 55 push %ebp
31: 89 e5 mov %esp,%ebp
33: 56 push %esi
34: 31 f6 xor %esi,%esi
36: 53 push %ebx
37: bb 00 00 00 00 mov $0x0,%ebx
3c: 83 ec 40 sub $0x40,%esp
3f: c7 04 24 01 00 00 00 movl $0x1,(%esp)
46: e8 fc ff ff ff call 47 <func()+0x17>
4b: 8d 55 ec lea 0xffffffec(%ebp),%edx
4e: 89 14 24 mov %edx,(%esp)
51: 89 5c 24 04 mov %ebx,0x4(%esp)
55: 89 74 24 08 mov %esi,0x8(%esp)
59: 89 44 24 0c mov %eax,0xc(%esp)
; the rest of the function is omitted
I can't understand the operand of call instruction here, why does it call into itself, but with one byte off?
The call is probably to an external function, and the address you see (FFFFFFFC) is just a placeholder for the real address, which the linker and/or loader will take care of later.