Retrieving vptr(pointer to virtual Table aka VTABLE)from the Objdump utility? - c++

How can we know the address of the VTABLEs (i.e corresponding vptr) using the objdump utility and dissembled code.
vptr is generally stored in the first byte of object .(correct/edit this).
There is this simple code using the virtual function :
class base
{
public:
int x;
virtual void func()
{
//test function
}
};
class der : public base
{
void func()
{
//test function
}
};
/*
*
*/
int main() {
int s = 9;
base b;
der d ;
std::cout<<"the address of Vptr is = "<<(&b+0)<<std::endl;
std::cout<<"the value at Vptr is = "<<(int*)*(int*)((&b+0))<<std::endl;
return 0;
}
following is the output of the code :
the address of Vptr is = 0x7fff86a78fe0
the value at Vptr is = **0x400c30**
Following is the part of main function - diassembly of the code :
base b;
4009b4: 48 8d 45 d0 lea -0x30(%rbp),%rax
4009b8: 48 89 c7 mov %rax,%rdi
4009bb: e8 f4 00 00 00 callq 400ab4 <_ZN4baseC1Ev>
der d ;
4009c0: 48 8d 45 c0 lea -0x40(%rbp),%rax
4009c4: 48 89 c7 mov %rax,%rdi
4009c7: e8 fe 00 00 00 callq 400aca <_ZN3derC1Ev>
It shows here that _ZN4baseC1Ev is the address of the base object and
_ZN3derC1Ev is the address of the derived object.
in the _ZN4baseC1Ev
0000000000400ab4 <_ZN4baseC1Ev>:
400ab4: 55 push %rbp
400ab5: 48 89 e5 mov %rsp,%rbp
400ab8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400abc: 48 8b 45 f8 mov -0x8(%rbp),%rax
400ac0: 48 c7 00 30 0c 40 00 movq $0x400c30,(%rax)
400ac7: c9 leaveq
400ac8: c3 retq
400ac9: 90 nop
0000000000400aca <_ZN3derC1Ev>:
}
#include<iostream>
class base
{
public:
int x;
virtual void func()
400a8a: 55 push %rbp
400a8b: 48 89 e5 mov %rsp,%rbp
400a8e: 48 89 7d f8 mov %rdi,-0x8(%rbp)
{
//test function
}
400a92: c9 leaveq
400a93: c3 retq
0000000000400a94 <_ZN3der4funcEv>:
};
class der : public base
{
void func()
400a94: 55 push %rbp
400a95: 48 89 e5 mov %rsp,%rbp
400a98: 48 89 7d f8 mov %rdi,-0x8(%rbp)
{
//test function
}
400a9c: c9 leaveq
400a9d: c3 retq
0000000000400a9e <_ZN4baseC2Ev>:
*/
#include <stdlib.h>
#include<iostream>
class base
{
400a9e: 55 push %rbp
400a9f: 48 89 e5 mov %rsp,%rbp
400aa2: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400aa6: 48 8b 45 f8 mov -0x8(%rbp),%rax
400aaa: 48 c7 00 50 0c 40 00 movq $0x400c50,(%rax)
400ab1: c9 leaveq
400ab2: c3 retq
400ab3: 90 nop
0000000000400ab4 <_ZN4baseC1Ev>:
400ab4: 55 push %rbp
400ab5: 48 89 e5 mov %rsp,%rbp
400ab8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400abc: 48 8b 45 f8 mov -0x8(%rbp),%rax
400ac0: 48 c7 00 50 0c 40 00 movq $0x400c50,(%rax)
400ac7: c9 leaveq
400ac8: c3 retq
400ac9: 90 nop
0000000000400aca <_ZN3derC1Ev>:
}
};
Here is the link to output of objdump -S exe file
also objdump -t virtualfunctionsize | grep vtable gives this :
0000000000400c40 w O .rodata 0000000000000018 vtable for base
0000000000601e00 g O .dtors 0000000000000000 .hidden __DTOR_END__
0000000000400b00 g F .text 0000000000000089 __libc_csu_init
0000000000400c20 w O .rodata 0000000000000018 vtable for der
I wanted to know
- what it is the VTABLE address and corresponding virtual function's denoted by it.
the address of Vptr is = 0x7fff86a78fe0 , what does this represent - VTABLE location?
the value at Vptr is = 0x400c30 - What does this represent - the first Virtual function of the base class?
How can the subsequent addresses of the virtual functions of the derived classes can be found?
Rgds,
softy

Gcc follows this C++ ABI: http://refspecs.linuxbase.org/cxxabi-1.83.html#vtable
To see vtable that gcc generates, you can compile it with "gcc -S" to produce assembly source, and then filter it through c++filt
You can see the following vtable generated:
.type vtable for base, #object
.size vtable for base, 24
vtable for base:
.quad 0
.quad typeinfo for base
.quad base::func()
You can see that first two 8-byte values are zero (which is offset-to-top, offset of derived class within the base), and pointer to typeinfo. Then the real vtable (pointers to virtual functions) begins. And that is the vptr value you see in debug output. That explains the +0x10 offset.

_ZN4baseC1Ev is the base::base(), the base constructor, _ZN3derC1Ev is the derived constructor. You can use a tool like c++filt to demangle the names. They are not the addresses of the actual objects.
The address of the base object b is 0x7fff86a78fe0, which is on the stack, as expected. For this compiler this is the same as the address of the vptr, which is an pointer to an array of function pointers to virtual members.
If you dereference it, you get the address of a pointer to the first virtual function (0x400c30) in your base class.
EDIT:
You need to dereference it one more time to obtain the address of base::func(). Use something like
(int*)*(int*)*(int*)(&b)

Related

C++ assembly code analysis (compiled with clang)

I am trying to figure out how the C++ binary code looks like, especially for virtual function calls. I have come up with few curious things. I have this following C++ code:
#include <iostream>
using namespace std;
class Base {
public:
virtual void print() { cout << "from base" << endl; }
};
class Derived : public Base {
public:
virtual void print() { cout << "from derived" << endl; }
};
int main() {
Base *b;
Derived d;
d.print();
b = &d;
b->print();
return 0;
}
I compiled it with clang++, and then use objdump:
00000000004008b0 <main>:
4008b0: 55 push rbp
4008b1: 48 89 e5 mov rbp,rsp
4008b4: 48 83 ec 20 sub rsp,0x20
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008bc: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4008c3: e8 28 00 00 00 call 4008f0 <Derived::Derived()>
4008c8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008cc: e8 5f 00 00 00 call 400930 <Derived::print()>
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d5: 48 89 7d f0 mov QWORD PTR [rbp-0x10],rdi
4008d9: 48 8b 7d f0 mov rdi,QWORD PTR [rbp-0x10]
4008dd: 48 8b 07 mov rax,QWORD PTR [rdi]
4008e0: ff 10 call QWORD PTR [rax]
4008e2: 31 c0 xor eax,eax
4008e4: 48 83 c4 20 add rsp,0x20
4008e8: 5d pop rbp
4008e9: c3 ret
4008ea: 66 0f 1f 44 00 00 nop WORD PTR [rax+rax*1+0x0]
My question is why in assembly code, we have the following code:
4008b8: 48 8d 7d e8 lea rdi,[rbp-0x18]
4008d1: 48 8d 7d e8 lea rdi,[rbp-0x18]
The local variable d in main() is stored at location [rbp-0x18]. This is in the automatic storage allocated on the stack for main().
lea rdi,[rbp-0x18]
This line loads the address of d into the rdi register. By convention, member functions of Derived treat rdi as the this pointer.

Member initializer list, pointer initialization without argument

In a large framework which used to use many smart pointers and now uses raw pointers, I come across situations like this quite often:
class A {
public:
int* m;
A() : m() {}
};
The reason is because int* m used to be a smart pointer and so the initializer list called a default constructor. Now that int* m is a raw pointer I am not certain if this is equivalent to:
class A {
public:
int* m;
A() : m(nullptr) {}
};
Without the explicit nullptr is A::m still initialized to zero? A look at no optimization objdump -d makes it appear to be yes but I am not certain. The reason I feel that the answer is yes is due to this line in the objdump -d (I posted more of the objdump -d below):
400644: 48 c7 00 00 00 00 00 movq $0x0,(%rax)
Little program that tries to find undefined behavior:
class A {
public:
int* m;
A() : m(nullptr) {}
};
int main() {
A buf[1000000];
unsigned int count = 0;
for (unsigned int i = 0; i < 1000000; ++i) {
count += buf[i].m ? 1 : 0;
}
return count;
}
Compilation, execution, and return value:
g++ -std=c++14 -O0 foo.cpp
./a.out; echo $?
0
Relevant assembly sections from objdump -d:
00000000004005b8 <main>:
4005b8: 55 push %rbp
4005b9: 48 89 e5 mov %rsp,%rbp
4005bc: 41 54 push %r12
4005be: 53 push %rbx
4005bf: 48 81 ec 10 12 7a 00 sub $0x7a1210,%rsp
4005c6: 48 8d 85 e0 ed 85 ff lea -0x7a1220(%rbp),%rax
4005cd: bb 3f 42 0f 00 mov $0xf423f,%ebx
4005d2: 49 89 c4 mov %rax,%r12
4005d5: eb 10 jmp 4005e7 <main+0x2f>
4005d7: 4c 89 e7 mov %r12,%rdi
4005da: e8 59 00 00 00 callq 400638 <_ZN1AC1Ev>
4005df: 49 83 c4 08 add $0x8,%r12
4005e3: 48 83 eb 01 sub $0x1,%rbx
4005e7: 48 83 fb ff cmp $0xffffffffffffffff,%rbx
4005eb: 75 ea jne 4005d7 <main+0x1f>
4005ed: c7 45 ec 00 00 00 00 movl $0x0,-0x14(%rbp)
4005f4: c7 45 e8 00 00 00 00 movl $0x0,-0x18(%rbp)
4005fb: eb 23 jmp 400620 <main+0x68>
4005fd: 8b 45 e8 mov -0x18(%rbp),%eax
400600: 48 8b 84 c5 e0 ed 85 mov -0x7a1220(%rbp,%rax,8),%rax
400607: ff
400608: 48 85 c0 test %rax,%rax
40060b: 74 07 je 400614 <main+0x5c>
40060d: b8 01 00 00 00 mov $0x1,%eax
400612: eb 05 jmp 400619 <main+0x61>
400614: b8 00 00 00 00 mov $0x0,%eax
400619: 01 45 ec add %eax,-0x14(%rbp)
40061c: 83 45 e8 01 addl $0x1,-0x18(%rbp)
400620: 81 7d e8 3f 42 0f 00 cmpl $0xf423f,-0x18(%rbp)
400627: 76 d4 jbe 4005fd <main+0x45>
400629: 8b 45 ec mov -0x14(%rbp),%eax
40062c: 48 81 c4 10 12 7a 00 add $0x7a1210,%rsp
400633: 5b pop %rbx
400634: 41 5c pop %r12
400636: 5d pop %rbp
400637: c3 retq
0000000000400638 <_ZN1AC1Ev>:
400638: 55 push %rbp
400639: 48 89 e5 mov %rsp,%rbp
40063c: 48 89 7d f8 mov %rdi,-0x8(%rbp)
400640: 48 8b 45 f8 mov -0x8(%rbp),%rax
400644: 48 c7 00 00 00 00 00 movq $0x0,(%rax)
40064b: 5d pop %rbp
40064c: c3 retq
40064d: 0f 1f 00 nopl (%rax)
Empty () initializer stands for default-initialization in C++98 and for value-initialization in C++03 and later. For scalar types (including pointers) value-initialization/default-initialization leads to zero-initialization.
Which means that in your case m() and m(nullptr) will have exactly the same effect: in both cases m is initialized as a null pointer. In C++ it was like that since the beginning of standardized times.

Compile-time or Run-time evaluation of expression in constructor

I have the following class:
template<ItType I, LockType L>
class ArcItBase;
with a (one of them) constructor:
ArcItBase ( StableRootedDigraph& g_, Node const n_ ) noexcept :
srd ( g_ ),
arc ( I == ItType::in
? srd.nodes [ n_ ].head_in
: srd.nodes [ n_ ].head_out ) { }
The question is (which I don't see how to test) whether the value of the expression for the constructor of arc will be determined at compile-time or at run-time (Release, full optimization, clang-cl and VC14), given that I == ItType::in can be evaluated (is known, I is either ItType::in or ItType::out) at compile-time to either true or false?
It is not possible to have your code compiling without knowing the ItType at compile time.
The template parameter is evaluated at compile time and the conditional is a core constant expression, standard reference is C++11 5.19/2.
In the contrasting case the compiler would have to generate code that is equivalent to
arc(true ? : )
Which if you would actually write it would be optimized. However the rest of the conditional will not be optimized since you are accessing a what seems to be a non static member and cannot be evaluated as a core constant expression.
However, compilers may not always work as we expect so if you would actually want to test this you should dump the disassembled object file
objdump -DS file.o
and then you can better navigate the output.
Another option would be to launch the debugger and inspect the code.
Don't forget that you can always have your symbols even in case of optimizing, e.g.
g++ -O3 -g -c foo.cpp
Below you will find a toy implementation . In the first case values are given to the constructor of arcbase is called as:
arcbase<true> a(10,9);
Whereas in the second it is given non const random values that cannot be known at compile time.
After compiling with g++ --stc=c++11 -c -O3 -g the first case creates:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 83 ec 10 sub $0x10,%rsp
8: 48 89 7d f8 mov %rdi,-0x8(%rbp)
c: 89 75 f4 mov %esi,-0xc(%rbp)
f: 89 55 f0 mov %edx,-0x10(%rbp)
12: 48 8b 45 f8 mov -0x8(%rbp),%rax
16: 8b 55 f0 mov -0x10(%rbp),%edx
19: 8b 4d f4 mov -0xc(%rbp),%ecx
1c: 89 ce mov %ecx,%esi
1e: 48 89 c7 mov %rax,%rdi
21: e8 00 00 00 00 callq 26 <arcbase<true>::arcbase(int, int)+0x26>
26: 48 8b 45 f8 mov -0x8(%rbp),%rax
2a: 8b 00 mov (%rax),%eax
2c: 48 8b 55 f8 mov -0x8(%rbp),%rdx
30: 48 83 c2 08 add $0x8,%rdx
34: 89 c6 mov %eax,%esi
36: 48 89 d7 mov %rdx,%rdi
39: e8 00 00 00 00 callq 3e <arcbase<true>::arcbase(int, int)+0x3e>
3e: c9 leaveq
3f: c3 retq
Whereas the second case:
Disassembly of section .text._ZN7arcbaseILb1EEC2Eii:
0000000000000000 <arcbase<true>::arcbase(int, int)>:
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
0: 53 push %rbx
1: 48 89 fb mov %rdi,%rbx
4: e8 00 00 00 00 callq 9 <arcbase<true>::arcbase(int, int)+0x9>
9: 48 8d 7b 08 lea 0x8(%rbx),%rdi
d: 8b 33 mov (%rbx),%esi
f: 5b pop %rbx
10: e9 00 00 00 00 jmpq 15 <arcbase<true>::arcbase(int, int)+0x15>
Looking at the dissasembly you should notice that even in the first case the value of 10 is not directly passed as is to the constructor, but instead only placed in the register from where is is retrieved.
Here is the output from gdb :
0x400910 <_ZN3arcC2Ei> mov %esi,(%rdi)
0x400912 <_ZN3arcC2Ei+2> retq
0x400913 nop
0x400914 nop
0x400915 nop
0x400916 nop
0x400917 nop
0x400918 nop
0x400919 nop
0x40091a nop
0x40091b nop
0x40091c nop
0x40091d nop
0x40091e nop
0x40091f nop
0x400920 <_ZN7arcbaseILb1EEC2Eii> push %rbx
0x400921 <_ZN7arcbaseILb1EEC2Eii+1> mov %rdi,%rbx
0x400924 <_ZN7arcbaseILb1EEC2Eii+4> callq 0x400900 <_ZN3srdC2Eii>
0x400929 <_ZN7arcbaseILb1EEC2Eii+9> lea 0x8(%rbx),%rdi
0x40092d <_ZN7arcbaseILb1EEC2Eii+13> mov (%rbx),%esi
0x40092f <_ZN7arcbaseILb1EEC2Eii+15> pop %rbx
0x400930 <_ZN7arcbaseILb1EEC2Eii+16> jmpq 0x400910 <_ZN3arcC2Ei>
The code for the second case is :
struct llist
{
int head_in;
int head_out;
llist(int a , int b ) : head_in(a), head_out(b) {}
};
struct srd
{
llist nodes;
srd(int a, int b) : nodes(a,b) {}
};
struct arc
{
int y;
arc( int x):y(x) {}
};
template< bool I > class arcbase
{
srd isrd;
arc iarc;
public:
arcbase(int a , int b) : isrd(a,b) , iarc( I == true ? isrd.nodes.head_in : isrd.nodes.head_out ) {}
void print()
{
std::cout << iarc.y << std::endl;
}
};
int main(void)
{
std::srand(time(0));
volatile int a_ = std::rand()%100;
volatile int b_ = std::rand()%4;
arcbase<true> a(a_,b_);
a.print();
return 0;
}

No initializer list vs. initializer list with empty pairs of parentheses

This is copy paste from this topic Initializing fields in constructor - initializer list vs constructor body
The author explains the following equivalence:
public : Thing(int _foo, int _bar){
member1 = _foo;
member2 = _bar;
}
is equivalent to
public : Thing(int _foo, int _bar) : member1(), member2(){
member1 = _foo;
member2 = _bar;
}
My understanding was that
snippet 1 is a case of default-initialization (because of the absence of an initializer list)
snippet 2 is a case of value-initialization (empty pairs of parentheses).
How are these two equivalent?
Your understanding is correct (assuming member1 and member2
have type `int). The two forms are not equivalent; in the
first, the members are not initialized at all, and cannot be
used until they have been assigned. In the second case, the
members will be initialized to 0. The two formulations are only
equivalent if the members are class types with user defined
constructors.
You are right but the author is kind of right too!
Your interpretation is completely correct as are the answers given by others. In summary the two snippets are equivalent if member1 and member2 are non-POD types.
For certain POD types they are also equivalent in some sense. Well, let's simplify a little more and assume member1 and member2 have type int. Then, under the as-if-rule the complier is allowed to replace the second snippet with the first one. Indeed, in the second snippet the fact that member1 is first initlialized to 0 is not observable. Only its assignment to _foo is. This is the same reasoning that allows the compiler to replace these two lines
int x = 0;
x = 1;
with this one
int x = 1;
For instance, I've compiled this code
struct Thing {
int member1, member2;
__attribute__ ((noinline)) Thing(int _foo, int _bar)
: member1(), member2() // initialization line
{
member1 = _foo;
member2 = _bar;
}
};
Thing dummy(255, 256);
with GCC 4.8.1 using option -O1. (The __atribute((noinline))__ prevents the compiler from inlining the function). Then the generated assembly code is the same regardless whether the initialization line is present or not:
-O1 with or without initialization
0: 8b 44 24 04 mov 0x4(%esp),%eax
4: 89 01 mov %eax,(%ecx)
6: 8b 44 24 08 mov 0x8(%esp),%eax
a: 89 41 04 mov %eax,0x4(%ecx)
d: c2 08 00 ret $0x8
On the other hand, when compiled with -O0 the assembly code is different depending on whether the initialization line is present or not:
-O0 without initialization
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 89 4d fc mov %ecx,-0x4(%ebp)
9: 8b 45 fc mov -0x4(%ebp),%eax
c: 8b 55 08 mov 0x8(%ebp),%edx
f: 89 10 mov %edx,(%eax)
11: 8b 45 fc mov -0x4(%ebp),%eax
14: 8b 55 0c mov 0xc(%ebp),%edx
17: 89 50 04 mov %edx,0x4(%eax)
1a: c9 leave
1b: c2 08 00 ret $0x8
1e: 90 nop
1f: 90 nop
-O0 with initialization
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 83 ec 04 sub $0x4,%esp
6: 89 4d fc mov %ecx,-0x4(%ebp)
9: 8b 45 fc mov -0x4(%ebp),%eax ; extra line #1
c: c7 00 00 00 00 00 movl $0x0,(%eax) ; extra line #2
12: 8b 45 fc mov -0x4(%ebp),%eax ; extra line #3
15: c7 40 04 00 00 00 00 movl $0x0,0x4(%eax) ; extra line #4
1c: 8b 45 fc mov -0x4(%ebp),%eax
1f: 8b 55 08 mov 0x8(%ebp),%edx
22: 89 10 mov %edx,(%eax)
24: 8b 45 fc mov -0x4(%ebp),%eax
27: 8b 55 0c mov 0xc(%ebp),%edx
2a: 89 50 04 mov %edx,0x4(%eax)
2d: c9 leave
2e: c2 08 00 ret $0x8
31: 90 nop
32: 90 nop
33: 90 nop
Notice that -O0 with initialization has four extra lines (marked above) than -O0 without initialization. These extra lines initialize the two members to zero.

"call" instruction that seemingly jumps into itself

I have some C++ code
#include <cstdio>
#include <boost/bind.hpp>
#include <boost/function.hpp>
class A {
public:
void do_it() { std::printf("aaa"); }
};
void
call_it(const boost::function<void()> &f)
{
f();
}
void
func()
{
A *a = new A;
call_it(boost::bind(&A::do_it, a));
}
which gcc 4 compiles into the following assembly (from objdump):
00000030 <func()>:
30: 55 push %ebp
31: 89 e5 mov %esp,%ebp
33: 56 push %esi
34: 31 f6 xor %esi,%esi
36: 53 push %ebx
37: bb 00 00 00 00 mov $0x0,%ebx
3c: 83 ec 40 sub $0x40,%esp
3f: c7 04 24 01 00 00 00 movl $0x1,(%esp)
46: e8 fc ff ff ff call 47 <func()+0x17>
4b: 8d 55 ec lea 0xffffffec(%ebp),%edx
4e: 89 14 24 mov %edx,(%esp)
51: 89 5c 24 04 mov %ebx,0x4(%esp)
55: 89 74 24 08 mov %esi,0x8(%esp)
59: 89 44 24 0c mov %eax,0xc(%esp)
; the rest of the function is omitted
I can't understand the operand of call instruction here, why does it call into itself, but with one byte off?
The call is probably to an external function, and the address you see (FFFFFFFC) is just a placeholder for the real address, which the linker and/or loader will take care of later.