Consider the following code
class B1 {
public:
void f0() {}
virtual void f1() {}
int int_in_b1;
};
class B2 {
public:
virtual void f2() {}
int int_in_b2;
};
class D : public B1, public B2 {
public:
void d() {}
void f2() {int temp=int_in_b1;} // override B2::f2()
int int_in_d;
};
and the following memory layout for the object d:
d:
+0: pointer to virtual method table of D (for B1)
+4: value of int_in_b1
+8: pointer to virtual method table of D (for B2)
+12: value of int_in_b2
+16: value of int_in_d
Total size: 20 Bytes.
virtual method table of D (for B1):
+0: B1::f1() // B1::f1() is not overridden
virtual method table of D (for B2):
+0: D::f2() // B2::f2() is overridden by D::f2()
D *d = new D();
d->f2();
When d->f2(); is invoked, D::f2 needs access to data from B1, but modified this pointer
(*(*(d[+8]/*pointer to virtual method table of D (for B2)*/)[0]))(d+8) /* Call d->f2() */
is passed to D::f2, then how is D::f2 able to access it?
The code is taken(and modified) from :https://en.wikipedia.org/wiki/Virtual_method_table#Multiple_inheritance_and_thunks
Your case is actually too simple: The compiler can know that you have a pointer to a D object, so it can perform the lookup from the right table, passing the unmodified this pointer to the f2() implementation.
The interesting case is, when you have a pointer to B2:
B2* myD = new D();
myD->f2();
Now we start with an adjusted base pointer, and need to find the this pointer for the whole object. One way to achieve that would be to store an offset alongside the function pointer that is used to produce a valid this pointer from the B2 pointer used to access the vtable.
Thus, in your case, the code might implicitly be compiled like this
D* myD = new D();
((B2*)myD)->f2();
adjusting the pointer two times (once deriving the B2* from the D*, then the inverse using the offset from the vtable). Your compiler may be clever enough to avoid this, though.
In any case, this is firmly within the field of implementation. Your compiler can do anything, as long as it behaves the way the standard specifies.
Firstly, the effect you are describing as "modifying a this pointer" is an implementation detail of some particular compiler. There is no specific requirement that a compiler modify pointers like you describe.
There is also no requirement that an object have vtables, let alone that they are laid out like you describe. The actual requirement is that the correct overload of a virtual function will be called at run time, and that it will be able to correctly access data members and call member functions. Now, in practice, compilers tend to use vtables, but that is an implementation detail because alternatives are less efficient by various measures.
Now, that said, the following discussion will assume every class with a virtual function has a vtable. Looking at your example, what does this do?
D *d = new D();
d->f2();
The first thing is that the compiler knows that d is a pointer to D, and knows that D has a function named f2(). It will also know that f2() is a virtual function inherited from B2 (which is one reason that it is not possible to call a class member function unless the compiler has visibility of the complete class definition).
In this case, we know what d and D are, so we know D::f2() should be called, with the this pointer equal in value to d. The compiler has the same information (it knows d is a D *) so it just does that. Now, okay, it might or might not look up D::f2() in the vtable, but that is the end of it.
The more interesting example, like cmaster said, is
B2* myD = new D();
myD->f2();
In this case, myD is a pointer to B2. The compiler knows that B2 has a virtual function named f2(), so knows it has to call the correct overload.
The thing is, in the statement myD->f2(), the compiler might not know that myD actually points to a D (e.g. the construction of the object and the calling of the member function might be in different functions, in different compilation units). However, it does know that a B2 has a virtual function named f2(), which is required to call the actual overloaded version correctly.
This means the compiler needs two bits of information. Firstly it needs information identifying the actual function (D::f2()) to be called. The second bit of information will be some adjustment of myD to make the call of D::f2() work correctly. This second bit of information is essentially what is needed to produce (what you are calling) the "modified this pointer" from myD.
If the compiler does all this with the help of vtables, it might include BOTH bits of information in the vtable for B2. So (assuming the second bit of information is an offset) the compiler turns
myD->f2();
into something like
(myD + myD->vtable->offset_for_f2)->(myD->vtable->entry_for_f2)();
The part (myD + myD->vtable->offset_for_f2) is essentially what you are describing as "the modified this pointer" which D::f2() will see when called. The part (myD->vtable->entry_for_f2) is essentially the address of D::f2() (say the address of the member function).
The next question to ask is how might the compiler populate the vtable? The short answer is that it does this when constructing the object.
B2* myD = new D();
The new expression (new D()) essentially expands to
void *temp = ::operator new(sizeof (D)); // assuming class does not supply its own operator new
// construct a `D` in the memory pointed to by temp
temp = (D *)myD; // the compiler knows we're creating a D, so doesn't use offsets or anything funky here
The process of turning the memory pointed to be temp into a D is the important thing. Firstly, it invokes constructors of base classes (B2 and B2), then constructs or initialises Ds members, then it invokes D's constructor (the C++ standard actually describes the order of events in exquisite detail). The other thing is that the compiler does bookkeeping to ensure we actually get a valid D from the process. Part of that is populating the vtable.
Now, since the compiler has complete visibility of the definition of class D (i.e. complete definition of base classes, its members, etc), it has all the information needed to populate the vtable. In other words, it has all the information it needs to give sensible values to both myD->vtable->offset_for_f2 and myD->vtable->entry_for_f2
In the case of multiple inheritence, assuming one vtable per base class, the compiler has all the information it needs to populate all the vtables in a similar way. In other words, the compiler knows how it lays out objects in memory, including their vtables, and uses that knowledge appropriately.
But, then again, it might not. As I said, vtables is a technique commonly used in compilers to implement/support virtual function dispatch. There are other ways too.
again i can't comment so must answer here.
is no problem in the code!
D::f2 needs access to data from B1
then how is D::f2 able to access it?
just write in D::f2, B1::int_in_b1 then you ca access to int value.
In your example, when d->f2() is called, compiler knows that d is a pointer to class D. To call f2(), it would adjust the pointer of d to be "this" of B2 before passing it to virtual f2(), as you describe. Now, inside of the D::f2(), the compiler knows that this is D::f2() and it knows how D inherits from B2, and so it fixes the "this" of B2 to be "this" of D in the very beginning of the function, so when your code executes it would see that "this" is that of D. Therefore it can access any members of D inside of D::f2().
If you would have had
B2* b = d;
b->f2();
When b->f2() is called, the pointer being passed to f2() is "this" of B2. Inside D::f2(), the passed pointer is fixed to point to this of D.
Related
Every class which contains one or more virtual function has a Vtable associated with it. A void pointer called vptr points to that vtable. Every object of that class contains that vptr which points to the same Vtable. Then why isn't vptr static ? Instead of associating the vptr with the object, why not associate it with the class ?
The runtime class of the object is a property of the object itself. In effect, vptr represents the runtime class, and therefore can't be static. What it points to, however, can be shared by all instances of the same runtime class.
Your diagram is wrong. There is not a single vtable, there is one vtable for each polymorphic type. The vptr for A points to the vtable for A, the vptr for A1 points to the vtable for A1 etc.
Given:
class A {
public:
virtual void foo();
virtual void bar();
};
class A1 : public A {
virtual void foo();
};
class A2 : public A {
virtual void foo();
};
class A3 : public A {
virtual void bar();
virtual void baz();
};
The vtable for A contains { &A::foo, &A::bar }
The vtable for A1 contains { &A1::foo, &A::bar }
The vtable for A2 contains { &A2::foo, &A::bar }
The vtable for A3 contains { &A::foo, &A3::bar, &A3::baz }
So when you call a.foo() the compiler follows the object's vptr to find the vtable then calls the first function in the vtable.
Suppose a compiler uses your idea, and we write:
A1 a1;
A2 a2;
A& a = (std::rand() % 2) ? a1 : a2;
a.foo();
The compiler looks in the base class A and finds the vptr for the class A which (according to your idea) is a static property of the type A not a member of the object that the reference a is bound to. Does that vptr point to the vtable for A, or A1 or A2 or something else? If it pointed to the vtable for A1 it would be wrong 50% of the time when a refers to a2, and vice versa.
Now suppose that we write:
A1 a1;
A2 a2;
A& a = a1;
A& aa = a2;
a.foo();
aa.foo();
a and aa are both references to A, but they need two different vptrs, one pointing to the vtable for A1 and one pointing to the vtable for A2. If the vptr is a static member of A how can it have two values at once? The only logical, consistent choice is that the static vptr of A points to the vtable for A.
But that means the call a.foo() calls A::foo() when it should call A1::foo(), and the call aa.foo() also calls A::foo() when it should call A2::foo().
Clearly your idea fails to implement the required semantics, proving that a compiler using your idea cannot be a C++ compiler. There is no way for the compiler to get the vtable for A1 from a without either knowing what the derived type is (which is impossible in general, the reference-to-base could have been returned from a function defined in a different library and could refer to a derived type that hasn't even been written yet!) or by having the vptr stored directly in the object.
The vptr must be different for a1 and a2, and must be accessible without knowing the dynamic type when accessing them through a poiner or reference to base, so that when you obtain the vptr through the reference to the base class, a, it still points to the right vtable, not the base class vtable. The most obvious way to do this is to store the vptr directly in the object. An alternative, more complicated solution would be to keep a map of object addresses to vptrs, e.g. something like std::map<void*, vtable*>, and find the vtable for a by looking up &a, but this still stores one vptr per object not one per type, and would require a lot more work (and dynamic allocation) to update the map every time polymorphic objects are created and destroyed, and would increase overall memory usage because the map structure would take up space. It's simpler just to embed the vptr in the objects themselves.
The virtual table (which is, by the way, an implementation mechanism not mentioned in the C++ standard) is used to identify the dynamic type of an object at runtime. Therefore, the object itself must hold a pointer to it. If it was static, then only the static type could be identified by it and it would be useless.
If you are thinking of somehow using typeid() internally to identify the dynamic type and then call the static pointer with it, be aware that typeid() only returns the dynamic type for objects belonging to types with virtual functions; otherwise it just returns the static type (§ 5.2.8 in the current C++ standard). Yes, this means that it works the other way around: typeid() typically uses the virtual pointer to identify the dynamic type.
As everyone attest Vptr is a property of an object.
Lets see why?
Assume we have three objects
Class Base{
virtual ~Base();
//Class Definition
};
Class Derived: public Base{
//Class Definition
};
Class Client: public Derived{
//Class Definition
};
holding relation Base<---Derived<----Client.
Client Class is derived from Derived Class which is in turn derived from Base
Base * Ob = new Base;
Derived * Od = new Derived;
Client* Oc = new Client;
Whenever Oc is destructed it should destruct base part, derived part and then client part of the data. To aid in this sequence Base destructor should be virtual and object Oc's destructor is pointing to Client's destructor. When object Oc's base destructor is virtual compiler adds code to destructor of object Oc to call derived's destructor and derived destructor to call base's destructor. This chaining sees all the base, derived and client data is destructed when Client object is destroyed.
If that vptr is static then Oc's vtable entry will still be pointing to Base's destructor and only base part of Oc is destroyed. Oc's vptr should always point to most derived object's destructor, which is not possible if vptr is static.
The whole point of the vptr is because you don't know exactly which class an object has at runtime. If you knew that, then the virtual function call would be unnecessary. That is, in fact, what happens when you're not using virtual functions. But with virtual functions, if I have
class Sub : Parent {};
and a value of type Parent*, I don't know at runtime if this is really an object of type Parent or one of type Sub. The vptr lets me figure that out.
virtual method table is per class. An object contains a pointer to the run-time type vptr.
I don't think this is a requirement in the standard bust all compiles that I've worked with do it this way.
This is true even in you example.
#Harsh Maurya: Reason might be , Static member variables must be defined before Main function in the program. But if we want _vptr to be static, whose responsibility ( compiler/programmer ) to define the _vptr in the program before main. And how programmer knows the pointer of VTABLE to assign it to _vptr. Thats why compiler took that responsibility to assign the value to pointer(_vptr). This happens in Constructor of class(Hidden functionality). And now if Constructor comes into picture there should be one _vptr for each object.
How to identify whether vptr will be used to invoke a virtual function?
Consider the below hierarchy:
class A
{
int n;
public:
virtual void funcA()
{std::cout <<"A::funcA()" << std::endl;}
};
class B: public A
{
public:
virtual void funcB()
{std::cout <<"B::funcB()" << std::endl;}
};
A* obj = new B();
obj->funcB(); //1. this does not even compile
typedef void (*fB)();
fB* func;
int* vptr = (int*)obj; //2. Accessing the vptr
func = (fB*)(*vptr);
func[1](); //3. Calling funcB using vptr.
Statement 1. i.e. obj->funcB(); does not even compile although Vtable has an entry for funcB where as on accessing vPtr indirectly funcB() can be invoked successfully.
How does compiler decide when to use the vTable to invoke a function?
In the statement A* obj = new B(); since I am using a base class pointer so I believe vtable should be used to invoke the function.
Below is the memory layout when vptr is accessed indirectly.
So there are two answers to your question:
The short one is:
obj->FuncB() is only a legal call, if the static type of obj (in this case A) has a function FuncB with the appropriate signature (either directly or due to a base class). Only if that is the case, the compiler decides whether it translates it to a direct or dynamic function call (e.g. using a vtable), based on whether FuncB is declared virtual or not in the declaration of A (or its base type).
The longer one is this:
When the compiler sees obj->funcB() it has no way of knowing (optimizations aside), what the runtime type of obj is and especially it doesn't know, whether a derived class that implements funcB() exists, at all. obj might e.g. be created in another translation unit or it might be a function parameter.
And no, that information is usually not stored in the virtual function table:
The vtable is just an array of addresses and without the prior knowledge that a specific addess corresponds to a function called funcB, the compiler can't use it to implement the call obj->funcB()- or to be more precise: it is not allowed to do so by the standard. That prior knowledge can only be provided by a virtual function declaration in the static type of obj (or its base classes).
The reason, why you have that information available in the debugger (whose behavior lys outside of the standard anyway) is, because it has access to the debugging symbols, which are usually not part of the distributed release binary. Storing that information in the vtable by default, would be a waste of memory and performance, as the program isn't allowed to make use of it in standard c++ in the way you describe anyway. For extensions like C++/CLI that might be a different story.
Adding to Barry's comment, adding the line virtual void funcB() = 0; to class A seems to fix the problem.
Consider the following setup.
class I
{
public:
virtual void F() = 0;
};
class A : public I
{
public:
void F() { /* some implementation */ }
};
class B : public I
{
public:
void F() { /* some implementation */ }
};
This allows me to write a function like the following.
std::shared_ptr<I> make_I(bool x)
{
if (x) return std::make_shared<A>();
else return std::make_shared<B>();
}
In this situation, I am paying some costs for the inheritance and polymorphism, namely having a vtable and that calls to F can't be inlined when used like follows (correct me if I'm wrong).
auto i = make_I(false);
i->F(); //can't be inlined
What I want to know is if I have to pay these same costs when using A or B as objects allocated on the stack, like in the following code.
A a;
a.F();
Do A and B have vtables when allocated on the stack? Can the call to F be inlined?
It seems to me that a compiler could feasibly create two memory layouts for classes in an inheritance hierarchy - one for the stack and one for the heap. Is this what a C++ compiler will/may do? Or is there a theoretical or practical reason it can't?
Edit:
I saw a comment (that looks like it was deleted) that actually raised a good point. You could always do the following, and then that A a was allocated on the stack might not be the salient point I'm trying to get at...
A a;
A* p = &a;
p->F(); //likely won't be inlined (correct me if I'm wrong)
Maybe a better way to phrase it would be "Is the behavior different for an object that is allocated on the stack and is used as a 'regular value type'?" Please help me out here with the terminology if you know what I mean but have a better way of putting it!
The point I'm trying to get at is that you could feasibly, at compile time, "flatten" the definition of the base class into the derived class you are allocating an instance of on the stack.
I think your question really has to do with whether a compiler has static knowledge of an object and can elide the vtable lookup (you mentioned this in your edit), rather than whether there is a distinction on where the object lives - stack or heap. Yes, many compilers can elide the virtual dispatch in that case.
The edit to your question asks whether you can flatten the definition of the base class, A, into the derived class, B. If the compiler can tell, at compile time, that an object will only ever contain an instance of B then it can eliminate the vtable lookup at runtime and call B.F(); for that particular call.
For example, the compiler will probably eliminate the vtable lookup at runtime below and call the derived function:
B b;
b.F();
In the code below, the compiler will not be able to eliminate the runtime lookup in doSomething, but it probably can eliminate the lookup in b.F()
void doSomething( A* object ) {
object->F(); // will involve a vtable lookup
}
B b;
b.F(); // probably won't need a vtable lookup
doSomething( &b );
Note it does not matter whether object is allocated on the stack or the heap. What matters is that the compiler is able to determine the type. Each class will still have a vtable, it just might not always be needed for each method call.
You mention code inlining, this is not related to how the object is allocated. When a normal function is called, variables will be pushed onto the stack along with a return address. The CPU will then jump to the function. With inline code, the site of the function call is replaced with the actual code (similar to a macro).
If an object contained in an inheritance hierarchy is allocated on the stack, the compiler still needs to be able to determine what functions it can call, especially if there are virtual and non-virtual functions.
I've been looking into C++ and structs for a project I'm working on; at the moment I'm using 'chained' template structures to add in data fields in as pseudo-traits.
Whilst it works, I think I'd prefer something like multiple inheritance as in the example below:
struct a {
int a_data;
}; // 'Trait' A
struct b {
int b_data;
}; // 'Trait' B
struct c : public a, public b {
int c_data;
}; // A composite structure with 'traits' A and B.
struct d : public b {
int d_data;
}; // A composite structure with 'trait' B.
My experimental code examples show they work fine, but I'm a bit perplexed as to how its actually working when things get complex.
For example:
b * basePtr = new c;
cout << basePtr->b_data << endl;
b * basePtr = new d;
cout << basePtr->b_data << endl;
This works fine every time, even through function calls with the a pointer as a parameter.
My question is how does the code know where b_data is stored in one of the derived structs? As far as I can tell, the structs still use a compacted structure with no extra data (i.e. 3 int structs only take up 12 bytes, 2 ints 8 bytes, etc). Surely it needs some sort of extra data field to say where a_data and b_data are stored in a given structure?
It's more of a curiosity question as it all seems to work regardless, and if there are multiple implementations in use, I'll happily accept a single example. Though I do have a bit of a concern as I want to transfer the bytes behind these structs through a inter-process message queue and want to know if they'll be decoded OK on the other end (all the programs using the queue will be compiled by the same compiler and run on a single platform).
In both cases, basePtr truly is a pointer to an object of type b, so there is no problem. The fact that this object is not a complete object, but rather a subobject of a more-derived object (this is actually the technical term), is not material.
The (static, implicit) conversion from d * to b *, as well as from c * to b *, takes care of adjusting the pointer value so that it really points to the b subobject. All the information is known statically, so the compiler makes all those computations automatically.
You should read the wikipedia value on C++ classes , under the memory management and class inheritance content.
Basically, the compiler creates the class structure, so at compile time it knows the offset to each part of the class.
When you call a variable, the compiler knows the type and therefore its structure, and if you cast it to a base class, it just needs to jump to the right off set.
On most implementations, a pointer conversion, say from c* to b*, will automatically adjust the address if necessary. In the statement
b * basePtr = new c;
the new expression allocates a c object, which contains an a base class subobject, a b base class subobject, and a c_data member subobject. In raw memory, this will probably look like just three ints. The new expression returns the address of the created complete c object, which is (on most implementations) the same as the address of the a base class subobject and the address of the a_data member subobject.
But then the expression new c, with type c*, is used to initialize a b* pointer, which causes an implicit conversion. The compiler sets basePtr to the address of the b base class subobject within the complete c object. Not hard, since the compiler knows the offset from a c object to its unique b subobject.
Afterward, an expression like basePtr->b_data doesn't need to know what the complete object type was. It just knows that b_data is at the very beginning of b, so it can simply dereference the b* pointer.
The details of this are up to the C++ implementation, but in a case like this, with non-virtual inheritance, you can think of it like this:
c has two sub-objects, one with type a and one with type b.
When you cast a pointer to c to a pointer to b, the compiler is smart enough so that the result of the cast is a pointer to the b sub-object of the c object referenced by the original pointer. This may involve changing the numerical value of the returned pointer.
Generally, with single inheritance, the sub-object pointer will have the same numerical value as the original pointer. With multi-inheritance, it might not.
Yes, there are extra fields that define the offset each sub-component has into the aggregate. But they are not stored in the aggregate itself, but most likely (although the ultimate choice about how to do that is left to the compiler designers) in auxiliary structure residing in a hidden side of the data segment.
Your objects are not polymorphic (and you used them wrongly, but I'll came to this later), but just compounds like:
c[a[a_data],b[b_data],c_data];
^
b* points here
d[b[b_data],d_data]
^
b* points here
(Note that the real layout may depend on the particular compiler and even optimization flags used)
The offsets of the beginning of b respect to the beginning of c or d does not depend on the particular object instance, so it is not a value required to stay into the object, but just in a general d and c descriptions known to the compiler but not necessarily available to you.
The compiler knows, given a c or a d, where the b component begins. But given a b cannot know if it is inside a d or a c.
The reason why you used the object wrongly is that you did not care about their destruction. You allocate them with new, but never delete-ed them afterwards.
And you cannot just call delete baseptr since there is nothing in the b subcomponent that tells what the aggregate it is actually (at runtime) part of.
There are two programming style to come around it:
The classic OOP, assume the actual type is known at runtime, and pretends all your classes to have a virtual destructor: that gives to all the struct an extra "ghost" field (the v-table pointer, that point to a table in the "auxiliary descriptor", containing all the virtual functions' addresses) that makes the destructor call originated by delete to actually be dispatched to the most derived one (hence delete pbase will actually call c::~c or d::~d depending on the actual object)
The Generic programming style, assume you know in some other way (most likely from a template parameter) the actual derived type, so you will not delete pbase, but a static_cast<actual_derived_class*>(pbase)
Inheritance is the abstraction for a method to resuse functions from another class under it. The method can be called from the class if it's located in the class below it. A struct enables you to have variables as in a data structure similar to a class that uses a variable or a function.
class trait
{
//variable definition
//variable declaration
function function_name(variable_type variable_name, and more)
{
//operation on variables in function call
}
variable_name = function_name(variable_name);
struct struct_name
{
//variable definition
}
struct_name = {value_1, value_2, and more}
operation on struct_name.value_1
}
There is a distinction between compile time knowledge and runtime knowledge. Part of the job of the compiler is to make as much use of compile time information as possible to avoid having to do things at run time.
In this case, all the details of exactly where each piece of data is in a given type are known at compile time. So the compiler doesn't need to know it at runtime. Whenever you access a particular member, it just uses its compile time knowledge to compute the appropriate offset for the data you need.
The same thing goes for pointer conversions. It will adjust pointer values when they're converted to make sure the point at the appropriate sub-part.
Part of the reason this works is that the data values from an individual class or struct are never interleaved with any other that aren't mentioned in the class definition, even when that struct is a sub-component of another struct either through composition or inheritance. So the relative layout of any individual struct is always the same no matter where in memory it's found.
(C++,MinGW 4.4.0,Windows OS)
All that is commented in the code, except labels <1> and <2>, is my guess. Please correct me in case you think I'm wrong somewhere:
class A {
public:
virtual void disp(); //not necessary to define as placeholder in vtable entry will be
//overwritten when derived class's vtable entry is prepared after
//invoking Base ctor (unless we do new A instead of new B in main() below)
};
class B :public A {
public:
B() : x(100) {}
void disp() {std::printf("%d",x);}
int x;
};
int main() {
A* aptr=new B; //memory model and vtable of B (say vtbl_B) is assigned to aptr
aptr->disp(); //<1> no error
std::printf("%d",aptr->x); //<2> error -> A knows nothing about x
}
<2> is an error and is obvious. Why <1> is not an error? What I think is happening for this invocation is: aptr->disp(); --> (*aptr->*(vtbl_B + offset to disp))(aptr) aptr in the parameter being the implicit this pointer to the member function. Inside disp() we would have std::printf("%d",x); --> std::printf("%d",aptr->x); SAME AS std::printf("%d",this->x); So why does <1> give no error while <2> does?
(I know vtables are implementation specific and stuff but I still think it's worth asking the question)
this is not the same as aptr inside B::disp. The B::disp implementation takes this as B*, just like any other method of B. When you invoke virtual method via A* pointer, it is converted to B* first (which may even change its value so it is not necessarily equal to aptr during the call).
I.e. what really happens is something like
typedef void (A::*disp_fn_t)();
disp_fn_t methodPtr = aptr->vtable[index_of_disp]; // methodPtr == &B::disp
B* b = static_cast<B*>(aptr);
(b->*methodPtr)(); // same as b->disp()
For more complicated example, check this post http://blogs.msdn.com/b/oldnewthing/archive/2004/02/06/68695.aspx. Here, if there are multiple A bases which may invoke the same B::disp, MSVC generates different entry points with each one shifting A* pointer by different offset. This is implementation-specific, of course; other compilers may choose to store the offset somewhere in vtable for example.
The rule is:
In C++ dynamic dispatch only works for member functions functions not for member variables.
For a member variable the compiler only looksup for the symbol name in that particular class or its base classes.
In case 1, the appropriate method to be called is decided by fetching the vpt, fetching the address of the appropriate method and then calling the appropiate member function.
Thus dynamic dispatch is essentially a fetch-fetch-call instead of a normal call in case of static binding.
In Case 2: The compiler only looks for x in the scope of this Obviously, it cannot find it and reports the error.
You are confused, and it seems to me that you come from more dynamic languages.
In C++, compilation and runtime are clearly isolated. A program must first be compiled and then can be run (and any of those steps may fail).
So, going backward:
<2> fails at compilation, because compilation is about static information. aptr is of type A*, thus all methods and attributes of A are accessible through this pointer. Since you declared disp() but no x, then the call to disp() compiles but there is no x.
Therefore, <2>'s failure is about semantics, and those are defined in the C++ Standard.
Getting to <1>, it works because there is a declaration of disp() in A. This guarantees the existence of the function (I would remark that you actually lie here, because you did not defined it in A).
What happens at runtime is semantically defined by the C++ Standard, but the Standard provides no implementation guidance. Most (if not all) C++ compilers will use a virtual table per class + virtual pointer per instance strategy, and your description looks correct in this case.
However this is pure runtime implementation, and the fact that it runs does not retroactively impact the fact that the program compiled.
virtual void disp(); //not necessary to define as placeholder in vtable entry will be
//overwritten when derived class's vtable entry is prepared after
//invoking Base ctor (unless we do new A instead of new B in main() below)
Your comment is not strictly correct. A virtual function is odr-used unless it is pure (the converse does not necessarily hold) which means that you must provide a definition for it. If you don't want to provide a definition for it you must make it a pure virtual function.
If you make one of these modifications then aptr->disp(); works and calls the derived class disp() because disp() in the derived class overrides the base class function. The base class function still has to exist as you are calling it through a pointer to base. x is not a member of the base class so aptr->x is not a valid expression.