Polymorphism in c++ idea of vtable (dynamic binding) - c++

I was thinking about mechanism of polymorphism in C++ and I can't understand one thing. Here I have very simple piece of code with one class:
#include <iostream>
using namespace std;
class A
{
public:
int x;
void fun1();
double fun2(int, char*);
void fun3(double, float[]);
};
int main()
{
cout << sizeof(A) << endl;
return 0;
}
On the console there will be printed size of int object (x) - it's obvious. If I modife my class by added keyword virtual the size will change because compiler are adding pointer to array (vtable) of virtual functions. But how it is possible that size of my class doesn't change while writing declarations of new virtual methods of completely different signatures? I mean that:
void (*(tab[100]) )(int, double, char*);
It's a definition of array which is obliged to has adresses of functions with signature:
void fun(int, double, char*);
And only this type of functions may be added to this array so why no matter of type of virtual method class contains only one pointer to one virtual array? Where have I made a mistake in my logic?

It could be useful:
The virtual table is actually quite simple, though it’s a little complex to describe in words. First, every class that uses virtual functions (or is derived from a class that uses virtual functions) is given it’s own virtual table. This table is simply a static array that the compiler sets up at compile time. A virtual table contains one entry for each virtual function that can be called by objects of the class. Each entry in this table is simply a function pointer that points to the most-derived function accessible by that class

first thing first - the standard doesn't say nothing about virtual tables. it only talks about virtual functions and polymorphism. every compiler is allowed to implement this feature in any way it likes.
virtual tables are only common implementation of virtual function, it is not mendatory, and the implementation is different in every compiler.
lastly, on my Visual studio 2015, this :
class A1 {
int x;
void doIT(){}
};
class A2 {
int x;
virtual void doIT(){}
};
constexpr int size = sizeof(A1);
constexpr int size2 = sizeof(A2);
makes size 4 bytes, but size2 12 bytes, which breaks your assumptions.
again, GCC, Clang and even C++/CLI may have different behaviour, and yield different size.

Related

Is forbidding template virtual functions an unnecessary cautiousness?

After reading many posts about similar topics, and thinking about it for a while, still I do not understand why it is forbidden to implement template virtual functions.
The way I see it, this case has nothing to do with mixing static polymorphism with the dynamic one, but it is rather using template differentiation of the functions at the compile-time and then using dynamic polymorphism for each individual created function at the run-time.
Consider this piece of code:
class parrent{
public:
virtual float function(float value)const{
return value;
}
virtual double function(double value)const{
return value;
}
virtual long double function(long double value)const{
return value;
}
virtual ~parrent() = default;
};
class a_child:public parrent{
public:
float function(float value)const override{
return value + 1.5;
}
double function(double value)const override{
return value + 1.5;
}
long double function(long double value)const override{
return value + 1.5;
}
};
Obviously this code is OK and will achieve the expected result.
But using template to rewrite a similar code:
class parrent{
public:
template<typename t__>
virtual t__ function(t__ value)const{
return value;
}
virtual ~parrent() = default;
};
class a_child:public parrent{
public:
template<typename t__>
t__ function(t__ value)const override{
return value + 1.5;
}
};
Is not allowed.
I am not a compiler designer but from what I have read compilers will create a look up table from virtual functions and use them to launch the appropriate function at the run time, which is different from what they do in case of template functions. For any sets of template parameters given to use a template function at the compile time, compiler will create a unique function.
For this example compiler could detect template parameters in compile time simply by looking at how this virtual template function have been used throughout the entire program. Please consider the main function now:
int main() {
parrent* a;
parrent* b;
a = new parrent;
b = new a_child;
std::cout<< a->function(1.6f) << std::endl;
std::cout<< a->function(1.6) << std::endl;
std::cout<< a->function(1.6L) << std::endl;
std::cout<< b->function(1.6f) << std::endl;
std::cout<< b->function(1.6) << std::endl;
std::cout<< b->function(1.6L) << std::endl;
delete a;
delete b;
return 0;
}
Here Compiler will see that the function was used once for a float value, once for a double value and once for a long double value, so in any case it can easily create the right function with appropriate template parameters.
And in the end there will be 3 individual virtual functionS, not just one virtual function.
If we have a function which template parameters could not be deduced from the functions inputs like
template<typename t__>
virtual t__ function(int value){return value;}
Then users can just give the parameters themselves like:
object_pointer->function<double>(1234);
These practices are just what is already being used in case of any template functions, so why would it be different for virtual functions!
the only caveat to this practice that I can think of would be when the template virtual function get instantiated from a child object and never from the parent object or pointer.
Well even in that case same practice could be applied in order to create different virtual functions. Alternatively due to the lack of use of their virtuality they can become normal individual functions.
From the answer and comments it appears that there might be a serious problem with this approach which is obvious to every one else, so please be patient and help me understand it too.
I guess the mentioned problem in answers has something to do with compiler and/or linker not being able to know how many (and what type of) vtables it should produce for a class with regard to the rest of the codes or different translation units that it might face.
Well lets say it can produce an unfinished vtables list and extend it as it go along. The problem of ending up with two vtables or two different instances of a same class in the of case dynamic linking can already happen with instantiation of a template class with a virtual (non template)function.
So it seems that compilers already have a method to circumvent that problem!
First lets not forget that with regards to c, methods or class non static functions are nothing more than simple functions which require an object as one of their parameters, so lets not think of class as some intricate piece of code.
Second let's not get carried away by how compilers and linkers and what not works today. The language should be standard not the way compilers produce executable! Lets not forget that there are still many features in standard c++ 17 that even GCC does not cover yet!
Please explain to me in term of logic not the way compilers and/or linkers work what is the problem?
The way compilers implement polymorphic classes is as follows: the compiler looks at the class definition, determines how many vtable entries are needed, and statically assigns one entry in that vtable to each of the class's virtual methods. Wherever one of those virtual methods is called, the compiler generates code that retrieves the vptr from the class and looks up the entry at the statically assigned offset in order to determine the address that needs to be called.
We can now see how having a virtual template would cause issues. Suppose you had a class containing a virtual template. Now, after the end of the class definition, the compiler doesn't know how large to make the vtable. It has to wait until the end of the translation unit, to see the full list of the specializations of the template that are actually called (or to which a pointer-to-member is taken). If the class is only defined in this single translation unit, this problem could be solved by assigning vtable offsets to the template specializations in some increasing order in which they are encountered, then emitting the vtable at the end. However, if the class has external linkage, this breaks down, as when compiling different translation units, the compiler has no way of avoiding conflicts in the assignment of offsets to specializations of the virtual method template. Instead, the vtable offsets would have to be replaced with symbols that would be resolved by the linker once it has seen the list of referenced specializations from all translation units and merged them into a single list. It seems that if standard C++ required virtual templates to be supported, every implementation would have to require the linker to implement this functionality. I can guess that this will not be feasible any time soon.
I am not a compiler designer but I see a problem with what you are hoping to do.
When you have a virtual template member function, such as
template<typename t__>
virtual t__ function(t__ value)const{
return value;
}
there is no end to the types for which that is applicable. How does the compiler know whether to stop at int and double? There are unlimited number of types for which that function can be instantiated. Would you expect the compiler to generate vtable that takes into account all possible ways that function can be instantiated? That's infinite. It's not doable.

Why does the member function name lookup stay in the parent class?

Motivation, if it helps: : I have struct member functions that are radial-basis-function kernels. They are called 1e06 x 15 x 1e05 times in a numerical simulation. Counting on devirtualization to inline virtual functions is not something I want to do for that many function calls. Also, the structs (RBF kernels) are already used as template parameters of a larger interpolation class.
Minimal working example
I have a function g() that is always the same, and I want to reuse it, so I pack it in the base class.
The function g() calls a function f() that is different in derived classes.
I don't want to use virtual functions to resolve the function names at runtime, because this incurs additional costs (I measured it in my code, it has an effect).
Here is the example:
#include <iostream>
struct A
{
double f() const { return 0; };
void g() const
{
std::cout << f() << std::endl;
}
};
struct B : private A
{
using A::g;
double f() const { return 1; };
};
struct C : private A
{
using A::g;
double f() const { return 2; };
};
int main()
{
B b;
C c;
b.g(); // Outputs 0 instead of 1
c.g(); // Outputs 0 instead of 2
}
I expected the name resolution mechanism to figure out I want to use "A::g()", but then to return to "B" or "C" to resolve the "f()" function. Something along the lines: "when I know a type at compile time, I will try to resolve all names in this type first, and do a name lookup from objects/parents something is missing, then go back to the type I was called from". However, it seems to figure out "A::g()" is used, then it sits in "A" and just picks "A::f()", even though the actual call to "g()" came from "B" and "C".
This can be solved using virtual functions, but I don't understand and would like to know the reasoning behind the name lookup sticking to the parent class when types are known at compile time.
How can I get this to work without virtual functions?
This is a standard task for the CRTP. The base class needs to know what the static type of the object is, and then it just casts itself to that.
template<typename Derived>
struct A
{
void g() const
{
cout << static_cast<Derived const*>(this)->f() << endl;
}
};
struct B : A<B>
{
using A::g;
double f() const { return 1; };
};
Also, responding to a comment you wrote, (which is maybe your real question?),
can you tell me what is the reasoning for the name lookup to stick to the base class, instead of returning to the derived class once it looked up g()?
Because classes are intended to be used for object-oriented programming, not for code reuse. The programmer of A needs to be able to understand what their code is doing, which means subclasses shouldn't be able to arbitrarily override functionality from the base class. That's what virtual is, really: A giving its subclasses permission to override that specific member. Anything that A hasn't opted-in to that for, they should be able to rely on.
Consider in your example: What if the author of B later added an integer member which happened to be called endl? Should that break A? Should B have to care about all the private member names of A? And if the author of A wants to add a member variable, should they be able to do so in a way that doesn't potentially break some subclass? (The answers are "no", "no", and "yes".)

C++: Template method pattern using directly the derived type

Suppose that situation:
struct base
{
void method()
{
requisites();
do_it();
}
virtual void requisites() const = 0;
void do_it() { /* do it */ }
};
struct derived : base
{
void requisites() const
{
if (!my_requisites)
throw something;
}
}
int main()
{
derived d;
d.method();
return 0;
}
In that case, where I'm not using pointers or references, but directly instances of the derived type, does the compiler need to do a run-time query against the vtable to select the correct override of requisites (the one of derived)? Or is that kind of behaviour as efficent as using no virtual functions? In other words, does the compiler know in compilation time that we are using derived::requisites()?
vtable is not necessarily slower.
For example on x86 in a unix shared object, position independent code has been produced (gcc3, gcc4) using a hack to load ebx with the current eip. This value was used to find a jump table for any static functions. Calling a dynamic function could be performed by querying the this pointer directly, and was faster (if no static functions were called in a given function).
The compiler does know the concrete type, and is able to call directly the function, but may choose to find the function virtually because :-
a) it may be faster.
b) it simplifies the amount of code generation cases.

How to get "direct" function pointer to a virtual member function?

I am working on an embedded platform which doesn't cope very well with dynamic code (no speculative / OOO execution at all).
On this platform I call a virtual member function on the same object quite often, however the compiler fails to optimize the vtable-lookup away, as it doesn't seem to recognize the lookup is only required for the first invocation.
Therefore I wonder: Is there a manual way to devirtualize a virtual member function of a C++ class in order to get a function-pointer which points directly to the resolved address?
I had a look at C++ function pointers, but since they seem to require a type specified, I guess this won`t work out.
Thank you in advance
There's no general standard-C++-only way to find the address of a virtual function, given only a reference to a base class object. Furthermore there's no reasonable type for that, because the this needs not be passed as an ordinary argument, following a general convention (e.g. it can be passed in a register, with the other args on stack).
If you do not need portability, however, you can always do whatever works for your given compiler. E.g., with Microsoft's COM (I know, that's not your platform) there is a known memory layout with vtable pointers, so as to access the functionality from C.
If you do need portability then I suggest to design in the optimization. For example, instead of
class Foo_base
{
public:
virtual void bar() = 0;
};
do like
class Foo_base
{
public:
typedef (*Bar_func)(Foo_base&);
virtual Bar_func bar_func() const = 0;
void bar() { bar_func()( *this ); }
};
supporting the same public interface as before, but now exposing the innards, so to speak, thus allowing manual optimization of repeated calls to bar.
Regarding gcc I have seen the following while debuggging the assembly code compiled.
I have seen that a generic method pointer holds two data:
a) a "pointer" to the method
b) an offset to add eventually to the class instance starting address ( the offset is used when multiple inheritance is involved and for methods of the second and further parent class that if applied to their objects would have their data at different starting points).
The "pointer" to the method is as follows:
1) if the "pointer" is even it is interpreted as a normal (non virtual) function pointer.
2) If the "pointer" is odd then 1 should be subtracted and the remaining value should be 0 or 4 or 8 or 12 ( supposing a pointer size of 4 bytes).
The previous codification supposes obviously that all normal methods start at even addresses (so the compiler should align them at even addresses).
So that offset is the offset into the vtable where to fetch the address of the "real" non virual method pointer.
So the correct idea in order to devirtualize the call is to convert a virtual method pointer to a non virtual method pointer and use it aftewards in order to apply it to the "subject" that is our class instance.
The code bellow does what described.
#include <stdio.h>
#include <string.h>
#include <typeinfo>
#include <typeindex>
#include <cstdint>
struct Animal{
int weight=0x11111111;
virtual int mm(){printf("Animal1 mm\n");return 0x77;};
virtual int nn(){printf("Animal1 nn\n");return 0x99;};
};
struct Tiger:Animal{
int weight=0x22222222,height=0x33333333;
virtual int mm(){printf("Tigerxx\n");return 0xCC;}
virtual int nn(){printf("Tigerxx\n");return 0x99;};
};
typedef int (Animal::*methodPointerT)();
typedef struct {
void** functionPtr;
size_t offset;
} MP;
void devirtualize(methodPointerT& mp0,const Animal& a){
MP& t=*(MP*)&mp0;
if((intptr_t)t.functionPtr & 1){
size_t index=(t.functionPtr-(void**)1); // there is obviously a more
void** vTable=(void**)(*(void**)&a); // efficient way. Just for clearness !
t.functionPtr=(void**)vTable[index];
}
};
int main()
{
int (Animal::*mp1)()=&Animal::nn;
MP& mp1MP=*(MP*)&mp1;
Animal x;Tiger y;
(x.*mp1)();(y.*mp1)();
devirtualize(mp1,x);
(x.*mp1)();(y.*mp1)();
}
Yes, this is possible in a way that works at least with MSVC, GCC and Clang.
I was also looking for how to do this, and here is a blog post I found that explains it in detail: https://medium.com/#calebleak/fast-virtual-functions-hacking-the-vtable-for-fun-and-profit-25c36409c5e0
Taking the code from there, in short, this is what you need to do. This function works for all objects:
template <typename T>
void** GetVTable(T* obj) {
return *((void***)obj);
}
And then to get a direct function pointer to the first virtual function of the class, you do this:
typedef void(VoidMemberFn)(void*);
VoidMemberFn* fn = (VoidMemberFn*)GetVTable<BaseType>(my_obj_ptr)[0];
// ... sometime later
fn(my_obj_ptr);
So it's quite easy actually.

Are virtual functions the only way to achieve Runtime Polymorphism in C++?

One of my friends asked me "How Runtime Polymorphism is achieved in C++?" I answered "By Inheritance"
He said "No, it can be achieved only using virtual functions".
So I gave him an example of the following code :-
#include<iostream>
using namespace std;
class A
{
public:
int i;
A(){i=100;}
};
class B : public A
{
public:
int j;
B(){i = -1; j = 99;}
};
void func(A& myA)
{
cout<<myA.i << endl;
}
int main()
{
B b;
A* a = new B();
func(*a);
func(b);
delete a;
return 0;
}
Here, function func() takes reference of A but we pass object of B and we can print the value of public member "i". He said it is compile time polymorphism.
My questions are :-
1) Is Runtime polymorphism achieved only with virtual functions?
2) Is the example above has runtime polymorphism or compile time?
3) If I have the following code :-
void func2(A& myA)
{
cout << myA.i << endl;
// dynamic/static cast myA to myB
cout<<myB.j << endl;
}
what kind of polymorphism is it? Or is it even polymorphism?
The example does not show dynamic polymorphism. The method to be called is known at compile time. There is no runtime decision(based on actual object type) as to which method should be called. There is no different behavior for different types.
For the example to be example of dynamic polymorphism.
You need to provide a virtual member function in Base class and overide it in derived class. The actual method to be called is decided by the actual type of the object pointed by the Base class pointer.
Online sample:
#include<iostream>
using namespace std;
class A
{
public:
virtual void doSomething()
{
std::cout<<"\nIn A::doSomething()";
}
};
class B : public A
{
public:
virtual void doSomething()
{
std::cout<<"\nIn B::doSomething()";
}
};
int main()
{
B b;
A obj;
A* a = &b;
a->doSomething();
a = &obj;
a->doSomething();
return 0;
}
Output:
In B::doSomething()
In A::doSomething()
Is Runtime polymorphism acheived only with virutal functions?
No, but virtual functions is the most common and correct way to do so.
Polymorphism can be achieved through function pointers. Consider the following code example, the actual method to call is decided at run-time depending on user input. It is a form of polymorphism through not in strict sense C++ sense which mandates different behaviors for different types.
#include <iostream>
typedef void (*someFunction)(int, char*);
void FirstsomeFunction(int i, char *c)
{
std::cout<<"\n In FirstsomeFunction";
}
void SecondsomeFunction(int i, char *c)
{
std::cout<<"\n In SecondsomeFunction";
}
int main()
{
someFunction arr[1];
int x = 0;
std::cin >> x;
if(x ==0)
arr[0] = &FirstsomeFunction;
else
arr[0] = &SecondsomeFunction;
(arr[0])(10,"Hello");
return 0;
}
Is the example above has runtime polymorphism or compile time?
There is no polymorphism of any kind. The same method will be called in all cases. There is no different behavior for different types and hence it does not classify as polymorphism of any kind.
The C language's fprintf is a polymorphic function.
You can pass it various handles and it can print to a file, stdout, a printer, a socket, anything which the system can represent as a stream.
FILE* file = fopen("output.txt", "w"); // a file
FILE* file = stdout; // standard output
FILE* file = fopen("/dev/usb/lp0", "w"); // a usb printer
FILE* file = popen("/usr/bin/lpr -P PDF", "w"); // a PDF file
FILE* file = fdopen(socket(AF_INET,SOCK_STREAM,0), "r+"); // network socket
fprintf(file, "Hello World.\n");
what you wrote is not polymorphism.
This is how you do polymorphism in C++ :
#include<iostream>
using namespace std;
class A
{
public:
virtual void func(){
cout << "printing A" << endl;
}
virtual ~A(){}
};
class B : public A
{
public:
void func(){
cout << "printing B" << endl;
}
};
int main()
{
A* a = new A();
A* b = new B();
a->func(); // "printing A"
b->func(); // "printing B"
delete a;
delete b;
return 0;
}
If you were to remove the virtual keyword, the method func of A would be called twice.
One of my friends asked me "How Runtime Polymorphism is achieved in C++?" I answered "By Inheritance"
He said "No, it can be achieved only using virtual functions".
First, the term polymorphism is ambiguous: in the general computing science sense it refers to an ability to implicitly invoke type-specific code, whether at compile time or run-time. In the C++ Standard it is defined very narrowly are being virtual dispatch (that's the perogative of standards). Obviously for your friend's question to be meaningful, as he's asking how it's achieved in C++ his perspective must be from outside C++ - in the larger context of Computing Science terminology.
Certainly, virtual functions/dispatch are an answer, but are they the only answer...?
To attempt to answer that, it helps to have a clear conception of what behaviour qualifies as run-time polymorphic. Consider:
void f(X& x)
{
// the only situation where C++ doesn't know the run-time type of a variable is
// where it's an instance of a class/struct accepted by reference or pointer
...some operation involving "x"...
}
Any mechanism that could result in different machine code for the operation being invoked involving "x", where the reason relates specifically to the run-time type of "x", is getting pretty close to run-time polymorphic, but there's one final issue: was that branching decided implicitly by the language, or arranged explicitly by the programmer?
In the case of virtual dispatch, the compiler implicitly knows to create the virtual dispatch tables and lookups that branch to the type-appropriate code.
But, say we have a function pointer that was previously set to address type-appropriate code, or a type-specific number or enum that is used to control a switch to a type-specific case. These functionally achieve the same behaviour as run-time virtual dispatch, but the set up had to be done explicitly by the developer, and there's no compiler enforcement to make sure that the determination is done purely on run-time type. Whether they qualify or not is arguable. Because C++ has a fully implicit mechanism in virtual dispatch, and because in the C++ Standard polymorphism has a narrowed definition related specifically to virtual dispatch, I'd guess that most C++ programmers would say "no".
But in the world of C, describing say qsort or bsearch (two Standard libC functions that handle arbitrary types using run-time dispatch via function pointer arguments) as run-time polymorphic might aid quick understanding... it's more normal to say that they're generic implementations though.
Still, there's doubtless hundreds of Computing Science textbooks out there with functional definitions of run-time polymorphism, and I'd bet dispatch using function pointers or other programmer-initialised metadata satisfied a good percentage of them. So, it's pointless to be too insistent that there's a single unambiguous answer.
My questions are :-
1) Is Runtime polymorphism achieved only with virtual functions?
As above, I'd lean towards "yes" in the context of C++, but it's (endlessly) arguable.
2) Is the example above has runtime polymorphism or compile time?
Neither... there's not even two functions to choose between on the basis of type - you're always running the same machine code for func(): that picked by the compiler given an expectation that the type is A.
3) If I have the following code :-
void func2(A& myA)
{
cout << myA.i << endl;
// dynamic/static cast myA to myB
cout<<myB.j << endl;
}
what kind of polymorphism is it? Or is it even polymorphism?
Not polymorphic at all, as you have no branching based on type. A dynamic cast could consult the compiler-populated type meta-data in the run-time type of myA, and if you used that to only conditionally invoke the access to myB.j - which would be undefined behaviour unless myA was a B - then you're back at manually, explicitly developer coordinated type-specific behaviour, and whether that qualifies as "polymorphism" for you is discussed above.
Polymorphism is achieved with virtual functions. But to have any effect, i.e. different behaviour depending on type, you need inheritance too
struct A {
virtual void f() = 0;
};
struct B : public A {
void f() {
// do B things
std::cout << "B::f() called\n";
}
};
struct C : public A {
void f() {
// do C things
std::cout << "C::f() called\n";
}
};
Now you can have pointers or references to A with different behaviour, depending on whether it's a B or C.
[C++]
Polymorphism is defined as one interface to control access to a general class of actions. There are two types of polymorphism one is compile time polymorphism and the other is run time polymorphism. Compile time polymorphism is functions and operators overloading. Runtime time polymorphism is done using inheritance and virtual functions.
Polymorphism means that functions assume different forms at different times. In case of compile time it is called function overloading. For example, a program can consist of two functions where one can perform integer addition and other can perform addition of floating point numbers but the name of the functions can be same such as add. The function add() is said to be overloaded. Two or more functions can have same name but their parameter list should be different either in terms of parameters or their data types. The functions which differ only in their return types cannot be overloaded. The compiler will select the right function depending on the type of parameters passed. In cases of classes constructors could be overloaded as there can be both initialized and uninitialized objects.