Is virtual dispatch used if class type is known? - c++

Suppose we have base class A with at least one virtual method. Suppose then we have another class B that derives from A and may or may not override this virtual method.
Finally, suppose you create an object of class B with local scope, and call this virtual method.
From the C++ docs we know that if this virtual method is inlined, the inline version will be used, because class type is known and this is not pointer or reference, but the class itself.
Will virtual dispatch be used in this case or it will be bypassed? Will this work for normal (non inline) methods?
I am interested in gcc / clang.

Since both the stack and the vtable are implementation details, it's probably better to phrase it:
Can the compiler use static - rather than virtual - dispatch if the object's (real, runtime) type is statically known?
to which the answer is: yes. Anywhere the compiler knows for certain what version of a virtual method will be used, it can just emit a regular statically-dispatched function call.
Note that there are some places you might expect the compiler to know the object's runtime type and be mistaken - specifically inside constructors.
If you want to know whether a particular compiler does emit this particular optimization for some particular code (at a particular optimization level), just check the assembly output. Even if you're not sure what both versions of a call should look like, you can compare the output with a simple and a fully-qualified call (b.B::foo() vs b.foo()). I'd expect gcc and clang to do a reasonable job in this case, but it's easy enough to check.

Related

C++ Method Override and Overloading (Compiler level)

I know what the difference between the two. Overriding basically lets you "redefine" your a method in a child class and overloading basically lets you "redefine" your method with different arguments or parameters. I'm a little confused on what's going on under the hood though. I read that when you overload a method, the compiler will have all the overloaded methods and find the best match or report an error if none exists. This is obviously done during compile time but I'm confused on how Override works. I've read that handling overrides is extremely hard because you'll have to check if the return type matches with the class hierarchy and there can be a lot of class levels to check
(ie. class Living is the super class of Human and Animal. Human and Animal can have many derived classes which means we will have a deep level of classes).
Without getting too detailed, how does overriding work at the compiler level and why is it that overriding is done during run time and not compile time?
It depends on if the overridden method is virtual or not. If the overridden method is not virtual, then under the hood it usually works in the same way as overloading, the compiler looks at the static type of the object and calls the correct function based on that.
For objects with virtual methods a vtable is usually used. This is a collection of function pointers to the virtual methods. The reason this is done at run time is to allow for runtime polymorphism. The usual way that a vtable is generate is the compilier will generate a single vtable for each class and populate it with the required pointers at compile time and include this in the executable. The constructor will then set a hidden pointer in the class to point to the correct vtable. When looking up methods it first dereferences the hidden pointer to find the vtable then dereferences the correct slot from the vtable.

Why is the virtual keyword needed?

In other words, why doesn't the compiler just "know" that if the definition of a function is changed in a derived class, and a pointer to dynamically allocated memory of that derived class calls the changed function, then that function in particular should be called and not the base class's?
In what instances would not having the virtual keyword work to a programmer's benefit?
virtual keyword tells the compiler to implement dynamic dispatch.That is how the language was designed.
Without such an keyword the compiler would not know whether or not to implement dynamic dispatch.
The downside of virtual or dynamic dispatch in general is that,
It has slight performance penalty. Most compilers would implement dynamic dispatch using vtable and vptr mechanism, where the appropriate function to call is decided through vtable and hence an additional indirection is needed in case of dynamic dispatch.
It makes your class Non-POD.
One reason:
Consider base classes located in separate module, like library.
And derived classes in your application.
How would compiler knows during compiling the library that the given function is/must be virtual.
One of the main designing principles of C++ is that C++ does not incur overhead for features that are not used (the "zero-overhead principle"). This is because of a focus on high performance
This is why you need to opt in to features like virtual functions while in languages like Java, functions are virtual by default.
The compiler doesn't know, because it can't. It might be your intention, to not use virtual functions, because there's always a cost associated with every feature.

single virtual inheritance compiler optimization in c++?

If I have this situation in C++ project:
1 base class 'Base' containing only pure virtual functions
1 class 'Derived', which is the only class which inherits (public) from 'Base'
Will the compiler generate a VTABLE?
It seems there would be no need because the project only contains 1 class to which a Base* pointer could possibly point (Derived), so this could be resolved compile time for all cases.
This is interesting if you want to do dependency injection for unit testing but don't want to incur the VTABLE lookup costs in production code.
I don't have hard data, but I have good reasons to say no, it won't turn virtual calls into static ones.
Usually, the compiler only sees a single compilation unit. It cannot know there's only a single subclass, because five months later you may write another subclass, compile it, get some ancient object files from the backup and link them all together.
While link-time optimizations do see the whole picture, they usually work on a far lower-level representation of the program. Such representation allow e.g. inlining of static calls, but don't represent inheritance information (except perhaps as optional metadata) and already have the virtual calls and vtables spelt out explicitly. I know this is the case for Clang and IIRC gcc's whole-program optimizations also work on some low-level IR (GIMPLE?).
Also note that with dynamic loading, you can still add more subclasses long after compilation and LTO. You may not need it, but if I was a compiler writer, I'd be weary of adding an optimization that allows people royally breaking virtual calls in very specific, hard-to-track-down circumstances.
It's rarely worth the trouble - if you don't need virtual calls (e.g. because you know you won't need any more subclasses), don't make stuff virtual. Review your design. If you need some polymorphism but not the full power of virtual, the curiously recurring template pattern may help.
The compiler doesn't have to use a vtable based implementation of virtual function dispatch at all so the answer to your question will be specific to the implementation that you are using.
The vtable is usually not only used for virtual functions, but it is also used to identify the class type when you do some dynamic_cast or when the program accesses the type_info for the class.
If the compiler detects that no virtual functions ever need a dynamic dispatch and none of the other features are used, it just could remove the vtable pointer as an optimization.
Obviously the compiler writer hasn't found it worth the trouble of doing this. Probably because it wouldn't be used very often.

Inheritance in C++ internals

Can some one explain me how inheritance is implemented in C++ ?
Does the base class gets actually copied to that location or just refers to that location ?
What happens if a function in base class is overridden in derived class ? Does it replace it with the new function or copies it in other location in derived class memory ?
first of all you need to understand that C++ is quite different to e.g. Java, because there is no notion of a "Class" retained at runtime. All OO-features are compiled down to things which could also be achieved by plain C or assembler.
Having said this, what acutally happens is that the compiler generates kind-of a struct, whenever you use your class definition. And when you invoke a "method" on your object, actually the compiler just encodes a call to a function which resides somewhere in the generated executable.
Now, if your class inherits from another class, the compiler somehow includes the fields of the baseclass in the struct he uses for the derived class. E.g. it could place these fields at the front and place the fields corresponding to the derived class after that. Please note: you must not make any assumptions regarding the concrete memory layout the C++ compiler uses. If you do so, you're basically on your own and loose any portability.
How is the inheritance implemented? well, it depends!
if you use a normal function, then the compiler will use the concrete type he's figured out and just encode a jump to the right function.
if you use a virtual function, the compiler will generate a vtable and generate code to look up a function pointer from that vtable, depending on the run time type of the object
This distinction is very important in practice. Note, it is not true that inheritance is allways implemented through a vtable in C++ (this is a common gotcha). Only if you mark a certain member function as virtual (or have done so for the same member function in a baseclass), then you'll get a call which is directed at runtime to the right function. Because of this, a virtual function call is much slower than a non-virtual call (might be several hundered times)
Inheritance in C++ is often accomplished via the vtable. The linked Wikipedia article is a good starting point for your questions. If I went into more detail in this answer, it would essentially be a regurgitation of it.

If classes with virtual functions are implemented with vtables, how is a class with no virtual functions implemented?

In particular, wouldn't there have to be some kind of function pointer in place anyway?
I think that the phrase "classes with virtual functions are implemented with vtables" is misleading you.
The phrase makes it sound like classes with virtual functions are implemented "in way A" and classes without virtual functions are implemented "in way B".
In reality, classes with virtual functions, in addition to being implemented as classes are, they also have a vtable. Another way to see it is that "'vtables' implement the 'virtual function' part of a class".
More details on how they both work:
All classes (with virtual or non-virtual methods) are structs. The only difference between a struct and a class in C++ is that, by default, members are public in structs and private in classes. Because of that, I'll use the term class here to refer to both structs and classes. Remember, they are almost synonyms!
Data Members
Classes are (as are structs) just blocks of contiguous memory where each member is stored in sequence. Note that some times there will be gaps between members for CPU architectural reasons, so the block can be larger than the sum of its parts.
Methods
Methods or "member functions" are an illusion. In reality, there is no such thing as a "member function". A function is always just a sequence of machine code instructions stored somewhere in memory. To make a call, the processor jumps to that position of memory and starts executing. You could say that all methods and functions are 'global', and any indication of the contrary is a convenient illusion enforced by the compiler.
Obviously, a method acts like it belongs to a specific object, so clearly there is more going on. To tie a particular call of a method (a function) to a specific object, every member method has a hidden argument that is a pointer to the object in question. The member is hidden in that you don't add it to your C++ code yourself, but there is nothing magical about it -- it's very real. When you say this:
void CMyThingy::DoSomething(int arg);
{
// do something
}
The compiler really does this:
void CMyThingy_DoSomething(CMyThingy* this, int arg)
{
/do something
}
Finally, when you write this:
myObj.doSomething(aValue);
the compiler says:
CMyThingy_DoSomething(&myObj, aValue);
No need for function pointers anywhere! The compiler knows already which method you are calling so it calls it directly.
Static methods are even simpler. They don't have a this pointer, so they are implemented exactly as you write them.
That's is! The rest is just convenient syntax sugaring: The compiler knows which class a method belongs to, so it makes sure it doesn't let you call the function without specifying which one. It also uses that knowledge to translates myItem to this->myItem when it's unambiguous to do so.
(yeah, that's right: member access in a method is always done indirectly via a pointer, even if you don't see one)
(Edit: Removed last sentence and posted separately so it can be criticized separately)
Non virtual member functions are really just a syntactic sugar as they are almost like an ordinary function but with access checking and an implicit object parameter.
struct A
{
void foo ();
void bar () const;
};
is basically the same as:
struct A
{
};
void foo (A * this);
void bar (A const * this);
The vtable is needed so that we call the right function for our specific object instance. For example, if we have:
struct A
{
virtual void foo ();
};
The implementation of 'foo' might approximate to something like:
void foo (A * this) {
void (*realFoo)(A *) = lookupVtable (this->vtable, "foo");
(realFoo)(this); // Make the call to the most derived version of 'foo'
}
The virtual methods are required when you want to use polymorphism. The virtual modifier puts the method in the VMT for late binding and then at runtime is decided which method from which class is executed.
If the method is not virtual - it is decided at compile time from which class instance will it be executed.
Function pointers are used mostly for callbacks.
If a class with a virtual function is implemented with a vtable, then a class with no virtual function is implemented without a vtable.
A vtable contains the function pointers needed to dispatch a call to the appropriate method. If the method isn't virtual, the call goes to the class's known type, and no indirection is needed.
For a non-virtual method the compiler can generate a normal function invocation (e.g., CALL to a particular address with this pointer passed as a parameter) or even inline it. For a virtual function, the compiler doesn't usually know at compile time at which address to invoke the code, therefore it generates code that looks up the address in the vtable at runtime and then invokes the method. True, even for virtual functions the compiler can sometimes correctly resolve the right code at compile time (e.g., methods on local variables invoked without a pointer/reference).
(I pulled this section from my original answer so that it can be criticized separately. It is a lot more concise and to the point of your question, so in a way it's a much better answer)
No, there are no function pointers; instead, the compiler turns the problem inside-out.
The compiler calls a global function with a pointer to the object instead of calling some pointed-to function inside the object
Why? Because it's usually a lot more efficient that way. Indirect calls are expensive instructions.
There's no need for function pointers as it cant change during the runtime.
Branches are generated directly to the compiled code for the methods; just like if you have functions that aren't in a class at all, branches are generated straight to them.
The compiler/linker links directly which methods will be invoked. No need for a vtable indirection. BTW, what does that have to do with "stack vs. heap"?