Compile time elimination of virtual tables?

Compile time elimination of virtual tables? - c++

Assuming i have this hierarchy :
class Super
{
public:
virtual void bar();
};
class Sub : public Super
{
public:
virtual void bar() override;
};
is there a way for me to avoid vtables despite using the virtual key word? (curiosity) i have read something about a compiler optimization that eliminates vtables when the object is known during compile time, I'm not really sure, been digging around google for a while, but could't find any answers, so does it mean these?
Sub sb;
sb.bar(); //avoids vtable?
Super& sr = sb;
sr.bar(); //avoids vtable?
Super* srp = &sb;
srp->bar(); //avoids vtable?

One of the gcc developers has a whole series of blog posts about devirtualization. I think he is also active on SO, so there may be a chance that he responses.
However, devirtualization deals mostly with eliminating virtual dispatch by analyzing the program flow and possible types. I don't think it removes the virtual table in general, but there is an example in the second article where a virtual gets inlined and can then be evaluated completely at compile time through constant propagation. In that case, the compiler/linker transformed the program to not use the class at all, and thus it should not contain any object or vtable.

Sub sb;
sb.bar(); //avoids vtable?
The above will never need to use a vtable for dispatch, as the runtime type is known (i.e. it's known to be the same as the compile-time/static type, namely Sub).
Super& sr = sb;
sr.bar(); //avoids vtable?
Super* srp = &sb;
srp->bar(); //avoids vtable?
In these cases, if the pointer/reference and usage appear in the same function, the optimiser may well be smart enough to avoid dispatch via a vtable. If the pointer or reference is passed to some other out-of-line function that migth be called with other types of pointers, then vtable-based dispatch will normally be needed.
More generally, the C++ Standard doesn't make any stipulations about how runtime polymorphism is implemented, so there is no guaranteed, portable way to eliminate "vtables".
That said, your best bets to minimise use of the vtable for dispatch are:
to mark overrides final when the freedom to override further is not actually required, and
to keep the implementation inline (even if implicitly - by having the function implementation in the class definition)
To see if either/both help, you'll have to experiment with or read the docs for your own compiler / tool-chain, optimisation flags etc..
An unused vtable may or may not be removed by the linker: you may want to experiment with cross-object linker optimisation flags if you have multiple translation units.

Related

C++ Low latency Design: Function Dispatch v/s CRTP for Factory implementation

As part of a system design, we need to implement a factory pattern. In combination with the Factory pattern, we are also using CRTP, to provide a base set of functionality which can then be customized by the Derived classes.
Sample code below:
class FactoryInterface{
public:
virtual void doX() = 0;
};
//force all derived classes to implement custom_X_impl
template< typename Derived, typename Base = FactoryInterface>
class CRTP : public Base
{
public:
void doX(){
// do common processing..... then
static_cast<Derived*>(this)->custom_X_impl();
}
};
class Derived: public CRTP<Derived>
{
public:
void custom_X_impl(){
//do custom stuff
}
};
Although this design is convoluted, it does a provide a few benefits. All the calls after the initial virtual function call can be inlined. The derived class custom_X_impl call is also made efficiently.
I wrote a comparison program to compare the behavior for a similar implementation (tight loop, repeated calls) using function pointers and virtual functions. This design came out triumphs for gcc/4.8 with O2 and O3.
A C++ guru however told me yesterday, that any virtual function call in a large executing program can take a variable time, considering cache misses and I can achieve a potentially better performance using C style function table look-ups and gcc hotlisting of functions. However I still see 2x the cost in my sample program mentioned above.
My questions are as below:
1. Is the guru's assertion true? For either answers, are there any links I can refer.
2. Is there any low latency implementation which I can refer, has a base class invoking a custom function in a derived class, using function pointers?
3. Any suggestions on improving the design?
Any other feedback is always welcome.

Your guru refers to the hot attribute of the gcc compiler. The effect of this attribute is:
The function is optimized more aggressively and on many targets it is
placed into a special subsection of the text section so all hot
functions appear close together, improving locality.
So yes, in a very large code base, the hotlisted function may remain in cache ready to be executed without delay, because it avodis cache misses.
You can perfectly use this attribute for member functions:
struct X {
void test() __attribute__ ((hot)) {cout <<"hello, world !\n"; }
};
But...
When you use virtual functions the compiler generally generates a vtable that is shared between all objects of the class. This table is a table of pointers to functions. And indeed -- your guru is right -- nothing garantees that this table remains in cached memory.
But, if you manually create a "C-style" table of function pointers, the problem is EXACTLY THE SAME. While the function may remain in cache, nothing ensures that your function table remains in cache as well.
The main difference between the two approaches is that:
in the case of virtual functions, the compiler knows that the virtual function is a hot spot, and could decide to make sure to keep the vtable in cache as well (I don't know if gcc can do this or if there are plans to do so).
in the case of the manual function pointer table, your compiler will not easily deduce that the table belongs to a hot spot. So this attempt of manual optimization might very well backfire.
My opinion: never try to optimize yourself what a compiler can do much better.
Conclusion
Trust in your benchmarks. And trust your OS: if your function or your data is frequently acessed, there are high chances that a modern OS will take this into account in its virtual memry management, and whatever the compiler will generate.

Interface vtable

Do interfaces (polymorphic class solely with pure virtual functions) have a vtable?
Since interfaces do not implement a polymorphic function themself and cant be directly constructed there would be no need for the linker to place a vtable. Is that so? Im especially concerned about the MSVC compiler.

Yes, they do. And there are a number of good reasons for that.
The first good reason is that even pure virtual methods have implementation. Either implicit or explicit. It is relatively easy to pull off a trick calling a pure virtual function, so you can basically provide a definition for one of yours, call it and see what happens. For that reason, there should be a virtual table in a first place.
There is another reason for putting a virtual table into a base class even if all of its methods are pure virtual and there are no other data members though. When polymorphism is used, a pointer to a base class is passed all around the program. In order to call a virtual method, compiler/runtime should figure out the relative offset of the virtual table from the base pointer. If C++ had no multiple inheritance, one could assume a zero offset from the abstract base class (for example), in which case it would have been possible not to have a vtable there (but we still need it due to reason #1). But since there is a multiple inheritance involved, a trick ala "vtable is there at 0 offset" won't work because there could be two or three vtables depending on a number (and type) of base classes.
There could be other reasons I haven't though of as well.
Hope it helps.

From a purely C++ point of view it's an academic question. Virtual functions don't have to be implemented with vtables, if they are there is no portable way to get at them.
If you're particular concerned about the MSVC compiler you might want to decorate your interfaces with __declspec(novtable).
(In general, in common implementations, an abstract class may need a vtable, e.g.:
struct Base {
Base();
virtual void f() {}
virtual void g() = 0;
};
void h(Base& b) {
b.f(); // Call f on a Base that is not (yet) a Derived
// vtable for Base required
}
Base::Base() {
h(*this);
}
struct Derived : Base {
void g() {}
};
int main() {
Derived d;
}
)

The vtable is not necessary, but rarely optimized out. MSVC provides the __declspec(novtable) extension, which tells the compiler explicitly that the vtable can be removed. In the absence of that, the compiler would have to check itself that the vtable is not used. This is not exceptionally hard, but still far from trivial. And since it doesn't provide real speed benefits in regular code, the check is not implemented in any compiler I know.

single virtual inheritance compiler optimization in c++?

If I have this situation in C++ project:
1 base class 'Base' containing only pure virtual functions
1 class 'Derived', which is the only class which inherits (public) from 'Base'
Will the compiler generate a VTABLE?
It seems there would be no need because the project only contains 1 class to which a Base* pointer could possibly point (Derived), so this could be resolved compile time for all cases.
This is interesting if you want to do dependency injection for unit testing but don't want to incur the VTABLE lookup costs in production code.

I don't have hard data, but I have good reasons to say no, it won't turn virtual calls into static ones.
Usually, the compiler only sees a single compilation unit. It cannot know there's only a single subclass, because five months later you may write another subclass, compile it, get some ancient object files from the backup and link them all together.
While link-time optimizations do see the whole picture, they usually work on a far lower-level representation of the program. Such representation allow e.g. inlining of static calls, but don't represent inheritance information (except perhaps as optional metadata) and already have the virtual calls and vtables spelt out explicitly. I know this is the case for Clang and IIRC gcc's whole-program optimizations also work on some low-level IR (GIMPLE?).
Also note that with dynamic loading, you can still add more subclasses long after compilation and LTO. You may not need it, but if I was a compiler writer, I'd be weary of adding an optimization that allows people royally breaking virtual calls in very specific, hard-to-track-down circumstances.
It's rarely worth the trouble - if you don't need virtual calls (e.g. because you know you won't need any more subclasses), don't make stuff virtual. Review your design. If you need some polymorphism but not the full power of virtual, the curiously recurring template pattern may help.

The compiler doesn't have to use a vtable based implementation of virtual function dispatch at all so the answer to your question will be specific to the implementation that you are using.

The vtable is usually not only used for virtual functions, but it is also used to identify the class type when you do some dynamic_cast or when the program accesses the type_info for the class.
If the compiler detects that no virtual functions ever need a dynamic dispatch and none of the other features are used, it just could remove the vtable pointer as an optimization.
Obviously the compiler writer hasn't found it worth the trouble of doing this. Probably because it wouldn't be used very often.

Inline and Virtual

guys. I have read several threads about the interaction between inline and virtual co-existing in one function. In most cases, compilers won't consider it as inline. However, is the principle applied to the scenario when a non-virtual inline member function call a virtual function? say:
class ABC{
public:
void callVirtual(){IAmVitrual();}
protected:
virtual void IAmVirtual();
};

What principle? I would expect the compiler to generate a call to the virtual function. The call (in effect a jump-to-function-pointer) may be inlined but the IAmVirtual function is not.
The virtual function itself is not inline, and it is not called with qualification needed to inline it even if it were, so it can't be inlined.

The whole point of virtual functions is that the compiler generally doesn't know which of the derived class implementations will be needed at run-time, or even if extra derived classes will be dynamically loaded from shared libraries. So, in general, it's impossible to inline. The one case that the compiler can inline is when it happens to know for sure which type it's dealing with because it can see the concrete type in the code and soon afterwards - with no chance of the type having changed - see the call to the virtual function. Even then, it's not required to try to optimise or inline, it's just the only case where it's even possible.
You shouldn't try to fight this unless the profiler's proven the virtual calls are killing you. Then, first try to group a bunch of operations so one virtual call can do more work for you. If virtual dispatch is still just too slow, consider maintaining some kind of discriminated union: it's a lot less flexible and cleanly extensible, but can avoid the virtual function call overheads and allow inlining.
All that assumes you really need dynamic dispatch: some programmers and systems over-use virtual functions just because OO was the in thing 20 years ago, or they've used an OO-only language like Java. C++ has a rich selection of compile-time polymorphic mechanisms, including templates.

In your case callVirtual() will be inlined. Any non-virtual function can be a good candidate of being inline (obviously last decision is upto compiler).

Virtual functions have to be looked up in the Virtual Method Table, and as a result the compiler cannot simply move them to be inline. This is generally a runtime look up. An inline function however may call a virtual one and the compiler can put that call (the code to look up the call in the VMT) inline.

DECLSPEC_NOVTABLE on pure virtual classes?

This is probably habitual programming redundancy. I have noticed DECLSPEC_NOVTABLE ( __declspec(novtable) ) on a bunch of interfaces defined in headers:
struct DECLSPEC_NOVTABLE IStuff : public IObject
{
virtual method1 () = 0;
virtual method2 () = 0;
};
The MSDN article on this __declspec extended attribute says that adding this guy will remove the construct and desctructor vtable entries and thus result in "significant code size reduction" (because the vtable will be removed entirely).
This just doesn't make much sense to me. These guys are pure virtual, why wouldn't the compiler just do this by default?
The article also says that if you do this, and then try and instantiate one of these things, you will get a run time access violation. But when I tried this with a few compilers (with or without the __declspec extension), they don't compile (as I would have expected).
So I guess to summarize:
Does the compiler strip out the vtable regardless for pure virtual interfaces, or have I missed something fundamental here?
What is the MSDN article talking about ?

The compiler strips out the only reference to the vtable, which would have been during construction of the class. Therefore, the linker can optimize it away since there is no longer a reference in the code to it.
Also by the way, I have made a habit of declaring an empty constructor as protected, and also using Microsoft's extension abstract keyword, to avoid that access violation at runtime. This way, the compiler catches the problem at compile time instead (since only a base class can instantiate the interface through the protected constructor). The derived class will of course fill in the vtable during its construction.

It's a bit of handholding for a dumb compiler/linker. The compiler should not insert any reference to this vtable, as it is quite obvious that there is no need for this vtable. The compiler could also mark the reference in such a way that the linker can eliminate the vtable, but that's more complex of course.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Compile time elimination of virtual tables? - c++

Related

C++ Low latency Design: Function Dispatch v/s CRTP for Factory implementation

Interface vtable

single virtual inheritance compiler optimization in c++?

Inline and Virtual

DECLSPEC_NOVTABLE on pure virtual classes?

Categories

Resources