Can I change a pure-virtual function (in a base class) to become non-pure without running into any binary compatibility issues? (Linux, GCC 4.1)
thanks
There is no compatibility issues when you switch from pure virtual to virtual and then re-compile the code. (However, virtual to pure virtual may cause problems.)
The only thing you should take care is, that the non-pure virtual methods must have a body. They cannot remain unimplemented. i.e.
class A {
public:
virtual int foo ()
{
return 0; //put some content
}
};
You cannot simply put like,
virtual int foo();
It will cause linker error, even if you don't use it.
What does it mean to maintain binary compatibility to you?
The object layout will be the same, but you will be breaking the One Definition Rule unless you recompile all code, at which point binary compatibility is basically useless. Without recompiling, then the ODR is broken, and while it might be the case that it works, it might also not work.
In particular if all of the virtual methods in the class are either pure or defined inline, then the compiler might generate the vtable in each translation unit that includes the header and mark it as a weak symbol. Then the linker will pick one of them and discard all the others. In this situation the linker is not required to verify that all of the vtables are exactly the same and will pick one at random (or deterministically in an undefined way), and it might pick one such vtable where the method is pure virtual, which in turn might end up crashing the application if the method is called on an object of the base class.
Related
In theory, C++ does not have a binary interface, and the order of methods in the vtable is undefined. Change anything about a class's definition and you need to recompile every class that depends upon it, in every dll etc.
But what I would like to know is how the compilers work in practice. I would hope that they just use the order that the methods are defined in the header/class, which would make appending additional methods safe. But they could also use a hash of the mangled names to make them order independent, but also then completely non-upgradable.
If people have specific knowledge of how specific versions of specific compilers work in different operating systems etc. then that would be most helpful.
Added: Ideally linker symbols would be created for the virtual methods offsets, so that the offsets would never be hard compiled into calling functions. But my understanding is that that is never done. Correct?
It appears that of Microsoft the VTable may be reordered.
The following is copied from https://marc.info/?l=kde-core-devel&m=139744177410091&w=2
I (Nicolas Alvarez) can confirm this behavior happens.
I compiled this class:
struct Testobj {
virtual void func1();
virtual void func2();
virtual void func3();
};
And a program that calls func1(); func2(); func3();
Then I added a func2(int) overload to the end:
struct Testobj {
virtual void func1();
virtual void func2();
virtual void func3();
virtual void func2(int);
};
and recompiled the class but not the program using the class.
Output of calling func1(); func2(); func3(); was
This is func1
This is func2 taking int
This is func2
This shows that if I declare func1() func2() func3() func2(int), the
vtable is laid out as func1() func2(int) func2() func3().
Tested with MSVC2010.
In MSVC 2010 they are in the order you declare them. I can't think of any rationale for another compiler doing it differently although it is an arbitrary choice. It only needs to be consistent. They are just arrays of pointers so don't worry about hashes or mangling.
No matter the order, additional virtual functions added in derived classes must come after those in the base or polymorphic casts would not work.
As far as I know they are always in the order of declarations. This way you can always add declarations of new virtual methods at the end (or below all previous declaration of virtual methods). If you remove any virtual method or add new one somewhere in the middle - you do need to recompile and relink everything.
I know that for sure - I already made that mistake. From my experience these rules apply to both MSVC and GCC.
Any compiler must at least place all the viable entries for a specific class together, with those for derived classes coming either before or afterwards, and also together.
The easiest way to accomplish that is to use the header order. It is difficult to see why any compiler would do anything different, given that it requires more code, more testing, etc., and just provides another way for mistakes to occur. No identifiable benefit that I can see.
Do interfaces (polymorphic class solely with pure virtual functions) have a vtable?
Since interfaces do not implement a polymorphic function themself and cant be directly constructed there would be no need for the linker to place a vtable. Is that so? Im especially concerned about the MSVC compiler.
Yes, they do. And there are a number of good reasons for that.
The first good reason is that even pure virtual methods have implementation. Either implicit or explicit. It is relatively easy to pull off a trick calling a pure virtual function, so you can basically provide a definition for one of yours, call it and see what happens. For that reason, there should be a virtual table in a first place.
There is another reason for putting a virtual table into a base class even if all of its methods are pure virtual and there are no other data members though. When polymorphism is used, a pointer to a base class is passed all around the program. In order to call a virtual method, compiler/runtime should figure out the relative offset of the virtual table from the base pointer. If C++ had no multiple inheritance, one could assume a zero offset from the abstract base class (for example), in which case it would have been possible not to have a vtable there (but we still need it due to reason #1). But since there is a multiple inheritance involved, a trick ala "vtable is there at 0 offset" won't work because there could be two or three vtables depending on a number (and type) of base classes.
There could be other reasons I haven't though of as well.
Hope it helps.
From a purely C++ point of view it's an academic question. Virtual functions don't have to be implemented with vtables, if they are there is no portable way to get at them.
If you're particular concerned about the MSVC compiler you might want to decorate your interfaces with __declspec(novtable).
(In general, in common implementations, an abstract class may need a vtable, e.g.:
struct Base {
Base();
virtual void f() {}
virtual void g() = 0;
};
void h(Base& b) {
b.f(); // Call f on a Base that is not (yet) a Derived
// vtable for Base required
}
Base::Base() {
h(*this);
}
struct Derived : Base {
void g() {}
};
int main() {
Derived d;
}
)
The vtable is not necessary, but rarely optimized out. MSVC provides the __declspec(novtable) extension, which tells the compiler explicitly that the vtable can be removed. In the absence of that, the compiler would have to check itself that the vtable is not used. This is not exceptionally hard, but still far from trivial. And since it doesn't provide real speed benefits in regular code, the check is not implemented in any compiler I know.
If I have this situation in C++ project:
1 base class 'Base' containing only pure virtual functions
1 class 'Derived', which is the only class which inherits (public) from 'Base'
Will the compiler generate a VTABLE?
It seems there would be no need because the project only contains 1 class to which a Base* pointer could possibly point (Derived), so this could be resolved compile time for all cases.
This is interesting if you want to do dependency injection for unit testing but don't want to incur the VTABLE lookup costs in production code.
I don't have hard data, but I have good reasons to say no, it won't turn virtual calls into static ones.
Usually, the compiler only sees a single compilation unit. It cannot know there's only a single subclass, because five months later you may write another subclass, compile it, get some ancient object files from the backup and link them all together.
While link-time optimizations do see the whole picture, they usually work on a far lower-level representation of the program. Such representation allow e.g. inlining of static calls, but don't represent inheritance information (except perhaps as optional metadata) and already have the virtual calls and vtables spelt out explicitly. I know this is the case for Clang and IIRC gcc's whole-program optimizations also work on some low-level IR (GIMPLE?).
Also note that with dynamic loading, you can still add more subclasses long after compilation and LTO. You may not need it, but if I was a compiler writer, I'd be weary of adding an optimization that allows people royally breaking virtual calls in very specific, hard-to-track-down circumstances.
It's rarely worth the trouble - if you don't need virtual calls (e.g. because you know you won't need any more subclasses), don't make stuff virtual. Review your design. If you need some polymorphism but not the full power of virtual, the curiously recurring template pattern may help.
The compiler doesn't have to use a vtable based implementation of virtual function dispatch at all so the answer to your question will be specific to the implementation that you are using.
The vtable is usually not only used for virtual functions, but it is also used to identify the class type when you do some dynamic_cast or when the program accesses the type_info for the class.
If the compiler detects that no virtual functions ever need a dynamic dispatch and none of the other features are used, it just could remove the vtable pointer as an optimization.
Obviously the compiler writer hasn't found it worth the trouble of doing this. Probably because it wouldn't be used very often.
guys. I have read several threads about the interaction between inline and virtual co-existing in one function. In most cases, compilers won't consider it as inline. However, is the principle applied to the scenario when a non-virtual inline member function call a virtual function? say:
class ABC{
public:
void callVirtual(){IAmVitrual();}
protected:
virtual void IAmVirtual();
};
What principle? I would expect the compiler to generate a call to the virtual function. The call (in effect a jump-to-function-pointer) may be inlined but the IAmVirtual function is not.
The virtual function itself is not inline, and it is not called with qualification needed to inline it even if it were, so it can't be inlined.
The whole point of virtual functions is that the compiler generally doesn't know which of the derived class implementations will be needed at run-time, or even if extra derived classes will be dynamically loaded from shared libraries. So, in general, it's impossible to inline. The one case that the compiler can inline is when it happens to know for sure which type it's dealing with because it can see the concrete type in the code and soon afterwards - with no chance of the type having changed - see the call to the virtual function. Even then, it's not required to try to optimise or inline, it's just the only case where it's even possible.
You shouldn't try to fight this unless the profiler's proven the virtual calls are killing you. Then, first try to group a bunch of operations so one virtual call can do more work for you. If virtual dispatch is still just too slow, consider maintaining some kind of discriminated union: it's a lot less flexible and cleanly extensible, but can avoid the virtual function call overheads and allow inlining.
All that assumes you really need dynamic dispatch: some programmers and systems over-use virtual functions just because OO was the in thing 20 years ago, or they've used an OO-only language like Java. C++ has a rich selection of compile-time polymorphic mechanisms, including templates.
In your case callVirtual() will be inlined. Any non-virtual function can be a good candidate of being inline (obviously last decision is upto compiler).
Virtual functions have to be looked up in the Virtual Method Table, and as a result the compiler cannot simply move them to be inline. This is generally a runtime look up. An inline function however may call a virtual one and the compiler can put that call (the code to look up the call in the VMT) inline.
This is probably habitual programming redundancy. I have noticed DECLSPEC_NOVTABLE ( __declspec(novtable) ) on a bunch of interfaces defined in headers:
struct DECLSPEC_NOVTABLE IStuff : public IObject
{
virtual method1 () = 0;
virtual method2 () = 0;
};
The MSDN article on this __declspec extended attribute says that adding this guy will remove the construct and desctructor vtable entries and thus result in "significant code size reduction" (because the vtable will be removed entirely).
This just doesn't make much sense to me. These guys are pure virtual, why wouldn't the compiler just do this by default?
The article also says that if you do this, and then try and instantiate one of these things, you will get a run time access violation. But when I tried this with a few compilers (with or without the __declspec extension), they don't compile (as I would have expected).
So I guess to summarize:
Does the compiler strip out the vtable regardless for pure virtual interfaces, or have I missed something fundamental here?
What is the MSDN article talking about ?
The compiler strips out the only reference to the vtable, which would have been during construction of the class. Therefore, the linker can optimize it away since there is no longer a reference in the code to it.
Also by the way, I have made a habit of declaring an empty constructor as protected, and also using Microsoft's extension abstract keyword, to avoid that access violation at runtime. This way, the compiler catches the problem at compile time instead (since only a base class can instantiate the interface through the protected constructor). The derived class will of course fill in the vtable during its construction.
It's a bit of handholding for a dumb compiler/linker. The compiler should not insert any reference to this vtable, as it is quite obvious that there is no need for this vtable. The compiler could also mark the reference in such a way that the linker can eliminate the vtable, but that's more complex of course.