Are there instances where a virtual method call is optimized out? - c++

For example if Foo() is a virtual method of class Bar, there are no inheriting classes, and the compiler could deduce at compile-time that the type is Bar (eg. Bar.Foo()).
Since it's clear at compile-time that Bar::Foo() is the only possible method the call could resolve to, do compilers commonly optimize out the virtual method lookup?

Yes, in such a case Bar.Foo() call will be optimized. Here is the explanation of how such a call will be inlined by a GCC compiler.
The whole series of articles from GCC developer Honza Hubička describes how devirtualization is implemented on low level and what limitations it has:
Devirtualization in C++, part 1
Devirtualization in C++, part 2
Devirtualization in C++, part 3
Devirtualization in C++, part 4

The compiler optimization removing virtual calls is called devirtualisation. It requires the compiler to know the exact type of the instance to know that a certain overload is being called.
In the assumption that you have such classes, I would recommend using final where useful to indicate that no class can inherit from it, or no inheriting class can override this specific method.
All depends on your compiler, though to a certain extent this is being used already.
A big catch in this optimization is that the compiler needs to know the exact type and can deduce that no classes inherit from it or can override the method call. If the class has a hidden visibility, LTO could find out that a method is only implemented once, however I haven't seen any implementation of that yet.

Related

Why does my compiler insist on unused function definitions only for virtual? [duplicate]

I find it quite odd that unused virtual functions must still be defined unlike unused ordinary functions. I understand somewhat about the implicit vtables and vpointers which are created when a class object is created - this somewhat answers the question (that the function must be defined so that the pointers to the virtual function can be defined) but this pushes my query back further still.
Why would a vtable entry need to be created for a function if there's absolutely no chance that virtual function will be called at all?
class A{
virtual bool test() const;
};
int main(){
A a; //error: undefined reference to 'vtable for A'
}
Even though I declared A::test() it was never used in the program but it still throws up an error. Can the compiler not run through the program and realise test() was never called - and thus not require a vtable entry for it? Or is that an unreasonable thing to expect of the compiler?
Because it would inevitably be a very difficult problem to solve on the compiler writer's part, when the usefulness of being able to leave virtual functions undefined is at best dubious. Compiler authors surely have better problems to solve.
Besides, you ARE using that function even though you don't call it. You are taking its address.
The OP says that he already knows about vtables and vpointers, so he understands that there is a difference between unused virtual functions and unused non-virtual functions: an unused non-virtual function is not referenced anywhere, while a virtual function is referenced at least once, in the vtable of its class. So, essentially the question is asking why is the compiler not smart enough to refrain from placing a reference to a virtual function in the vtable if that function is not used anywhere. That would allow the function to also go undefined.
The compiler generally sees only one .cpp file at a time, so it does not know whether you have some source file somewhere which invokes that function.
Some tools support this kind of analysis, they call it "global" analysis or something similar. You might even find it built-in in some compilers, and accessible via some compiler option. But it is never enabled by default, because it would tremendously slow down compilation.
As a matter of fact, the reason why you can leave a non-virtual function undefined is also related to lack of global analysis, but in a different way: if the compiler knew that you have omitted the definition of a function, it would probably at least warn you. But since it does not do global analysis, it can't. This is evidenced by the fact that if you do try to use an undefined function, the error will not be caught by the compiler: it will be caught by the linker.
So, just define an empty virtual function which contains an ASSERT(FALSE) and proceed with your life.
The whole point of virtual functions is that they can be called through a base class pointer. If you never use the base class virtual function, then, why did you define it ? If it is used, they you either have to leave the parent implementation (if it's not pure virtual), or define your own implementation, so that code using your objects through the base class can actually make use of it. In that case, the function is used, it's just not used directly.

Can we make virtual function inline [duplicate]

Pure virtual functions are those member functions that are virtual and have the pure-specifier ( = 0; )
Clause 10.4 paragraph 2 of C++03 tells us what an abstract class is and, as a side note, the following:
[Note: a function declaration cannot provide both a pure-specifier and a definition
—end note] [Example:
struct C {
virtual void f() = 0 { }; // ill-formed
};
—end example]
For those who are not very familiar with the issue, please note that pure virtual functions can have definitions but the above-mentioned clause forbids such definitions to appear inline (lexically in-class). (For uses of defining pure virtual functions you may see, for example, this GotW)
Now for all other kinds and types of functions it is allowed to provide an in-class definition, and this restriction seems at first glance absolutely artificial and inexplicable. Come to think of it, it seems such on second and subsequent glances :) But I believe the restriction wouldn't be there if there weren't a specific reason for that.
My question is: does anybody know those specific reasons? Good guesses are also welcome.
Notes:
MSVC does allow PVF's to have inline definitions. So don't get surprised :)
the word inline in this question does not refer to the inline keyword. It is supposed to mean lexically in-class
In the SO thread "Why is a pure virtual function initialized by 0?" Jerry Coffin provided this quote from Bjarne Stroustrup’s The Design & Evolution of C++, section §13.2.3, where I've added some emphasis of the part I think is relevant:
The curious =0 syntax was chosen over the obvious alternative of introducing a new keyword pure or abstract because at the time I saw no chance of getting a new keyword accepted. Had I suggested pure, Release 2.0 would have shipped without abstract classes. Given a choice between a nicer syntax and abstract classes, I chose abstract classes. Rather than risking delay and incurring the certain fights over pure, I used the tradition C and C++ convention of using 0 to represent "not there." The =0 syntax fits with my view that a function body is the initializer for a function and also with the (simplistic, but usually adequate) view of the set of virtual functions being implemented as a vector of function pointers. [ … ]
So, when choosing the syntax Bjarne was thinking of a function body as a kind of initializer part of the declarator, and =0 as an alternate form of initializer, one that indicated “no body” (or in his words, “not there”).
It stands to reason that one cannot both indicate “not there” and have a body – in that conceptual picture.
Or, still in that conceptual picture, having two initializers.
Now, that's as far as my telepathic powers, google-foo and soft-reasoning goes. I surmise that nobody's been Interested Enough™ to formulate a proposal to the committee about having this purely syntactical restriction lifted, and following up with all the work that that entails. Thus it's still that way.
You shouldn't have so much faith in the standardization committee. Not everything has a deep reason to explain it. Something are so just because at first nobody thought otherwise and after nobody thought that changing it is important enough (I think it is the case here); for things old enough it could even be an artifact of the first implementation. Some are the result of evolution -- there was a deep reason at a time, but the reason was removed and the initial decision wasn't reconsidered again (it could be also the case here, where the initial decision was because any definition of the pure function was forbidden). Some are the result of negotiation between different POV and the result lacks coherence but this lack was deemed necessary to reach to consensus.
Good guesses... well, considering the situation:
it is legal to declare the function inline and provide an explicitly inline body (outside the class), so there's clearly no objection to the only practical implication of being declared inside the class.
I see no potential ambiguities or conflicts introduced in the grammar, so no logical reason for the exclusion of function definitions in situ.
My guess: the use for bodies for pure virtual functions was realised after the = 0 | { ... } grammar was formulated, and the grammar simply wasn't revised. It's worth considering that there are a lot of proposals for language changes / enhancements - including those to make things like this more logical and consistent - but the number that are picked up by someone and written up as formal proposals is much smaller, and the number of those the Committee has time to consider, and believes the compiler-vendors will be prepared to implement, is much smaller again. Things like this need a champion, and perhaps you're the first person to see an issue in it. To get a feel for this process, check out http://www2.research.att.com/~bs/evol-issues.html.
Good guesses are welcome you say?
I think the = 0 at the declaration comes from having the implementation in mind. Most likely this definition means, that you get a NULL entry in the RTTI's vtbl of the class information -- the location where at runtime addresses of the member functions of a class are stored.
But actually, when put a definition of the function in your *.cpp file, you introduce a name into the object file for the linker: An address in the *.o file where to find a specific function.
The basic linker then does need to know about C++ anymore. It can just link together, even though you declared it as = 0.
I think I read that it is possible what you described, although I forgot the behaviour :-)...
Leaving destructors aside, implementations of pure virtual functions are a strange thing, because they never get called in the natural way. i.e. if you have a pointer or reference to your Base class the underlying object will always be some Derived that overrides the function, and that will always get called.
The only way to actually get the implementation to be called is using the Base::func() syntax from one of the derived class's overloads.
This actually, in some ways, makes it a better target for inlining, as at the point where the compiler wants to invoke it, it is always clear which overload is being called.
Also, if implementations for pure virtual functions were forbidden, there would be an obvious workaround of some other (probably protected) non-virtual function in the Base class that you could just call in the regular way from your derived function. Of course the scope would be less limited in that you could call it from any function.
(By the way, I am under the assumption that Base::f() can only be called with this syntax from Derived::f() and not from Derived::anyOtherFunc(). Am I right with this assumption?).
Pure virtual destructors are a different story, in a sense. It is used as a technique simply to prevent someone creating an instance of the derived class without there being any pure virtual functions elsewhere.
The answer to the actual question of "why" it is not permitted is really just because the standards committee said so, but my answer sheds some light on what we are trying to achieve anyway.

Why must unused virtual functions be defined?

I find it quite odd that unused virtual functions must still be defined unlike unused ordinary functions. I understand somewhat about the implicit vtables and vpointers which are created when a class object is created - this somewhat answers the question (that the function must be defined so that the pointers to the virtual function can be defined) but this pushes my query back further still.
Why would a vtable entry need to be created for a function if there's absolutely no chance that virtual function will be called at all?
class A{
virtual bool test() const;
};
int main(){
A a; //error: undefined reference to 'vtable for A'
}
Even though I declared A::test() it was never used in the program but it still throws up an error. Can the compiler not run through the program and realise test() was never called - and thus not require a vtable entry for it? Or is that an unreasonable thing to expect of the compiler?
Because it would inevitably be a very difficult problem to solve on the compiler writer's part, when the usefulness of being able to leave virtual functions undefined is at best dubious. Compiler authors surely have better problems to solve.
Besides, you ARE using that function even though you don't call it. You are taking its address.
The OP says that he already knows about vtables and vpointers, so he understands that there is a difference between unused virtual functions and unused non-virtual functions: an unused non-virtual function is not referenced anywhere, while a virtual function is referenced at least once, in the vtable of its class. So, essentially the question is asking why is the compiler not smart enough to refrain from placing a reference to a virtual function in the vtable if that function is not used anywhere. That would allow the function to also go undefined.
The compiler generally sees only one .cpp file at a time, so it does not know whether you have some source file somewhere which invokes that function.
Some tools support this kind of analysis, they call it "global" analysis or something similar. You might even find it built-in in some compilers, and accessible via some compiler option. But it is never enabled by default, because it would tremendously slow down compilation.
As a matter of fact, the reason why you can leave a non-virtual function undefined is also related to lack of global analysis, but in a different way: if the compiler knew that you have omitted the definition of a function, it would probably at least warn you. But since it does not do global analysis, it can't. This is evidenced by the fact that if you do try to use an undefined function, the error will not be caught by the compiler: it will be caught by the linker.
So, just define an empty virtual function which contains an ASSERT(FALSE) and proceed with your life.
The whole point of virtual functions is that they can be called through a base class pointer. If you never use the base class virtual function, then, why did you define it ? If it is used, they you either have to leave the parent implementation (if it's not pure virtual), or define your own implementation, so that code using your objects through the base class can actually make use of it. In that case, the function is used, it's just not used directly.

MSVC Compiler Error C2688: Microsoft C++ ABI corner case issue?

A very specific corner case that MSVC disallows via Compiler Error 2688 is admitted by Microsoft to be non-standard behavior. Does anyone know why MSVC++ has this specific limitation?
The fact that it involves simultaneous usage of three language features ("virtual base classes", "covariant return types", and "variable number of arguments", according to the description in the second linked page) that are semantically orthogonal and fully supported separately seems to imply that this is not a parsing or semantic issue, but a corner case in the Microsoft C++ ABI. In particular, the fact that a "variable number of arguments" is involved seems to (?) suggest that the C++ ABI is using an implicit trailing parameter to implement the combination of the other two features, but can't because there's no fixed place to put that parameter when the function is var arg.
Does anyone have enough knowledge of the Microsoft C++ ABI to confirm whether this is the case, and explain what this implicit trailing argument is used for (or what else is going on, if my guess is incorrect)? The C++ ABI is not documented by Microsoft but I know that some people outside of Microsoft have done work to match the ABI for various reasons so I'm hoping someone can explain what is going on.
Also, Microsoft's documentation is a bit inconsistent; the second page linked says:
Virtual base classes are not supported as covariant return types when the virtual function has a variable number of arguments.
but the first page more broadly states:
covariant returns with multiple or virtual inheritance not supported for varargs functions
Does anyone know what the real story is? I can do some experimentation to find out, but I'm guessing that the actual corner case is neither of these, exactly, but has to do with the specifics of the class hierachy in a way that the documenters decided to gloss over. My guess it that it has to do with the need for a pointer adjustment in the virtual thunk, but I'm hoping someone with deeper knowledge of the situation than me can explain what's going on behind the hood.
I can tell you with authority that MSVC's C++ ABI uses implicit extra parameters to do things that in other ABIs (namely Itanium) implement multiple separate functions to handle, so it's not hard to imagine that one is being used here (or would be, if the case were supported).
I don't know for sure what's happening in this case, but it seems plausible that an implicit extra parameter is being passed to tell the thunk implementing the virtual function whether a downcast to the covariant return type class is required (or, more likely, whether an upcast back to the base class is required, since the actual implementing function probably returns the derived class), and that this extra parameter goes last so that it can be ignored by the base classes (which wouldn't know anything about the covariant return).
This implies that the unsupported corner case occurs always when a virtual base class is the original return type (since a thunk will always be required to the derived class) which is what is described in the first quote; it would also happen in some, but not all, cases involving multiple inheritance (which may be why it's included in the second quote, but not the first).

C++ vtable query

I have a query in regard to the explaination provided here http://www.parashift.com/c++-faq/virtual-functions.html#faq-20.4
In the sample code the function mycode(Base *p), calls virt3 method as p->virt3(). Here how exactly the compiler know that virt3 is found in third slot of vtable? How do it compare and with what ?
When the compiler sees the definition of Base it decides the layout of its vtable according to some algorithm1, which is common to all its derived classes as far as methods inherited from Base are concerned (derived classes may add other virtual methods, but they are put into the vtable after the stuff inherited from Base).
Thus, when the compiler sees p->virt3(), it already knows that for any object that inherits from Base the pointer to the correct virt3 is e.g. in the third slot of the vtable (because that's how it laid out the vtable of Base at the moment of its definition), so it can correctly generate the code for the virtual call.
Long story short (driving inspiration from #David Rodríguez's comment): it knows where it stays because he decided it before.
1. The standard do not mandate any particular algorithm (actually, it doesn't say anything about how the C++ ABI should be implenented), but there are several widespread C++ ABI specifications, notably the COM ABI on Windows and the Itanium ABI on Linux (and in general for gcc). Obviously, given the same class definition, the algorithm must give the same vtable layout every time, otherwise it would be impossible to link together different object modules.
The layout of the vtable is specified by the Itanium C++ ABI, followed by many compilers including GCC. The compiler itself doesn't decide where the function pointers go (though I suppose it does decide to abide by the ABI!).
The order of the virtual function pointers in a virtual table is the order of declaration of the corresponding member functions in the class.
(Example.)
COM — used by Visual Studio — also emits vtable pointers in source order (though I can't find normative documentation to prove that).
Also, because the function name doesn't even exist at runtime (but a function pointer), the layout of the vtable at compile-time doesn't really matter. The function call translation works in just the same way that a normal function call translation works: the compiler is already mapping the function name to an address in its internal machinery. The only difference is that the mapping here is to a location in the vtable, rather than to the start of the actual function code.
This also addresses your concern about interoperability, to some extent.
Do remember, though, that this is all implementation machinery and C++ itself has no knowledge that virtual tables even exist.
The compiler has a well defined algorithm for allocating the entries in the vtable so that the order of the entries will always be the same regardless of which translation unit is being processed. Internal to the compiler is a mapping between the function names and their location in the vtable so the compiler can do the correct transformation between function call and vtable index.
It is important, therefore, that changes to the definition of a class with virtual functions causes all source files that are dependent on the class to be recompiled, otherwise bad things could happen.