I find it quite odd that unused virtual functions must still be defined unlike unused ordinary functions. I understand somewhat about the implicit vtables and vpointers which are created when a class object is created - this somewhat answers the question (that the function must be defined so that the pointers to the virtual function can be defined) but this pushes my query back further still.
Why would a vtable entry need to be created for a function if there's absolutely no chance that virtual function will be called at all?
class A{
virtual bool test() const;
};
int main(){
A a; //error: undefined reference to 'vtable for A'
}
Even though I declared A::test() it was never used in the program but it still throws up an error. Can the compiler not run through the program and realise test() was never called - and thus not require a vtable entry for it? Or is that an unreasonable thing to expect of the compiler?
Because it would inevitably be a very difficult problem to solve on the compiler writer's part, when the usefulness of being able to leave virtual functions undefined is at best dubious. Compiler authors surely have better problems to solve.
Besides, you ARE using that function even though you don't call it. You are taking its address.
The OP says that he already knows about vtables and vpointers, so he understands that there is a difference between unused virtual functions and unused non-virtual functions: an unused non-virtual function is not referenced anywhere, while a virtual function is referenced at least once, in the vtable of its class. So, essentially the question is asking why is the compiler not smart enough to refrain from placing a reference to a virtual function in the vtable if that function is not used anywhere. That would allow the function to also go undefined.
The compiler generally sees only one .cpp file at a time, so it does not know whether you have some source file somewhere which invokes that function.
Some tools support this kind of analysis, they call it "global" analysis or something similar. You might even find it built-in in some compilers, and accessible via some compiler option. But it is never enabled by default, because it would tremendously slow down compilation.
As a matter of fact, the reason why you can leave a non-virtual function undefined is also related to lack of global analysis, but in a different way: if the compiler knew that you have omitted the definition of a function, it would probably at least warn you. But since it does not do global analysis, it can't. This is evidenced by the fact that if you do try to use an undefined function, the error will not be caught by the compiler: it will be caught by the linker.
So, just define an empty virtual function which contains an ASSERT(FALSE) and proceed with your life.
The whole point of virtual functions is that they can be called through a base class pointer. If you never use the base class virtual function, then, why did you define it ? If it is used, they you either have to leave the parent implementation (if it's not pure virtual), or define your own implementation, so that code using your objects through the base class can actually make use of it. In that case, the function is used, it's just not used directly.
Related
I find it quite odd that unused virtual functions must still be defined unlike unused ordinary functions. I understand somewhat about the implicit vtables and vpointers which are created when a class object is created - this somewhat answers the question (that the function must be defined so that the pointers to the virtual function can be defined) but this pushes my query back further still.
Why would a vtable entry need to be created for a function if there's absolutely no chance that virtual function will be called at all?
class A{
virtual bool test() const;
};
int main(){
A a; //error: undefined reference to 'vtable for A'
}
Even though I declared A::test() it was never used in the program but it still throws up an error. Can the compiler not run through the program and realise test() was never called - and thus not require a vtable entry for it? Or is that an unreasonable thing to expect of the compiler?
Because it would inevitably be a very difficult problem to solve on the compiler writer's part, when the usefulness of being able to leave virtual functions undefined is at best dubious. Compiler authors surely have better problems to solve.
Besides, you ARE using that function even though you don't call it. You are taking its address.
The OP says that he already knows about vtables and vpointers, so he understands that there is a difference between unused virtual functions and unused non-virtual functions: an unused non-virtual function is not referenced anywhere, while a virtual function is referenced at least once, in the vtable of its class. So, essentially the question is asking why is the compiler not smart enough to refrain from placing a reference to a virtual function in the vtable if that function is not used anywhere. That would allow the function to also go undefined.
The compiler generally sees only one .cpp file at a time, so it does not know whether you have some source file somewhere which invokes that function.
Some tools support this kind of analysis, they call it "global" analysis or something similar. You might even find it built-in in some compilers, and accessible via some compiler option. But it is never enabled by default, because it would tremendously slow down compilation.
As a matter of fact, the reason why you can leave a non-virtual function undefined is also related to lack of global analysis, but in a different way: if the compiler knew that you have omitted the definition of a function, it would probably at least warn you. But since it does not do global analysis, it can't. This is evidenced by the fact that if you do try to use an undefined function, the error will not be caught by the compiler: it will be caught by the linker.
So, just define an empty virtual function which contains an ASSERT(FALSE) and proceed with your life.
The whole point of virtual functions is that they can be called through a base class pointer. If you never use the base class virtual function, then, why did you define it ? If it is used, they you either have to leave the parent implementation (if it's not pure virtual), or define your own implementation, so that code using your objects through the base class can actually make use of it. In that case, the function is used, it's just not used directly.
Pure virtual functions are those member functions that are virtual and have the pure-specifier ( = 0; )
Clause 10.4 paragraph 2 of C++03 tells us what an abstract class is and, as a side note, the following:
[Note: a function declaration cannot provide both a pure-specifier and a definition
—end note] [Example:
struct C {
virtual void f() = 0 { }; // ill-formed
};
—end example]
For those who are not very familiar with the issue, please note that pure virtual functions can have definitions but the above-mentioned clause forbids such definitions to appear inline (lexically in-class). (For uses of defining pure virtual functions you may see, for example, this GotW)
Now for all other kinds and types of functions it is allowed to provide an in-class definition, and this restriction seems at first glance absolutely artificial and inexplicable. Come to think of it, it seems such on second and subsequent glances :) But I believe the restriction wouldn't be there if there weren't a specific reason for that.
My question is: does anybody know those specific reasons? Good guesses are also welcome.
Notes:
MSVC does allow PVF's to have inline definitions. So don't get surprised :)
the word inline in this question does not refer to the inline keyword. It is supposed to mean lexically in-class
In the SO thread "Why is a pure virtual function initialized by 0?" Jerry Coffin provided this quote from Bjarne Stroustrup’s The Design & Evolution of C++, section §13.2.3, where I've added some emphasis of the part I think is relevant:
The curious =0 syntax was chosen over the obvious alternative of introducing a new keyword pure or abstract because at the time I saw no chance of getting a new keyword accepted. Had I suggested pure, Release 2.0 would have shipped without abstract classes. Given a choice between a nicer syntax and abstract classes, I chose abstract classes. Rather than risking delay and incurring the certain fights over pure, I used the tradition C and C++ convention of using 0 to represent "not there." The =0 syntax fits with my view that a function body is the initializer for a function and also with the (simplistic, but usually adequate) view of the set of virtual functions being implemented as a vector of function pointers. [ … ]
So, when choosing the syntax Bjarne was thinking of a function body as a kind of initializer part of the declarator, and =0 as an alternate form of initializer, one that indicated “no body” (or in his words, “not there”).
It stands to reason that one cannot both indicate “not there” and have a body – in that conceptual picture.
Or, still in that conceptual picture, having two initializers.
Now, that's as far as my telepathic powers, google-foo and soft-reasoning goes. I surmise that nobody's been Interested Enough™ to formulate a proposal to the committee about having this purely syntactical restriction lifted, and following up with all the work that that entails. Thus it's still that way.
You shouldn't have so much faith in the standardization committee. Not everything has a deep reason to explain it. Something are so just because at first nobody thought otherwise and after nobody thought that changing it is important enough (I think it is the case here); for things old enough it could even be an artifact of the first implementation. Some are the result of evolution -- there was a deep reason at a time, but the reason was removed and the initial decision wasn't reconsidered again (it could be also the case here, where the initial decision was because any definition of the pure function was forbidden). Some are the result of negotiation between different POV and the result lacks coherence but this lack was deemed necessary to reach to consensus.
Good guesses... well, considering the situation:
it is legal to declare the function inline and provide an explicitly inline body (outside the class), so there's clearly no objection to the only practical implication of being declared inside the class.
I see no potential ambiguities or conflicts introduced in the grammar, so no logical reason for the exclusion of function definitions in situ.
My guess: the use for bodies for pure virtual functions was realised after the = 0 | { ... } grammar was formulated, and the grammar simply wasn't revised. It's worth considering that there are a lot of proposals for language changes / enhancements - including those to make things like this more logical and consistent - but the number that are picked up by someone and written up as formal proposals is much smaller, and the number of those the Committee has time to consider, and believes the compiler-vendors will be prepared to implement, is much smaller again. Things like this need a champion, and perhaps you're the first person to see an issue in it. To get a feel for this process, check out http://www2.research.att.com/~bs/evol-issues.html.
Good guesses are welcome you say?
I think the = 0 at the declaration comes from having the implementation in mind. Most likely this definition means, that you get a NULL entry in the RTTI's vtbl of the class information -- the location where at runtime addresses of the member functions of a class are stored.
But actually, when put a definition of the function in your *.cpp file, you introduce a name into the object file for the linker: An address in the *.o file where to find a specific function.
The basic linker then does need to know about C++ anymore. It can just link together, even though you declared it as = 0.
I think I read that it is possible what you described, although I forgot the behaviour :-)...
Leaving destructors aside, implementations of pure virtual functions are a strange thing, because they never get called in the natural way. i.e. if you have a pointer or reference to your Base class the underlying object will always be some Derived that overrides the function, and that will always get called.
The only way to actually get the implementation to be called is using the Base::func() syntax from one of the derived class's overloads.
This actually, in some ways, makes it a better target for inlining, as at the point where the compiler wants to invoke it, it is always clear which overload is being called.
Also, if implementations for pure virtual functions were forbidden, there would be an obvious workaround of some other (probably protected) non-virtual function in the Base class that you could just call in the regular way from your derived function. Of course the scope would be less limited in that you could call it from any function.
(By the way, I am under the assumption that Base::f() can only be called with this syntax from Derived::f() and not from Derived::anyOtherFunc(). Am I right with this assumption?).
Pure virtual destructors are a different story, in a sense. It is used as a technique simply to prevent someone creating an instance of the derived class without there being any pure virtual functions elsewhere.
The answer to the actual question of "why" it is not permitted is really just because the standards committee said so, but my answer sheds some light on what we are trying to achieve anyway.
For example if Foo() is a virtual method of class Bar, there are no inheriting classes, and the compiler could deduce at compile-time that the type is Bar (eg. Bar.Foo()).
Since it's clear at compile-time that Bar::Foo() is the only possible method the call could resolve to, do compilers commonly optimize out the virtual method lookup?
Yes, in such a case Bar.Foo() call will be optimized. Here is the explanation of how such a call will be inlined by a GCC compiler.
The whole series of articles from GCC developer Honza Hubička describes how devirtualization is implemented on low level and what limitations it has:
Devirtualization in C++, part 1
Devirtualization in C++, part 2
Devirtualization in C++, part 3
Devirtualization in C++, part 4
The compiler optimization removing virtual calls is called devirtualisation. It requires the compiler to know the exact type of the instance to know that a certain overload is being called.
In the assumption that you have such classes, I would recommend using final where useful to indicate that no class can inherit from it, or no inheriting class can override this specific method.
All depends on your compiler, though to a certain extent this is being used already.
A big catch in this optimization is that the compiler needs to know the exact type and can deduce that no classes inherit from it or can override the method call. If the class has a hidden visibility, LTO could find out that a method is only implemented once, however I haven't seen any implementation of that yet.
I recently decided to use the non-virtual interface idiom (NVI) to design an interface in C++, mostly for the purpose of using a parameter with a default value (thus avoiding problems caused by the fact that default parameters are statically bound).
I came with a rather trivial declaration for my class, looking like this :
class Interface{
public:
void func(Parameter p = 0);
virtual ~Interface();
private:
virtual void doFunc(Parameter p)=0;
};
void Interface::func(Parameter p){ doFunc(p); }
Interface::~Interface() {}
I know that providing the function body in a header automatically mark a function as candidate for inlining (though I don't know if placing the definition outside of the class will prevent that). I also know that virtual function aren't inlined for obvious reasons (we don't know which function will be called at runtime, so we can't replace calls by the function's body obviously).
Then, in this case, will func() be marked candidate for inlining ? It isn't a virtual function, but still calls a virtual function. Does it make it inline-able ?
Extra question: would it be worth it ? The body consist of only one statement.
Note that this question is quite rather for learning about it rather than searching optimization everywhere. I know that this function will be called only a few time (well, for now, so might as well be prudent as to how the program evolves), and that inlining would be quite superfluous and not the main concern about my program's performance.
Thanks !
I know that providing the function body in a header automatically mark a function as candidate for inlining
More or less; but you do that by providing the function body in a class definition, or by explicitly declaring it inline if it's in a header. Otherwise, the function is subject to the One Definition Rule, and you'll get errors if you include the header in more than one translation unit.
Note that this doesn't force the compiler to inline all calls to the function; providing the definition in a header just allows it to inline it in any translation unit that includes the header, if it thinks it's worth it. Also, some compilers can perform "whole program optimisation", and inline functions even when a definition is not available at the call site.
Then, in this case, will func() be marked candidate for inlining ? It isn't a virtual function, but still calls a virtual function. Does it make it inline-able ?
Yes, all functions are candidates for inlining. If they call themselves, then obviously you couldn't inline all the calls; nor if the function isn't known at compile time (e.g. because it must be called virtually, or through a function pointer). In this case, inlining the function would replace the direct call to func() with a virtual call to doFunc().
Note that sometimes virtual calls can be inlined, if the dynamic type is known at compile time. For example:
struct MyImplementation : Interface {/*whatever*/};
MyImplementation thing;
thing.func(); // Known to be MyImplementation, func and doFunc can be inlined
Extra question: would it be worth it ?
That depends on what "it" is. If you mean compile time then, as long as the function remains short, you might get some benefit (possibly significant, if the function is called many times) for negligible cost. If you mean the cost of spending time choosing where to put it, then probably not; just put it wherever's most convenient.
I have a query in regard to the explaination provided here http://www.parashift.com/c++-faq/virtual-functions.html#faq-20.4
In the sample code the function mycode(Base *p), calls virt3 method as p->virt3(). Here how exactly the compiler know that virt3 is found in third slot of vtable? How do it compare and with what ?
When the compiler sees the definition of Base it decides the layout of its vtable according to some algorithm1, which is common to all its derived classes as far as methods inherited from Base are concerned (derived classes may add other virtual methods, but they are put into the vtable after the stuff inherited from Base).
Thus, when the compiler sees p->virt3(), it already knows that for any object that inherits from Base the pointer to the correct virt3 is e.g. in the third slot of the vtable (because that's how it laid out the vtable of Base at the moment of its definition), so it can correctly generate the code for the virtual call.
Long story short (driving inspiration from #David Rodríguez's comment): it knows where it stays because he decided it before.
1. The standard do not mandate any particular algorithm (actually, it doesn't say anything about how the C++ ABI should be implenented), but there are several widespread C++ ABI specifications, notably the COM ABI on Windows and the Itanium ABI on Linux (and in general for gcc). Obviously, given the same class definition, the algorithm must give the same vtable layout every time, otherwise it would be impossible to link together different object modules.
The layout of the vtable is specified by the Itanium C++ ABI, followed by many compilers including GCC. The compiler itself doesn't decide where the function pointers go (though I suppose it does decide to abide by the ABI!).
The order of the virtual function pointers in a virtual table is the order of declaration of the corresponding member functions in the class.
(Example.)
COM — used by Visual Studio — also emits vtable pointers in source order (though I can't find normative documentation to prove that).
Also, because the function name doesn't even exist at runtime (but a function pointer), the layout of the vtable at compile-time doesn't really matter. The function call translation works in just the same way that a normal function call translation works: the compiler is already mapping the function name to an address in its internal machinery. The only difference is that the mapping here is to a location in the vtable, rather than to the start of the actual function code.
This also addresses your concern about interoperability, to some extent.
Do remember, though, that this is all implementation machinery and C++ itself has no knowledge that virtual tables even exist.
The compiler has a well defined algorithm for allocating the entries in the vtable so that the order of the entries will always be the same regardless of which translation unit is being processed. Internal to the compiler is a mapping between the function names and their location in the vtable so the compiler can do the correct transformation between function call and vtable index.
It is important, therefore, that changes to the definition of a class with virtual functions causes all source files that are dependent on the class to be recompiled, otherwise bad things could happen.