C++ vtable query - c++

I have a query in regard to the explaination provided here http://www.parashift.com/c++-faq/virtual-functions.html#faq-20.4
In the sample code the function mycode(Base *p), calls virt3 method as p->virt3(). Here how exactly the compiler know that virt3 is found in third slot of vtable? How do it compare and with what ?

When the compiler sees the definition of Base it decides the layout of its vtable according to some algorithm1, which is common to all its derived classes as far as methods inherited from Base are concerned (derived classes may add other virtual methods, but they are put into the vtable after the stuff inherited from Base).
Thus, when the compiler sees p->virt3(), it already knows that for any object that inherits from Base the pointer to the correct virt3 is e.g. in the third slot of the vtable (because that's how it laid out the vtable of Base at the moment of its definition), so it can correctly generate the code for the virtual call.
Long story short (driving inspiration from #David Rodríguez's comment): it knows where it stays because he decided it before.
1. The standard do not mandate any particular algorithm (actually, it doesn't say anything about how the C++ ABI should be implenented), but there are several widespread C++ ABI specifications, notably the COM ABI on Windows and the Itanium ABI on Linux (and in general for gcc). Obviously, given the same class definition, the algorithm must give the same vtable layout every time, otherwise it would be impossible to link together different object modules.

The layout of the vtable is specified by the Itanium C++ ABI, followed by many compilers including GCC. The compiler itself doesn't decide where the function pointers go (though I suppose it does decide to abide by the ABI!).
The order of the virtual function pointers in a virtual table is the order of declaration of the corresponding member functions in the class.
(Example.)
COM — used by Visual Studio — also emits vtable pointers in source order (though I can't find normative documentation to prove that).
Also, because the function name doesn't even exist at runtime (but a function pointer), the layout of the vtable at compile-time doesn't really matter. The function call translation works in just the same way that a normal function call translation works: the compiler is already mapping the function name to an address in its internal machinery. The only difference is that the mapping here is to a location in the vtable, rather than to the start of the actual function code.
This also addresses your concern about interoperability, to some extent.
Do remember, though, that this is all implementation machinery and C++ itself has no knowledge that virtual tables even exist.

The compiler has a well defined algorithm for allocating the entries in the vtable so that the order of the entries will always be the same regardless of which translation unit is being processed. Internal to the compiler is a mapping between the function names and their location in the vtable so the compiler can do the correct transformation between function call and vtable index.
It is important, therefore, that changes to the definition of a class with virtual functions causes all source files that are dependent on the class to be recompiled, otherwise bad things could happen.

Related

Why does my compiler insist on unused function definitions only for virtual? [duplicate]

I find it quite odd that unused virtual functions must still be defined unlike unused ordinary functions. I understand somewhat about the implicit vtables and vpointers which are created when a class object is created - this somewhat answers the question (that the function must be defined so that the pointers to the virtual function can be defined) but this pushes my query back further still.
Why would a vtable entry need to be created for a function if there's absolutely no chance that virtual function will be called at all?
class A{
virtual bool test() const;
};
int main(){
A a; //error: undefined reference to 'vtable for A'
}
Even though I declared A::test() it was never used in the program but it still throws up an error. Can the compiler not run through the program and realise test() was never called - and thus not require a vtable entry for it? Or is that an unreasonable thing to expect of the compiler?
Because it would inevitably be a very difficult problem to solve on the compiler writer's part, when the usefulness of being able to leave virtual functions undefined is at best dubious. Compiler authors surely have better problems to solve.
Besides, you ARE using that function even though you don't call it. You are taking its address.
The OP says that he already knows about vtables and vpointers, so he understands that there is a difference between unused virtual functions and unused non-virtual functions: an unused non-virtual function is not referenced anywhere, while a virtual function is referenced at least once, in the vtable of its class. So, essentially the question is asking why is the compiler not smart enough to refrain from placing a reference to a virtual function in the vtable if that function is not used anywhere. That would allow the function to also go undefined.
The compiler generally sees only one .cpp file at a time, so it does not know whether you have some source file somewhere which invokes that function.
Some tools support this kind of analysis, they call it "global" analysis or something similar. You might even find it built-in in some compilers, and accessible via some compiler option. But it is never enabled by default, because it would tremendously slow down compilation.
As a matter of fact, the reason why you can leave a non-virtual function undefined is also related to lack of global analysis, but in a different way: if the compiler knew that you have omitted the definition of a function, it would probably at least warn you. But since it does not do global analysis, it can't. This is evidenced by the fact that if you do try to use an undefined function, the error will not be caught by the compiler: it will be caught by the linker.
So, just define an empty virtual function which contains an ASSERT(FALSE) and proceed with your life.
The whole point of virtual functions is that they can be called through a base class pointer. If you never use the base class virtual function, then, why did you define it ? If it is used, they you either have to leave the parent implementation (if it's not pure virtual), or define your own implementation, so that code using your objects through the base class can actually make use of it. In that case, the function is used, it's just not used directly.

Are there instances where a virtual method call is optimized out?

For example if Foo() is a virtual method of class Bar, there are no inheriting classes, and the compiler could deduce at compile-time that the type is Bar (eg. Bar.Foo()).
Since it's clear at compile-time that Bar::Foo() is the only possible method the call could resolve to, do compilers commonly optimize out the virtual method lookup?
Yes, in such a case Bar.Foo() call will be optimized. Here is the explanation of how such a call will be inlined by a GCC compiler.
The whole series of articles from GCC developer Honza Hubička describes how devirtualization is implemented on low level and what limitations it has:
Devirtualization in C++, part 1
Devirtualization in C++, part 2
Devirtualization in C++, part 3
Devirtualization in C++, part 4
The compiler optimization removing virtual calls is called devirtualisation. It requires the compiler to know the exact type of the instance to know that a certain overload is being called.
In the assumption that you have such classes, I would recommend using final where useful to indicate that no class can inherit from it, or no inheriting class can override this specific method.
All depends on your compiler, though to a certain extent this is being used already.
A big catch in this optimization is that the compiler needs to know the exact type and can deduce that no classes inherit from it or can override the method call. If the class has a hidden visibility, LTO could find out that a method is only implemented once, however I haven't seen any implementation of that yet.

Why must unused virtual functions be defined?

I find it quite odd that unused virtual functions must still be defined unlike unused ordinary functions. I understand somewhat about the implicit vtables and vpointers which are created when a class object is created - this somewhat answers the question (that the function must be defined so that the pointers to the virtual function can be defined) but this pushes my query back further still.
Why would a vtable entry need to be created for a function if there's absolutely no chance that virtual function will be called at all?
class A{
virtual bool test() const;
};
int main(){
A a; //error: undefined reference to 'vtable for A'
}
Even though I declared A::test() it was never used in the program but it still throws up an error. Can the compiler not run through the program and realise test() was never called - and thus not require a vtable entry for it? Or is that an unreasonable thing to expect of the compiler?
Because it would inevitably be a very difficult problem to solve on the compiler writer's part, when the usefulness of being able to leave virtual functions undefined is at best dubious. Compiler authors surely have better problems to solve.
Besides, you ARE using that function even though you don't call it. You are taking its address.
The OP says that he already knows about vtables and vpointers, so he understands that there is a difference between unused virtual functions and unused non-virtual functions: an unused non-virtual function is not referenced anywhere, while a virtual function is referenced at least once, in the vtable of its class. So, essentially the question is asking why is the compiler not smart enough to refrain from placing a reference to a virtual function in the vtable if that function is not used anywhere. That would allow the function to also go undefined.
The compiler generally sees only one .cpp file at a time, so it does not know whether you have some source file somewhere which invokes that function.
Some tools support this kind of analysis, they call it "global" analysis or something similar. You might even find it built-in in some compilers, and accessible via some compiler option. But it is never enabled by default, because it would tremendously slow down compilation.
As a matter of fact, the reason why you can leave a non-virtual function undefined is also related to lack of global analysis, but in a different way: if the compiler knew that you have omitted the definition of a function, it would probably at least warn you. But since it does not do global analysis, it can't. This is evidenced by the fact that if you do try to use an undefined function, the error will not be caught by the compiler: it will be caught by the linker.
So, just define an empty virtual function which contains an ASSERT(FALSE) and proceed with your life.
The whole point of virtual functions is that they can be called through a base class pointer. If you never use the base class virtual function, then, why did you define it ? If it is used, they you either have to leave the parent implementation (if it's not pure virtual), or define your own implementation, so that code using your objects through the base class can actually make use of it. In that case, the function is used, it's just not used directly.

How does GCC store member functions in memory?

I am trying to minimise the size my class occupies in memory (both data and instructions). I know how to minimise data size, but I am not too familiar with how GCC places member functions.
Are they stored in memory, the same order they are declared in the class?
For the purpose of in-memory data representation, a C++ class can have either plain or static member functions, or virtual member functions (including some virtualdestructor, if any).
Plain or static member functions do not take any space in data memory, but of course their compiled code take some resource, e.g. as binary code in the text or code segment of your executable or your process. Of course, they can also require static data (or thread-local data), or local data (e.g. local variables) on the call stack.
My answer is Linux oriented. I don't know Windows, and don't know how GCC work on it.
Virtual member functions are very often implemented thru virtual method table (or vtable); a class having some virtual member functions usually have instances with a single (assuming single-inheritance) vtable-pointer pointing to that vtable (which is practically some data packed in the text segment).
Notice that vtables are not mandatory and are not required by C++11 standard. But I don't know any C++ implementation not using them.
When you are using multiple-inheritance things become more complex, objects might have several vtable pointers.
So if you have a class (either a root class, or using single-inheritance), the consumption for virtual member functions is one vtable pointer per instance (plus the small space needed by the single vtable itself). It won't change (for each instance) if you have only one virtual member function (or destructor) or a thousand of them (what would change is the vtable itself). Each class has its own single vtable (unless it has no virtual member function), and each instance has generally one (for single-inheritance case) vtable pointer.
The GCC compiler is free to organize the vtable as it wishes (and its order and layout is an implementation detail you should not care about); see also this. In practice (for single-inheritance) for most recent GCC versions, the vtable pointer is the first word of the object, and the vtable contain function pointers in the order of virtual method declaration, but you should not depend on such details.
The GCC compiler is free to organize the functions in the code segment as it wishes, and it would actually reorder them (e.g. for optimizations). Last time I looked, it ordered them in reverse order. But you certainly should not depend on that order! BTW GCC can inline functions (even when not marked inline) and clone functions when optimizing. You could also compile and link with link-time optimizations (e.g. make CXX='g++ -flto -Os'), and you could ask for profile-guided optimizations (for GCC: -fprofile-generate, -fprofile-use, -fauto-profile etc...)
You should not depend on how the compiler (and linker) is organizing function code or vtables. Leave the optimizations to the compiler (and such optimizations depend upon your target machine, your compiler flags, and the compiler version). You might also use function attributes to give hints to the GCC (or Clang/LLVM) compiler (e.g. __attribute__((cold)), __attribute__((noinline)) etc etc....)
If you really need to know how functions are placed (which IMHO is very wrong), study the generated assembly code (e.g. using g++ -O -fverbose-asm -S) and be aware that it could vary with compiler versions!
If you need on Linux and Posix systems at runtime to find out the address of a function from its name, consider using dlsym (for Linux, see dlsym(3), which also documents dladdr). Be aware of name mangling, which you can disable by declaring such functions as extern "C" (see C++ dlopen minihowto).
BTW, you might compile and link with -rdynamic (which is very useful for dlopen etc...). If you really need to know the address of functions, use nm(1) as nm -C your-executable.
You might also read the ABI specification and calling conventions for your target platform (and compiler), e.g. Linux x86-64 ABI spec.
Let's say we have a type T with 4 instance methods.
class T {
public:
void member_function_1() { ... }
void member_function_2() { ... }
void member_function_3() { ... }
void member_function_4() { ... }
};
The amount of memory that those methods take up is the same if we instantiate 1 copy of T, or if we instantiate 1 million copies of T.

Will adding enum definition inside a class break its binary-backward-compatibility?

I know adding static member function is fine, but how about an enum definition? No new data members, just it's definition.
A little background:
I need to add a static member function (in a class), that will recognize (the function) the version of an IP address by its string representation. The first thing, that comes to my mind is to declare a enum for IPv4, IPv6 and Unknown and make this enum return code of my function.
But I don't want to break the binary backward compatibility.
And a really bad question (for SO) - is there any source or question here, I can read more about that? I mean - what breaks the binary compatibility and what - does not. Or it depends on many things (like architecture, OS, compiler..)?
EDIT: Regarding the #PeteKirkham 's comment: Okay then, at least - is there a way to test/check for changed ABI or it's better to post new question about that?
EDIT2: I just found a SO Question : Static analysis tool to detect ABI breaks in C++ . I think it's somehow related here and answers the part about tool to check binary compatibility. That's why I relate it here.
The real question here, is obviously WHY make it a class (static) member ?
It seems obvious from the definition that this could perfectly be a free function in its own namespace (and probably header file) or if the use is isolated define in an anonymous namespace within the source file.
Although this could still potentially break ABI, it would really take a funny compiler to do so.
As for ABI breakage:
modifying the size of a class: adding data members, unless you manage to stash them into previously unused padding (compiler specific, of course)
modifying the alignment of a class: changing data members, there are tricks to artificially inflate the alignment (union) but deflating it requires compiler specific pragmas or attributes and compliant hardware
modifying the layout of a vtable: adding a virtual method may change the offsets of previous virtual methods in the vtable. For gcc, the vtable is layed out in the order of declaration, so adding the virtual method at the end works... however it does not work in base classes as vtable layout may be shared with derived classes. Best considered frozen
modyfing the signature of a function: the name of the symbol usually depends both on the name of the function itself and the types of its arguments (plus for methods the name of the class and the qualifiers of the method). You can add a top-level const on an argument, it's ignored anyway, and you can normally change the return type (this might entails other problems though). Note that adding a parameter with a default value does break the ABI, defaults are ignored as far as signatures are concerned. Best considered frozen
removing any function or class that previously exported symbols (ie, classes with direct or inherited virtual methods)
I may have forgotten one or two points, but that should get you going for a while already.
Example of what an ABI is: the Itanium ABI.
Formally... If you link files which were compiled against two different
versions of your class, you've violated the one definition rule, which
is undefined behavior. Practically... about the only things which break
binary compatibilty are adding data members or virtual functions
(non-virtual functions are fine), or changing the name or signature of a
function, or anything involving base classes. And this seems to be
universal—I don't know of a compiler where the rules are
different.