How does a class vtable work across shared libraries? - c++

Let's suppose I have a shared library named libplugin. In this shared library, there is a class:
class Plugin
{
public:
virtual void doStuff();
};
Let's also suppose that there is another shared library named libspecialplugin. It contains the following class and function:
class SpecialPlugin : public Plugin
{
public:
virtual void doStuff();
};
Plugin *createSpecialPlugin()
{
return new SpecialPlugin;
}
Now, suppose I change Plugin and add the following method:
virtual void doMoreStuff();
I do not recompile libspecialplugin.
What happens when I do this:
Plugin *plugin = createSpecialPlugin();
plugin->doMoreStuff();
I'm guessing one of the following happens:
the application crashes
the Plugin::doMoreStuff() method is invoked
Does the libspecialplugin library contain information that libplugin can use to determine which of its methods are overridden - even at runtime? I'm a little fuzzy on what exactly is supposed to happen here.

You are effectively violating the "One Definition Rule" by having the same class (Plugin) defined differently in two different translation units within any program that uses the two libraries.
The standard says (C++11 ISO 14882:2011, ยง3.2 para 5):
There can be more than one definition of a class type (Clause 9) ...
in a program provided that each definition appears in a different
translation unit, and provided the definitions satisfy the following
requirements. Given such an entity named D defined in more than one
translation unit, then:
each definition of D shall consist of the same sequence of tokens; and
...
Your class Plugin has two different definitions, one baked into libplugin and the other in libspecialplugin, so it does not comply with the standard.
The outcome of this is not defined by the standard, so anything could happen.

I have to add the giant disclaimer that "Everything to do with vtables is implementation defined."
This will work fine provided that the Plugin constructor and destructor are not declared inline in the header. It has to be an actual function call to the Plugin constructor in the libplugin.so library. This means the header has to declare the constructor and destructor but not define them in order to avoid generating the compiler's automatic versions.
It would look like:
class Plugin
{
public:
Plugin();
~Plugin();
virtual void doStuff();
};
Also provided that the new virtual function is added at the end of the class. If it causes any of the other functions in the vtable to move, that will ruin the ABI.
Then when the Plugin base class is constructed it will create the new vtable with the extra function. Then SpecialPlugin will adjust its one virtual function and complete the construction.
Some of this may depend on particular compiler implementations of vtbl pointers, but I have seen it done.

Related

When should a default destructor be explicitly defined in a code module

I notice that the Apache Arrow C++ libraries frequently define non-inline virtual destructors in code modules. Is there guidance for which classes should/should not have an explicitly defined non-inline destructor? For example, RandomAccessFile contains a destructor declaration in arrow/io/interfaces.h:
class ARROW_EXPORT RandomAccessFile : public InputStream, public Seekable {
...
/// Necessary because we hold a std::unique_ptr
~RandomAccessFile() override;
...
};
and a corresponding definition in arrow/io/interfaces.cc:
RandomAccessFile::~RandomAccessFile() = default;
But BufferReader has no such explicit declaration nor definition. I haven't been able to find any general C++ guidance for when a default destructor should be explicitly declared or defined, or for the tradeoffs between defining them inline or in the code module, and would like to better understand the tradeoffs.
I suspect that this lack of an explicit destructor on BufferReader is causing linker warnings in my usage of Arrow, and I'm wondering if this is the cause. Specifically, I'm seeing:
INFO: From Linking ...:
/usr/bin/ld.gold: warning: while linking bazel-out/k8-fastbuild/bin/...: symbol 'virtual thunk to arrow::io::BufferReader::~BufferReader()' defined in multiple places (possible ODR violation):
/usr/include/c++/7/bits/shared_ptr_base.h:170 from bazel-out/k8-fastbuild/bin/external/arrow/_objs/arrow/function_internal.pic.o
external/arrow/cpp/src/arrow/io/memory.h:145 from bazel-out/k8-fastbuild/bin/external/arrow/_objs/parquet/0/schema.pic.o

C++ Inheritance and dynamic libraries

The idea is the following. I have a library Version 1 with a class that looks as follows:
class MY_EXPORT MyClass
{
public:
virtual void myMethod(int p1);
}
In Version 2, the class was modified to this:
class MY_EXPORT MyClass
{
public:
virtual void myMethod(int p1);
virtual void myMethod2(int p1, int p2);
}
//implementation of myMethod2 in cpp file
void MyClass::myMethod2(int p1, int p2)
{
myMethod(p1);
//...
}
Now imagine a user compiled againts Version 1 of the library, and extended MyClass by overriding myMethod. Now he updates the library to version 2, without recompiling. Let's further assume the dynamic linker still successfully finds the library and loads it.
The question is, if I call the method instance->myMethod2(1, 2); somwhere inside the library, will it work, or will the application crash? In both cases, the class has no members and thus is of the same size.
I don't think there is point to guess if that app will crash or not, behavior is undefined. The application has to be recompiled, since there was ABI change in the library.
When library calls instance->myMethod2(1, 2); it will have to go through virtual table that was created in the application code with the assumption that there is only one virtual method: myMethod. From that point, you get undefined behavior. In short, you have to recompile you application when library ABI changes.
KDE C++ ABI guidelines specifically prohibit such change. Virtual tables of derived classes will not contain addresses for new methods and so virtual calls of those methods on objects of derived classes will crash.
By changing the definition of the class without recompiling, you've violated the One Definition Rule. The user who did not recompile is using the old definition, while your library is using the new definition. This results in undefined behavior.
To see how this might manifest, consider the typical implementation of virtual functions which uses a VTable to dispatch function calls. The library user has derived a class, and this derived class has only one function in the VTable. If a pointer or reference to this class is passed into the library, and the library tries to call the second function, it will attempt to access a VTable entry that doesn't exist. This will almost always result in a crash, although nothing is guaranteed when it comes to undefined behavior.

c++ - Use of header/source files to separate interface and implementation

In C++, classes are usually declared like this:
// Object.h
class Object
{
void doSomething();
}
// Object.cpp
#include "Object.h"
void Object::doSomething()
{
// do something
}
I understand that this improves compile time because having the class in one file makes you recompile it whenever you change either the implementation or the interface (see this).
However, from and OOP point of view, I don't see how separating the interface from the implementation helps. I've read a lot of other questions and answers, but the problem I have is that if you define the methods for a class properly (in separate header/source files), then how can you make a different implementation? If you define Object::method in two different places, then how will the compiler know which one to call? Do you declare the Object::method definitions in different namespaces?
Any help would be appreciated.
If you want one interface and multiple implementations in the same program then you use an abstract virtual base.
Like so:
class Printer {
public:
virtual void print_string(const char *s) = 0;
virtual ~Printer();
};
Then you can have implementations:
class EpsonPrinter : public Printer {
public:
void print_string(const char *s) override;
};
class LexmarkPrinter : public Printer {
public:
void print_string(const char *s) override;
};
On the other hand, if you are looking at code which implements OS independence, it might have several subdirectories, one for each OS. The header files are the same, but the source files for Windows are only built for Windows and the source files for Linux/POSIX are only built for Linux.
However, from [an] OOP point of view, I don't see how separating the interface from the implementation helps.
It doesn't help from an OOP point of view, and isn't intended to. This is a text inclusion feature of C++ which is inherited from C, a language that has no direct support for object-oriented programming.
Text inclusion for modularity is a feature borrowed, in turn, from assembly languages. It is almost an antithesis to object-oriented programming or basically anything that is good in the area of computer program organization.
Text inclusion allows your C++ compiler to interoperate with ancient object file formats which do not store any type information about symbols. The Object.cpp file is compiled to this object format, resulting in an Object.o file or Object.obj or what have you on your platform. When other parts of the program use this module, they almost solely trust the information that is written about it in Object.h. Nothing useful emanates out of the Object.o file except for symbols accompanied by numeric information like their offsets and sizes. If the information in the header doesn't correctly reflect Object.obj, you have undefined behavior (mitigated, in some cases, by C++'s support for function overloading, which turns mismatched function calls into unresolving symbols, thanks to name mangling).
For instance if the header declares a variable extern int foo; but the object file is the result of compiling double foo = 0.0; it means that the rest of the program is accessing a double object as an int. What prevents this from happening is that Object.cpp includes its own header (thereby forcing the mismatch between the declaration and definition to be caught by the compiler) and that you have a sane build system in place which ensures that Object.cpp is rebuilt if anything touches Object.h. If that check is based on timestamps, you must also have a sane file system and version control system that don't do wacky things with timestamps.
If you define Object::method in two different places, then how will the compiler know which one to call?
It won't, and in fact you will be breaking the "One Definition Rule" if you do this, which results in undefined behavior, no diagnostic required, according to the standards.
If you want to define multiple implementations for a class interface, you should use inheritance in some way.
One way that you might do it is, use a virtual base class and override some of the methods in different subclasses.
If you want to manipulate instances of the class as value types, then you can use the pImpl idiom, combined with virtual inheritance. So you would have one class, the "pointer" class, which exposes the interface, and holds a pointer to an abstract virtual base class type. Then, in the .cpp file, you would define the virtual base class, and define multiple subclasses of it, and different constructors of the pImpl class would instantiate different of the subclasses as the implementation.
If you want to use static polymorphism, rather than run-time polymorphism, you can use the CRTP idiom (which is still ultimately based on inheritance, just not virtual inheritance).

What are the restrictions of exporting a virtual class from a dynamic library?

I'm using gcc and msvc. My goal is to make a dynamic library framework as a collection of some basic functionality seen across multiple programs. Using a virtual class seemed intuitive to me for making default behavior to execute on most programs, using virtual functions that can be overridden by other programs that differ slightly.
The main ideal I'm looking for in a dynamic library is that new exported symbols can be added to the library without rebuilding all of the dependent binaries. Also, adding internal functionality should be simple. My main concern is linking requirements. What are some of the requirements for exporting a virtual class, or pure virtual, from a dynamic library so that another binary can be compiled and linked with a child class? Specifically:
When does the entire base class need to be exported? i.e. using declspec(dllexport/dllimport) or __attribute((visibility("default"))) in the class declaration.
When can the export attribute be omitted from the class declaration, and only be placed in front of the desired methods?
Do all methods declared virtual in the base class need to be exported?
How are the symbols added to the base class by the compiler exported? (virtual table address and typeinfo)
Can the symbols added to the base class by the compiler be explicitly exported without exporting the entire class?
In MSVC, when the entire class is exported can individual methods or members be "unexported"? i.e similar to using the attribute((visibility("hidden"))) or #pragma GCC visibility push(default) on individual methods / members.
Do any of the requirements differ depending on whether the base class is regular, virtual, or pure virtual?
Run time related
8.. Is it safe to declare a dynamic class, or a child of a dynamic class on the stack or should they always be allocated on the heap using new/delete?

C++: Why must private functions be declared?

Why do classes in C++ have to declare their private functions? Has it actual technical reasons (what is its role at compile time) or is it simply for consistency's sake?
I asked why private functions had to be declared at all, as they don't add anything (neither object size nor vtable entry) for other translation units to know
If you think about it, this is similar to declaring some functions static in a file. It's not visible from the outside, but it is important for the compiler itself. The compiler wants to know the signature of the function before it can use it. That's why you declare functions in the first place. Remember that C++ compilers are one pass, which means everything has to be declared before it is used.1
From the programmer's point of view, declaring private functions is still not completely useless. Imagine 2 classes, one of which is the friend of the other. The friendzoned class2 would need to know how the privates of that class look like, (This discussion is getting weird) otherwise they can't use it.
As to why exactly C++ was designed in this way, I would first say there is the historical reason: the fact that you can't slice a struct in C, was adopted by C++ so you can't slice a class (and adopted by other languages branched from C++, too). I'd also guess that it's about simplicity: Imagine how difficult it would be to devise a method of compilation in which you can split the class among different header files, let your source files know about it, and prevent others from adding stuff to your class.
A final note is that, private functions can affect vtable size. That is, if they are virtual.
1 Actually not entirely. If you have inline functions in the class, they can refer to functions later defined in the same class. But probably the idea started from single pass and this exception later added to it.
2 It's inlined member functions in particular.
You have to declare all members in the definition of the class itself so that the compiler knows which functions are allowed to be members. Otherwise, a second programmer could (accidentally?) come along and add members, make mistakes, and violate your object's guarantees, causing undefined behavior and/or random crashes.
There's a combination of concerns, but:
C++ doesn't let you re-open a class to declare new members in it after its initial definition.
C++ doesn't let you have different definitions of a class in different translation units that combine to form a program.
Therefore:
Any private member functions that the .cpp file wants declared in the class need to be defined in the .h file, which every user of the class sees too.
From the POV of practical binary compatibility: as David says in a comment, private virtual functions affect the size and layout of the vtable of this class and any classes that use it as a base. So the compiler needs to know about them even when compiling code that can't call them.
Could C++ have been invented differently, to allow the .cpp file to reopen the class and add certain kinds of additional member functions, with the implementation required to arrange that this doesn't break binary compatibility? Could the one definition rule be relaxed, to allow definitions that differ in certain ways? For example, static member functions and non-virtual non-static member functions.
Probably yes to both. I don't think there's any technical obstacle, although the current ODR is very strict about what makes a definition "different" (and hence is very generous to implementations in allowing binary incompatibilities between very similar-looking definitions). I think the text to introduce this kind of exception to the rule would be complex.
Ultimately it might come down to, "the designers wanted it that way", or it might be that someone tried it and encountered an obstacle that I haven't thought of.
The access level does not affect visibility. Private functions are visible to external code and may be selected by overload resolution (which would then result in an access violoation error):
class A {
void F(int i) {}
public:
void F(unsigned i) {}
};
int main() {
A a;
a.F(1); // error, void A::F(int) is private
}
Imagine the confusion when this works:
class A {
public:
void F(unsigned i) {}
};
int main() {
A a;
a.F(1);
}
// add private F overload to A
void A::F(int i) {}
But changing it to the first code causes overload resolution to select a different function. And what about the following example?
class A {
public:
void F(unsigned i) {}
};
// add private F overload to A
void A::F(int i) {}
int main() {
A a;
a.F(1);
}
Or here's another example of this going wrong:
// A.h
class A {
public:
void g() { f(1); }
void f(unsigned);
};
// A_private_interface.h
class A;
void A::f(int);
// A.cpp
#include "A_private_interface.h"
#include "A.h"
void A::f(int) {}
void A::f(unsigned) {}
// main.cpp
#include "A.h"
int main() {
A().g();
}
One reason is that in C++ friends can access your privates. For friends to access them, friends have to know about them.
Private members of a class are still members of the class, so they must be declared, as the implementation of other public members might depend on that private method. Declaring them will allow the compiler to understand a call to that function as a member function call.
If you have a method that only is used int the .cpp file and does not depend on direct access to other private members of the class, consider moving it to an anonymous namespace. Then, it does not need to be declared in the header file.
There are a couple of reason on why private functions must be declared.
First Compile Time Error Checks
the point of access modifiers is to catch certain classes (no pun intended) of programming errors at compile time. Private functions are functions that, if someone called them from outside the class, that would be a bug, and you want to know about it as early as possible.
Second Casting and Inheritance
Taken from the C++ standard:
3 [ Note: A member of a private base class might be inaccessible as an inherited member name, but accessible directly. Because of the rules on pointer conversions (4.10) and explicit casts (5.4), a conversion from a pointer to a derived class to a pointer to an inaccessible base class might be ill-formed if an implicit conversion is used, but well-formed if an explicit cast is used.
3rd Friends
Friends show each other there privates. A private method can be call by another class that is a friend.
4th General Sanity and Good Design
Ever worked on a project with another 100 developers. Having a standard and a general set of rule helps maintain maintainable. declaring something private has a specific meaning to everyone else in the group.
Also this flows into good OO design principles. What to expose and what not