The idea is the following. I have a library Version 1 with a class that looks as follows:
class MY_EXPORT MyClass
{
public:
virtual void myMethod(int p1);
}
In Version 2, the class was modified to this:
class MY_EXPORT MyClass
{
public:
virtual void myMethod(int p1);
virtual void myMethod2(int p1, int p2);
}
//implementation of myMethod2 in cpp file
void MyClass::myMethod2(int p1, int p2)
{
myMethod(p1);
//...
}
Now imagine a user compiled againts Version 1 of the library, and extended MyClass by overriding myMethod. Now he updates the library to version 2, without recompiling. Let's further assume the dynamic linker still successfully finds the library and loads it.
The question is, if I call the method instance->myMethod2(1, 2); somwhere inside the library, will it work, or will the application crash? In both cases, the class has no members and thus is of the same size.
I don't think there is point to guess if that app will crash or not, behavior is undefined. The application has to be recompiled, since there was ABI change in the library.
When library calls instance->myMethod2(1, 2); it will have to go through virtual table that was created in the application code with the assumption that there is only one virtual method: myMethod. From that point, you get undefined behavior. In short, you have to recompile you application when library ABI changes.
KDE C++ ABI guidelines specifically prohibit such change. Virtual tables of derived classes will not contain addresses for new methods and so virtual calls of those methods on objects of derived classes will crash.
By changing the definition of the class without recompiling, you've violated the One Definition Rule. The user who did not recompile is using the old definition, while your library is using the new definition. This results in undefined behavior.
To see how this might manifest, consider the typical implementation of virtual functions which uses a VTable to dispatch function calls. The library user has derived a class, and this derived class has only one function in the VTable. If a pointer or reference to this class is passed into the library, and the library tries to call the second function, it will attempt to access a VTable entry that doesn't exist. This will almost always result in a crash, although nothing is guaranteed when it comes to undefined behavior.
Related
Consider the following class
// in library
class A {
public:
static A* instance(){
static A self;
return &self;
}
void foo() { }
private:
A () {}
int a{0};
};
that is defined in dynamic library.
Then a new version of the class is defined with an extra field b.
class A {
// as before
private:
int b{0};
};
Is the new version of the library breaking the ABI ?
For a regular class, I would say yes undoubtedly, but for a singleton I think it's /not/ a ABI break (the application handles only pointers to A which didn't changed, nothing public changed). But I'm not sure and I would like a definitive answer.
If the application code ever needs a complete type for A in order to compile, then yes this would be an ABI break.
If the application merely stores a pointer to A (and calls into the library to obtain an A*) then there is no ABI break.
Comparisons can be made here with the FILE* pointer of C.
The fact that A is a singleton is a red herring.
Yes, it is.
First, with regards to guarantees by the standard, changing any single token in the class definition requires all translation units (including libraries) to use the new definition in order to not break the ODR and cause undefined behavior.
Obviously there are situations in practice where you can rely on the implementation-specific behavior to still make it work and not break ABI in the practical sense.
In the case you are showing, at the very least the instance function will be a problem. This function is an inline function and must know the actual exact layout of A. But because it is an inline function you cannot control that both the library and the library user will use the same version of this function. You can easily get a mismatch and then the object might be constructed with the wrong layout.
Even if you make instance non-inline, you will have the same issue with the constructor which is also inline in your code.
Let's suppose I have a shared library named libplugin. In this shared library, there is a class:
class Plugin
{
public:
virtual void doStuff();
};
Let's also suppose that there is another shared library named libspecialplugin. It contains the following class and function:
class SpecialPlugin : public Plugin
{
public:
virtual void doStuff();
};
Plugin *createSpecialPlugin()
{
return new SpecialPlugin;
}
Now, suppose I change Plugin and add the following method:
virtual void doMoreStuff();
I do not recompile libspecialplugin.
What happens when I do this:
Plugin *plugin = createSpecialPlugin();
plugin->doMoreStuff();
I'm guessing one of the following happens:
the application crashes
the Plugin::doMoreStuff() method is invoked
Does the libspecialplugin library contain information that libplugin can use to determine which of its methods are overridden - even at runtime? I'm a little fuzzy on what exactly is supposed to happen here.
You are effectively violating the "One Definition Rule" by having the same class (Plugin) defined differently in two different translation units within any program that uses the two libraries.
The standard says (C++11 ISO 14882:2011, ยง3.2 para 5):
There can be more than one definition of a class type (Clause 9) ...
in a program provided that each definition appears in a different
translation unit, and provided the definitions satisfy the following
requirements. Given such an entity named D defined in more than one
translation unit, then:
each definition of D shall consist of the same sequence of tokens; and
...
Your class Plugin has two different definitions, one baked into libplugin and the other in libspecialplugin, so it does not comply with the standard.
The outcome of this is not defined by the standard, so anything could happen.
I have to add the giant disclaimer that "Everything to do with vtables is implementation defined."
This will work fine provided that the Plugin constructor and destructor are not declared inline in the header. It has to be an actual function call to the Plugin constructor in the libplugin.so library. This means the header has to declare the constructor and destructor but not define them in order to avoid generating the compiler's automatic versions.
It would look like:
class Plugin
{
public:
Plugin();
~Plugin();
virtual void doStuff();
};
Also provided that the new virtual function is added at the end of the class. If it causes any of the other functions in the vtable to move, that will ruin the ABI.
Then when the Plugin base class is constructed it will create the new vtable with the extra function. Then SpecialPlugin will adjust its one virtual function and complete the construction.
Some of this may depend on particular compiler implementations of vtbl pointers, but I have seen it done.
I was under the impression that whenever you do one of these:
Add a new public virtual method virtual void aMethod();
Add a new public non-virtual method void aMethod();
Implement a public pure-virtual method from an interface virtual void aMethod override;
Was actually breaking binary compatibility, meaning that if a project had build on a previous version of the DLL, it would not be able to load it now that there is new methods available.
From what I have tested using Visual Studio 2012, none of these break anything. Dependency Walker reports no error and my test application was calling the appropriate method.
DLL:
class EXPORT_LIB MyClass {
public:
void saySomething();
}
Executable:
int _tmain(int argc, _TCHAR* argv[])
{
MyClass wTest;
wTest.saySomething();
return 0;
}
The only undefined behavior I found was if MyClass was implementing an pure-virtual interface and from my executable, I was calling one of the pure-virtual method and then I added a new pure-virtual method before the one used by my executable. In this case, Dependency Walker did not report any error but at runtime, it was actually calling the wrong method.
class IMyInterface {
public:
virtual void foo();
}
In the executable
IMyInterface* wTest = new MyClass();
wTest->foo();
Then I change the interface without rebuilding my executable
class IMyInterface {
public:
virtual void bar();
virtual void foo();
}
It is now quietly calling bar() instead of foo().
Is it safe to do all of my three assumptions?
EDIT:
Doing this
class EXPORT_LIB MyClass {
public:
virtual void saySomething();
}
Exec
MyClass wTest;
wTest.saySomething();
Then rebuild DLL with this:
class EXPORT_LIB MyClass {
public:
virtual void saySomething2();
virtual void saySomething();
virtual void saySomething3();
}
Is calling the appropriate saySomething()
Breaking binary compatibility doesn't always result in the DLL not loading, in many cases you'll end up with memory corruption which may or may not be immediately obvious. It depends a lot on the specifics of what you've changed and how things were and now are laid out in memory.
Binary compatibility between DLLs is a complex subject. Lets start by looking at your three examples;
Add a new public virtual method virtual void aMethod();
This almost certainly will result in undefined behaviour, it's very much compiler dependant but most compilers will use some form of vtable for virtual methods, so adding new ones will change the layout of that table.
Add a new public non-virtual method void aMethod();
This is fine for a global function or a member function. A member function is essentially just a global function with a hidden 'this' argument. It doesn't change the memory layout of anything.
Implement a public pure-virtual method from an interface virtual void aMethod override;
This won't exactly cause any undefined behaviour but as you've found, it won't do what you expect. Code that was compiled against the previous version of the library won't know this function has been overridden, so will not call the new implementation, it'll carry on calling the old impl. This may or may not be a problem depending on your use case, it shouldn't cause any other side effects. However I think your mileage could vary here depending on what compiler you're using. So it's probably best to avoid this.
What will stop a DLL from being loaded is if you change the signature of an exported function in any way (including changing parameters and scope) or if you remove a function. As then the dynamic linker won't be able to find it. This only applies if the function in question is being used as the linker only imports functions that are referenced in the code.
There are also many more ways to break binary compatibility between dlls, which are beyond the scope of this answer. In my experience they usually follow a theme of changing the size or layout of something in memory.
Edit: I just remembered that there is an excellent article on the KDE Wiki on binary compatibility in C++ including a very good list of do's and don'ts with explanations and work arounds.
C++ doesn't say.
Visual Studio generally follows COM rules, allowing you to add virtual methods to the end of your most derived class unless they are overloads.
Any non-static data member will change the binary layout as well.
Non-virtual functions don't affect binary compatibility.
Templates make a huge mess because of name mangling.
Your best bet to retain binary compatibility is to use both the pimpl idiom and the nvi idiom quite liberally.
I work on a project with extremely low unit test culture. We have almost zero unit testing and every API is static.
To be able to unit test some of my code I create wrappers like
class ApiWrapper {
virtual int Call(foo, bar) {
return ApiCall(foo, bar);
}
};
Now in my functions instead:
int myfunc() {
APiCall(foo, bar);
}
I do:
int myfunc(ApiWrapper* wrapper) {
wrapper->Call(foo, bar);
}
This way I am able to mock such functionality. The problem is that some colleagues complain that production code should not be affected from testability needs - nonsense I know, but reality.
Anyway, I believe that I read somewhere at some point that compilers are actually smart about replacing unused polymorphic behavior with a direct call ... or if there is no class that overrides a virtual method it becomes "normal".
I experimented, and on gcc 4.8 it does not inline or directly call the virtual method, but instead creates vt.
I tried to google, but I did not find anything about this. Is this a thing or do I misremember ... or I have to do something to explain this to the linker, an optimization flag or something?
Note that while in production this class is final, in the test environment it is not. This is exactly what the linker has to be smart about and detect it.
The C++ compiler will only replace a polymorphic call with a direct call if it knows for certain what the actual type is.
So in the following snippet, it will be optimized:
void f() {
ApiWrapper x;
x.Call(); // Can be replaced
}
But in the general case, it can't:
void f(ApiWrapper* wrapper) {
wrapper->Call(); // Cannot be replaced
}
You also added two conditions to your question:
if there is no class that overrides a virtual method it becomes "normal".
This will not help. Neither the C++ compiler nor the linker will look at the totality of classes to search whether any inheritor exists. It's futile anyway, since you can always dynamically-load an instance of a new class.
By the way, this optimization is indeed performed by some JVMs (called devirtualization) - since in Java land there's a class loader which is aware of which classes are currently loaded.
in production this class is final
That will help! Clang, for example, will convert virtual calls to non-virtual calls if the method / method's class is marked final.
In theory, C++ does not have a binary interface, and the order of methods in the vtable is undefined. Change anything about a class's definition and you need to recompile every class that depends upon it, in every dll etc.
But what I would like to know is how the compilers work in practice. I would hope that they just use the order that the methods are defined in the header/class, which would make appending additional methods safe. But they could also use a hash of the mangled names to make them order independent, but also then completely non-upgradable.
If people have specific knowledge of how specific versions of specific compilers work in different operating systems etc. then that would be most helpful.
Added: Ideally linker symbols would be created for the virtual methods offsets, so that the offsets would never be hard compiled into calling functions. But my understanding is that that is never done. Correct?
It appears that of Microsoft the VTable may be reordered.
The following is copied from https://marc.info/?l=kde-core-devel&m=139744177410091&w=2
I (Nicolas Alvarez) can confirm this behavior happens.
I compiled this class:
struct Testobj {
virtual void func1();
virtual void func2();
virtual void func3();
};
And a program that calls func1(); func2(); func3();
Then I added a func2(int) overload to the end:
struct Testobj {
virtual void func1();
virtual void func2();
virtual void func3();
virtual void func2(int);
};
and recompiled the class but not the program using the class.
Output of calling func1(); func2(); func3(); was
This is func1
This is func2 taking int
This is func2
This shows that if I declare func1() func2() func3() func2(int), the
vtable is laid out as func1() func2(int) func2() func3().
Tested with MSVC2010.
In MSVC 2010 they are in the order you declare them. I can't think of any rationale for another compiler doing it differently although it is an arbitrary choice. It only needs to be consistent. They are just arrays of pointers so don't worry about hashes or mangling.
No matter the order, additional virtual functions added in derived classes must come after those in the base or polymorphic casts would not work.
As far as I know they are always in the order of declarations. This way you can always add declarations of new virtual methods at the end (or below all previous declaration of virtual methods). If you remove any virtual method or add new one somewhere in the middle - you do need to recompile and relink everything.
I know that for sure - I already made that mistake. From my experience these rules apply to both MSVC and GCC.
Any compiler must at least place all the viable entries for a specific class together, with those for derived classes coming either before or afterwards, and also together.
The easiest way to accomplish that is to use the header order. It is difficult to see why any compiler would do anything different, given that it requires more code, more testing, etc., and just provides another way for mistakes to occur. No identifiable benefit that I can see.