Splitting long method maintaining class interface - c++

In my library there's a class like this:
class Foo {
public:
void doSomething();
};
Now, implementation of doSomething() has been grow a lot and I want to split it in two methods:
class Foo {
public:
void doSomething();
private:
void doSomething1();
void doSomething2();
};
Where doSomething() implementation is this:
void Foo::doSomething() {
this->doSomething1();
this->doSomething2();
}
But now class interface has changed. If I compile this library, all existent applications using this library wont work, external linkage is changed.
How can I avoid breaking of binary compatibility?
I guess inlining solves this problem. Is it right? And is it portable? What happen if compiler optimization uninlines these methods?
class Foo {
public:
void doSomething();
private:
inline void doSomething1();
inline void doSomething2();
};
void Foo::doSomething1() {
/* some code here */
}
void Foo::doSomething2() {
/* some code here */
}
void Foo::doSomething() {
this->doSomething1();
this->doSomething2();
}
EDIT:
I tested this code before and after method splitting and it seems to maintain binary compatibility. But I'm not sure this would work in every OS and every compiler and with more complex classes (with virtual methods, inheritance...). Sometimes I had binary compatibility breaking after adding private methods like these, but now I don't remember in which particular situation. Maybe it was due to symbol tabled looked by index (like Steve Jessop notes in his answer).

Strictly speaking, changing the class definition at all (in either of the ways you show) is a violation of the One Definition Rule and leads to undefined behavior.
In practice, adding non-virtual member functions to a class maintains binary compatibility in every implementation out there, because if it didn't then you'd lose most of the benefits of dynamic libraries. But the C++ standard doesn't say much (anything?) about dynamic libraries or binary compatibility, so it doesn't guarantee what changes you can make.
So in practice, changing the symbol table doesn't matter provided that the dynamic linker looks up entries in the symbol table by name. There are more entries in the symbol table than before, but that's OK because all the old ones still have the same mangled names. It may be that with your implementation, private and/or inline functions (or any functions you specify) aren't dll-exported, but you don't need to rely on that.
I have used one system (Symbian) where entries in the symbol table were not looked up by name, they were looked up by index. On that system, when you added anything to a dynamic library you had to ensure that any new functions were added to the end of the symbol table, which you did by listing the required order in a special config file. You could ensure that binary compatibility wasn't broken, but it was fairly tedious.
So, you could check your C++ ABI or compiler/linker documentation to be absolutely sure, or just take my word for it and go ahead.

There is no problem here. The name mangling of Foo::doSomething() is always the same regardless of it's implementation.

I think the ABI of the class won't change if you add non-virtual methods because non-virtual methods are not stored in the class object, but rather as functions with mangled names. You can add as many functions as you like as long as you don't add class members.

Related

When do we break binary compatibility

I was under the impression that whenever you do one of these:
Add a new public virtual method virtual void aMethod();
Add a new public non-virtual method void aMethod();
Implement a public pure-virtual method from an interface virtual void aMethod override;
Was actually breaking binary compatibility, meaning that if a project had build on a previous version of the DLL, it would not be able to load it now that there is new methods available.
From what I have tested using Visual Studio 2012, none of these break anything. Dependency Walker reports no error and my test application was calling the appropriate method.
DLL:
class EXPORT_LIB MyClass {
public:
void saySomething();
}
Executable:
int _tmain(int argc, _TCHAR* argv[])
{
MyClass wTest;
wTest.saySomething();
return 0;
}
The only undefined behavior I found was if MyClass was implementing an pure-virtual interface and from my executable, I was calling one of the pure-virtual method and then I added a new pure-virtual method before the one used by my executable. In this case, Dependency Walker did not report any error but at runtime, it was actually calling the wrong method.
class IMyInterface {
public:
virtual void foo();
}
In the executable
IMyInterface* wTest = new MyClass();
wTest->foo();
Then I change the interface without rebuilding my executable
class IMyInterface {
public:
virtual void bar();
virtual void foo();
}
It is now quietly calling bar() instead of foo().
Is it safe to do all of my three assumptions?
EDIT:
Doing this
class EXPORT_LIB MyClass {
public:
virtual void saySomething();
}
Exec
MyClass wTest;
wTest.saySomething();
Then rebuild DLL with this:
class EXPORT_LIB MyClass {
public:
virtual void saySomething2();
virtual void saySomething();
virtual void saySomething3();
}
Is calling the appropriate saySomething()
Breaking binary compatibility doesn't always result in the DLL not loading, in many cases you'll end up with memory corruption which may or may not be immediately obvious. It depends a lot on the specifics of what you've changed and how things were and now are laid out in memory.
Binary compatibility between DLLs is a complex subject. Lets start by looking at your three examples;
Add a new public virtual method virtual void aMethod();
This almost certainly will result in undefined behaviour, it's very much compiler dependant but most compilers will use some form of vtable for virtual methods, so adding new ones will change the layout of that table.
Add a new public non-virtual method void aMethod();
This is fine for a global function or a member function. A member function is essentially just a global function with a hidden 'this' argument. It doesn't change the memory layout of anything.
Implement a public pure-virtual method from an interface virtual void aMethod override;
This won't exactly cause any undefined behaviour but as you've found, it won't do what you expect. Code that was compiled against the previous version of the library won't know this function has been overridden, so will not call the new implementation, it'll carry on calling the old impl. This may or may not be a problem depending on your use case, it shouldn't cause any other side effects. However I think your mileage could vary here depending on what compiler you're using. So it's probably best to avoid this.
What will stop a DLL from being loaded is if you change the signature of an exported function in any way (including changing parameters and scope) or if you remove a function. As then the dynamic linker won't be able to find it. This only applies if the function in question is being used as the linker only imports functions that are referenced in the code.
There are also many more ways to break binary compatibility between dlls, which are beyond the scope of this answer. In my experience they usually follow a theme of changing the size or layout of something in memory.
Edit: I just remembered that there is an excellent article on the KDE Wiki on binary compatibility in C++ including a very good list of do's and don'ts with explanations and work arounds.
C++ doesn't say.
Visual Studio generally follows COM rules, allowing you to add virtual methods to the end of your most derived class unless they are overloads.
Any non-static data member will change the binary layout as well.
Non-virtual functions don't affect binary compatibility.
Templates make a huge mess because of name mangling.
Your best bet to retain binary compatibility is to use both the pimpl idiom and the nvi idiom quite liberally.

c++ - Use of header/source files to separate interface and implementation

In C++, classes are usually declared like this:
// Object.h
class Object
{
void doSomething();
}
// Object.cpp
#include "Object.h"
void Object::doSomething()
{
// do something
}
I understand that this improves compile time because having the class in one file makes you recompile it whenever you change either the implementation or the interface (see this).
However, from and OOP point of view, I don't see how separating the interface from the implementation helps. I've read a lot of other questions and answers, but the problem I have is that if you define the methods for a class properly (in separate header/source files), then how can you make a different implementation? If you define Object::method in two different places, then how will the compiler know which one to call? Do you declare the Object::method definitions in different namespaces?
Any help would be appreciated.
If you want one interface and multiple implementations in the same program then you use an abstract virtual base.
Like so:
class Printer {
public:
virtual void print_string(const char *s) = 0;
virtual ~Printer();
};
Then you can have implementations:
class EpsonPrinter : public Printer {
public:
void print_string(const char *s) override;
};
class LexmarkPrinter : public Printer {
public:
void print_string(const char *s) override;
};
On the other hand, if you are looking at code which implements OS independence, it might have several subdirectories, one for each OS. The header files are the same, but the source files for Windows are only built for Windows and the source files for Linux/POSIX are only built for Linux.
However, from [an] OOP point of view, I don't see how separating the interface from the implementation helps.
It doesn't help from an OOP point of view, and isn't intended to. This is a text inclusion feature of C++ which is inherited from C, a language that has no direct support for object-oriented programming.
Text inclusion for modularity is a feature borrowed, in turn, from assembly languages. It is almost an antithesis to object-oriented programming or basically anything that is good in the area of computer program organization.
Text inclusion allows your C++ compiler to interoperate with ancient object file formats which do not store any type information about symbols. The Object.cpp file is compiled to this object format, resulting in an Object.o file or Object.obj or what have you on your platform. When other parts of the program use this module, they almost solely trust the information that is written about it in Object.h. Nothing useful emanates out of the Object.o file except for symbols accompanied by numeric information like their offsets and sizes. If the information in the header doesn't correctly reflect Object.obj, you have undefined behavior (mitigated, in some cases, by C++'s support for function overloading, which turns mismatched function calls into unresolving symbols, thanks to name mangling).
For instance if the header declares a variable extern int foo; but the object file is the result of compiling double foo = 0.0; it means that the rest of the program is accessing a double object as an int. What prevents this from happening is that Object.cpp includes its own header (thereby forcing the mismatch between the declaration and definition to be caught by the compiler) and that you have a sane build system in place which ensures that Object.cpp is rebuilt if anything touches Object.h. If that check is based on timestamps, you must also have a sane file system and version control system that don't do wacky things with timestamps.
If you define Object::method in two different places, then how will the compiler know which one to call?
It won't, and in fact you will be breaking the "One Definition Rule" if you do this, which results in undefined behavior, no diagnostic required, according to the standards.
If you want to define multiple implementations for a class interface, you should use inheritance in some way.
One way that you might do it is, use a virtual base class and override some of the methods in different subclasses.
If you want to manipulate instances of the class as value types, then you can use the pImpl idiom, combined with virtual inheritance. So you would have one class, the "pointer" class, which exposes the interface, and holds a pointer to an abstract virtual base class type. Then, in the .cpp file, you would define the virtual base class, and define multiple subclasses of it, and different constructors of the pImpl class would instantiate different of the subclasses as the implementation.
If you want to use static polymorphism, rather than run-time polymorphism, you can use the CRTP idiom (which is still ultimately based on inheritance, just not virtual inheritance).

How are C++ vtable methods ordered *In Practice*

In theory, C++ does not have a binary interface, and the order of methods in the vtable is undefined. Change anything about a class's definition and you need to recompile every class that depends upon it, in every dll etc.
But what I would like to know is how the compilers work in practice. I would hope that they just use the order that the methods are defined in the header/class, which would make appending additional methods safe. But they could also use a hash of the mangled names to make them order independent, but also then completely non-upgradable.
If people have specific knowledge of how specific versions of specific compilers work in different operating systems etc. then that would be most helpful.
Added: Ideally linker symbols would be created for the virtual methods offsets, so that the offsets would never be hard compiled into calling functions. But my understanding is that that is never done. Correct?
It appears that of Microsoft the VTable may be reordered.
The following is copied from https://marc.info/?l=kde-core-devel&m=139744177410091&w=2
I (Nicolas Alvarez) can confirm this behavior happens.
I compiled this class:
struct Testobj {
virtual void func1();
virtual void func2();
virtual void func3();
};
And a program that calls func1(); func2(); func3();
Then I added a func2(int) overload to the end:
struct Testobj {
virtual void func1();
virtual void func2();
virtual void func3();
virtual void func2(int);
};
and recompiled the class but not the program using the class.
Output of calling func1(); func2(); func3(); was
This is func1
This is func2 taking int
This is func2
This shows that if I declare func1() func2() func3() func2(int), the
vtable is laid out as func1() func2(int) func2() func3().
Tested with MSVC2010.
In MSVC 2010 they are in the order you declare them. I can't think of any rationale for another compiler doing it differently although it is an arbitrary choice. It only needs to be consistent. They are just arrays of pointers so don't worry about hashes or mangling.
No matter the order, additional virtual functions added in derived classes must come after those in the base or polymorphic casts would not work.
As far as I know they are always in the order of declarations. This way you can always add declarations of new virtual methods at the end (or below all previous declaration of virtual methods). If you remove any virtual method or add new one somewhere in the middle - you do need to recompile and relink everything.
I know that for sure - I already made that mistake. From my experience these rules apply to both MSVC and GCC.
Any compiler must at least place all the viable entries for a specific class together, with those for derived classes coming either before or afterwards, and also together.
The easiest way to accomplish that is to use the header order. It is difficult to see why any compiler would do anything different, given that it requires more code, more testing, etc., and just provides another way for mistakes to occur. No identifiable benefit that I can see.

adding virtual function to the end of the class declaration avoids binary incompatibility?

Could someone explain to me why adding a virtual function to the end of a class declaration avoids binary incompatibility?
If I have:
class A
{
public:
virtual ~A();
virtual void someFuncA() = 0;
virtual void someFuncB() = 0;
virtual void other1() = 0;
private:
int someVal;
};
And later modify this class declaration to:
class A
{
public:
virtual ~A();
virtual void someFuncA() = 0;
virtual void someFuncB() = 0;
virtual void someFuncC() = 0;
virtual void other1() = 0;
private:
int someVal;
};
I get a coredump from another .so compiled against the previous declaration. But if I put someFuncC() at the end of the class declaration (after "int someVal"):
class A
{
public:
virtual ~A();
virtual void someFuncA() = 0;
virtual void someFuncB() = 0;
virtual void other1() = 0;
private:
int someVal;
public:
virtual void someFuncC() = 0;
};
I don't see coredump anymore. Could someone tell me why this is? And does this trick always work?
PS. compiler is gcc, does this work with other compilers?
From another answer:
Whether this leads to a memory leak, wipes your hard disk, gets you pregnant, makes nasty Nasal Demons chasing you around your apartment, or lets everything work fine with no apparent problems, is undefined. It might be this way with one compiler, and change with another, change with a new compiler version, with each new compilation, with the moon phases, your mood, or depending on the number or neutrinos that passed through the processor on the last sunny afternoon. Or it might not.
All that, and an infinite amount of other possibilities are put into one term: Undefined behavior:
Just stay away from it.
If you know how a particular compiler version implements its features, you might get this to work. Or you might not. Or you might think it works, but it breaks, but only if whoever sits in front of the computer just ate a yogurt.
Is there any reason why you want this to work anyway?
I suppose it works/doesn't work the way it does/doesn't because your compiler creates virtual table entries in the order the virtual functions are declared. If you mess up the order by putting a virtual function in between the others, then when someone calls other1(), instead someFuncC() is called, possibly with wrong arguments, but definitely at the wrong moment. (Be glad it crashes immediately.)
This is, however, just a guess, and even if it is a right one, unless your gcc version comes with a document somewhere describing this, there's no guarantee it will work this way tomorrow even with the same compiler version.
I'm a bit surprised that this particular rearrangement helps at all. It's certainly not guaranteed to work.
The class you give above will normally be translated to something on this order:
typedef void (*vfunc)(void);
struct __A__impl {
vfunc __vtable_ptr;
int someVal;
};
__A__impl__init(__A__impl *object) {
static vfunc virtual_functions[] = { __A__dtor, __A__someFuncA, __A__someFuncB};
object->__vtable__ptr = virtual_functions;
}
When/if you add someFuncC, you should normally get another entry added to the class' virtual functions table. If the compiler arranges that before any of the other functions, you'll run into a problem where attempting to invoke one function actually invokes another. As long as its address is at the end of the virtual function table, things should still work. C++ doesn't guarantee anything about how vtables are arranged though (or even that there is a vtable).
With respect to normal data, (non-static) members are required to be arranged in ascending order as long as there isn't an intervening access specificer (public:, protected: or private:).
If the compiler followed the same rules when mapping virtual function declarations to vtable positions, your first attempt should work, but your second could break. Obviously enough, there's no guarantee of that though -- as long as it works consistently, the compiler can arrange the vtable about any way it wants to.
Maybe G++ puts its "private" virtual table in a different place than its public one.
At any rate, if you can somehow trick the compiler into thinking an object looks differently than it really does and then use that object you are going to invoke nasal demons. You can't depend on anything they do. This is basically what you are doing.
I suggest avoiding anything related to binary compatibility or serialization with a class. This opens up too many cans of worms and there is not enough fish to eat them all.
When transferring data, prefer XML or an ASCII based protocol with field definitions over a binary protocol. A flexible protocal will help people (including you) debug and maintain the protocols. ASCII and XML protocols offer a higher readability than binary.
Classes, objects, structs and the like, should never be compared as a whole by binary image. The preferred method is for each class to implement its own comparison operators or functions. After all, the class is the expert on how to compare its data members. Some data members, such a pointers and containers, can really screw up binary comparisons.
Since processors are fast, memory is cheap, change your principles from space saving to correctness and robustness. Optimize for speed or space after the program is bug-free.
There are too many horror stories posted to Stack Overflow and the Newsgroups related to binary protocols and binary comparisons (and assignments) of objects. Let's reduce the amount of new horror stories by learning from the existing ones.
Binary compatibility is a bad thing. You should use a plaintext-based system, perhaps XML.
Edit: Misplaced the meaning of your question a little. Windows provides many native ways to share data, for example GetProcAddress, __declspec(dllexport) and __declspec(dllimport). GCC must offer something similar. Binary serialization is a bad thing.
Edit again:
Actually, he didn't mention executables in his post. At all. Nor what he was trying to use his binary compatibility for.

How to share classes between DLLs

I have an unmanaged Win32 C++ application that uses multiple C++ DLLs. The DLLs each need to use class Foo - definition and implementation.
Where do Foo.h and Foo.cpp live so that the DLLs link and don't end up duplicating code in memory?
Is this a reasonable thing to do?
[Edit]
There is a lot of good info in all the answers and comments below - not just the one I've marked as the answer. Thanks for everyone's input.
Providing functionality in the form of classes via a DLL is itself fine. You need to be careful that you seperate the interrface from the implementation, however. How careful depends on how your DLL will be used. For toy projects or utilities that remain internal, you may not need to even think about it. For DLLs that will be used by multiple clients under who-knows-which compiler, you need to be very careful.
Consider:
class MyGizmo
{
public:
std::string get_name() const;
private:
std::string name_;
};
If MyGizmo is going to be used by 3rd parties, this class will cause you no end of headaches. Obviously, the private member variables are a problem, but the return type for get_name() is just as much of a problem. The reason is because std::string's implementation details are part of it's definition. The Standard dictates a minimum functionality set for std::string, but compiler writers are free to implement that however they choose. One might have a function named realloc() to handle the internal reallocation, while another may have a function named buy_node() or something. Same is true with data members. One implementation may use 3 size_t's and a char*, while another might use std::vector. The point is your compiler might think std::string is n bytes and has such-and-so members, while another compiler (or even another patch level of the same compiler) might think it looks totally different.
One solution to this is to use interfaces. In your DLL's public header, you declare an abstract class representing the useful facilities your DLL provides, and a means to create the class, such as:
DLL.H :
class MyGizmo
{
public:
static MyGizmo* Create();
virtual void get_name(char* buffer_alloc_by_caller, size_t size_of_buffer) const = 0;
virtual ~MyGizmo();
private:
MyGizmo(); // nobody can create this except myself
};
...and then in your DLL's internals, you define a class that actually implements MyGizmo:
mygizmo.cpp :
class MyConcreteGizmo : public MyGizmo
{
public:
void get_name(char* buf, size_t sz) const { /*...*/ }
~MyGizmo() { /*...*/ }
private:
std::string name_;
};
MyGizmo* MyGizmo::Create()
{
return new MyConcreteGizmo;
}
This might seem like a pain and, well, it is. If your DLL is going to be only used internally by only one compiler, there may be no reason to go to the trouble. But if your DLL is going to be used my multiple compilers internally, or by external clients, doing this saves major headaches down the road.
Use __declspec dllexport to export the class to the DLL's export table, then include the header file in your other projects and link against the main DLL's export library file. That way the implementation is common.
Where does Foo live? in another dll.
Is it reasonable? not really.
If you declare a class like this:
class __declspec(dllexport) Foo { ...
then msvc will export every member function of the class. However the resulting dll is very fragile as any small change to the class definition without a corresponding rebuild of every consuming dll means that the consuming code will allocate the incorrect number of bytes for any stack and heap allocations not performed by factory functions. Likewise, inline methods will compile into consuming dlls and reference the old layout of the class.
If all the dlls are always rebuilt together, then go ahead. If not - don't :P