How to design a C++ API for binary compatible extensibility

How to design a C++ API for binary compatible extensibility - c++

I am designing an API for a C++ library which will be distributed in a dll / shared object. The library contains polymorhic classes with virtual functions. I am concerned that if I expose these virtual functions on the DLL API, I cut myself from the possibility of extending the same classes with more virtual functions without breaking binary compatibility with applications built for the previous version of the library.
One option would be to use the PImpl idiom to hide all the classes having virtual functions, but that also seem to have it's limitations: this way applications lose the possibility of subclassing the classes of the library and overriding the virtual methods.
How would you design a API class which can be subclassed in an application, without losing the possibility to extend the API with (not abstract) virtual methods in a new version of the dll while staying backward binary compatible?
Update: the target platforms for the library are windows/msvc and linux/gcc.

Several months ago I wrote an article called "Binary Compatibility of Shared Libraries Implemented in C++ on GNU/Linux Systems" [pdf]. While concepts are similar on Windows system, I'm sure they're not exactly the same. But having read the article you can get a notion on what's going on at C++ binary level that has anything to do with compatibility.
By the way, GCC application binary interface is summarized in a standard document draft "Itanium ABI", so you'll have a formal ground for a coding standard you choose.
Just for a quick example: in GCC you can extend a class with more virtual functions, if no other class inherits it. Read the article for better set of rules.
But anyway, rules are sometimes way too complex to understand. So you might be interested in a tool that verifies compatibility of two given versions: abi-compliance-checker for Linux.

There is an interesting article on the KDE knowledge base that describes the do's and don'ts when aiming at binary compatibility when writing a library: Policies/Binary Compatibility Issues With C++

C++ binary compat is generally difficult, even without inheritance. Look at GCC for example. In the last 10 years, I'm not sure how many breaking ABI changes they've had. Then MSVC has a different set of conventions, so linking to that with GCC and vice versa can't be done... If you compare this to the C world, compiler inter-op seems a bit better there.
If you're on Windows you should look at COM. As you introduce new functionality you can add interfaces. Then callers can QueryInterface() for the new one to expose that new functionality, and even if you end up changing things a lot, you can either leave the old implementation there or you can write shims for the old interfaces.

I think you misunderstand the problem of subclassing.
Here is your Pimpl:
// .h
class Derived
{
public:
virtual void test1();
virtual void test2();
private;
Impl* m_impl;
};
// .cpp
struct Impl: public Base
{
virtual void test1(); // override Base::test1()
virtual void test2(); // override Base::test2()
// data members
};
void Derived::test1() { m_impl->test1(); }
void Derived::test2() { m_impl->test2(); }
See ? No problem with overriding the virtual methods of Base, you just need to make sure to redeclare them virtual in Derived so that those deriving from Derived know they may rewrite them too (only if you wish so, which by the way is a great way of providing a final for those who lack it), and you may still redefine it for yourself in Impl which may even call the Base version.
There is no problem with Pimpl there.
On the other hand, you lose polymorphism, which may be troublesome. It's up to you to decide whether you want polymorphism or just composition.

If you expose the PImpl class in a header file, then you can inherit from it. You can still maintain backward portability since the external classes contains a pointer to the PImpl object. Of course if the client code of the library isn't very wise, it could misuse this exposed PImpl object, and ruin the binary backward compatibility. You may add some notes to warn the user in the PImpl's header file.

Related

Limit inheritance of classes to specific library in C++

I am trying to limit the inheritance of an c++ class to within the same library, while allowing it to be instantiated in other libraries.
The use case is that I have some code that needs to be real-time capable (compiled with special flags and poisoned code) and that needs to be used/interfaced to non-RT code. However, I need to make absolutely sure that no non-RT code can ever be called inside the RT code. Therefore I have to libraries: one that is RT capable and one that isn't (which depends on the RT library and may use code from it).
Now, I have some abstract classes, which I want to be only inherited from inside of the RT library. Is it possible to prohibit the inheritance of those ABCs from classes defined outside of the RT library?
What I came up so far (without it working) is defining a macro that makes the classes final outside of RT code and a templated base class that uses std::conditional
class BaseA REALTIME_FINAL
{
virtual void foo() = 0;
}
template <bool allow = REALTIME_TRUE>
class BaseB : : virtual public std::conditional<allow, std::true_t, std::nullptr_t>::type
{
virtual void foo() = 0;
}
While both of these methods prohibit the inheritance from the abstract base, it also makes it impossible in the non-RT library to call or instantiate (or even include the header) any classes derived from it in the RT lib.

You can solve this problem much more simply, by moving your realtime code into its own library with its own header files, and building it without any dependency on your non-realtime library.
Put your realtime and non-realtime headers into separate directories, and if your realtime code is built as a shared library, use the linker option to prohibit undefined symbols in that library.
Then all you have to do is remember not to add your system's equivalent of -Inon-realtime or -lnon-realtime in the realtime library's build configuration.

One thing you need to think hard is - you can solve this problem today but tomorrow an intern makes changes that sneak into your code which compiles but will break in the hands of your client. So I am quite dramatic on adopting a solution that does not leave anything to imagination.
My go-to approach is always to separate libraries with a C API. This guarantees that no C++ features, which compilers are frequently guilty of manipulating, can be tampered.
More often I use this to encapsulate code that was compiled with the old ABI (pre-C++11) which is unfortunately still common from certain vendors.
This also guarantees that I can access these features from a CD/CI perspective from other scripting languages like Python, Java, Rust more easily.

Loading Stages from external code

I wrote a Pipe and Filter based architecture. To avoid confusion the Filter's are called "Stages" in my code. Here's the basic idea :
I want other developers to have the possibility to implement their own Stage class and then I can add it into the list of Stages that already exist at run-time.
I've been reading around for a while and it seems like their are restrictions to dynamic code loading. My current Stage class looks like this :
class Stage
{
public:
void Process();
const uint16_t InputCount();
const uint16_t OutputCount();
void SetOutputPipe(size_t idx, Pipe<AmplitudeVal> *outputPipe);
void SetInputPipe(size_t idx, Pipe<AmplitudeVal> *inputPipe);
protected:
Stage(const uint16_t inputCount, const uint16_t outputCount);
virtual void init() {};
virtual bool work() = 0;
virtual void finish() {};
protected:
const uint16_t mInputCount;
const uint16_t mOutputCount;
std::vector< Pipe<AmplitudeVal>* > mInputs;
std::vector< Pipe<AmplitudeVal>* > mOutputs;
};
AmplitudeVal is simply an alias for float. This class only holds references to pipes it is connected to(mInputs and mOutputs), it doesn't deal with any algorithmic activity. I want to expose as little as I can for ease of use to external developers. Right now this class only relies on the Pipe header and some basic config file. Most examples dealing with loading DLL's propose a class with only pure virtual functions and barely any member variables. I'm not sure what I should do.

I understand you want to have stage in a DLL and have user users derive their work on your DLL.
Scenario 1: consumer and DLL build with same compiler and same standard library
If the consumer uses the same compiler, compatible compiling options, and both, sides use the same shared standard library (default with MSVC), then you solution should work as is
(See related SO question.)
Scenario 2: consumer and DLL build with same compiler but different libraries
If one side uses a different standard library or linking option thereof (for example if you use statically linked library for your DLL and shared library for consumer), then you have to make sure that all objects are ALWAYS created/released on the same side (because DLL and application would each use their own alloaction functions with different memory pools).
This will be very difficult because of:
inheritance of data
virtual destructor
storage management of standard containers (which would be different in DLL and consumer, despite the impression the source code might give that it's the same)
The first step in the right direction would then be to isolate all the data into the private section and ensure clean access via getters and setters. Fortunately, this kind of design is a sound design approach for inheritance, so it's worth to use it even if you don't need it.
Scenario 3: different compilers or incompatible compiling options
If you use different compilers, or incompatible compiling options then the real problems start:
you can't rely on the assumption that both sides have the same understanding of memory layout. So the read/write of members might occur at different locations; a huge mess ! This is why so many DLL classes have no data member. Many also use PIMPL idiom to hide a private memory layout. But PIMPL in this case of inheritance is very similar to using private data (*this would then be the implicit pointer to the private implementation)
the compiler/linker use "mangled" function names. Different compilers might use different mangling, and wouldn't understand each other's symbol definitions (i.e. the client would'nt find SetOutputPipe() despite it's there). This is why most DLL put all member functions as virtual : the functions are called via an offset in a vtable, which uses fortunately the same layout accross compilers in practice.
finally, different compilers could use different calling conventions. But I think in practice on well established platforms, this shouldn't be a major risk
In this answer to a question about DLLs with different compilers (also without inheritance), I've provided some additional explanations and references that could be relevant for such a mixed scenario.
Again using private member data isntead of protected would put you on a safer side. Exposing getters/setters (whether protected or public) using an extern "C" linkage specifier would avoid name mangling issues for non virtual functions.
Final remark:
When exposing classes in libraries, extra-care should be taken to design the class, making data private. This good practice is worth an extra thought, what ever scenario you're in.

Runtime interfaces and object composition in C++

I am searching for a simple, light-weight solution for interface-based runtime object composition in C++. I want to be able to specify interfaces (methods declarations), and objects (creatable through factory pattern) implementing these. At runtime I want mechanisms to instantiate these objects and interconnect these based on interface-connectors. The method calls at runtime should remain fairly cheap, i.e. only several more instructions per call, comparable to functor patterns.
The whole thing needs to be platform independent (at least MS Windows and Linux). And the solution needs to be licensed liberally, like open source LGPL or (even better) BSD or something, especially allowing use commercial products.
What I do not want are heavy things like networking, inter-process-communication, extra compiler steps (one-time code generation is ok though), or dependencies to some heavy libraries (like Qt).
The concrete scenario is: I have such a mechanism in a larger software, but the mechanism is not very well implemented. Interfaces are realized by base classes exported by Dlls. These Dlls also export factory functions to instantiate the implementing objects, based on hand-written class ids.
Before I now start to redesign and implement something better myself, I want to know if there is something out there which would be even better.
Edit: The solution also needs to support multi-threading environments. Additionally, as everything will happen inside the same process, I do not need data serialization mechanisms of any kind.
Edit: I know how such mechanisms work, and I know that several teaching books contain corresponding examples. I do not want to write it myself. The aim of my question is: Is there some sort of "industry standard" lib for this? It is a small problem (within a single process) and I am really only searching for a small solution.
Edit: I got the suggestion to add a pseudo-code example of what I really want to do. So here it is:
Somewhere I want to define interfaces. I do not care if it's C-Headers or some language and code generation.
class interface1 {
public:
virtual void do_stuff(void) = 0;
};
class interface2 {
public:
virtual void do_more_stuff(void) = 0;
};
Then I want to provide (multiple) implementations. These may even be placed in Dll-based plugins. Especially, these two classes my be implemented in two different Dlls not knowing each other at compile time.
class A : public interface1 {
public:
virtual void do_stuff(void) {
// I even need to call further interfaces here
// This call should, however, not require anything heavy, like data serialization or something.
this->con->do_more_stuff();
}
// Interface connectors of some kind. Here I use something like a template
some_connector<interface2> con;
};
class B : public interface2 {
public:
virtual void do_more_stuff() {
// finally doing some stuff
}
};
Finally, I may application main code I want to be able to compose my application logic at runtime (e.g. based on user input):
void main(void) {
// first I create my objects through a factory
some_object a = some_factory::create(some_guid<A>);
some_object b = some_factory::create(some_guid<B>);
// Then I want to connect the interface-connector 'con' of object 'a' to the instance of object 'b'
some_thing::connect(a, some_guid<A::con>, b);
// finally I want to call an interface-method.
interface1 *ia = a.some_cast<interface1>();
ia->do_stuff();
}
I am perfectly able to write such a solution myself (including all pitfalls). What I am searching for is a solution (e.g. a library) which is used and maintained by a wide user base.

While not widely used, I wrote a library several years ago that does this.
You can see it on GitHub zen-core library, and it's also available on Google Code
The GitHub version only contains the core libraries, which is really all the you need. The Google Code version contains a LOT of extra libraries, primarily for game development, but it does provide a lot of good examples on how to use it.
The implementation was inspired by Eclipse's plugin system, using a plugin.xml file that indicates a list of available plugins, and a config.xml file that indicates which plugins you would like to load. I'd also like to change it so that it doesn't depend on libxml2 and allow you to be able to specify plugins using other methods.
The documentation has been destroyed thanks to some hackers, but if you think this would be useful then I can write enough documentation to get you started.

A co-worker gave me two further tips:
The loki library (originating from the modern c++ book):
http://loki-lib.sourceforge.net/
A boost-like library:
http://kifri.fri.uniza.sk/~chochlik/mirror-lib/html/
I still have not looked at all the ideas I got.

Design with (pure)virtual C++

First of all I have to mention that I have read many C++ virtual questions in on stackoverflow. I have some knowledge how they work, but when I start the project and try to design something I never consider/use virtual or pure virtual implementations. Maybe it is because I am lack of knowledge how do they work or I don't know how to realize some stuff with them. I think it's bad because I don't use fully Object Oriented development.
Maybe someone can advise me how to get used to them?

Check out abstract base classes and interfaces in Java or C# to get ideas on when pure virtuals are useful.
Virtual functions are pretty basic to OO. Theree are plenty of books out there to help you. Myself, I like Larman's Applying UML and Patterns.

but when I start the project and try to design something I never consider/use virtual or pure virtual implementations.
Here's something you can try:
Figure out the set of classes you use
Do you see some class hierarchies? A Circle is-a Shape sort of relationships?
Isolate behavior
Bubble up/down behavior to form interfaces (base classes) (Code to interfaces and not implementations)
Implement these as virtual functions
The responsibility of defining the exact semantics of the operation(s) rests with the sub-classes'
Create your sub-classes
Implement (override) the virtual functions
But don't force a hierarchy just for the sake of using them. An example from real code I have been working on recently:
class Codec {
public:
virtual GUID Guid() { return GUID_NULL; }
};
class JpegEncoder : public Codec {
public:
virtual GUID Guid() { return GUID_JpegEncoder; }
};
class PngDecoder : public Codec {
public:
virtual GUID Guid() { return GUID_PngDecoder; }
};

I don't have a ton of time ATM, but here is a simple example.
In my job I maintain and application which talks to various hardware devices. Of these devices, many motors are used for various purposes. Now, I don't know if you have done any development with motors and drives, but they are all a bit different, even if they claim to follow a standard like CANOpen. Anyway, you need to create some new code when you switch vendors, perhaps you motor or drive was end-of-life'd, etc. On top of that, this code has to maintain compatibility with older devices, and we also have various models of similar devices. So, all in all, you have to deal with many different motors and interfaces.
Now, in the code I use an abstract class, named "iMotor", which contains only pure virtual functions. In the implementation code only the iMotor class is referenced. I create a dll for different types of motors with different implementations, but they all implement the iMotor interface. So, all that I need to do to add/change a motor is create a new implementation and drop that dll in place of the old one. Because the code which uses these motor implementations deals only with the iMotor interface it never needs to change, only the implementation of how each motor does what it does needs to change.

If you google for design patterns like the "strategy pattern" and "command pattern" you will find some good uses of interfaces and polymorphism. Besides that, design patterns are always very useful to know.

You don't HAVE to use them but they have their advantages.
Generally they are used as an "interface" between 2 different types of functionality that, code wise, aren't very related.
An example would be handling file loading. A simple file handling class would seem to be perfect. However at a later stage you are asked to shift all your files into a single packaged file while maintaining support for individual files for debug purposes. How do you handle loading here? Obviously things will be handled rather differently because suddenly you can't just open a file. Instead you need to be able to look up the files location and then seek to that location before loading, pretty much, as normal.
The obvious thing to do is implement an abstract base class. Perhaps call it BaseFile. The OpenFile function handling will differ dependent on whether you are using the PackageFile or the DiskFile class. So make that a pure virtual.
Then when you derive the PackageFile and DiskFile classes you provide the appropriate implementation for Opening a file.
You can then add something such as
#if !defined( DISK_FILE ) && defined ( _DEBUG )
#define DISK_FILE 1
#elif !defined( DISK_FILE )
#define DISK_FILE 0
#endif
#if DISK_FILE
typedef DiskFile File;
#else
typedef PackageFile File;
#endif
Now you would just use the "File" typedef to do all file handling. Equally if you don't pre-define DISK_FILE as 0 or 1 and debug is set it will automatically load from disk otherwise it will load from the Package file.
Of course such a construct still allows you to load from the Package file in debug simply by defining DISK_FILE to be 1 in advance and it also allows you to use disk access in a release build by setting DISK_FILE to 0.

Using C++ DLLs with different compiler versions

This question is related to "How to make consistent dll binaries across VS versions ?"
We have applications and DLLs built
with VC6 and a new application built
with VC9. The VC9-app has to use
DLLs compiled with VC6, most of
which are written in C and one in
C++.
The C++ lib is problematic due to
name decoration/mangling issues.
Compiling everything with VC9 is
currently not an option as there
appear to be some side effects.
Resolving these would be quite time
consuming.
I can modify the C++ library, however it must be compiled with VC6.
The C++ lib is essentially an OO-wrapper for another C library. The VC9-app uses some static functions as well as some non-static.
While the static functions can be handled with something like
// Header file
class DLL_API Foo
{
int init();
}
extern "C"
{
int DLL_API Foo_init();
}
// Implementation file
int Foo_init()
{
return Foo::init();
}
it's not that easy with the non-static methods.
As I understand it, Chris Becke's suggestion of using a COM-like interface won't help me because the interface member names will still be decorated and thus inaccessible from a binary created with a different compiler. Am I right there?
Would the only solution be to write a C-style DLL interface using handlers to the objects or am I missing something?
In that case, I guess, I would probably have less effort with directly using the wrapped C-library.

The biggest problem to consider when using a DLL compiled with a different C++ compiler than the calling EXE is memory allocation and object lifetime.
I'm assuming that you can get past the name mangling (and calling convention), which isn't difficult if you use a compiler with compatible mangling (I think VC6 is broadly compatible with VS2008), or if you use extern "C".
Where you'll run into problems is when you allocate something using new (or malloc) from the DLL, and then you return this to the caller. The caller's delete (or free) will attempt to free the object from a different heap. This will go horribly wrong.
You can either do a COM-style IFoo::Release thing, or a MyDllFree() thing. Both of these, because they call back into the DLL, will use the correct implementation of delete (or free()), so they'll delete the correct object.
Or, you can make sure that you use LocalAlloc (for example), so that the EXE and the DLL are using the same heap.

Interface member names will not be decorated -- they're just offsets in a vtable. You can define an interface (using a C struct, rather than a COM "interface") in a header file, thusly:
struct IFoo {
int Init() = 0;
};
Then, you can export a function from the DLL, with no mangling:
class CFoo : public IFoo { /* ... */ };
extern "C" IFoo * __stdcall GetFoo() { return new CFoo(); }
This will work fine, provided that you're using a compiler that generates compatible vtables. Microsoft C++ has generated the same format vtable since (at least, I think) MSVC6.1 for DOS, where the vtable is a simple list of pointers to functions (with thunking in the multiple-inheritance case). GNU C++ (if I recall correctly) generates vtables with function pointers and relative offsets. These are not compatible with each other.

Well, I think Chris Becke's suggestion is just fine. I would not use Roger's first solution, which uses an interface in name only and, as he mentions, can run into problems of incompatible compiler-handling of abstract classes and virtual methods. Roger points to the attractive COM-consistent case in his follow-on.
The pain point: You need to learn to make COM interface requests and deal properly with IUnknown, relying on at least IUnknown:AddRef and IUnknown:Release. If the implementations of interfaces can support more than one interface or if methods can also return interfaces, you may also need to become comfortable with IUnknown:QueryInterface.
Here's the key idea. All of the programs that use the implementation of the interface (but don't implement it) use a common #include "*.h" file that defines the interface as a struct (C) or a C/C++ class (VC++) or struct (non VC++ but C++). The *.h file automatically adapts appropriately depending on whether you are compiling a C Language program or a C++ language program. You don't have to know about that part simply to use the *.h file. What the *.h file does is define the Interface struct or type, lets say, IFoo, with its virtual member functions (and only functions, no direct visibility to data members in this approach).
The header file is constructed to honor the COM binary standard in a way that works for C and that works for C++ regardless of the C++ compiler that is used. (The Java JNI folk figured this one out.) This means that it works between separately-compiled modules of any origin so long as a struct consisting entirely of function-entry pointers (a vtable) is mapped to memory the same by all of them (so they have to be all x86 32-bit, or all x64, for example).
In the DLL that implements the the COM interface via a wrapper class of some sort, you only need a factory entry point. Something like an
extern "C" HRESULT MkIFooImplementation(void **ppv);
which returns an HRESULT (you'll need to learn about those too) and will also return a *pv in a location you provide for receiving the IFoo interface pointer. (I am skimming and there are more careful details that you'll need here. Don't trust my syntax) The actual function stereotype that you use for this is also declared in the *.h file.
The point is that the factory entry, which is always an undecorated extern "C" does all of the necessary wrapper class creation and then delivers an Ifoo interface pointer to the location that you specify. This means that all memory management for creation of the class, and all memory management for finalizing it, etc., will happen in the DLL where you build the wrapper. This is the only place where you have to deal with those details.
When you get an OK result from the factory function, you have been issued an interface pointer and it has already been reserved for you (there is an implicit IFoo:Addref operation already performed on behalf of the interface pointer you were delivered).
When you are done with the interface, you release it with a call on the IFoo:Release method of the interface. It is the final release implementation (in case you made more AddRef'd copies) that will tear down the class and its interface support in the factory DLL. This is what gets you correct reliance on a consistent dynamic stoorage allocation and release behind the interface, whether or not the DLL containing the factory function uses the same libraries as the calling code.
You should probably implement IUnknown:QueryInterface (as method IFoo:QueryInterface) too, even if it always fails. If you want to be more sophisticated with using the COM binary interface model as you have more experience, you can learn to provide full QueryInterface implementations.
This is probably too much information, but I wanted to point out that a lot of the problems you are facing about heterogeneous implementations of DLLs are resolved in the definition of the COM binary interface and even if you don't need all of it, the fact that it provides worked solutions is valuable. In my experience, once you get the hang of this, you will never forget how powerful this can be in C++ and C++ interop situations.
I haven't sketched the resources you might need to consult for examples and what you have to learn in order to make *.h files and to actually implement factory-function wrappers of the libraries you want to share. If you want to dig deeper, holler.

There are other things you need to consider too, such as which run-times are being used by the various libraries. If no objects are being shared that's fine, but that seems quite unlikely at first glance.
Chris Becker's suggestions are pretty accurate - using an actual COM interface may help you get the binary compatibility you need. Your mileage may vary :)

not fun, man. you are in for a lot of frustration, you should probably give this:
Would the only solution be to write a
C-style DLL interface using handlers
to the objects or am I missing
something? In that case, I guess, I
would probably have less effort with
directly using the wrapped C-library.
a really close look. good luck.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to design a C++ API for binary compatible extensibility - c++

There is an interesting article on the KDE knowledge base that describes the do's and don'ts when aiming at binary compatibility when writing a library: Policies/Binary Compatibility Issues With C++

Related

Limit inheritance of classes to specific library in C++

Loading Stages from external code

Runtime interfaces and object composition in C++

Design with (pure)virtual C++

Using C++ DLLs with different compiler versions

Categories

Resources