Loading Stages from external code - c++

I wrote a Pipe and Filter based architecture. To avoid confusion the Filter's are called "Stages" in my code. Here's the basic idea :
I want other developers to have the possibility to implement their own Stage class and then I can add it into the list of Stages that already exist at run-time.
I've been reading around for a while and it seems like their are restrictions to dynamic code loading. My current Stage class looks like this :
class Stage
{
public:
void Process();
const uint16_t InputCount();
const uint16_t OutputCount();
void SetOutputPipe(size_t idx, Pipe<AmplitudeVal> *outputPipe);
void SetInputPipe(size_t idx, Pipe<AmplitudeVal> *inputPipe);
protected:
Stage(const uint16_t inputCount, const uint16_t outputCount);
virtual void init() {};
virtual bool work() = 0;
virtual void finish() {};
protected:
const uint16_t mInputCount;
const uint16_t mOutputCount;
std::vector< Pipe<AmplitudeVal>* > mInputs;
std::vector< Pipe<AmplitudeVal>* > mOutputs;
};
AmplitudeVal is simply an alias for float. This class only holds references to pipes it is connected to(mInputs and mOutputs), it doesn't deal with any algorithmic activity. I want to expose as little as I can for ease of use to external developers. Right now this class only relies on the Pipe header and some basic config file. Most examples dealing with loading DLL's propose a class with only pure virtual functions and barely any member variables. I'm not sure what I should do.

I understand you want to have stage in a DLL and have user users derive their work on your DLL.
Scenario 1: consumer and DLL build with same compiler and same standard library
If the consumer uses the same compiler, compatible compiling options, and both, sides use the same shared standard library (default with MSVC), then you solution should work as is
(See related SO question.)
Scenario 2: consumer and DLL build with same compiler but different libraries
If one side uses a different standard library or linking option thereof (for example if you use statically linked library for your DLL and shared library for consumer), then you have to make sure that all objects are ALWAYS created/released on the same side (because DLL and application would each use their own alloaction functions with different memory pools).
This will be very difficult because of:
inheritance of data
virtual destructor
storage management of standard containers (which would be different in DLL and consumer, despite the impression the source code might give that it's the same)
The first step in the right direction would then be to isolate all the data into the private section and ensure clean access via getters and setters. Fortunately, this kind of design is a sound design approach for inheritance, so it's worth to use it even if you don't need it.
Scenario 3: different compilers or incompatible compiling options
If you use different compilers, or incompatible compiling options then the real problems start:
you can't rely on the assumption that both sides have the same understanding of memory layout. So the read/write of members might occur at different locations; a huge mess ! This is why so many DLL classes have no data member. Many also use PIMPL idiom to hide a private memory layout. But PIMPL in this case of inheritance is very similar to using private data (*this would then be the implicit pointer to the private implementation)
the compiler/linker use "mangled" function names. Different compilers might use different mangling, and wouldn't understand each other's symbol definitions (i.e. the client would'nt find SetOutputPipe() despite it's there). This is why most DLL put all member functions as virtual : the functions are called via an offset in a vtable, which uses fortunately the same layout accross compilers in practice.
finally, different compilers could use different calling conventions. But I think in practice on well established platforms, this shouldn't be a major risk
In this answer to a question about DLLs with different compilers (also without inheritance), I've provided some additional explanations and references that could be relevant for such a mixed scenario.
Again using private member data isntead of protected would put you on a safer side. Exposing getters/setters (whether protected or public) using an extern "C" linkage specifier would avoid name mangling issues for non virtual functions.
Final remark:
When exposing classes in libraries, extra-care should be taken to design the class, making data private. This good practice is worth an extra thought, what ever scenario you're in.

Related

Limit inheritance of classes to specific library in C++

I am trying to limit the inheritance of an c++ class to within the same library, while allowing it to be instantiated in other libraries.
The use case is that I have some code that needs to be real-time capable (compiled with special flags and poisoned code) and that needs to be used/interfaced to non-RT code. However, I need to make absolutely sure that no non-RT code can ever be called inside the RT code. Therefore I have to libraries: one that is RT capable and one that isn't (which depends on the RT library and may use code from it).
Now, I have some abstract classes, which I want to be only inherited from inside of the RT library. Is it possible to prohibit the inheritance of those ABCs from classes defined outside of the RT library?
What I came up so far (without it working) is defining a macro that makes the classes final outside of RT code and a templated base class that uses std::conditional
class BaseA REALTIME_FINAL
{
virtual void foo() = 0;
}
template <bool allow = REALTIME_TRUE>
class BaseB : : virtual public std::conditional<allow, std::true_t, std::nullptr_t>::type
{
virtual void foo() = 0;
}
While both of these methods prohibit the inheritance from the abstract base, it also makes it impossible in the non-RT library to call or instantiate (or even include the header) any classes derived from it in the RT lib.
You can solve this problem much more simply, by moving your realtime code into its own library with its own header files, and building it without any dependency on your non-realtime library.
Put your realtime and non-realtime headers into separate directories, and if your realtime code is built as a shared library, use the linker option to prohibit undefined symbols in that library.
Then all you have to do is remember not to add your system's equivalent of -Inon-realtime or -lnon-realtime in the realtime library's build configuration.
One thing you need to think hard is - you can solve this problem today but tomorrow an intern makes changes that sneak into your code which compiles but will break in the hands of your client. So I am quite dramatic on adopting a solution that does not leave anything to imagination.
My go-to approach is always to separate libraries with a C API. This guarantees that no C++ features, which compilers are frequently guilty of manipulating, can be tampered.
More often I use this to encapsulate code that was compiled with the old ABI (pre-C++11) which is unfortunately still common from certain vendors.
This also guarantees that I can access these features from a CD/CI perspective from other scripting languages like Python, Java, Rust more easily.

Name of this C++ pattern and the reasoning behind it?

In my company's C++ codebase I see a lot of classes defined like this:
// FooApi.h
class FooApi {
public:
virtual void someFunction() = 0;
virtual void someOtherFunction() = 0;
// etc.
};
// Foo.h
class Foo : public FooApi {
public:
virtual void someFunction();
virtual void someOtherFunction();
};
Foo is this only class that inherits from FooApi and functions that take or return pointers to Foo objects use FooApi * instead. It seems to mainly be used for singleton classes.
Is this a common, named way to write C++ code? And what is the point in it? I don't see how having a separate, pure abstract class that just defines the class's interface is useful.
Edit[0]: Sorry, just to clarify, there is only one class deriving from FooApi and no intention to add others later.
Edit[1]: I understand the point of abstraction and inheritance in general but not this particular usage of inheritance.
The only reason that I can see why they would do this is for encapsulation purposes. The point here is that most other code in the code-base only requires inclusion of the "FooApi.h" / "BarApi.h" / "QuxxApi.h" headers. Only the parts of the code that create Foo objects would actually need to include the "Foo.h" header (and link with the object-file containing the definition of the class' functions). And for singletons, the only place where you would normally create a Foo object is in the "Foo.cpp" file (e.g., as a local static variable within a static member function of the Foo class, or something similar).
This is similar to using forward-declarations to avoid including the header that contains the actual class declaration. But when using forward-declarations, you still need to eventually include the header in order to be able to call any of the member functions. But when using this "abstract + actual" class pattern, you don't even need to include the "Foo.h" header to be able to call the member functions of FooApi.
In other words, this pattern provides very strong encapsulation of the Foo class' implementation (and complete declaration). You get roughly the same benefits as from using the Compiler Firewall idiom. Here is another interesting read on those issues.
I don't know the name of that pattern. It is not very common compared to the other two patterns I just mentioned (compiler firewall and forward declarations). This is probably because this method has quite a bit more run-time overhead than the other two methods.
This is for if the code is later added on to. Lets say NewFoo also extends/implements FooApi. All the current infrastructure will work with both Foo and NewFoo.
It's likely that this has been done for the same reason that pImpl ("pointer to implementation idiom", sometimes called "private implementation idiom") is used - to keep private implementation details out of the header, which means common build systems like make that use file timestamps to trigger code recompilation will not rebuild client code when only implementation has changed. Instead, the object containing the new implementation can be linked against existing client object(s), and indeed if the implementation is distributed in a shared object (aka dynamic link library / DLL) the client application can pick up a changed implementation library the next time it runs (or does a dlopen() or equivalent if it's linking at run-time). As well as facilitating distribution of updated implementation, it can reduce rebuilding times allowing a faster edit/test/edit/... cycle.
The cost of this is that implementations have to be accessed through out-of-line virtual dispatch, so there's a performance hit. This is typically insignificant, but if a trivial function like a get-int-member is called millions of times in a performance critical loop it may be of interest - each call can easily be an order of magnitude slower than inlined member access.
What's the "name" for it? Well, if you say you're using an "interface" most people will get the general idea. That term's a bit vague in C++, as some people use it whenever a base class has virtual methods, others expect that the base will be abstract, lack data members and/or private member functions and/or function definitions (other than the virtual destructor's). Expectations around the term "interface" are sometimes - for better or worse - influenced by Java's language keyword, which restricts the interface class to being abstract, containing no static methods or function definitions, with all functions being public, and only const final data members.
None of the well-known Gang of Four Design Patterns correspond to the usage you cite, and while doubtless lots of people have published (web- or otherwise) corresponding "patterns", they're probably not widely enough used (with the same meaning!) to be less confusing than "interface".
FooApi is a virtual base class, it provides the interface for concrete implementations (Foo).
The point is you can implement functionality in terms of FooApi and create multiple implementations that satisfy its interface and still work with your functionality. You see some advantage when you have multiple descendants - the functionality can work with multiple implementations. One might implement a different type of Foo or for a different platform.
Re-reading my answer, I don't think I should talk about OO ever again.

Is it safe to use strings as private data members in a class used across a DLL boundry?

My understanding is that exposing functions that take or return stl containers (such as std::string) across DLL boundaries can cause problems due to differences in STL implementations of those containers in the 2 binaries. But is it safe to export a class like:
class Customer
{
public:
wchar_t * getName() const;
private:
wstring mName;
};
Without some sort of hack, mName is not going to be usable by the executable, so it won't be able to execute methods on mName, nor construct/destruct this object.
My gut feeling is "don't do this, it's unsafe", but I can't figure out a good reason.
It is not a problem. Because it is trumped by the bigger problem, you cannot create an object of that class in code that lives in a module other than the one that contains the code for the class. Code in another module cannot accurately know the required object size, their implementation of the std::string class may well be different. Which, as declared, also affects the size of the Customer object. Even the same compiler cannot guarantee this, mixing optimized and debugging builds of these modules for example. Albeit that this is usually pretty easy to avoid.
So you must create a class factory for Customer objects, a factory that lives in that same module. Which then automatically implies that any code that touches the "mName" member also lives in the same module. And is therefore safe.
Next step then is to not expose Customer at all but expose an pure abstract base class (aka interface). Now you can prevent the client code from creating an instance of Customer and shoot their leg off. And you'll trivially hide the std::string as well. Interface-based programming techniques are common in module interop scenarios. Also the approach taken by COM.
As long as the allocator of instances of the class and deallocator are of the same settings, you should be ok, but you are right to avoid this.
Differences between the .exe and .dll as far as debug/release, code generation (Multi-threaded DLL vs. Single threaded) could cause problems in some scenarios.
I would recommend using abstract classes in the DLL interface with creation and deletion done solely inside the DLL.
Interfaces like:
class A {
protected:
virtual ~A() {}
public:
virtual void func() = 0;
};
//exported create/delete functions
A* create_A();
void destroy_A(A*);
DLL Implementation like:
class A_Impl : public A{
public:
~A_Impl() {}
void func() { do_something(); }
}
A* create_A() { return new A_Impl; }
void destroy_A(A* a) {
A_Impl* ai=static_cast<A_Impl*>(a);
delete ai;
}
Should be ok.
Even if your class has no data members, you cannot expect it to be usable from code compiled with a different compiler. There is no common ABI for C++ classes. You can expect differences in name mangling just for starters.
If you are prepared to constrain clients to use the same compiler as you, or provide source to allow clients to compile your code with their compiler, then you can do pretty much anything across your interface. Otherwise you should stick to C style interfaces.
If you want to provide an object oriented interface in a DLL that is truly safe, I would suggest building it on top of the COM object model. That's what it was designed for.
Any other attempt to share classes between code that is compiled by different compilers has the potential to fail. You may be able to get something that seems to work most of the time, but it can't be guaraneteed to work.
The chances are that at some point you're going to be relying on undefined behaviour in terms of calling conventions or class structure or memory allocation.
The C++ standard does not say anything about the ABI provided by implementations. Even on a single platform changing the compiler options may change binary layout or function interfaces.
Thus to ensure that standard types can be used across DLL boundaries it is your responsibility to ensure that either:
Resource Acquisition/Release for standard types is done by the same DLL. (Note: you can have multiple crt's in a process but a resource acquired by crt1.DLL must be released by crt1.DLL.)
This is not specific to C++. In C for example malloc/free, fopen/fclose call pairs must each go to a single C runtime.
This can be done by either of the below:
By explicitly exporting acquisition/release functions ( Photon's answer ). In this case you are forced to use a factory pattern and abstract types.Basically COM or a COM-clone
Forcing a group of DLL's to link against the same dynamic CRT. In this case you can safely export any kind of functions/classes.
There are also two "potential bug" (among others) you must take care, since they are related to what is "under" the language.
The first is that std::strng is a template, and hence it is instantiated in every translation unit. If they are all linked to a same module (exe or dll) the linker will resolve same functions as same code, and eventually inconsistent code (same function with different body) is treated as error.
But if they are linked to different module (and exe and a dll) there is nothing (compiler and linker) in common. So -depending on how the module where compiled- you may have different implementation of a same class with different member and memory layout (for example one may have some debugging or profiling added features the other has not). Accessing an object created on one side with methods compiled on the other side, if you have no other way to grant implementation consistency, may end in tears.
The second problem (more subtle) relates to allocation/deallocaion of memory: because of the way windows works, every module can have a distinct heap. But the standard C++ does not specify how new and delete take care about which heap an object comes from. And if the string buffer is allocated on one module, than moved to a string instance on another module, you risk (upon destruction) to give the memory back to the wrong heap (it depends on how new/delete and malloc/free are implemented respect to HeapAlloc/HeapFree: this merely relates to the level of "awarness" the STL implementation have respect to the underlying OS. The operation is not itself destructive -the operation just fails- but it leaks the origin's heap).
All that said, it is not impossible to pass a container. It is just up to you to grant a consistent implementation between the sides, since the compiler and linker have no way to cross check.

How to design a C++ API for binary compatible extensibility

I am designing an API for a C++ library which will be distributed in a dll / shared object. The library contains polymorhic classes with virtual functions. I am concerned that if I expose these virtual functions on the DLL API, I cut myself from the possibility of extending the same classes with more virtual functions without breaking binary compatibility with applications built for the previous version of the library.
One option would be to use the PImpl idiom to hide all the classes having virtual functions, but that also seem to have it's limitations: this way applications lose the possibility of subclassing the classes of the library and overriding the virtual methods.
How would you design a API class which can be subclassed in an application, without losing the possibility to extend the API with (not abstract) virtual methods in a new version of the dll while staying backward binary compatible?
Update: the target platforms for the library are windows/msvc and linux/gcc.
Several months ago I wrote an article called "Binary Compatibility of Shared Libraries Implemented in C++ on GNU/Linux Systems" [pdf]. While concepts are similar on Windows system, I'm sure they're not exactly the same. But having read the article you can get a notion on what's going on at C++ binary level that has anything to do with compatibility.
By the way, GCC application binary interface is summarized in a standard document draft "Itanium ABI", so you'll have a formal ground for a coding standard you choose.
Just for a quick example: in GCC you can extend a class with more virtual functions, if no other class inherits it. Read the article for better set of rules.
But anyway, rules are sometimes way too complex to understand. So you might be interested in a tool that verifies compatibility of two given versions: abi-compliance-checker for Linux.
There is an interesting article on the KDE knowledge base that describes the do's and don'ts when aiming at binary compatibility when writing a library: Policies/Binary Compatibility Issues With C++
C++ binary compat is generally difficult, even without inheritance. Look at GCC for example. In the last 10 years, I'm not sure how many breaking ABI changes they've had. Then MSVC has a different set of conventions, so linking to that with GCC and vice versa can't be done... If you compare this to the C world, compiler inter-op seems a bit better there.
If you're on Windows you should look at COM. As you introduce new functionality you can add interfaces. Then callers can QueryInterface() for the new one to expose that new functionality, and even if you end up changing things a lot, you can either leave the old implementation there or you can write shims for the old interfaces.
I think you misunderstand the problem of subclassing.
Here is your Pimpl:
// .h
class Derived
{
public:
virtual void test1();
virtual void test2();
private;
Impl* m_impl;
};
// .cpp
struct Impl: public Base
{
virtual void test1(); // override Base::test1()
virtual void test2(); // override Base::test2()
// data members
};
void Derived::test1() { m_impl->test1(); }
void Derived::test2() { m_impl->test2(); }
See ? No problem with overriding the virtual methods of Base, you just need to make sure to redeclare them virtual in Derived so that those deriving from Derived know they may rewrite them too (only if you wish so, which by the way is a great way of providing a final for those who lack it), and you may still redefine it for yourself in Impl which may even call the Base version.
There is no problem with Pimpl there.
On the other hand, you lose polymorphism, which may be troublesome. It's up to you to decide whether you want polymorphism or just composition.
If you expose the PImpl class in a header file, then you can inherit from it. You can still maintain backward portability since the external classes contains a pointer to the PImpl object. Of course if the client code of the library isn't very wise, it could misuse this exposed PImpl object, and ruin the binary backward compatibility. You may add some notes to warn the user in the PImpl's header file.

Using C++ DLLs with different compiler versions

This question is related to "How to make consistent dll binaries across VS versions ?"
We have applications and DLLs built
with VC6 and a new application built
with VC9. The VC9-app has to use
DLLs compiled with VC6, most of
which are written in C and one in
C++.
The C++ lib is problematic due to
name decoration/mangling issues.
Compiling everything with VC9 is
currently not an option as there
appear to be some side effects.
Resolving these would be quite time
consuming.
I can modify the C++ library, however it must be compiled with VC6.
The C++ lib is essentially an OO-wrapper for another C library. The VC9-app uses some static functions as well as some non-static.
While the static functions can be handled with something like
// Header file
class DLL_API Foo
{
int init();
}
extern "C"
{
int DLL_API Foo_init();
}
// Implementation file
int Foo_init()
{
return Foo::init();
}
it's not that easy with the non-static methods.
As I understand it, Chris Becke's suggestion of using a COM-like interface won't help me because the interface member names will still be decorated and thus inaccessible from a binary created with a different compiler. Am I right there?
Would the only solution be to write a C-style DLL interface using handlers to the objects or am I missing something?
In that case, I guess, I would probably have less effort with directly using the wrapped C-library.
The biggest problem to consider when using a DLL compiled with a different C++ compiler than the calling EXE is memory allocation and object lifetime.
I'm assuming that you can get past the name mangling (and calling convention), which isn't difficult if you use a compiler with compatible mangling (I think VC6 is broadly compatible with VS2008), or if you use extern "C".
Where you'll run into problems is when you allocate something using new (or malloc) from the DLL, and then you return this to the caller. The caller's delete (or free) will attempt to free the object from a different heap. This will go horribly wrong.
You can either do a COM-style IFoo::Release thing, or a MyDllFree() thing. Both of these, because they call back into the DLL, will use the correct implementation of delete (or free()), so they'll delete the correct object.
Or, you can make sure that you use LocalAlloc (for example), so that the EXE and the DLL are using the same heap.
Interface member names will not be decorated -- they're just offsets in a vtable. You can define an interface (using a C struct, rather than a COM "interface") in a header file, thusly:
struct IFoo {
int Init() = 0;
};
Then, you can export a function from the DLL, with no mangling:
class CFoo : public IFoo { /* ... */ };
extern "C" IFoo * __stdcall GetFoo() { return new CFoo(); }
This will work fine, provided that you're using a compiler that generates compatible vtables. Microsoft C++ has generated the same format vtable since (at least, I think) MSVC6.1 for DOS, where the vtable is a simple list of pointers to functions (with thunking in the multiple-inheritance case). GNU C++ (if I recall correctly) generates vtables with function pointers and relative offsets. These are not compatible with each other.
Well, I think Chris Becke's suggestion is just fine. I would not use Roger's first solution, which uses an interface in name only and, as he mentions, can run into problems of incompatible compiler-handling of abstract classes and virtual methods. Roger points to the attractive COM-consistent case in his follow-on.
The pain point: You need to learn to make COM interface requests and deal properly with IUnknown, relying on at least IUnknown:AddRef and IUnknown:Release. If the implementations of interfaces can support more than one interface or if methods can also return interfaces, you may also need to become comfortable with IUnknown:QueryInterface.
Here's the key idea. All of the programs that use the implementation of the interface (but don't implement it) use a common #include "*.h" file that defines the interface as a struct (C) or a C/C++ class (VC++) or struct (non VC++ but C++). The *.h file automatically adapts appropriately depending on whether you are compiling a C Language program or a C++ language program. You don't have to know about that part simply to use the *.h file. What the *.h file does is define the Interface struct or type, lets say, IFoo, with its virtual member functions (and only functions, no direct visibility to data members in this approach).
The header file is constructed to honor the COM binary standard in a way that works for C and that works for C++ regardless of the C++ compiler that is used. (The Java JNI folk figured this one out.) This means that it works between separately-compiled modules of any origin so long as a struct consisting entirely of function-entry pointers (a vtable) is mapped to memory the same by all of them (so they have to be all x86 32-bit, or all x64, for example).
In the DLL that implements the the COM interface via a wrapper class of some sort, you only need a factory entry point. Something like an
extern "C" HRESULT MkIFooImplementation(void **ppv);
which returns an HRESULT (you'll need to learn about those too) and will also return a *pv in a location you provide for receiving the IFoo interface pointer. (I am skimming and there are more careful details that you'll need here. Don't trust my syntax) The actual function stereotype that you use for this is also declared in the *.h file.
The point is that the factory entry, which is always an undecorated extern "C" does all of the necessary wrapper class creation and then delivers an Ifoo interface pointer to the location that you specify. This means that all memory management for creation of the class, and all memory management for finalizing it, etc., will happen in the DLL where you build the wrapper. This is the only place where you have to deal with those details.
When you get an OK result from the factory function, you have been issued an interface pointer and it has already been reserved for you (there is an implicit IFoo:Addref operation already performed on behalf of the interface pointer you were delivered).
When you are done with the interface, you release it with a call on the IFoo:Release method of the interface. It is the final release implementation (in case you made more AddRef'd copies) that will tear down the class and its interface support in the factory DLL. This is what gets you correct reliance on a consistent dynamic stoorage allocation and release behind the interface, whether or not the DLL containing the factory function uses the same libraries as the calling code.
You should probably implement IUnknown:QueryInterface (as method IFoo:QueryInterface) too, even if it always fails. If you want to be more sophisticated with using the COM binary interface model as you have more experience, you can learn to provide full QueryInterface implementations.
This is probably too much information, but I wanted to point out that a lot of the problems you are facing about heterogeneous implementations of DLLs are resolved in the definition of the COM binary interface and even if you don't need all of it, the fact that it provides worked solutions is valuable. In my experience, once you get the hang of this, you will never forget how powerful this can be in C++ and C++ interop situations.
I haven't sketched the resources you might need to consult for examples and what you have to learn in order to make *.h files and to actually implement factory-function wrappers of the libraries you want to share. If you want to dig deeper, holler.
There are other things you need to consider too, such as which run-times are being used by the various libraries. If no objects are being shared that's fine, but that seems quite unlikely at first glance.
Chris Becker's suggestions are pretty accurate - using an actual COM interface may help you get the binary compatibility you need. Your mileage may vary :)
not fun, man. you are in for a lot of frustration, you should probably give this:
Would the only solution be to write a
C-style DLL interface using handlers
to the objects or am I missing
something? In that case, I guess, I
would probably have less effort with
directly using the wrapped C-library.
a really close look. good luck.