C++ Plugins ABI issues on Linux - c++

I am working on a plugin system to replace shared libs.
I am aware of ABI issues when designing an API for shared libs and entry points in the libs, such as exported classes, should be carefully design.
For example, adding, removing or reordering private member variables of an exported class may lead to different memory layout and runtime errors (from my understanding, that's why the Pimpl pattern might be useful). Of course there are many other pitfalls to avoid when modifying exported classes.
I have built a small example here to illustrate my question.
First, i provide the following header for the plugin developer :
// character.h
#ifndef CHARACTER_H
#define CHARACTER_H
#include <iostream>
class Character
{
public:
virtual std::string name() = 0;
virtual ~Character() = 0;
};
inline Character::~Character() {}
#endif
Then the plugin is built as a shared lib "libcharacter.so" :
#include "character.h"
#include <iostream>
class Wizard : public Character
{
public:
virtual std::string name() {
return "wizard";
}
};
extern "C"
{
Wizard *createCharacter()
{
return new Wizard;
}
}
And finally the main application that uses the plugin :
#include "character.h"
#include <iostream>
#include <dlfcn.h>
int main(int argc, char *argv[])
{
(void)argc, (void)argv;
using namespace std;
Character *(*creator)();
void *handle = dlopen("../character/libcharacter.so", RTLD_NOW);
if (handle == nullptr) {
cerr << dlerror() << endl;
exit(1);
}
void *f = dlsym(handle, "createCharacter");
creator = (Character *(*)())f;
Character *character = creator();
cout << character->name() << endl;
dlclose(handle);
return 0;
}
Is it sufficient to define an abstract class to get rid of all ABI issues?

Is it sufficient to define an abstract class to get rid of all ABI issues?
Short answer:
No.
I wouldn't recommend using C++ for a plugin API (see longer answer below), but if you do decide to stick with C++ then:
Don't use any standard library types in your plugin API.
For instance, Character::name() returns a std::string. If the implementation of std::string ever changes (and it has in the past in GCC) then that will result in Undefined Behavior. Really, anything that you don't control (any third-party library) shouldn't be used in the API.
Don't use exceptions or RTTI across the plugin boundary. On Linux exceptions and RTTI might work if you load the plugin with RTLD_GLOBAL (not a good idea for plugins) and both the host and the plugin use the same runtime. But in general you either won't be able to catch exceptions from another module, or they might even cause heap corruption (if they are allocated by different runtimes).
Only add functions to the end of your abstract classes, or everything will silently break because of the vtable layout changing (and that can be really hard to diagnose).
Always allocate and deallocate an object from the same module. I noticed you don't have a destroyCharacter() function (main() actually leaks the character but that's another question). Always provide symmetric create and destroy functions for resources created by a different module (shared library or plugin).
I believe on Linux with GCC the host application's operator new and operator delete get correctly propagated to the loaded plugin (through weak symbols), but if you want it to work on Windows then don't assume that operator new and operator delete in the host application and the plugin are the same. A statically linked runtime, especially built with LTO, might also mess with this.
Longer answer:
There are a lot of possible issues when exporting a C++ API from a plugin.
Generally speaking, there are no guarantees about it working if anything about the toolchains used to build the host application and the plugin differs. This can include (but is not limited to) compilers, versions of the language, compiler flags, preprocessor definitions, etc.
The common wisdom regarding plugins is to use a pure C89 API, because the C ABI on all common platforms is very stable.
Keeping to the common subset of C89 and C++ will mean that the host and plugin can use different language standards, standard libraries, etc. Unless the host or the plugin are built with some weird (and probably non-standard-conforming) APIs, this should be reasonably safe. Obviously, you still have to be careful with data structure layouts.
You can then provide a rich C++ header-only wrapper for the C API that handles lifetime and error code/exception conversions, etc.
As a nice bonus, C APIs are producible and consumable by most languages, which could allow the plugin authors to use not just C++.
There are actually quite a few pitfalls even in a C API. If we're being pedantic then the only safe things are functions with fixed-size arguments and return types (pointers, size_t, [u]intN_t) - not even necessarily built-in types (short, int, long, ...), or enums. E.g. in GCC: -fshort-enums can change the size of enums, -fpack-struct[=n] can change the padding within structs.
So, if you really want to be safe then don't use enums and either pack all your structs or don't expose them directly (instead expose accessor functions).
Other considerations:
These aren't strictly related to the question but should definitely be considered before committing to a specific style of API.
Error handling: Whether or not you use C++, you'll need an alternative to exceptions.
This will probably be some form of error code. std::error_code in C++ can be then used to wrap the raw enum/int as soon as you're in C++ land, and if the API uses C++ then a std::expected-like or Boost.Outcome-like type with a stable ABI could be used.
Loading the plugin and importing symbols: With abstract classes it's easy - a simple factory function is all you need. With a traditional C API you might end up needing to import hundreds of symbols. One way of dealing with that would be to emulate a vtable in C. Make each object that has associated functions start with a pointer to a dispatch table, e.g.
typedef struct game_string_view { const char *data; size_t size; } game_string_view;
typedef enum game_plugin_error_code { game_plugin_success = 0, /* ... */ } game_plugin_error_code;
typedef struct game_plugin_character_impl *GamePluginCharacter; // handle to a Character
typedef struct game_plugin_character_dispatch_table { // basically a vtable
void (*destroy)(GamePluginCharacter character); // you could even put destroy() here
game_string_view (*name)(GamePluginCharacter character);
void (*update)(GamePluginCharacter character, /*...*/, game_plugin_error_code *ec); // might fail
} game_plugin_character_dispatch_table;
typedef struct game_plugin_character_impl {
// every call goes through this table and takes GamePluginCharacter as it's first argument
const game_plugin_character_dispatch_table *dispatch;
} game_plugin_character_impl;
Future extensibility and compatibility: You should design the API, knowing that you'll want to change it in the future and keep compatibility. IMO, a C API lends itself well to this because it forces you to be very precise in what is exposed. The plugin should be able to expose it's API version to the host in a way that is forward and backward compatible.
It's a good idea to think about extensibility when designing each function signature. E.g. if a struct is passed by pointer (instead of by value), then it's size can be extended without breaking compatibility (so long as at run time both the caller and the callee agree on it's size).
Visibility: Maybe look into visibility on Linux and other platforms. This isn't really a question of API design, just helps clean up the symbols exported from a shared library.
All of the above is by no means extensive.
I would suggest the talk "Hourglass Interfaces for C++ APIs" as further "reading".
And of course there other good talks and articles on the matter (that I can't remember of the top of my head).

Related

How to make functions with conditional implementations for different platforms. without linking the other implementation [duplicate]

I'm having a bit of a go at developing a platform abstraction library for an application I'm writing, and struggling to come up with a neat way of separating my platform independent code from the platform specific code.
As I see it there are two basic approaches possible: platform independent classes with platform specific delegates, or platform independent classes with platform specific derived classes. Are there any inherent advantages/disadvantages to either approach? And in either case, what's the best mechanism to set up the delegation/inheritance relationship such that the process is transparent to a user of the platform independent classes?
I'd be grateful for any suggestions as to a neat architecture to employ, or even just some examples of what people have done in the past and the pros/cons of the given approach.
EDIT: in response to those suggesting Qt and similar, yes I'm purposely looking to "reinvent the wheel" as I'm not just concerned with developing the app, I'm also interested in the intellectual challenge of rolling my own platform abstraction library. Thanks for the suggestion though!
I'm using platform neutral header files, keeping any platform specific code in the source files (using the PIMPL idiom where neccessary). Each platform neutral header has one platform specific source file per platform, with extensions such as *.win32.cpp, *.posix.cpp. The platform specific ones are only compiled on the relevent platforms.
I also use boost libraries (filesystem, threads) to reduce the amount of platform specific code I have to maintain.
It's platform independent classes declarations with platform specific definitions.
Pros: Works fairly well, doesn't rely on the preprocessor - no #ifdef MyPlatform, keeps platform specific code readily identifiable, allows compiler specific features to be used in platform specific source files, doesn't pollute the global namespace by #including platform headers.
Cons: It's difficult to use inheritance with pimpled classes, sometimes the PIMPL structs need their own headers so they can be referenced from other platform specific source files.
Another way is to have platform independent conventions, but substitute platform specific source code at compile time.
That is to say that if you imagine a component, Foo, that has to be platform specific (like sockets or GUI elements), but has these public members:
class Foo {
public:
void write(const char* str);
void close();
};
Every module that has to use a Foo, obviously has #include "Foo.h", but in a platform specific make file you might have -IWin32, which means that the compiler looks in .\Win32 and finds a Windows specific Foo.h which contains the class, with the same interface, but maybe Windows specific private members etc.
So there is never any file which contains Foo as written above, but only sets of platform specific files which are only used when selected by a platform specific make file.
Have a look at ACE. It has a pretty good abstraction using templates and inheritance.
I might go for a policy-type thing:
template<typename Platform>
struct PlatDetails : private Platform {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") + getName();
}
};
// For any serious compatibility functions, these would
// of course have to be in different headers, and the implementations
// would call some platform-specific functions to get precise
// version numbers. Using PImpl would be a smart idea for these
// classes if they need any platform-specific members, since as
// Joe Gauterin says, you want to avoid your application code indirectly
// including POSIX or Windows system headers, containing useless definitions.
struct Windows {
std::string getName() const { return "Windows"; }
};
struct Linux {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef PlatDetails<Windows> PlatformDetails;
#else
typedef PlatDetails<Linux> PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
There's not a whole lot to choose though between doing this, and doing regular simulated dynamic binding with CRTP, so that the generic thing is the base and the specific thing the derived class:
template<typename Platform>
struct PlatDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
static_cast<Platform*>(this)->getName();
}
};
struct Windows : PlatDetails<Windows> {
std::string getName() const { return "Windows"; }
};
struct Linux : PlatDetails<Linux> {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef Windows PlatformDetails;
#else
typedef Linux PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
Basically in the latter version, getName must be public (although I think you can use friend) and so must be the inheritance, whereas in the former, the inheritance can be private and/or the interface functions can be protected, if desired. So the adaptor can be a firewall between the interface the platform has to implement, and the interface your application code uses. Furthermore you can have multiple policies in the former (i.e. multiple platform-dependent facets used by the same platform-independent class), but not for the latter.
The advantage of either of them over versions with delegates or non-template-using inheritance, is that you don't need any virtual functions. Arguably this isn't a whole lot of advantage, considering how scary both policy-based design and CRTP are at first contact.
In practice, though, I agree with quamrana that normally you can just have different implementations of the same thing on different platforms:
// Or just set the include path with -I or whatever
#ifdef WIN32
#include "windows/platform.h"
#else
#include "linux/platform.h"
#endif
struct PlatformDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
porting::getName();
}
};
// windows/platform.h
namespace porting {
std::string getName() { return "Windows"; }
}
// linux/platform.h
namespace porting {
std::string getName() { return "Linux"; }
}
If you like to use a full-blown c++ framework available for many platforms and permissive copylefted, use Qt.
So... you don't want to simply use Qt? For real work using C++, I'd very highly recommend it. It's an absolutely excellent cross-platform toolkit. I just wrote a few plugins to get it working on the Kindle, and now the Palm Pre. Qt makes everything easy and fun. Downright rejuvenating, even. Well, until your first encounter with QModelIndex, but they've supposedly realized they over-engineered it and they're replacing it ;)
As an academic exercise though, this is an interesting problem. As a wheel re-inventor myself, I've even done it a few times now. :)
Short answer: I'd go with PIMPL. (Qt sources have examples a-plenty)
I've used base classes and platform specific derived classes in the past, but it usually ends up a bit messier than I had in mind. I've also done part of an implementation using some degree of function pointers for platform specific bits, and I was even less happy with that.
Both times I ended up with a very strong feeling that I was over-architecting and had lost my way.
I found using private implementation classes (PIMPL) with different platforms specific bits in different files easiest to write AND debug. However... don't be too afraid of an #ifdef or two, if it's just a few lines and very clear what's going on. I hate cluttered or nested #ifdef logic, but one or two here and there can really help avoid code duplication.
With PIMPL, you're not constantly refactoring your design as you discover new bits that require different implementations between platforms. That way be dragons.
At the implementation level, hidden from the application... there's nothing wrong with a few platform specific derived classes either. If two platform implementations are fairly well defined and share almost no data members, they'd be a good candidate for that. Just do it after realizing that, not before out of some idea that everything needs to fit your selected pattern.
If anything, the biggest gripe I have about coding today is how easily people seem to get lost in idealism. PIMPL is a pattern, having platform specific derived classes is another pattern. Using function pointers is a pattern. There's nothing that says they're mutually exclusive.
However, as a general guideline... start with PIMPL.
There're also the big boys, such as Qt4 (complete framework + GUI),GTK+ (gui-only afaik), and Boost (framework only, no GUI), all 3 support most platforms, GTK+ is C, Qt4/Boost are C++ and for the most part template based.
You might also want to take a look at poco:
The POCO C++ Libraries (POCO stands for POrtable COmponents) are open source C++ class libraries that simplify and accelerate the development of network-centric, portable applications in C++. The libraries integrate perfectly with the C++ Standard Library and fill many of the functional gaps left open by it. Their modular and efficient design and implementation makes the POCO C++ Libraries extremely well suited for embedded development, an area where the C++ programming language is becoming increasingly popular, due to its suitability for both low-level (device I/O, interrupt handlers, etc.) and high-level object-oriented development. Of course, the POCO C++ Libraries are also ready for enterprise-level challenges.
(source: pocoproject.org)

Is use of the STL on C++ APIs safe? [duplicate]

How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.

How to return C++ class when using extern "C"? [duplicate]

How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.

How do I safely pass objects, especially STL objects, to and from a DLL?

How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.

Cross platform C++ code architecture

I'm having a bit of a go at developing a platform abstraction library for an application I'm writing, and struggling to come up with a neat way of separating my platform independent code from the platform specific code.
As I see it there are two basic approaches possible: platform independent classes with platform specific delegates, or platform independent classes with platform specific derived classes. Are there any inherent advantages/disadvantages to either approach? And in either case, what's the best mechanism to set up the delegation/inheritance relationship such that the process is transparent to a user of the platform independent classes?
I'd be grateful for any suggestions as to a neat architecture to employ, or even just some examples of what people have done in the past and the pros/cons of the given approach.
EDIT: in response to those suggesting Qt and similar, yes I'm purposely looking to "reinvent the wheel" as I'm not just concerned with developing the app, I'm also interested in the intellectual challenge of rolling my own platform abstraction library. Thanks for the suggestion though!
I'm using platform neutral header files, keeping any platform specific code in the source files (using the PIMPL idiom where neccessary). Each platform neutral header has one platform specific source file per platform, with extensions such as *.win32.cpp, *.posix.cpp. The platform specific ones are only compiled on the relevent platforms.
I also use boost libraries (filesystem, threads) to reduce the amount of platform specific code I have to maintain.
It's platform independent classes declarations with platform specific definitions.
Pros: Works fairly well, doesn't rely on the preprocessor - no #ifdef MyPlatform, keeps platform specific code readily identifiable, allows compiler specific features to be used in platform specific source files, doesn't pollute the global namespace by #including platform headers.
Cons: It's difficult to use inheritance with pimpled classes, sometimes the PIMPL structs need their own headers so they can be referenced from other platform specific source files.
Another way is to have platform independent conventions, but substitute platform specific source code at compile time.
That is to say that if you imagine a component, Foo, that has to be platform specific (like sockets or GUI elements), but has these public members:
class Foo {
public:
void write(const char* str);
void close();
};
Every module that has to use a Foo, obviously has #include "Foo.h", but in a platform specific make file you might have -IWin32, which means that the compiler looks in .\Win32 and finds a Windows specific Foo.h which contains the class, with the same interface, but maybe Windows specific private members etc.
So there is never any file which contains Foo as written above, but only sets of platform specific files which are only used when selected by a platform specific make file.
Have a look at ACE. It has a pretty good abstraction using templates and inheritance.
I might go for a policy-type thing:
template<typename Platform>
struct PlatDetails : private Platform {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") + getName();
}
};
// For any serious compatibility functions, these would
// of course have to be in different headers, and the implementations
// would call some platform-specific functions to get precise
// version numbers. Using PImpl would be a smart idea for these
// classes if they need any platform-specific members, since as
// Joe Gauterin says, you want to avoid your application code indirectly
// including POSIX or Windows system headers, containing useless definitions.
struct Windows {
std::string getName() const { return "Windows"; }
};
struct Linux {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef PlatDetails<Windows> PlatformDetails;
#else
typedef PlatDetails<Linux> PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
There's not a whole lot to choose though between doing this, and doing regular simulated dynamic binding with CRTP, so that the generic thing is the base and the specific thing the derived class:
template<typename Platform>
struct PlatDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
static_cast<Platform*>(this)->getName();
}
};
struct Windows : PlatDetails<Windows> {
std::string getName() const { return "Windows"; }
};
struct Linux : PlatDetails<Linux> {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef Windows PlatformDetails;
#else
typedef Linux PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
Basically in the latter version, getName must be public (although I think you can use friend) and so must be the inheritance, whereas in the former, the inheritance can be private and/or the interface functions can be protected, if desired. So the adaptor can be a firewall between the interface the platform has to implement, and the interface your application code uses. Furthermore you can have multiple policies in the former (i.e. multiple platform-dependent facets used by the same platform-independent class), but not for the latter.
The advantage of either of them over versions with delegates or non-template-using inheritance, is that you don't need any virtual functions. Arguably this isn't a whole lot of advantage, considering how scary both policy-based design and CRTP are at first contact.
In practice, though, I agree with quamrana that normally you can just have different implementations of the same thing on different platforms:
// Or just set the include path with -I or whatever
#ifdef WIN32
#include "windows/platform.h"
#else
#include "linux/platform.h"
#endif
struct PlatformDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
porting::getName();
}
};
// windows/platform.h
namespace porting {
std::string getName() { return "Windows"; }
}
// linux/platform.h
namespace porting {
std::string getName() { return "Linux"; }
}
If you like to use a full-blown c++ framework available for many platforms and permissive copylefted, use Qt.
So... you don't want to simply use Qt? For real work using C++, I'd very highly recommend it. It's an absolutely excellent cross-platform toolkit. I just wrote a few plugins to get it working on the Kindle, and now the Palm Pre. Qt makes everything easy and fun. Downright rejuvenating, even. Well, until your first encounter with QModelIndex, but they've supposedly realized they over-engineered it and they're replacing it ;)
As an academic exercise though, this is an interesting problem. As a wheel re-inventor myself, I've even done it a few times now. :)
Short answer: I'd go with PIMPL. (Qt sources have examples a-plenty)
I've used base classes and platform specific derived classes in the past, but it usually ends up a bit messier than I had in mind. I've also done part of an implementation using some degree of function pointers for platform specific bits, and I was even less happy with that.
Both times I ended up with a very strong feeling that I was over-architecting and had lost my way.
I found using private implementation classes (PIMPL) with different platforms specific bits in different files easiest to write AND debug. However... don't be too afraid of an #ifdef or two, if it's just a few lines and very clear what's going on. I hate cluttered or nested #ifdef logic, but one or two here and there can really help avoid code duplication.
With PIMPL, you're not constantly refactoring your design as you discover new bits that require different implementations between platforms. That way be dragons.
At the implementation level, hidden from the application... there's nothing wrong with a few platform specific derived classes either. If two platform implementations are fairly well defined and share almost no data members, they'd be a good candidate for that. Just do it after realizing that, not before out of some idea that everything needs to fit your selected pattern.
If anything, the biggest gripe I have about coding today is how easily people seem to get lost in idealism. PIMPL is a pattern, having platform specific derived classes is another pattern. Using function pointers is a pattern. There's nothing that says they're mutually exclusive.
However, as a general guideline... start with PIMPL.
There're also the big boys, such as Qt4 (complete framework + GUI),GTK+ (gui-only afaik), and Boost (framework only, no GUI), all 3 support most platforms, GTK+ is C, Qt4/Boost are C++ and for the most part template based.
You might also want to take a look at poco:
The POCO C++ Libraries (POCO stands for POrtable COmponents) are open source C++ class libraries that simplify and accelerate the development of network-centric, portable applications in C++. The libraries integrate perfectly with the C++ Standard Library and fill many of the functional gaps left open by it. Their modular and efficient design and implementation makes the POCO C++ Libraries extremely well suited for embedded development, an area where the C++ programming language is becoming increasingly popular, due to its suitability for both low-level (device I/O, interrupt handlers, etc.) and high-level object-oriented development. Of course, the POCO C++ Libraries are also ready for enterprise-level challenges.
(source: pocoproject.org)