Related
I'm having a bit of a go at developing a platform abstraction library for an application I'm writing, and struggling to come up with a neat way of separating my platform independent code from the platform specific code.
As I see it there are two basic approaches possible: platform independent classes with platform specific delegates, or platform independent classes with platform specific derived classes. Are there any inherent advantages/disadvantages to either approach? And in either case, what's the best mechanism to set up the delegation/inheritance relationship such that the process is transparent to a user of the platform independent classes?
I'd be grateful for any suggestions as to a neat architecture to employ, or even just some examples of what people have done in the past and the pros/cons of the given approach.
EDIT: in response to those suggesting Qt and similar, yes I'm purposely looking to "reinvent the wheel" as I'm not just concerned with developing the app, I'm also interested in the intellectual challenge of rolling my own platform abstraction library. Thanks for the suggestion though!
I'm using platform neutral header files, keeping any platform specific code in the source files (using the PIMPL idiom where neccessary). Each platform neutral header has one platform specific source file per platform, with extensions such as *.win32.cpp, *.posix.cpp. The platform specific ones are only compiled on the relevent platforms.
I also use boost libraries (filesystem, threads) to reduce the amount of platform specific code I have to maintain.
It's platform independent classes declarations with platform specific definitions.
Pros: Works fairly well, doesn't rely on the preprocessor - no #ifdef MyPlatform, keeps platform specific code readily identifiable, allows compiler specific features to be used in platform specific source files, doesn't pollute the global namespace by #including platform headers.
Cons: It's difficult to use inheritance with pimpled classes, sometimes the PIMPL structs need their own headers so they can be referenced from other platform specific source files.
Another way is to have platform independent conventions, but substitute platform specific source code at compile time.
That is to say that if you imagine a component, Foo, that has to be platform specific (like sockets or GUI elements), but has these public members:
class Foo {
public:
void write(const char* str);
void close();
};
Every module that has to use a Foo, obviously has #include "Foo.h", but in a platform specific make file you might have -IWin32, which means that the compiler looks in .\Win32 and finds a Windows specific Foo.h which contains the class, with the same interface, but maybe Windows specific private members etc.
So there is never any file which contains Foo as written above, but only sets of platform specific files which are only used when selected by a platform specific make file.
Have a look at ACE. It has a pretty good abstraction using templates and inheritance.
I might go for a policy-type thing:
template<typename Platform>
struct PlatDetails : private Platform {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") + getName();
}
};
// For any serious compatibility functions, these would
// of course have to be in different headers, and the implementations
// would call some platform-specific functions to get precise
// version numbers. Using PImpl would be a smart idea for these
// classes if they need any platform-specific members, since as
// Joe Gauterin says, you want to avoid your application code indirectly
// including POSIX or Windows system headers, containing useless definitions.
struct Windows {
std::string getName() const { return "Windows"; }
};
struct Linux {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef PlatDetails<Windows> PlatformDetails;
#else
typedef PlatDetails<Linux> PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
There's not a whole lot to choose though between doing this, and doing regular simulated dynamic binding with CRTP, so that the generic thing is the base and the specific thing the derived class:
template<typename Platform>
struct PlatDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
static_cast<Platform*>(this)->getName();
}
};
struct Windows : PlatDetails<Windows> {
std::string getName() const { return "Windows"; }
};
struct Linux : PlatDetails<Linux> {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef Windows PlatformDetails;
#else
typedef Linux PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
Basically in the latter version, getName must be public (although I think you can use friend) and so must be the inheritance, whereas in the former, the inheritance can be private and/or the interface functions can be protected, if desired. So the adaptor can be a firewall between the interface the platform has to implement, and the interface your application code uses. Furthermore you can have multiple policies in the former (i.e. multiple platform-dependent facets used by the same platform-independent class), but not for the latter.
The advantage of either of them over versions with delegates or non-template-using inheritance, is that you don't need any virtual functions. Arguably this isn't a whole lot of advantage, considering how scary both policy-based design and CRTP are at first contact.
In practice, though, I agree with quamrana that normally you can just have different implementations of the same thing on different platforms:
// Or just set the include path with -I or whatever
#ifdef WIN32
#include "windows/platform.h"
#else
#include "linux/platform.h"
#endif
struct PlatformDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
porting::getName();
}
};
// windows/platform.h
namespace porting {
std::string getName() { return "Windows"; }
}
// linux/platform.h
namespace porting {
std::string getName() { return "Linux"; }
}
If you like to use a full-blown c++ framework available for many platforms and permissive copylefted, use Qt.
So... you don't want to simply use Qt? For real work using C++, I'd very highly recommend it. It's an absolutely excellent cross-platform toolkit. I just wrote a few plugins to get it working on the Kindle, and now the Palm Pre. Qt makes everything easy and fun. Downright rejuvenating, even. Well, until your first encounter with QModelIndex, but they've supposedly realized they over-engineered it and they're replacing it ;)
As an academic exercise though, this is an interesting problem. As a wheel re-inventor myself, I've even done it a few times now. :)
Short answer: I'd go with PIMPL. (Qt sources have examples a-plenty)
I've used base classes and platform specific derived classes in the past, but it usually ends up a bit messier than I had in mind. I've also done part of an implementation using some degree of function pointers for platform specific bits, and I was even less happy with that.
Both times I ended up with a very strong feeling that I was over-architecting and had lost my way.
I found using private implementation classes (PIMPL) with different platforms specific bits in different files easiest to write AND debug. However... don't be too afraid of an #ifdef or two, if it's just a few lines and very clear what's going on. I hate cluttered or nested #ifdef logic, but one or two here and there can really help avoid code duplication.
With PIMPL, you're not constantly refactoring your design as you discover new bits that require different implementations between platforms. That way be dragons.
At the implementation level, hidden from the application... there's nothing wrong with a few platform specific derived classes either. If two platform implementations are fairly well defined and share almost no data members, they'd be a good candidate for that. Just do it after realizing that, not before out of some idea that everything needs to fit your selected pattern.
If anything, the biggest gripe I have about coding today is how easily people seem to get lost in idealism. PIMPL is a pattern, having platform specific derived classes is another pattern. Using function pointers is a pattern. There's nothing that says they're mutually exclusive.
However, as a general guideline... start with PIMPL.
There're also the big boys, such as Qt4 (complete framework + GUI),GTK+ (gui-only afaik), and Boost (framework only, no GUI), all 3 support most platforms, GTK+ is C, Qt4/Boost are C++ and for the most part template based.
You might also want to take a look at poco:
The POCO C++ Libraries (POCO stands for POrtable COmponents) are open source C++ class libraries that simplify and accelerate the development of network-centric, portable applications in C++. The libraries integrate perfectly with the C++ Standard Library and fill many of the functional gaps left open by it. Their modular and efficient design and implementation makes the POCO C++ Libraries extremely well suited for embedded development, an area where the C++ programming language is becoming increasingly popular, due to its suitability for both low-level (device I/O, interrupt handlers, etc.) and high-level object-oriented development. Of course, the POCO C++ Libraries are also ready for enterprise-level challenges.
(source: pocoproject.org)
How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.
How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.
How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.
I'm having a bit of a go at developing a platform abstraction library for an application I'm writing, and struggling to come up with a neat way of separating my platform independent code from the platform specific code.
As I see it there are two basic approaches possible: platform independent classes with platform specific delegates, or platform independent classes with platform specific derived classes. Are there any inherent advantages/disadvantages to either approach? And in either case, what's the best mechanism to set up the delegation/inheritance relationship such that the process is transparent to a user of the platform independent classes?
I'd be grateful for any suggestions as to a neat architecture to employ, or even just some examples of what people have done in the past and the pros/cons of the given approach.
EDIT: in response to those suggesting Qt and similar, yes I'm purposely looking to "reinvent the wheel" as I'm not just concerned with developing the app, I'm also interested in the intellectual challenge of rolling my own platform abstraction library. Thanks for the suggestion though!
I'm using platform neutral header files, keeping any platform specific code in the source files (using the PIMPL idiom where neccessary). Each platform neutral header has one platform specific source file per platform, with extensions such as *.win32.cpp, *.posix.cpp. The platform specific ones are only compiled on the relevent platforms.
I also use boost libraries (filesystem, threads) to reduce the amount of platform specific code I have to maintain.
It's platform independent classes declarations with platform specific definitions.
Pros: Works fairly well, doesn't rely on the preprocessor - no #ifdef MyPlatform, keeps platform specific code readily identifiable, allows compiler specific features to be used in platform specific source files, doesn't pollute the global namespace by #including platform headers.
Cons: It's difficult to use inheritance with pimpled classes, sometimes the PIMPL structs need their own headers so they can be referenced from other platform specific source files.
Another way is to have platform independent conventions, but substitute platform specific source code at compile time.
That is to say that if you imagine a component, Foo, that has to be platform specific (like sockets or GUI elements), but has these public members:
class Foo {
public:
void write(const char* str);
void close();
};
Every module that has to use a Foo, obviously has #include "Foo.h", but in a platform specific make file you might have -IWin32, which means that the compiler looks in .\Win32 and finds a Windows specific Foo.h which contains the class, with the same interface, but maybe Windows specific private members etc.
So there is never any file which contains Foo as written above, but only sets of platform specific files which are only used when selected by a platform specific make file.
Have a look at ACE. It has a pretty good abstraction using templates and inheritance.
I might go for a policy-type thing:
template<typename Platform>
struct PlatDetails : private Platform {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") + getName();
}
};
// For any serious compatibility functions, these would
// of course have to be in different headers, and the implementations
// would call some platform-specific functions to get precise
// version numbers. Using PImpl would be a smart idea for these
// classes if they need any platform-specific members, since as
// Joe Gauterin says, you want to avoid your application code indirectly
// including POSIX or Windows system headers, containing useless definitions.
struct Windows {
std::string getName() const { return "Windows"; }
};
struct Linux {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef PlatDetails<Windows> PlatformDetails;
#else
typedef PlatDetails<Linux> PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
There's not a whole lot to choose though between doing this, and doing regular simulated dynamic binding with CRTP, so that the generic thing is the base and the specific thing the derived class:
template<typename Platform>
struct PlatDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
static_cast<Platform*>(this)->getName();
}
};
struct Windows : PlatDetails<Windows> {
std::string getName() const { return "Windows"; }
};
struct Linux : PlatDetails<Linux> {
std::string getName() const { return "Linux"; }
};
#ifdef WIN32
typedef Windows PlatformDetails;
#else
typedef Linux PlatformDetails;
#endif
int main() {
std::cout << PlatformDetails().getName() << "\n";
}
Basically in the latter version, getName must be public (although I think you can use friend) and so must be the inheritance, whereas in the former, the inheritance can be private and/or the interface functions can be protected, if desired. So the adaptor can be a firewall between the interface the platform has to implement, and the interface your application code uses. Furthermore you can have multiple policies in the former (i.e. multiple platform-dependent facets used by the same platform-independent class), but not for the latter.
The advantage of either of them over versions with delegates or non-template-using inheritance, is that you don't need any virtual functions. Arguably this isn't a whole lot of advantage, considering how scary both policy-based design and CRTP are at first contact.
In practice, though, I agree with quamrana that normally you can just have different implementations of the same thing on different platforms:
// Or just set the include path with -I or whatever
#ifdef WIN32
#include "windows/platform.h"
#else
#include "linux/platform.h"
#endif
struct PlatformDetails {
std::string getDetails() const {
return std::string("MyAbstraction v1.0; ") +
porting::getName();
}
};
// windows/platform.h
namespace porting {
std::string getName() { return "Windows"; }
}
// linux/platform.h
namespace porting {
std::string getName() { return "Linux"; }
}
If you like to use a full-blown c++ framework available for many platforms and permissive copylefted, use Qt.
So... you don't want to simply use Qt? For real work using C++, I'd very highly recommend it. It's an absolutely excellent cross-platform toolkit. I just wrote a few plugins to get it working on the Kindle, and now the Palm Pre. Qt makes everything easy and fun. Downright rejuvenating, even. Well, until your first encounter with QModelIndex, but they've supposedly realized they over-engineered it and they're replacing it ;)
As an academic exercise though, this is an interesting problem. As a wheel re-inventor myself, I've even done it a few times now. :)
Short answer: I'd go with PIMPL. (Qt sources have examples a-plenty)
I've used base classes and platform specific derived classes in the past, but it usually ends up a bit messier than I had in mind. I've also done part of an implementation using some degree of function pointers for platform specific bits, and I was even less happy with that.
Both times I ended up with a very strong feeling that I was over-architecting and had lost my way.
I found using private implementation classes (PIMPL) with different platforms specific bits in different files easiest to write AND debug. However... don't be too afraid of an #ifdef or two, if it's just a few lines and very clear what's going on. I hate cluttered or nested #ifdef logic, but one or two here and there can really help avoid code duplication.
With PIMPL, you're not constantly refactoring your design as you discover new bits that require different implementations between platforms. That way be dragons.
At the implementation level, hidden from the application... there's nothing wrong with a few platform specific derived classes either. If two platform implementations are fairly well defined and share almost no data members, they'd be a good candidate for that. Just do it after realizing that, not before out of some idea that everything needs to fit your selected pattern.
If anything, the biggest gripe I have about coding today is how easily people seem to get lost in idealism. PIMPL is a pattern, having platform specific derived classes is another pattern. Using function pointers is a pattern. There's nothing that says they're mutually exclusive.
However, as a general guideline... start with PIMPL.
There're also the big boys, such as Qt4 (complete framework + GUI),GTK+ (gui-only afaik), and Boost (framework only, no GUI), all 3 support most platforms, GTK+ is C, Qt4/Boost are C++ and for the most part template based.
You might also want to take a look at poco:
The POCO C++ Libraries (POCO stands for POrtable COmponents) are open source C++ class libraries that simplify and accelerate the development of network-centric, portable applications in C++. The libraries integrate perfectly with the C++ Standard Library and fill many of the functional gaps left open by it. Their modular and efficient design and implementation makes the POCO C++ Libraries extremely well suited for embedded development, an area where the C++ programming language is becoming increasingly popular, due to its suitability for both low-level (device I/O, interrupt handlers, etc.) and high-level object-oriented development. Of course, the POCO C++ Libraries are also ready for enterprise-level challenges.
(source: pocoproject.org)