Is it an acceptable way to use class' private methods in C++? - c++

In my C++ program I have a class, in some methods of which there are same routines happen, such as opening streams for reading/writing to files, parsing files, determining mime types, etc. Same routines are also used in constructor. To make methods more compact and avoid typing same code multiple times I split these routine operations into private methods for using inside the class only. However, some of these private methods depend on the result of the others, so that calling these methods in wrong order could lead in pretty bad consequences.
Just a stupid example:
class Example
{
public:
Example(int x);
~Example() {}
//...
//...
protected:
private:
int a;
int b;
bool c;
void foo_();
void bar_();
//...
//...
};
Example::Example(int x) : a(x)
{
foo_();
bar_();
}
void Example::foo_()
{
if (a == 0)
{
b = 10;
}
else
{
b = a * 2;
}
}
void Example::bar_()
{
if (b == 0)
{
c = false;
}
else
{
c = true;
}
}
As can be seen from the above example, calling bar_() before foo_() in constructor will lead in undefined behavior because b has not been yet initialized. But should I bother about such nuances if I am definitely sure that I am using these private methods correctly inside the class, and they can never be used outside the class?

Not to mention that what you did is the recommended way! Whenever you have multiple different operations inside a function, the standard way is to separate the function into multiple functions. In your case, the user does not need those functions, so making them private was the best you could do! When it comes to the part where "I need to call them in a specific order", its entirely fine if the code needs calls in a particular order. I mean, its only logical to call foo after bar is the former depends on the result of the later. It's not much different than when you need to assign memory to int* p before using it as an array. Although, as #Basil and many others have explained, be sure to document your code correctly

calling bar_() before foo_() in constructor will lead in undefined behavior because b has not been yet initialized
As a rule of thumb, I always explicitly initialize all member fields in a constructor (in particular those having a scalar type like pointers or numbers, e.g. your a,b,c inside class Example). Advantage: the behavior of your program is more reproducible. Disadvantage: the compiled code might run useless initialization (but clever optimizing compilers would remove them).
If you compile with GCC, use it as g++ -Wall -Wextra -g. It usually gives you useful warnings.
For a large C++ project, consider documenting your coding rules (in a separate written document, on paper, distributed to all developers in your team) and checking some of them with your GCC plugin. See also the DECODER project and the Bismon static source code analyzer, and the Clang static analyzer (all of GCC, Bismon and Clang analyzer are open source, you can improve their source code).
In some cases some C++ code is generated. See GNU bison, ANTLR, RefPerSys, FLTK, Qt as examples of software projects generating C++ code or providing code generators emitting C++ code. On x86/64 PCs, you could generate machine code at runtime with ASMJIT or libgccjit, and call that code thru function pointers (on Linux see also dlopen(3), dlsym(3) and the C++ dlopen minihowto...). If your software project has C++ code generators (e.g. using GPP), you can ensure that the generated code respects some of your coding conventions and invariants. Be however aware of Rice's theorem.
If you debug with GDB, read about its watch command and watchpoints.
I am also aware of the C++ rule of five.

Related

Is use of the STL on C++ APIs safe? [duplicate]

How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.

How to return C++ class when using extern "C"? [duplicate]

How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.

Dead virtual function elimination

Question
(Can I get clang or perhaps some other optimizing tool shipped with LLVM to identify unused virtual functions in a C++ program, to mark them for dead code elimination? I guess not.)
If there is no such functionality shipped with LLVM, how would one go about implementing a thing like this? What's the most appropriate layer to achieve this, and where can I find examples on which I could build this?
Thoughts
My first thought was an optimizer working on LLVM bitcode or IR. After all, a lot of optimizers are written for that representation. Simple dead code elimination is easy enough: any function which is neither called nor has its address taken and stored somewhere is dead code and can be omitted from the final binary. But a virtual function has its address taken and stored in the virtual function table of the corresponding class. In order to identify whether that function has a chance of getting called, an optimizer would not only have to identify all virtual function calls, but also identify the type hierarchy to map these virtual function calls to all possible implementations.
This makes things look quite hard to tackle at the bitcode level. It might be better to handle this somewhere closer to the front end, at a stage where more type information is available, and where calls to a virtual function might be more readily associated with implementations of these functions. Perhaps the VirtualCallChecker could serve as a starting point.
One problem is probably the fact that while it's possible to combine the bitcode of several objects into a single unit for link time optimization, one hardly ever compiles all the source code of a moderately sized project as a single translation unit. So the association between virtual function calls and implementations might have to be somehow maintained till that stage. I don't know if any kind of custom annotation is possible with LLVM; I have seen no indication of this in the language specification.
But I'm having a bit of a trouble with the language specification in any case. The only reference to virtual in there are the virtuality and virtualIndex properties of MDSubprogram, but so far I have found no information at all about their semantics. No documentation, nor any useful places inside the LLVM source code. I might be looking at the wrong documentation for my use case.
Cross references
eliminate unused virtual functions asked about pretty much the same thing in the context of GCC, but I'm specifically looking for a LLVM solution here. There used to be a -fvtable-gc switch to GCC, but apparently it was too buggy and got punted, and clang doesn't support it either.
Example:
struct foo {
virtual ~foo() { }
virtual int a() { return 12345001; }
virtual int b() { return 12345002; }
};
struct bar : public foo {
virtual ~bar() { }
virtual int a() { return 12345003; }
virtual int b() { return 12345004; }
};
int main(int argc, char** argv) {
foo* p = (argc & 1 ? new foo() : new bar());
int res = p->a();
delete p;
return res;
};
How can I write a tool to automatically get rid of foo::b() and bar::b() in the generated code?
clang++ -fuse-ld=gold -O3 -flto with clang 3.5.1 wasn't enough, as an objdump -d -C of the resulting executable showed.
Question focus changed
Originally I had been asking not only about how to use clang or LLVM to this effect, but possibly for third party tools to achieve the same if clang and LLVM were not up to the task. Questions asking for tools are frowned upon here, though, so by now the focus has shifted from finding a tool to writing one. I guess chances for finding one are slim in any case, since a web search revealed no hints in that direction.

How do I safely pass objects, especially STL objects, to and from a DLL?

How do I pass class objects, especially STL objects, to and from a C++ DLL?
My application has to interact with third-party plugins in the form of DLL files, and I can't control what compiler these plugins are built with. I'm aware that there's no guaranteed ABI for STL objects, and I'm concerned about causing instability in my application.
The short answer to this question is don't. Because there's no standard C++ ABI (application binary interface, a standard for calling conventions, data packing/alignment, type size, etc.), you will have to jump through a lot of hoops to try and enforce a standard way of dealing with class objects in your program. There's not even a guarantee it'll work after you jump through all those hoops, nor is there a guarantee that a solution which works in one compiler release will work in the next.
Just create a plain C interface using extern "C", since the C ABI is well-defined and stable.
If you really, really want to pass C++ objects across a DLL boundary, it's technically possible. Here are some of the factors you'll have to account for:
Data packing/alignment
Within a given class, individual data members will usually be specially placed in memory so their addresses correspond to a multiple of the type's size. For example, an int might be aligned to a 4-byte boundary.
If your DLL is compiled with a different compiler than your EXE, the DLL's version of a given class might have different packing than the EXE's version, so when the EXE passes the class object to the DLL, the DLL might be unable to properly access a given data member within that class. The DLL would attempt to read from the address specified by its own definition of the class, not the EXE's definition, and since the desired data member is not actually stored there, garbage values would result.
You can work around this using the #pragma pack preprocessor directive, which will force the compiler to apply specific packing. The compiler will still apply default packing if you select a pack value bigger than the one the compiler would have chosen, so if you pick a large packing value, a class can still have different packing between compilers. The solution for this is to use #pragma pack(1), which will force the compiler to align data members on a one-byte boundary (essentially, no packing will be applied). This is not a great idea, as it can cause performance issues or even crashes on certain systems. However, it will ensure consistency in the way your class's data members are aligned in memory.
Member reordering
If your class is not standard-layout, the compiler can rearrange its data members in memory. There is no standard for how this is done, so any data rearranging can cause incompatibilities between compilers. Passing data back and forth to a DLL will require standard-layout classes, therefore.
Calling convention
There are multiple calling conventions a given function can have. These calling conventions specify how data is to be passed to functions: are parameters stored in registers or on the stack? What order are arguments pushed onto the stack? Who cleans up any arguments left on the stack after the function finishes?
It's important you maintain a standard calling convention; if you declare a function as _cdecl, the default for C++, and try to call it using _stdcall bad things will happen. _cdecl is the default calling convention for C++ functions, however, so this is one thing that won't break unless you deliberately break it by specifying an _stdcall in one place and a _cdecl in another.
Datatype size
According to this documentation, on Windows, most fundamental datatypes have the same sizes regardless of whether your app is 32-bit or 64-bit. However, since the size of a given datatype is enforced by the compiler, not by any standard (all the standard guarantees is that 1 == sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)), it's a good idea to use fixed-size datatypes to ensure datatype size compatibility where possible.
Heap issues
If your DLL links to a different version of the C runtime than your EXE, the two modules will use different heaps. This is an especially likely problem given that the modules are being compiled with different compilers.
To mitigate this, all memory will have to be allocated into a shared heap, and deallocated from the same heap. Fortunately, Windows provides APIs to help with this: GetProcessHeap will let you access the host EXE's heap, and HeapAlloc/HeapFree will let you allocate and free memory within this heap. It is important that you not use normal malloc/free as there is no guarantee they will work the way you expect.
STL issues
The C++ standard library has its own set of ABI issues. There is no guarantee that a given STL type is laid out the same way in memory, nor is there a guarantee that a given STL class has the same size from one implementation to another (in particular, debug builds may put extra debug information into a given STL type). Therefore, any STL container will have to be unpacked into fundamental types before being passed across the DLL boundary and repacked on the other side.
Name mangling
Your DLL will presumably export functions which your EXE will want to call. However, C++ compilers do not have a standard way of mangling function names. This means a function named GetCCDLL might be mangled to _Z8GetCCDLLv in GCC and ?GetCCDLL##YAPAUCCDLL_v1##XZ in MSVC.
You already won't be able to guarantee static linking to your DLL, since a DLL produced with GCC won't produce a .lib file and statically linking a DLL in MSVC requires one. Dynamically linking seems like a much cleaner option, but name mangling gets in your way: if you try to GetProcAddress the wrong mangled name, the call will fail and you won't be able to use your DLL. This requires a little bit of hackery to get around, and is a fairly major reason why passing C++ classes across a DLL boundary is a bad idea.
You'll need to build your DLL, then examine the produced .def file (if one is produced; this will vary based on your project options) or use a tool like Dependency Walker to find the mangled name. Then, you'll need to write your own .def file, defining an unmangled alias to the mangled function. As an example, let's use the GetCCDLL function I mentioned a bit further up. On my system, the following .def files work for GCC and MSVC, respectively:
GCC:
EXPORTS
GetCCDLL=_Z8GetCCDLLv #1
MSVC:
EXPORTS
GetCCDLL=?GetCCDLL##YAPAUCCDLL_v1##XZ #1
Rebuild your DLL, then re-examine the functions it exports. An unmangled function name should be among them. Note that you cannot use overloaded functions this way: the unmangled function name is an alias for one specific function overload as defined by the mangled name. Also note that you'll need to create a new .def file for your DLL every time you change the function declarations, since the mangled names will change. Most importantly, by bypassing the name mangling, you're overriding any protections the linker is trying to offer you with regards to incompatibility issues.
This whole process is simpler if you create an interface for your DLL to follow, since you'll just have one function to define an alias for instead of needing to create an alias for every function in your DLL. However, the same caveats still apply.
Passing class objects to a function
This is probably the most subtle and most dangerous of the issues that plague cross-compiler data passing. Even if you handle everything else, there's no standard for how arguments are passed to a function. This can cause subtle crashes with no apparent reason and no easy way to debug them. You'll need to pass all arguments via pointers, including buffers for any return values. This is clumsy and inconvenient, and is yet another hacky workaround that may or may not work.
Putting together all these workarounds and building on some creative work with templates and operators, we can attempt to safely pass objects across a DLL boundary. Note that C++11 support is mandatory, as is support for #pragma pack and its variants; MSVC 2013 offers this support, as do recent versions of GCC and clang.
//POD_base.h: defines a template base class that wraps and unwraps data types for safe passing across compiler boundaries
//define malloc/free replacements to make use of Windows heap APIs
namespace pod_helpers
{
void* pod_malloc(size_t size)
{
HANDLE heapHandle = GetProcessHeap();
HANDLE storageHandle = nullptr;
if (heapHandle == nullptr)
{
return nullptr;
}
storageHandle = HeapAlloc(heapHandle, 0, size);
return storageHandle;
}
void pod_free(void* ptr)
{
HANDLE heapHandle = GetProcessHeap();
if (heapHandle == nullptr)
{
return;
}
if (ptr == nullptr)
{
return;
}
HeapFree(heapHandle, 0, ptr);
}
}
//define a template base class. We'll specialize this class for each datatype we want to pass across compiler boundaries.
#pragma pack(push, 1)
// All members are protected, because the class *must* be specialized
// for each type
template<typename T>
class pod
{
protected:
pod();
pod(const T& value);
pod(const pod& copy);
~pod();
pod<T>& operator=(pod<T> value);
operator T() const;
T get() const;
void swap(pod<T>& first, pod<T>& second);
};
#pragma pack(pop)
//POD_basic_types.h: holds pod specializations for basic datatypes.
#pragma pack(push, 1)
template<>
class pod<unsigned int>
{
//these are a couple of convenience typedefs that make the class easier to specialize and understand, since the behind-the-scenes logic is almost entirely the same except for the underlying datatypes in each specialization.
typedef int original_type;
typedef std::int32_t safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
safe_type* data;
original_type get() const
{
original_type result;
result = static_cast<original_type>(*data);
return result;
}
void set_from(const original_type& value)
{
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type))); //note the pod_malloc call here - we want our memory buffer to go in the process heap, not the possibly-isolated DLL heap.
if (data == nullptr)
{
return;
}
new(data) safe_type (value);
}
void release()
{
if (data)
{
pod_helpers::pod_free(data); //pod_free to go with the pod_malloc.
data = nullptr;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
}
};
#pragma pack(pop)
The pod class is specialized for every basic datatype, so that int will automatically be wrapped to int32_t, uint will be wrapped to uint32_t, etc. This all occurs behind the scenes, thanks to the overloaded = and () operators. I have omitted the rest of the basic type specializations since they're almost entirely the same except for the underlying datatypes (the bool specialization has a little bit of extra logic, since it's converted to a int8_t and then the int8_t is compared to 0 to convert back to bool, but this is fairly trivial).
We can also wrap STL types in this way, although it requires a little extra work:
#pragma pack(push, 1)
template<typename charT>
class pod<std::basic_string<charT>> //double template ftw. We're specializing pod for std::basic_string, but we're making this specialization able to be specialized for different types; this way we can support all the basic_string types without needing to create four specializations of pod.
{
//more comfort typedefs
typedef std::basic_string<charT> original_type;
typedef charT safe_type;
public:
pod() : data(nullptr) {}
pod(const original_type& value)
{
set_from(value);
}
pod(const charT* charValue)
{
original_type temp(charValue);
set_from(temp);
}
pod(const pod<original_type>& copyVal)
{
original_type copyData = copyVal.get();
set_from(copyData);
}
~pod()
{
release();
}
pod<original_type>& operator=(pod<original_type> value)
{
swap(*this, value);
return *this;
}
operator original_type() const
{
return get();
}
protected:
//this is almost the same as a basic type specialization, but we have to keep track of the number of elements being stored within the basic_string as well as the elements themselves.
safe_type* data;
typename original_type::size_type dataSize;
original_type get() const
{
original_type result;
result.reserve(dataSize);
std::copy(data, data + dataSize, std::back_inserter(result));
return result;
}
void set_from(const original_type& value)
{
dataSize = value.size();
data = reinterpret_cast<safe_type*>(pod_helpers::pod_malloc(sizeof(safe_type) * dataSize));
if (data == nullptr)
{
return;
}
//figure out where the data to copy starts and stops, then loop through the basic_string and copy each element to our buffer.
safe_type* dataIterPtr = data;
safe_type* dataEndPtr = data + dataSize;
typename original_type::const_iterator iter = value.begin();
for (; dataIterPtr != dataEndPtr;)
{
new(dataIterPtr++) safe_type(*iter++);
}
}
void release()
{
if (data)
{
pod_helpers::pod_free(data);
data = nullptr;
dataSize = 0;
}
}
void swap(pod<original_type>& first, pod<original_type>& second)
{
using std::swap;
swap(first.data, second.data);
swap(first.dataSize, second.dataSize);
}
};
#pragma pack(pop)
Now we can create a DLL that makes use of these pod types. First we need an interface, so we'll only have one method to figure out mangling for.
//CCDLL.h: defines a DLL interface for a pod-based DLL
struct CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) = 0;
};
CCDLL_v1* GetCCDLL();
This just creates a basic interface both the DLL and any callers can use. Note that we're passing a pointer to a pod, not a pod itself. Now we need to implement that on the DLL side:
struct CCDLL_v1_implementation: CCDLL_v1
{
virtual void ShowMessage(const pod<std::wstring>* message) override;
};
CCDLL_v1* GetCCDLL()
{
static CCDLL_v1_implementation* CCDLL = nullptr;
if (!CCDLL)
{
CCDLL = new CCDLL_v1_implementation;
}
return CCDLL;
}
And now let's implement the ShowMessage function:
#include "CCDLL_implementation.h"
void CCDLL_v1_implementation::ShowMessage(const pod<std::wstring>* message)
{
std::wstring workingMessage = *message;
MessageBox(NULL, workingMessage.c_str(), TEXT("This is a cross-compiler message"), MB_OK);
}
Nothing too fancy: this just copies the passed pod into a normal wstring and shows it in a messagebox. After all, this is just a POC, not a full utility library.
Now we can build the DLL. Don't forget the special .def files to work around the linker's name mangling. (Note: the CCDLL struct I actually built and ran had more functions than the one I present here. The .def files may not work as expected.)
Now for an EXE to call the DLL:
//main.cpp
#include "../CCDLL/CCDLL.h"
typedef CCDLL_v1*(__cdecl* fnGetCCDLL)();
static fnGetCCDLL Ptr_GetCCDLL = NULL;
int main()
{
HMODULE ccdll = LoadLibrary(TEXT("D:\\Programming\\C++\\CCDLL\\Debug_VS\\CCDLL.dll")); //I built the DLL with Visual Studio and the EXE with GCC. Your paths may vary.
Ptr_GetCCDLL = (fnGetCCDLL)GetProcAddress(ccdll, (LPCSTR)"GetCCDLL");
CCDLL_v1* CCDLL_lib;
CCDLL_lib = Ptr_GetCCDLL(); //This calls the DLL's GetCCDLL method, which is an alias to the mangled function. By dynamically loading the DLL like this, we're completely bypassing the name mangling, exactly as expected.
pod<std::wstring> message = TEXT("Hello world!");
CCDLL_lib->ShowMessage(&message);
FreeLibrary(ccdll); //unload the library when we're done with it
return 0;
}
And here are the results. Our DLL works. We've successfully reached past STL ABI issues, past C++ ABI issues, past mangling issues, and our MSVC DLL is working with a GCC EXE.
In conclusion, if you absolutely must pass C++ objects across DLL boundaries, this is how you do it. However, none of this is guaranteed to work with your setup or anyone else's. Any of this may break at any time, and probably will break the day before your software is scheduled to have a major release. This path is full of hacks, risks, and general idiocy that I probably should be shot for. If you do go this route, please test with extreme caution. And really... just don't do this at all.
#computerfreaker has written a great explanation of why the lack of ABI prevents passing C++ objects across DLL boundaries in the general case, even when the type definitions are under user control and the exact same token sequence is used in both programs. (There are two cases which do work: standard-layout classes, and pure interfaces)
For object types defined in the C++ Standard (including those adapted from the Standard Template Library), the situation is far, far worse. The tokens defining these types are NOT the same across multiple compilers, as the C++ Standard does not provide a complete type definition, only minimum requirements. In addition, name lookup of the identifiers that appear in these type definitions don't resolve the same. Even on systems where there is a C++ ABI, attempting to share such types across module boundaries results in massive undefined behavior due to One Definition Rule violations.
This is something that Linux programmers weren't accustomed to dealing with, because g++'s libstdc++ was a de-facto standard and virtually all programs used it, thus satisfying the ODR. clang's libc++ broke that assumption, and then C++11 came along with mandatory changes to nearly all Standard library types.
Just don't share Standard library types between modules. It's undefined behavior.
Some of the answers here make passing C++ classes sound really scary, but I'd like to share an alternate point of view. The pure virtual C++ method mentioned in some of the other responses actually turns out to be cleaner than you might think. I've built an entire plugin system around the concept and it's been working very well for years. I have a "PluginManager" class that dynamically loads the dlls from a specified directory using LoadLib() and GetProcAddress() (and the Linux equivalents so the executable to make it cross platform).
Believe it or not, this method is forgiving even if you do some wacky stuff like add a new function at the end of your pure virtual interface and try to load dlls compiled against the interface without that new function - they'll load just fine. Of course... you'll have to check a version number to make sure your executable only calls the new function for newer dlls that implement the function. But the good news is: it works! So in a way, you have a crude method for evolving your interface over time.
Another cool thing about pure virtual interfaces - you can inherit as many interfaces as you want and you'll never run into the diamond problem!
I would say the biggest downside to this approach is that you have to be very careful about what types you pass as parameters. No classes or STL objects without wrapping them with pure virtual interfaces first. No structs (without going through the pragma pack voodoo). Just primative types and pointers to other interfaces. Also, you can't overload functions, which is an inconvenience, but not a show-stopper.
The good news is that with a handful of lines of code you can make reusable generic classes and interfaces to wrap STL strings, vectors, and other container classes. Alternatively, you can add functions to your interface like GetCount() and GetVal(n) to let people loop through lists.
People building plugins for us find it quite easy. They don't have to be experts on the ABI boundary or anything - they just inherit the interfaces they're interested in, code up the functions they support, and return false for the ones they don't.
The technology that makes all this work isn't based on any standard as far as I know. From what I gather, Microsoft decided to do their virtual tables that way so they could make COM, and other compiler writers decided to follow suit. This includes GCC, Intel, Borland, and most other major C++ compilers. If you're planning on using an obscure embedded compiler then this approach probably won't work for you. Theoretically any compiler company could change their virtual tables at any time and break things, but considering the massive amount of code written over the years that depends on this technology, I would be very surprised if any of the major players decided to break rank.
So the moral of the story is... With the exception of a few extreme circumstances, you need one person in charge of the interfaces who can make sure the ABI boundary stays clean with primitive types and avoids overloading. If you are OK with that stipulation, then I wouldn't be afraid to share interfaces to classes in DLLs/SOs between compilers. Sharing classes directly == trouble, but sharing pure virtual interfaces isn't so bad.
You cannot safely pass STL objects across DLL boundaries, unless all the modules (.EXE and .DLLs) are built with the same C++ compiler version and the same settings and flavors of the CRT, which is highly constraining, and clearly not your case.
If you want to expose an object-oriented interface from your DLL, you should expose C++ pure interfaces (which is similar to what COM does). Consider reading this interesting article on CodeProject:
HowTo: Export C++ classes from a DLL
You may also want to consider exposing a pure C interface at the DLL boundary, and then building a C++ wrapper at the caller site.
This is similar to what happens in Win32: Win32 implementation code is almost C++, but lots of Win32 APIs expose a pure C interface (there are also APIs that expose COM interfaces). Then ATL/WTL and MFC wrap these pure C interfaces with C++ classes and objects.

Do repetitive calls to member functions hurt?

I have programmed in both Java and C, and now I am trying to get my hands dirty with C++.
Given this code:
class Booth {
private :
int tickets_sold;
public :
int get_tickets_sold();
void set_tickets_sold();
};
In Java, wherever I needed the value of tickets_sold, I would call the getter repeatedly.
For example:
if (obj.get_tickets_sold() > 50 && obj.get_tickets_sold() < 75){
//do something
}
In C I would just get the value of the particular variable in the structure:
if( obj_t->tickets_sold > 50 && obj_t->tickets_sold < 75){
//do something
}
So while using structures in C, I save on the two calls that I would otherwise make in Java, the two getters that is, I am not even sure if those are actual calls or Java somehow inlines those calls.
My point is if I use the same technique that I used in Java in C++ as well, will those two calls to getter member functions cost me, or will the compiler somehow know to inline the code? (thus reducing the overhead of function call altogether?)
Alternatively, am I better off using:
int num_tickets = 0;
if ( (num_tickets = obj.get_ticket_sold()) > 50 && num_tickets < 75){
//do something
}
I want to write tight code and avoid unnecessary function calls, I would care about this in Java, because, well, we all know why. But, I want my code to be readable and to use the private and public keywords to correctly reflect what is to be done.
Unless your program is too slow, it doesn't really matter. In 99.9999% of code, the overhead of a function call is insignificant. Write the clearest, easiest to maintain, easiest to understand code that you can and only start tweaking for performance after you know where your performance hot spots are, if you have any at all.
That said, modern C++ compilers (and some linkers) can and will inline functions, especially simple functions like this one.
If you're just learning the language, you really shouldn't worry about this. Consider it fast enough until proven otherwise. That said, there are a lot of misleading or incomplete answers here, so for the record I'll flesh out a few of the subtler implications. Consider your class:
class Booth
{
public:
int get_tickets_sold();
void set_tickets_sold();
private:
int tickets_sold;
};
The implementation (known as a definition) of the get and set functions is not yet specified. If you'd specified function bodies inside the class declaration then the compiler would consider you to have implicitly requested they be inlined (but may ignore that if they're excessively large). If you specify them later using the inline keyword, that has exactly the safe effect. Summarily...
class Booth
{
public:
int get_tickets_sold() { return tickets_sold; }
...
...and...
class Booth
{
public:
int get_tickets_sold();
...
};
inline int Booth::get_tickets_sold() { return tickets_sold; }
...are equivalent (at least in terms of what the Standard encourages us to expect, but individual compiler heuristics may vary - inlining is a request that the compiler's free to ignore).
If the function bodies are specified later without the inline keyword, then the compiler is under no obligation to inline them, but may still choose to do so. It's much more likely to do so if they appear in the same translation unit (i.e. in the .cc/.cpp/.c++/etc. "implementation" file you're compiling or some header directly or indirectly included by it). If the implementation is only available at link time then the functions may not be inlined at all, but it depends on the way your particular compiler and linker interact and cooperate. It is not simply a matter of enabling optimisation and expecting magic. To prove this, consider the following code:
// inline.h:
void f();
// inline.cc:
#include <cstdio>
void f() { printf("f()\n"); }
// inline_app.cc:
#include "inline.h"
int main() { f(); }
Building this:
g++ -O4 -c inline.cc
g++ -O4 -o inline_app inline_app.cc inline.o
Investigating the inlining:
$ gdb inline_app
...
(gdb) break main
Breakpoint 1 at 0x80483f3
(gdb) break f
Breakpoint 2 at 0x8048416
(gdb) run
Starting program: /home/delroton/dev/inline_app
Breakpoint 1, 0x080483f3 in main ()
(gdb) next
Single stepping until exit from function main,
which has no line number information.
Breakpoint 2, 0x08048416 in f ()
(gdb) step
Single stepping until exit from function _Z1fv,
which has no line number information.
f()
0x080483fb in main ()
(gdb)
Notice the execution went from 0x080483f3 in main() to 0x08048416 in f() then back to 0x080483fb in main()... clearly not inlined. This illustrates that inlining can't be expected just because a function's implementation is trivial.
Notice that this example is with static linking of object files. Clearly, if you use library files you may actually want to avoid inlining of the functions specifically so that you can update the library without having to recompile the client code. It's even more useful for shared libraries where the linking is done implicitly at load time anyway.
Very often, classes providing trivial functions use the two forms of expected-inlined function definitions (i.e. inside class or with inline keyword) if those functions can be expected to be called inside any performance-critical loops, but the countering consideration is that by inlining a function you force client code to be recompiled (relatively slow, possibly no automated trigger) and relinked (fast, for shared libraries happens on next execution), rather than just relinked, in order to pick up changes to the function implementation.
These kind of considerations are annoying, but deliberate management of these tradeoffs is what allows enterprise use of C and C++ to scale to tens and hundreds of millions of lines and thousands of individual projects, all sharing various libraries over decades.
One other small detail: as a ballpark figure, an out-of-line get/set function is typically about an order of magnitude (10x) slower than the equivalent inlined code. That will obviously vary with CPU, compiler, optimisation level, variable type, cache hits/misses etc..
No, repetitive calls to member functions will not hurt.
If it's just a getter function, it will almost certainly be inlined by the C++ compiler (at least with release/optimized builds) and the Java Virtual Machine may "figure out" that a certain function is being called frequently and optimize for that. So there's pretty much no performance penalty for using functions in general.
You should always code for readability first. Of course, that's not to say that you should completely ignore performance outright, but if performance is unacceptable then you can always profile your code and see where the slowest parts are.
Also, by restricting access to the tickets_sold variable behind getter functions, you can pretty much guarantee that the only code that can modify the tickets_sold variable to member functions of Booth. This allows you to enforce invariants in program behavior.
For example, tickets_sold is obviously not going to be a negative value. That is an invariant of the structure. You can enforce that invariant by making tickets_sold private and making sure your member functions do not violate that invariant. The Booth class makes tickets_sold available as a "read-only data member" via a getter function to everyone else and still preserves the invariant.
Making it a public variable means that anybody can go and trample over the data in tickets_sold, which basically completely destroys your ability to enforce any invariants on tickets_sold. Which makes it possible for someone to write a negative number into tickets_sold, which is of course nonsensical.
The compiler is very likely to inline function calls like this.
class Booth {
public:
int get_tickets_sold() const { return tickets_sold; }
private:
int tickets_sold;
};
Your compiler should inline get_tickets_sold, I would be very surprised if it didn't. If not, you either need to use a new compiler or turn on optimizations.
Any compiler worth its salt will easily optimize the getters into direct member access. The only times that won't happen are when you have optimization explicitly disabled (e.g. for a debug build) or if you're using a brain-dead compiler (in which case, you should seriously consider ditching it for a real compiler).
The compiler will very likely do the work for you, but in general, for things like this I would approach it more from the C perspective rather than the Java perspective unless you want to make the member access a const reference. However, when dealing with integers, there's usually little value in using a const reference over a copy (at least in 32 bit environments since both are 4 bytes), so your example isn't really a good one here... Perhaps this may illustrate why you would use a getter/setter in C++:
class StringHolder
{
public:
const std::string& get_string() { return my_string; }
void set_string(const std::string& val) { if(!val.empty()) { my_string = val; } }
private
std::string my_string;
}
That prevents modification except through the setter which would then allow you to perform extra logic. However, in a simple class such as this, the value of this model is nil, you've just made the coder who is calling it type more and haven't really added any value. For such a class, I wouldn't have a getter/setter model.