C++ interface design around shared library boundaries - c++

Suppose I have two projects. One is an application and the other is a shared library that contains common, reusable code that could be used by more than just this application.
My application uses STL, and my shared library also uses STL. The first problem here is that my shared library is using STL. If I ever build a newer version of STL into my application but I do not rebuild my shared library because it is not necessary, then we will have compatibility issues right away.
My first thought to solve this issue is to not use STL at all in the interface to the shared library classes. Suppose we have a function in my library that takes a string and does something with it. I would make the function prototype look like:
void DoStuffWithStrings( char const* str );
instead of:
void DoStuffWithStrings( std::string const& str );
For strings this will probably be OK between different versions of STL, but the downside is that we are going from std::string, to char*, and back to std::string, which seems like it causes performance issues.
Is the boxing/unboxing of raw types to their STL counterparts recommended? This becomes even worse when we try to do this to a std::list, since there really is no "raw type" I am aware of that we could easily pass it as without doing some sort of O(n) or similar operation.
What designs work best in this scenario? What are the pros/cons of each?

One of the pros of the const char* approach is that your library will also be callable from C, and hence from a lot of other laguages interfacing to C as well (pretty much everything out there). This alone is a very interesting selling point.
However, if you write and maintain both libraries, and they will be used in C++ only (say for the next 5 years), I would not go through the hassle of converting everything. std::string is one thing, std::vector or std::map won't convert as nicely. Apart from that, how many times do you move to another STL implementation? And in those cases, are you really going to 'forget' to rebuild your shared library as well? Also, you can still write/generate C style wrappers afterwards if really needed.
Conclusion (biased towards my experiences with this matter): if you don't need C, go with stl.

Surely the standard C++ library should be considered a part of the C++ ABI just as much as the virtual table layout or name mangling scheme. Besides, any incompatible changes in the ABI are more likely to affect obscure corner cases rather than the layout of std::vector. In other words: if you're going to make a C++ library, feel free to use standard C++ classes.

Another problem can come up if the library uses a different heap than the application. If this is the case or could be the case, make sure library owns/manages its own memory. Multiple heaps can be an issue when the library uses a different c-library and therefore a different malloc/free and therefore a different heap.
http://msdn.microsoft.com/en-us/library/ms810466.aspx

Many A.P.I. and Shared Libraries use "opaque" or generic pointers as arguments in functions, in order to avoid differences among versions.
// this:
int MyFunc(char* Param1, int Param2, bool Param3);
// into this:
struct MyParams
{
char* Param1;
int Param2;
bool Param3;
}
// "Params" its really "struct MyParams*"
int MyFunc(void* Params);
And sometimes if a Shared Library function has several arguments, they replaced to a pointer, taht is expected to be a pointer to an array or structure or even a class.
It depends how are you going to work with your library, since many libraries are used like plain C, even if you are using C++.

Related

Exporting C++ from dll - Domain and collections

There are several questions already on stack overflow regarding classes and functions across dll boundries.
Alot of them reference this article https://www.codeproject.com/Articles/28969/HowTo-Export-C-classes-from-a-DLL
I want to drill down a bit and ask some more specific questions.
The summary of most of those questions is
1) Exporting templates and the STL across dll boundries is bad in general
2) You cannot allocated on one side of a dll boundry and release in another.
3) Your options are
a) Export only pure C interfaces
b) Export abstract classes that are free to use templated and the STL in their internal implementation.
Ok. Well, I've never ever done this in 10 years of coding C++ and it has never nipped me in the butt. I've followed what the article calls "The naive approach." However, I resent being called naive a little bit :), because I've always had 100% of the source in every solution and always built 100% of it with the same compiler and settings, and have been mindful of such.
Recently, I've collected evidence that I have a problem with an object being allocated in on one side of a dll boundry and released in another. So, I may find myself having to change my ways. Perhaps, this is a result from now using third party libraries from Nuget, rather than compiling them from source.
So, I wonder, how can I employ these options when it comes to domain objects (Or in my case, objects with no functionality)
If I can never use std::string, then I assume everything exported has to use c-style strings and the amount of copying MBs of text just increased 500X in my solution, if I want to use std:string internally in my dlls.
Worse, I am pondering the question: How do you then have domain objects that contain a collection?
Consider a class in my "naive dll"
class CustomerList
{
public:
// where this is not trivial
void AddCustomer(const Customer & customer);
private:
std::vector<Customer> m_customers;
};
and a function or method in my dll
SubmitCustomers(const CustomerList & customers);
How do I make this safe?
Do I need to go reinvent vector in pure C without allocations?
Do I have to resort to arrays with sizes? yuck? and then do yet more copying to STL containers internally?

Implications of using std::vector in a dll exported function

I have two dll-exported classes A and B. A's declaration contains a function which uses a std::vector in its signature like:
class EXPORT A{
// ...
std::vector<B> myFunction(std::vector<B> const &input);
};
(EXPORT is the usual macro to put in place _declspec(dllexport)/_declspec(dllimport) accordingly.)
Reading about the issues related to using STL classes in a DLL interface, I gather in summary:
Using std::vector in a DLL interface would require all the clients of that DLL to be compiled with the same version of the same compiler because STL containers are not binary compatible. Even worse, depending on the use of that DLL by clients conjointly with other DLLs, the ''instable'' DLL API can break these client applications when system updates are installed (e.g. Microsoft KB packages) (really?).
Despite the above, if required, std::vector can be used in a DLL API by exporting std::vector<B> like:
template class EXPORT std::allocator<B>;
template class EXPORT std::vector<B>;
though, this is usually mentioned in the context when one wants to use std::vector as a member of A (http://support.microsoft.com/kb/168958).
The following Microsoft Support Article discusses how to access std::vector objects created in a DLL through a pointer or reference from within the executable (http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q172396). The above solution to use template class EXPORT ... seems to be applicable too. However, the drawback summarized under the first bullet point seems to remain.
To completely get rid of the problem, one would need to wrap std::vector and change the signature of myFunction, PIMPL etc..
My questions are:
Is the above summary correct, or do I miss here something essential?
Why does compilation of my class 'A' not generate warning C4251 (class 'std::vector<_Ty>' needs to have dll-interface to be used by clients of...)? I have no compiler warnings turned off and I don't get any warning on using std::vector in myFunction in exported class A (with VS2005).
What needs to be done to correctly export myFunction in A? Is it viable to just export std::vector<B> and B's allocator?
What are the implications of returning std::vector by-value? Assuming a client executable which has been compiled with a different compiler(-version). Does trouble persist when returning by-value where the vector is copied? I guess yes. Similarly for passing std::vector as a constant reference: could access to std::vector<B> (which might was constructed by an executable compiled with a different compiler(-version)) lead to trouble within myFunction? I guess yes again..
Is the last bullet point listed above really the only clean solution?
Many thanks in advance for your feedback.
Unfortunately, your list is very much spot-on. The root cause of this is that DLL-to-DLL or DLL-to-EXE is defined on the level of the operating system, while the the interface between functions is defined on the level of a compiler. In a way, your task is similar (although somewhat easier) to that of client-server interaction, when the client and the server lack binary compatibility.
The compiler maps what it can to the way the DLL importing and exporting is done in a particular operating system. Since language specifications give compilers a lot of liberty when it comes to binary layout of user-defined types and sometimes even built-in types (recall that the exact size of int is compiler-dependent, as long as minimal sizing requirements are met), importing and exporting from DLLs needs to be done manually to achieve binary-level compatibility.
When you use the same version of the same compiler, this last issue above does not create a problem. However, as soon as a different compiler enters the picture, all bets are off: you need to go back to the plainly-typed interfaces, and introduce wrappers to maintain nice-looking interfaces inside your code.
I've been having the same problem and discovered a neat solution to it.
Instead of passing std:vector, you can pass a QVector from the Qt library.
The problems you quote are then handled inside the Qt library and you do not need to deal with it at all.
Of course, the cost is having to use the library and accept its slightly worse performance.
In terms of the amount of coding and debugging time it saves you, this solution is well worth it.

STL Containers and Binary Interface Compatibility

STL Binary Interfaces
I'm curious to know if anyone is working on compatible interface layers for STL objects across multiple compilers and platforms for C++.
The goal would be to support STL types as first-class or intrinsic data types.
Is there some inherent design limitation imposed by templating in general that prevents this? This seems like a major limitation of using the STL for binary distribution.
Theory - Perhaps the answer is pragmatic
Microsoft has put effort into .NET and doesn't really care about C++ STL support being "first class".
Open-source doesn't want to promote binary-only distribution and focuses on getting things right with a single compiler instead of a mismatch of 10 different versions.
This seems to be supported by my experience with Qt and other libraries - they generally provide a build for the environment you're going to be using. Qt 4.6 and VS2008 for example.
References:
http://code.google.com/p/stabi/
Binary compatibility of STL containers
I think the problem preceeds your theory: C++ doesn't specify the ABI (application binary interface).
In fact even C doesn't, but being a C library just a collection of functions (and may be global variables) the ABI is just the name of the functions themselves. Depending on the platform, names can be mangled somehow, but, since every compiler must be able to place system calss, everything ends up using the same convention of the operating system builder (in windows, _cdecl just result in prepending a _ to the function name.
But C++ has overloading, hence more complex mangling scheme are required.
As far as of today, no agreement exists between compiler manufacturers about how such mangling must be done.
It is technically impossible to compile a C++ static library and link it to a C++ OBJ coming from another compiler. The same is for DLLs.
And since compilers are all different even for compiled overloaded member functions, no one is actually affording the problem of templates.
It CAN technically be afforded by introducing a redirection for every parametric type, and introducing dispatch tables but ... this makes templated function not different (in terms of call dispatching) than virtual functions of virtual bases, thus making the template performance to become similar to classic OOP dispatching (although it can limit code bloating ... the trade-off is not always obvious)
Right now, it seems there is no interest between compiler manufacturers to agree to a common standard since it will sacrifice all the performance differences every manufacturer can have with his own optimization.
C++ templates are compile-time generated code.
This means that if you want to use a templated class, you have to include its header (declaration) so the compiler can generate the right code for the templated class you need.
So, templates can't be pre-compiled to binary files.
What other libraries give you is pre-compiled base-utility classes that aren't templated.
C# generics for example are compiled into IL code in the form of dlls or executables.
But IL code is just like another programming language so this allows the compiler to read generics information from the included library.
.Net IL code is compiled into actual binary code at runtime, so the compiler at runtime has all the definitions it needs in IL to generate the right code for the generics.
I'm curious to know if anyone is working on compatible interface
layers for STL objects across multiple compilers and platforms for
C++.
Yes, I am. I am working on a layer of standardized interfaces which you can (among other things) use to pass binary safe "managed" references to instances of STL, Boost or other C++ types across component boundaries. The library (called 'Vex') will provide implementations of these interfaces plus proper factories to wrap and unwrap popular std:: or boost:: types. Additionally the library provides LINQ-like query operators to filter and manipulate contents of what I call Range and RangeSource. The library is not yet ready to "go public" but I intend to publish an early "preview" version as soon as possible...
Example:
com1 passes a reference to std::vector<uint32_t> to com2:
com1:
class Com2 : public ICom1 {
std::vector<int> mVector;
virtual void Com2::SendDataTo(ICom1* pI)
{
pI->ReceiveData(Vex::Query::From(mVector) | Vex::Query::ToInterface());
}
};
com2:
class Com2 : public ICom2 {
virtual void Com2::ReceiveData(Vex::Ranges::IRandomAccessRange<uint32_t>* pItf)
{
std::deque<uint32_t> tTmp;
// filter even numbers, reverse order and process data with STL or Boost algorithms
Vex::Query::From(pItf)
| Vex::Query::Where([](uint32_t _) -> { return _ % 2 == 0; })
| Vex::Query::Reverse
| Vex::ToStlSequence(std::back_inserter(tTmp));
// use tTmp...
}
};
You will recognize the relationship to various familiar concepts: LINQ, Boost.Range, any_iterator and D's Ranges... One of the basic intents of 'Vex' is not to reinvent wheel - it only adds that interface layer plus some required infrastructures and syntactic sugar for queries.
Cheers,
Paul

Dynamic C++

I'm wondering about an idea in my head. I want to ask if you know of any library or article related to this. Or you can just tell me this is a dumb idea and why.
I have a class, and I want to dynamically add methods/properties to it at runtime. I'm well aware of the techniques of using composite/command design pattern and using embedded scripting language to accomplish what I'm talking about. I'm just exploring the idea. Not necessary saying that it is a good idea.
class Dynamic
{
public:
typedef std::map<std::string, boost::function<void (Dynamic&)> > FuncMap;
void addMethod(const std::string& name, boost::function<void (Dynamic&)> func) {
funcMap_[name] = func;
}
void operator[](const std::string& name) {
FuncMap::iterator funcItr = funcMap_.find(name);
if (funcItr != funcMap_.end()) {
funcItr->second(*this);
}
}
private:
FuncMap funcMap_;
};
void f(Dynamic& self) {
doStuffWithDynamic(self);
}
int main()
{
Dynamic dyn;
dyn.addMethod("f", boost::bind(&f, _1));
dyn["f"]; // invoke f
}
The idea is that I can rebind the name "f" to any function at runtime. I'm aware of the performance problem in string lookup and boost::function vs. raw function pointer. With some hard work and non-portable hack I think I can make the performance problem less painful.
With the same kind of technique, I can do "runtime inheritance" by having a "v-table" for name lookup and dispatch function calls base on dynamic runtime properties.
If just want to tell me to use smalltalk or Objective-C, I can respect that but I love my C++ and I'm sticking with it.
What you want is to change C++ into something very different. One of the (many) goals of C++ was efficient implementation. Doing string lookup for function calls (no matter how well you implement it), just isn't going to be very efficient compared to the normal call mechanisms.
Basically, I think you're trying to shoehorn in functionality of a different language. You CAN make it work, to some degree, but you're creating C++ code that no one else is going to be able (or willing) to try to understand.
If you really want to write in a language that can change it's objects on the fly, then go find such a language (there are many choices, I'm sure). Trying to shoehorn that functionality into C++ is just going to cause you problems down the road.
Please note that I'm no stranger to bringing in non-C++ concepts into C++. I once spent a considerable amount of time removing another engineer's attempt at bringing a based-object system into a C++ project (he liked the idea of containers of 'Object *', so he made every class in the system descend from his very own 'Object' class).
Bringing in foreign language concepts almost always ends badly in two ways: The concept runs up against other C++ concepts, and can't work as well as it did in the source language, AND the concept tends to break something else in C++. You end up losing a lot of time trying to implement something that just isn't going to work out.
The only way I could see something like this working at all well, is if you implemented a new language on top of C++, with a cfront-style pre-compiler. That way, you could put some decent syntax onto the thing, and eliminate some of your problems.
If you implemented this, even as a pure library, and then used it extensively, you would in a way be using a new language - one with a hideous syntax, and a curious combination of runtime method resolution and unreliable bounds checking.
As a fan of C/C++ style syntax and apparently a fan of dynamic method dispatch, you may be interested in C# 4.0, which is now in Beta, and has the dynamic keyword to allow exactly this kind of thing to be seamlessly mixed into normal statically typed code.
I don't think it would be a good idea to change C++ enough to make this work. I'd suggest working in another language, such as Lisp or Perl or another language that's basically dynamic, or imbedding a dynamic language and using it.
What you are doing is actually a variation of the Visitor pattern.
EDIT: By the way, another approach would be by using Lua, since the language allows you to add functions at runtime. So does Objective-C++.
EDIT 2: You could actually inherit from FuncMap as well:
class Dynamic;
typedef std::map<std::string, boost::function<void (Dynamic&)> > FuncMap;
class Dynamic : public FuncMap
{
public:
};
void f(Dynamic& self) {
//doStuffWithDynamic(self);
}
int main()
{
Dynamic dyn;
dyn["f"] = boost::bind(&f, _1);
dyn["f"](dyn); // invoke f, however, 'dyn'param is awkward...
return 0;
}
If I understand what you are trying to accomplish correctly, it seems as though dynamic linking (i.e. Dynamically loaded libraries in windows or linux) will do most of what you are trying to accomplish.
That is, you can, at runtime, select the name of the function you want to execute (eg. the name of the DLL), which then gets loaded and executed. Much in the way that COM works. Or you can even use the name of the function exported from that library to select the correct function (C++ name mangling issues aside).
I don't think there's a library for this exact thing.
Of course, you have to have these functions pre-written somehow, so it seems there would be an easier way to do what you want. For example you could have just one method to execute arbitrary code from your favorite scripting language. That seems like an easier way to do what you want.
I keep thinking of the Visitor pattern. That allows you to do a vtable lookup on the visiting object (as well as the visited object, thought that doesn't seem relevant to your question).
And at runtime, you could have some variable which refers to the visitor, and call
Dynamic dynamic;
DynamicVisitor * dv = ...;
dynamic->Accept(dv);
dv = ...; // something else
dynamic->Accept(dv);
The point is, the visitor object has a vtable, which you said you wanted, and you can change its value dynamically, which you said you wanted. Accept is basically the "function to call things I didn't know about at compile time."
I've considered doing this before as well. Basically, however, you'd be on your way to writing a simple VM or interpreter (look at, say, Lua or Topaz's source to see what I mean -- Topaz is a dead project that pre-dates Parrot).
But if you're going that route it makes sense to just use an existing VM or interpreter.

Using C++ DLLs with different compiler versions

This question is related to "How to make consistent dll binaries across VS versions ?"
We have applications and DLLs built
with VC6 and a new application built
with VC9. The VC9-app has to use
DLLs compiled with VC6, most of
which are written in C and one in
C++.
The C++ lib is problematic due to
name decoration/mangling issues.
Compiling everything with VC9 is
currently not an option as there
appear to be some side effects.
Resolving these would be quite time
consuming.
I can modify the C++ library, however it must be compiled with VC6.
The C++ lib is essentially an OO-wrapper for another C library. The VC9-app uses some static functions as well as some non-static.
While the static functions can be handled with something like
// Header file
class DLL_API Foo
{
int init();
}
extern "C"
{
int DLL_API Foo_init();
}
// Implementation file
int Foo_init()
{
return Foo::init();
}
it's not that easy with the non-static methods.
As I understand it, Chris Becke's suggestion of using a COM-like interface won't help me because the interface member names will still be decorated and thus inaccessible from a binary created with a different compiler. Am I right there?
Would the only solution be to write a C-style DLL interface using handlers to the objects or am I missing something?
In that case, I guess, I would probably have less effort with directly using the wrapped C-library.
The biggest problem to consider when using a DLL compiled with a different C++ compiler than the calling EXE is memory allocation and object lifetime.
I'm assuming that you can get past the name mangling (and calling convention), which isn't difficult if you use a compiler with compatible mangling (I think VC6 is broadly compatible with VS2008), or if you use extern "C".
Where you'll run into problems is when you allocate something using new (or malloc) from the DLL, and then you return this to the caller. The caller's delete (or free) will attempt to free the object from a different heap. This will go horribly wrong.
You can either do a COM-style IFoo::Release thing, or a MyDllFree() thing. Both of these, because they call back into the DLL, will use the correct implementation of delete (or free()), so they'll delete the correct object.
Or, you can make sure that you use LocalAlloc (for example), so that the EXE and the DLL are using the same heap.
Interface member names will not be decorated -- they're just offsets in a vtable. You can define an interface (using a C struct, rather than a COM "interface") in a header file, thusly:
struct IFoo {
int Init() = 0;
};
Then, you can export a function from the DLL, with no mangling:
class CFoo : public IFoo { /* ... */ };
extern "C" IFoo * __stdcall GetFoo() { return new CFoo(); }
This will work fine, provided that you're using a compiler that generates compatible vtables. Microsoft C++ has generated the same format vtable since (at least, I think) MSVC6.1 for DOS, where the vtable is a simple list of pointers to functions (with thunking in the multiple-inheritance case). GNU C++ (if I recall correctly) generates vtables with function pointers and relative offsets. These are not compatible with each other.
Well, I think Chris Becke's suggestion is just fine. I would not use Roger's first solution, which uses an interface in name only and, as he mentions, can run into problems of incompatible compiler-handling of abstract classes and virtual methods. Roger points to the attractive COM-consistent case in his follow-on.
The pain point: You need to learn to make COM interface requests and deal properly with IUnknown, relying on at least IUnknown:AddRef and IUnknown:Release. If the implementations of interfaces can support more than one interface or if methods can also return interfaces, you may also need to become comfortable with IUnknown:QueryInterface.
Here's the key idea. All of the programs that use the implementation of the interface (but don't implement it) use a common #include "*.h" file that defines the interface as a struct (C) or a C/C++ class (VC++) or struct (non VC++ but C++). The *.h file automatically adapts appropriately depending on whether you are compiling a C Language program or a C++ language program. You don't have to know about that part simply to use the *.h file. What the *.h file does is define the Interface struct or type, lets say, IFoo, with its virtual member functions (and only functions, no direct visibility to data members in this approach).
The header file is constructed to honor the COM binary standard in a way that works for C and that works for C++ regardless of the C++ compiler that is used. (The Java JNI folk figured this one out.) This means that it works between separately-compiled modules of any origin so long as a struct consisting entirely of function-entry pointers (a vtable) is mapped to memory the same by all of them (so they have to be all x86 32-bit, or all x64, for example).
In the DLL that implements the the COM interface via a wrapper class of some sort, you only need a factory entry point. Something like an
extern "C" HRESULT MkIFooImplementation(void **ppv);
which returns an HRESULT (you'll need to learn about those too) and will also return a *pv in a location you provide for receiving the IFoo interface pointer. (I am skimming and there are more careful details that you'll need here. Don't trust my syntax) The actual function stereotype that you use for this is also declared in the *.h file.
The point is that the factory entry, which is always an undecorated extern "C" does all of the necessary wrapper class creation and then delivers an Ifoo interface pointer to the location that you specify. This means that all memory management for creation of the class, and all memory management for finalizing it, etc., will happen in the DLL where you build the wrapper. This is the only place where you have to deal with those details.
When you get an OK result from the factory function, you have been issued an interface pointer and it has already been reserved for you (there is an implicit IFoo:Addref operation already performed on behalf of the interface pointer you were delivered).
When you are done with the interface, you release it with a call on the IFoo:Release method of the interface. It is the final release implementation (in case you made more AddRef'd copies) that will tear down the class and its interface support in the factory DLL. This is what gets you correct reliance on a consistent dynamic stoorage allocation and release behind the interface, whether or not the DLL containing the factory function uses the same libraries as the calling code.
You should probably implement IUnknown:QueryInterface (as method IFoo:QueryInterface) too, even if it always fails. If you want to be more sophisticated with using the COM binary interface model as you have more experience, you can learn to provide full QueryInterface implementations.
This is probably too much information, but I wanted to point out that a lot of the problems you are facing about heterogeneous implementations of DLLs are resolved in the definition of the COM binary interface and even if you don't need all of it, the fact that it provides worked solutions is valuable. In my experience, once you get the hang of this, you will never forget how powerful this can be in C++ and C++ interop situations.
I haven't sketched the resources you might need to consult for examples and what you have to learn in order to make *.h files and to actually implement factory-function wrappers of the libraries you want to share. If you want to dig deeper, holler.
There are other things you need to consider too, such as which run-times are being used by the various libraries. If no objects are being shared that's fine, but that seems quite unlikely at first glance.
Chris Becker's suggestions are pretty accurate - using an actual COM interface may help you get the binary compatibility you need. Your mileage may vary :)
not fun, man. you are in for a lot of frustration, you should probably give this:
Would the only solution be to write a
C-style DLL interface using handlers
to the objects or am I missing
something? In that case, I guess, I
would probably have less effort with
directly using the wrapped C-library.
a really close look. good luck.