STL Binary Interfaces
I'm curious to know if anyone is working on compatible interface layers for STL objects across multiple compilers and platforms for C++.
The goal would be to support STL types as first-class or intrinsic data types.
Is there some inherent design limitation imposed by templating in general that prevents this? This seems like a major limitation of using the STL for binary distribution.
Theory - Perhaps the answer is pragmatic
Microsoft has put effort into .NET and doesn't really care about C++ STL support being "first class".
Open-source doesn't want to promote binary-only distribution and focuses on getting things right with a single compiler instead of a mismatch of 10 different versions.
This seems to be supported by my experience with Qt and other libraries - they generally provide a build for the environment you're going to be using. Qt 4.6 and VS2008 for example.
References:
http://code.google.com/p/stabi/
Binary compatibility of STL containers
I think the problem preceeds your theory: C++ doesn't specify the ABI (application binary interface).
In fact even C doesn't, but being a C library just a collection of functions (and may be global variables) the ABI is just the name of the functions themselves. Depending on the platform, names can be mangled somehow, but, since every compiler must be able to place system calss, everything ends up using the same convention of the operating system builder (in windows, _cdecl just result in prepending a _ to the function name.
But C++ has overloading, hence more complex mangling scheme are required.
As far as of today, no agreement exists between compiler manufacturers about how such mangling must be done.
It is technically impossible to compile a C++ static library and link it to a C++ OBJ coming from another compiler. The same is for DLLs.
And since compilers are all different even for compiled overloaded member functions, no one is actually affording the problem of templates.
It CAN technically be afforded by introducing a redirection for every parametric type, and introducing dispatch tables but ... this makes templated function not different (in terms of call dispatching) than virtual functions of virtual bases, thus making the template performance to become similar to classic OOP dispatching (although it can limit code bloating ... the trade-off is not always obvious)
Right now, it seems there is no interest between compiler manufacturers to agree to a common standard since it will sacrifice all the performance differences every manufacturer can have with his own optimization.
C++ templates are compile-time generated code.
This means that if you want to use a templated class, you have to include its header (declaration) so the compiler can generate the right code for the templated class you need.
So, templates can't be pre-compiled to binary files.
What other libraries give you is pre-compiled base-utility classes that aren't templated.
C# generics for example are compiled into IL code in the form of dlls or executables.
But IL code is just like another programming language so this allows the compiler to read generics information from the included library.
.Net IL code is compiled into actual binary code at runtime, so the compiler at runtime has all the definitions it needs in IL to generate the right code for the generics.
I'm curious to know if anyone is working on compatible interface
layers for STL objects across multiple compilers and platforms for
C++.
Yes, I am. I am working on a layer of standardized interfaces which you can (among other things) use to pass binary safe "managed" references to instances of STL, Boost or other C++ types across component boundaries. The library (called 'Vex') will provide implementations of these interfaces plus proper factories to wrap and unwrap popular std:: or boost:: types. Additionally the library provides LINQ-like query operators to filter and manipulate contents of what I call Range and RangeSource. The library is not yet ready to "go public" but I intend to publish an early "preview" version as soon as possible...
Example:
com1 passes a reference to std::vector<uint32_t> to com2:
com1:
class Com2 : public ICom1 {
std::vector<int> mVector;
virtual void Com2::SendDataTo(ICom1* pI)
{
pI->ReceiveData(Vex::Query::From(mVector) | Vex::Query::ToInterface());
}
};
com2:
class Com2 : public ICom2 {
virtual void Com2::ReceiveData(Vex::Ranges::IRandomAccessRange<uint32_t>* pItf)
{
std::deque<uint32_t> tTmp;
// filter even numbers, reverse order and process data with STL or Boost algorithms
Vex::Query::From(pItf)
| Vex::Query::Where([](uint32_t _) -> { return _ % 2 == 0; })
| Vex::Query::Reverse
| Vex::ToStlSequence(std::back_inserter(tTmp));
// use tTmp...
}
};
You will recognize the relationship to various familiar concepts: LINQ, Boost.Range, any_iterator and D's Ranges... One of the basic intents of 'Vex' is not to reinvent wheel - it only adds that interface layer plus some required infrastructures and syntactic sugar for queries.
Cheers,
Paul
Related
I am trying to limit the inheritance of an c++ class to within the same library, while allowing it to be instantiated in other libraries.
The use case is that I have some code that needs to be real-time capable (compiled with special flags and poisoned code) and that needs to be used/interfaced to non-RT code. However, I need to make absolutely sure that no non-RT code can ever be called inside the RT code. Therefore I have to libraries: one that is RT capable and one that isn't (which depends on the RT library and may use code from it).
Now, I have some abstract classes, which I want to be only inherited from inside of the RT library. Is it possible to prohibit the inheritance of those ABCs from classes defined outside of the RT library?
What I came up so far (without it working) is defining a macro that makes the classes final outside of RT code and a templated base class that uses std::conditional
class BaseA REALTIME_FINAL
{
virtual void foo() = 0;
}
template <bool allow = REALTIME_TRUE>
class BaseB : : virtual public std::conditional<allow, std::true_t, std::nullptr_t>::type
{
virtual void foo() = 0;
}
While both of these methods prohibit the inheritance from the abstract base, it also makes it impossible in the non-RT library to call or instantiate (or even include the header) any classes derived from it in the RT lib.
You can solve this problem much more simply, by moving your realtime code into its own library with its own header files, and building it without any dependency on your non-realtime library.
Put your realtime and non-realtime headers into separate directories, and if your realtime code is built as a shared library, use the linker option to prohibit undefined symbols in that library.
Then all you have to do is remember not to add your system's equivalent of -Inon-realtime or -lnon-realtime in the realtime library's build configuration.
One thing you need to think hard is - you can solve this problem today but tomorrow an intern makes changes that sneak into your code which compiles but will break in the hands of your client. So I am quite dramatic on adopting a solution that does not leave anything to imagination.
My go-to approach is always to separate libraries with a C API. This guarantees that no C++ features, which compilers are frequently guilty of manipulating, can be tampered.
More often I use this to encapsulate code that was compiled with the old ABI (pre-C++11) which is unfortunately still common from certain vendors.
This also guarantees that I can access these features from a CD/CI perspective from other scripting languages like Python, Java, Rust more easily.
After reading and watching much about SOLID principles I was very keen to use these principles in my work (mostly C++ development) since I do think they are good principles and that they indeed will bring much benefit to the quality of my code, readability, testability, reuse and maintainability.
But I have real hard time with the 'D' (Dependency inversion).
This principal states that:
A. High-level modules should not depend on low-level modules. Both should depend on abstractions.
B. Abstractions should not depend on details. Details should depend on abstractions.
Let me explain by example:
Lets say I am writing the following interface:
class SOLIDInterface {
//usual stuff with constructor, destructor, don't copy etc
public:
virtual void setSomeString(const std::string &someString) = 0;
};
(for the sake of simplicity please ignore the other things needed for a "correct interface" such as non virutal publics, private virtuals etc, its not part of the problem.)
notice, that setSomeString() is taking an std::string.
But that breaks the above principal since std::string is an implementation.
Java and C# don't have that problem since the language offers interfaces to all the complex common types such as string and containers.
C++ does not offer that.
Now, C++ does offer the possibility to write this interface in such a way that I could write an 'IString' interface that would take any implementation that will support an std::string interface using type erasure
(Very good article: http://www.artima.com/cppsource/type_erasure.html)
So the implementation could use STL (std::string) or Qt (QString), or my own string implementation or something else.
Like it should be.
But this means, that if I (and not only I but all C++ developers) want to write C++ API which obeys SOLID design principles ('D' included), I will have to implement a LOT of code to accommodate all the common non natural types.
Beyond being not realistic in terms of effort, this solution has other problems such as - what if STL changes?(for this example)
And its not really a solution since STL is not implementing IString, rather IString is abstracting STL, so even if I were to create such an interface the principal problem remains.
(I am not even getting into issues such as this adds polymorphic overhead, which for some systems, depending on size and HW requirements may not be acceptable)
So may question is:
Am I missing something here (which I guess the true answer, but what?), is there a way to use Dependency inversion in C++ without writing a whole new interface layer for the common types in a realistic way - or are we doomed to write API which is always dependent on some implementation?
Thanks for your time!
From the first few comments I received so far I think a clarification is needed:
The selection of std::string was just an example.
It could be QString for that matter - I just took STL since it is the standard.
Its not even important that its a string type, it could be any common type.
I have selected the answer by Corristo not because he explicitly answered my question but because the extensive post (coupled with the other answers) allowed me to extract my answer from it implicitly, realizing that the discussion tends to drift from the actual question which is:
Can you implement Dependency inversion in C++ when you use basic complex types like strings and containers and basically any of the STL with an effort that makes sense. (and the last part is a very important element of the question).
Maybe I should have explicitly noted that I am after run-time polymorphism not compile time.
The clear answer is NO, its not possible.
It might have been possible if STL would have exposed abstract interfaces to their implementations (if there are indeed reasons that prevent the STL implementations to derive from these interfaces (say, performance)) then it still could have simply maintained these abstract interfaces to match the implementations).
For types that I have full control over, yes, there is no technical problem implementing the DIP.
But most likely any such interface (of my own) will still use a string or a container, forcing it to use either the STL implementation or another.
All the suggested solutions below are either not polymorphic in runtime, or/and are forcing quiet a some coding around the interface - when you think you have to do this for all these common types the practicality is simply not there.
If you think you know better, and you say it is possible to have what I described above then simply post the code proving it.
I dare you! :-)
Note that C++ is not an object-oriented programming language, but rather lets the programmer choose between many different paradigms. One of the key principles of C++ is that of zero-cost abstractions, which in particular entails to build abstractions in such a way that users don't pay for what they don't use.
The C#/Java style of defining interfaces with virtual methods that are then implemented by derived classes don't fall into that category though, because even if you don't need the polymorphic behavior, were std::string implementing a virtual interface, every call of one of its methods would incur a vtable lookup. This is unacceptable for classes in the C++ standard library supposed to be used in all kinds of settings.
Defining interfaces without inheriting from an abstract interface class
Another problem with the C#/Java approach is that in most cases you don't actually care that something inherits from a particular abstract interface class and only need that the type you pass to a function supports the operations you use. Restricting accepted parameters to those inheriting from a particular interface class thus actually hinders reuse of existing components, and you often end up writing wrappers to make classes of one library conform to the interfaces of another - even when they already have the exact same member functions.
Together with the fact that inheritance-based polymorphism typically also entails heap allocations and reference semantics with all its problems regarding lifetime management, it is best to avoid inheriting from an abstract interface class in C++.
Generic templates for implicit interfaces
In C++ you can get compile-time polymorphism through templates.
In its simplest form, the interface that an object used in a templated function or class need to conform to is not actually specified in C++ code, but implied by what functions are called on them.
This is the approach used in the STL, and it is really flexible. Take std::vector for example. There the requirements on the value type T of objects you store in it are dependent on what operations you perform on the vector. This allows e.g. to store move-only types as long as you don't use any of the operations that need to make a copy. In such a case, defining an interface that the value types needs to conform to would greatly reduce the usefulness of std::vector, because you'd either need to remove methods that require copies or you'd need to exclude move-only types from being stored in it.
That doesn't mean you can't use dependency inversion, though: The common Button-Lamp example for dependency inversion implemented with templates would look like this:
class Lamp {
public:
void activate();
void deactivate();
};
template <typename T>
class Button {
Button(T& switchable)
: _switchable(&switchable) {
}
void toggle() {
if (_buttonIsInOnPosition) {
_switchable->deactivate();
_buttonIsInOnPosition = false;
} else {
_switchable->activate();
_buttonIsInOnPosition = true;
}
}
private:
bool _buttonIsInOnPosition{false};
T* _switchable;
}
int main() {
Lamp l;
Button<Lamp> b(l)
b.toggle();
}
Here Button<T>::toggle implicitly relies on a Switchable interface, requiring T to have member functions T::activate and T::deactivate. Since Lamp happens to implement that interface it can be used with the Button class. Of course, in real code you would also state these requirements on T in the documentation of the Button class so that users don't need to look up the implementation.
Similarly, you could also declare your setSomeString method as
template <typename String>
void setSomeString(String const& string);
and then this will work with all types that implement all the methods you used in the implementation of setSomeString, hence only relying on an abstract - although implicit - interface.
As always, there are some downsides to consider:
In the string example, assuming you only make use of .begin() and .end() member functions returning iterators that return a char when dereferenced (e.g. to copy it into the classes' local, concrete string data member), you can also accidentally pass a std::vector<char> to it, even though it isn't technically a string. If you consider this a problem is arguable, in a way this can also be seen as the epitome of relying only on abstractions.
If you pass an object of a type that doesn't have the required (member) functions, then you can end up with horrible compiler error messages that make it very hard to find the source of the error.
Only in very limited cases it is possible to separate the interface of a templated class or function from its implementation, as is typically done with separate .h and .cpp files. This can thus lead to longer compile times.
Defining interfaces with the Concepts TS
if you really care about types used in templated functions and classes to conform to a fixed interface, regardless of what you actually use, there are ways to restrict the template parameters only to types conforming to a certain interface with std::enable_if, but these are very verbose and unreadable. In order to make this kind of generic programming easier, the Concepts TS allows to actually define interfaces that are checked by the compiler and thus greatly improves diagnostics. With the Concepts TS, the Button-Lamp example from above translates to
template <typename T>
concept bool Switchable = requires(T t) {
t.activate();
t.deactivate();
};
// Lamp as before
template <Switchable T>
class Button {
public:
Button(T&); // implementation as before
void toggle(); // implementation as before
private:
T* _switchable;
bool _buttonIsInOnPosition{false};
};
If you can't use the Concepts TS (it is only implemented in GCC right now), the closest you can get is the Boost.ConceptCheck library.
Type erasure for runtime polymorphism
There is one case where compile-time polymorphism doesn't suffice, and that is when the types you pass to or get from a particular function aren't fully determined at compile-time but depend on runtime parameters (e.g. from a config file, command-line arguments passed to the executable or even the value of a parameter passed to the function itself).
If you need to store objects (even in a variable) of a type dependent on runtime parameters, the traditional approach is to store pointers to a common base class instead and to use dynamic dispatch via virtual member functions to get the behavior you need. But this still suffers from the problem described before: You can't use types that effectively do what you need but were defined in an external library, and thus don't inherit from the base class you defined. So you have to write a wrapper class.
Or you do what you described in your question and create a type-erasure class.
An example from the standard library is std::function. You declare only the interface of the function and it can store arbitrary function pointers and callables that have that interface. In general, writing a type erasure class can be quite tedious, so I refrain from giving an example of a type-erasing Switchable here, but I can highly recommend Sean Parent's talk Inheritance is the base class of evil, where he demonstrates the technique for "Drawable" objects and explores what you can build on top of it in just over 20 minutes.
There are libraries that help writing type-erasure classes though, e.g. Louis Dionne's experimental dyno, where you define the interface via what he calls "concept maps" directly in C++ code, or Zach Laine's emtypen which uses a python tool to create the type erasure classes from a C++ header file you provide. The latter also comes with a CppCon talk describing the features as well as the general idea and how to use it.
Conclusion
Inheriting from a common base class just to define interfaces, while easy, leads to many problems that can be avoided using different approaches:
(Constrained) templates allow for compile-time polymorphism, which is sufficient for the majority of cases, but can lead to hard-to-understand compiler errors when used with types that don't conform to the interface.
If you need runtime polymorphism (which actually is rather rare in my experience), you can use type-erasure classes.
So even though the classes in the STL and other C++ libraries rarely derive from an abstract interface, you can still apply dependency inversion with one of the two methods described above if you really want to.
But as always, use good judgment on a case-by-case basis whether you really need the abstraction or if it is better to simply use a concrete type. The string example you brought up is one where I'd go with concrete types, simply because the different string classes don't share a common interface (e.g. std::string has .find(), but QStrings version of the same function is called .contains()). It might be just as much effort to write wrapper classes for both as it is to write a conversion function and to use that at well-defined boundaries within the project.
Ahh, but C++ lets you write code that is independent of a particular implementation without actually using inheritance.
std::string itself is a good example... it's actually a typedef for std::basic_string<char, std::char_traits<char>, std::allocator<char>>. Which allows you to create strings using other allocators if you choose (or mock the allocator object in order to measure number of calls, if you like). There just isn't any explicit interface like an IAllocator, because C++ templates use duck-typing.
A future version of C++ will support explicit description of the interface a template parameter must adhere to -- this feature is called concepts -- but just using duck-typing enables decoupling without requiring redundant interface definitions.
And because C++ performs optimization after instantiation of templates, there's no polymorphic overhead.
Now, when you do have virtual functions, you'll need to commit to a particular type, because the virtual-table layout doesn't accommodate use of templates each of which generates an arbitrary number of instances each of which require separate dispatch. But when using templates, you'll won't need virtual functions nearly as much as e.g. Java does, so in practice this isn't a big problem.
I am building a C++ API which needs to be extendable without recompiling the software that uses it (for lots of bad reasons). This requires opaque data types so that fields can be added to classes. Something like
struct CheshireCat; // Not defined here
std::unique_ptr<CheshireCat> d_ptr;
But this API will have lots of simple property get/set methods that just access fields within CheshireCat. (I do not want to hear about "encapsulation" philosophy, that is just the way that it is for this application.)
There are clever techniques that create classes like MyInt that override operators to emulate properties like
int & operator = (const int &i) { return value = i; }
operator int () const { return value; }
But I cannot see any good way of such a MyInt getting access to the CheshireCat pointer in the containing class, let alone the hidden properties within the structure.
It seems like this must be a very common problem, so I am looking for other people's clever solutions. Possibly using some obscure macrology.
(The alternatives would be 1. Forget binary compatibility, or 2. write lots and lots of boilerplate code by hand.)
I realize that for binary compatibility new virtual methods need to be added to the end. Mangling is compiler dependent but should be stable for a given compiler (we will ship binaries made for a specific compiler/version).
This feels like C++ 101 but I could not find anything that addressed it elegantly. (My background is Java/.Net/Lisp where JITing solves this problem. A C++ system that did the final compilation phases in the (DLL) linker would be excellent but sadly that is not the way it is.)
I don't think you can get binary compatibility by exposing C++.
What you can do is expose a C interface (which is binary compatible). That means, as you have hinted, exposing functions that deal with opaque handles (i.e. pointers).
Of course, the internals can be written in C++.
I have two dll-exported classes A and B. A's declaration contains a function which uses a std::vector in its signature like:
class EXPORT A{
// ...
std::vector<B> myFunction(std::vector<B> const &input);
};
(EXPORT is the usual macro to put in place _declspec(dllexport)/_declspec(dllimport) accordingly.)
Reading about the issues related to using STL classes in a DLL interface, I gather in summary:
Using std::vector in a DLL interface would require all the clients of that DLL to be compiled with the same version of the same compiler because STL containers are not binary compatible. Even worse, depending on the use of that DLL by clients conjointly with other DLLs, the ''instable'' DLL API can break these client applications when system updates are installed (e.g. Microsoft KB packages) (really?).
Despite the above, if required, std::vector can be used in a DLL API by exporting std::vector<B> like:
template class EXPORT std::allocator<B>;
template class EXPORT std::vector<B>;
though, this is usually mentioned in the context when one wants to use std::vector as a member of A (http://support.microsoft.com/kb/168958).
The following Microsoft Support Article discusses how to access std::vector objects created in a DLL through a pointer or reference from within the executable (http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q172396). The above solution to use template class EXPORT ... seems to be applicable too. However, the drawback summarized under the first bullet point seems to remain.
To completely get rid of the problem, one would need to wrap std::vector and change the signature of myFunction, PIMPL etc..
My questions are:
Is the above summary correct, or do I miss here something essential?
Why does compilation of my class 'A' not generate warning C4251 (class 'std::vector<_Ty>' needs to have dll-interface to be used by clients of...)? I have no compiler warnings turned off and I don't get any warning on using std::vector in myFunction in exported class A (with VS2005).
What needs to be done to correctly export myFunction in A? Is it viable to just export std::vector<B> and B's allocator?
What are the implications of returning std::vector by-value? Assuming a client executable which has been compiled with a different compiler(-version). Does trouble persist when returning by-value where the vector is copied? I guess yes. Similarly for passing std::vector as a constant reference: could access to std::vector<B> (which might was constructed by an executable compiled with a different compiler(-version)) lead to trouble within myFunction? I guess yes again..
Is the last bullet point listed above really the only clean solution?
Many thanks in advance for your feedback.
Unfortunately, your list is very much spot-on. The root cause of this is that DLL-to-DLL or DLL-to-EXE is defined on the level of the operating system, while the the interface between functions is defined on the level of a compiler. In a way, your task is similar (although somewhat easier) to that of client-server interaction, when the client and the server lack binary compatibility.
The compiler maps what it can to the way the DLL importing and exporting is done in a particular operating system. Since language specifications give compilers a lot of liberty when it comes to binary layout of user-defined types and sometimes even built-in types (recall that the exact size of int is compiler-dependent, as long as minimal sizing requirements are met), importing and exporting from DLLs needs to be done manually to achieve binary-level compatibility.
When you use the same version of the same compiler, this last issue above does not create a problem. However, as soon as a different compiler enters the picture, all bets are off: you need to go back to the plainly-typed interfaces, and introduce wrappers to maintain nice-looking interfaces inside your code.
I've been having the same problem and discovered a neat solution to it.
Instead of passing std:vector, you can pass a QVector from the Qt library.
The problems you quote are then handled inside the Qt library and you do not need to deal with it at all.
Of course, the cost is having to use the library and accept its slightly worse performance.
In terms of the amount of coding and debugging time it saves you, this solution is well worth it.
As far as I know, if I have a class such as the following:
class TileSurface{
public:
Tile * tile;
enum Type{
Top,
Left,
Right
};
Type type;
Point2D screenverts[4]; // it's a rectangle.. so..
TileSurface(Tile * thetile, Type thetype);
};
There's no easy way to programatically (using templates or whatever) go through each member and do things like print their types (for example, typeinfo's typeid(Tile).name()).
Being able to loop through them would be a useful and easy way to generate class size reports, etc. Is this impossible to do, or is there a way (even using external tools) for this?
Simply not possible in C++. You would need something like Reflection to implement this, which C++ doesn't have.
As far as your code is concerned after it is compiled, the "class" doesn't exist -- the names of the variables as well as their types have no meaning in assembly, and therefore they aren't encoded into the binary.
(Note: When I say "Not possible in C++" I mean "not possible to do built into the language" -- you could of course write a C++ parser in C++ which could implement this sort of thing...)
No. There are no easy way. If to put "easy way" aside then with C++ you can do anything imaginable.
If you want just to dump your data contents run-time then simplest way is to implement operator<<(ostream&,YourClass const&) for each YourClass you are interested in. Bit more complex is to implement visitor pattern, but with visitor pattern you may have different reports done by different visitors and also the visitors may do other things, not only generate reports.
If you want it as static analysis (program is not running, you want to generate reports) then you can use debugger database. Alternatively you may analyze AST generated by some compilers (g++ and CLang have options to generate it) and generate reports from it.
If you really need run-time reflection then you have to build it into your classes. That involves overhead. For example you may use common base-classes and put all data members of classes into array too. It is often done to communicate with applications written in languages that have reflection on more equal grounds (oldest example is Lisp).
I beg to differ from the conventional wisdom. C++ does have it; it's not part of the C++ standard, but every single C++ compiler I've seen emits metadata of this sort for use by the debugger.
Moreover, two formats for the debug database cover almost all modern compilers: pdb (the Microsoft format) and dwarf2 (just about everything else).
Our DMS Software Reengineering Toolkit is what you call an "external tool" for extractingt/transforming arbitrary code. DMS is generalized compiler technology parameterized by explicit langauge definitions. It has language definitions for C, C++, Java, COBOL, PHP, ...
For C, C++, Java and COBOL versions, it provides complete access to parse trees, and symbol table information. That symbol table information includes the kind of data you are likely to want from "reflection". If you goal is to enumerate some set of fields or methods and do something with them, DMS can be used to transform the code (or generate derived code) according to what you find in the symbol tables in arbitrary ways.
If you derive all types of the member variables from your common typeinfo-provider-baseclass, then you can get that. It is a bit more work than like in Java, but possible.
External tools: you mentioned that you need reports like class size, etc.--
Doxygen could help http://www.doxygen.nl/manual/features.html to generate class member lists (including inherited members).