C++ properties for opaque data types - c++

I am building a C++ API which needs to be extendable without recompiling the software that uses it (for lots of bad reasons). This requires opaque data types so that fields can be added to classes. Something like
struct CheshireCat; // Not defined here
std::unique_ptr<CheshireCat> d_ptr;
But this API will have lots of simple property get/set methods that just access fields within CheshireCat. (I do not want to hear about "encapsulation" philosophy, that is just the way that it is for this application.)
There are clever techniques that create classes like MyInt that override operators to emulate properties like
int & operator = (const int &i) { return value = i; }
operator int () const { return value; }
But I cannot see any good way of such a MyInt getting access to the CheshireCat pointer in the containing class, let alone the hidden properties within the structure.
It seems like this must be a very common problem, so I am looking for other people's clever solutions. Possibly using some obscure macrology.
(The alternatives would be 1. Forget binary compatibility, or 2. write lots and lots of boilerplate code by hand.)
I realize that for binary compatibility new virtual methods need to be added to the end. Mangling is compiler dependent but should be stable for a given compiler (we will ship binaries made for a specific compiler/version).
This feels like C++ 101 but I could not find anything that addressed it elegantly. (My background is Java/.Net/Lisp where JITing solves this problem. A C++ system that did the final compilation phases in the (DLL) linker would be excellent but sadly that is not the way it is.)

I don't think you can get binary compatibility by exposing C++.
What you can do is expose a C interface (which is binary compatible). That means, as you have hinted, exposing functions that deal with opaque handles (i.e. pointers).
Of course, the internals can be written in C++.

Related

Programming language development practice, how to compile golang style interfaces to c++?

For fun I have been working on my own programming language that compiles down to C++. While most things are fairly straightforward to print, I have been having trouble compiling my golang style interfaces to c++. In golang you don't need to explicitly declare that a particular struct implements an interface, it happens automatically if the struct has all the functions declared in the interface. Originally I was going to compile the interfaces down to a class with all virtual methods like so
class MyInterface {
public:
void DoSomthing() = 0;
}
and all implementing structures would simply extend from the interface like you normally would in c++
class MyClass: public MyInterface {
// ...
}
However this would mean that my compiler would have to loop over every interface defined in the source code (and all dependencies) as well as every struct defined in the source and check if the struct implements the interface using an operation that would take O(N*M) time where N is the number of structs and M is the number of interfaces. I did some searching and stumbled upon some c++ code here: http://wall.org/~lewis/2012/07/23/go-style-interfaces-in-cpp.html that makes golang style interfaces in c++ a reality in which case I could just compile my interfaces to code similar to that (albeit not exactly since I am hesitant to use raw pointers over smart pointers) and not have to worry about explicitly implementing them. However the author states that It should not be done for production code which worries me a little.
This is kinda a loaded question that may be a little subjective, but could anyone with more C++ knowledge tell me if doing it the way suggested in the article is a really bad idea or is it actually not that bad and could be done, or if there is a better way to write c++ code that would allow me to achieve the behavior I want without resorting to the O(N*M) loop?
My initial thought is to make use of the fact that C++ supports multiple inheritance. Decompose your golang interface into single-function interfaces. Hash all interfaces by their unique signature. It now becomes an O(N) operation to find the set of C++ abstract interfaces for your concrete classes.
Similarly, when you consume an object, you find all the consumed interfaces. This is now O(M) by the same logic. The total compiler complexity then becomes O(N)+O(M) instead of O(N*M).
The slight downside is that you're going to have O(N) vtables in C++. Some of those might be merged if certain interfaces are always groupd together.

Loading Stages from external code

I wrote a Pipe and Filter based architecture. To avoid confusion the Filter's are called "Stages" in my code. Here's the basic idea :
I want other developers to have the possibility to implement their own Stage class and then I can add it into the list of Stages that already exist at run-time.
I've been reading around for a while and it seems like their are restrictions to dynamic code loading. My current Stage class looks like this :
class Stage
{
public:
void Process();
const uint16_t InputCount();
const uint16_t OutputCount();
void SetOutputPipe(size_t idx, Pipe<AmplitudeVal> *outputPipe);
void SetInputPipe(size_t idx, Pipe<AmplitudeVal> *inputPipe);
protected:
Stage(const uint16_t inputCount, const uint16_t outputCount);
virtual void init() {};
virtual bool work() = 0;
virtual void finish() {};
protected:
const uint16_t mInputCount;
const uint16_t mOutputCount;
std::vector< Pipe<AmplitudeVal>* > mInputs;
std::vector< Pipe<AmplitudeVal>* > mOutputs;
};
AmplitudeVal is simply an alias for float. This class only holds references to pipes it is connected to(mInputs and mOutputs), it doesn't deal with any algorithmic activity. I want to expose as little as I can for ease of use to external developers. Right now this class only relies on the Pipe header and some basic config file. Most examples dealing with loading DLL's propose a class with only pure virtual functions and barely any member variables. I'm not sure what I should do.
I understand you want to have stage in a DLL and have user users derive their work on your DLL.
Scenario 1: consumer and DLL build with same compiler and same standard library
If the consumer uses the same compiler, compatible compiling options, and both, sides use the same shared standard library (default with MSVC), then you solution should work as is
(See related SO question.)
Scenario 2: consumer and DLL build with same compiler but different libraries
If one side uses a different standard library or linking option thereof (for example if you use statically linked library for your DLL and shared library for consumer), then you have to make sure that all objects are ALWAYS created/released on the same side (because DLL and application would each use their own alloaction functions with different memory pools).
This will be very difficult because of:
inheritance of data
virtual destructor
storage management of standard containers (which would be different in DLL and consumer, despite the impression the source code might give that it's the same)
The first step in the right direction would then be to isolate all the data into the private section and ensure clean access via getters and setters. Fortunately, this kind of design is a sound design approach for inheritance, so it's worth to use it even if you don't need it.
Scenario 3: different compilers or incompatible compiling options
If you use different compilers, or incompatible compiling options then the real problems start:
you can't rely on the assumption that both sides have the same understanding of memory layout. So the read/write of members might occur at different locations; a huge mess ! This is why so many DLL classes have no data member. Many also use PIMPL idiom to hide a private memory layout. But PIMPL in this case of inheritance is very similar to using private data (*this would then be the implicit pointer to the private implementation)
the compiler/linker use "mangled" function names. Different compilers might use different mangling, and wouldn't understand each other's symbol definitions (i.e. the client would'nt find SetOutputPipe() despite it's there). This is why most DLL put all member functions as virtual : the functions are called via an offset in a vtable, which uses fortunately the same layout accross compilers in practice.
finally, different compilers could use different calling conventions. But I think in practice on well established platforms, this shouldn't be a major risk
In this answer to a question about DLLs with different compilers (also without inheritance), I've provided some additional explanations and references that could be relevant for such a mixed scenario.
Again using private member data isntead of protected would put you on a safer side. Exposing getters/setters (whether protected or public) using an extern "C" linkage specifier would avoid name mangling issues for non virtual functions.
Final remark:
When exposing classes in libraries, extra-care should be taken to design the class, making data private. This good practice is worth an extra thought, what ever scenario you're in.

STL Containers and Binary Interface Compatibility

STL Binary Interfaces
I'm curious to know if anyone is working on compatible interface layers for STL objects across multiple compilers and platforms for C++.
The goal would be to support STL types as first-class or intrinsic data types.
Is there some inherent design limitation imposed by templating in general that prevents this? This seems like a major limitation of using the STL for binary distribution.
Theory - Perhaps the answer is pragmatic
Microsoft has put effort into .NET and doesn't really care about C++ STL support being "first class".
Open-source doesn't want to promote binary-only distribution and focuses on getting things right with a single compiler instead of a mismatch of 10 different versions.
This seems to be supported by my experience with Qt and other libraries - they generally provide a build for the environment you're going to be using. Qt 4.6 and VS2008 for example.
References:
http://code.google.com/p/stabi/
Binary compatibility of STL containers
I think the problem preceeds your theory: C++ doesn't specify the ABI (application binary interface).
In fact even C doesn't, but being a C library just a collection of functions (and may be global variables) the ABI is just the name of the functions themselves. Depending on the platform, names can be mangled somehow, but, since every compiler must be able to place system calss, everything ends up using the same convention of the operating system builder (in windows, _cdecl just result in prepending a _ to the function name.
But C++ has overloading, hence more complex mangling scheme are required.
As far as of today, no agreement exists between compiler manufacturers about how such mangling must be done.
It is technically impossible to compile a C++ static library and link it to a C++ OBJ coming from another compiler. The same is for DLLs.
And since compilers are all different even for compiled overloaded member functions, no one is actually affording the problem of templates.
It CAN technically be afforded by introducing a redirection for every parametric type, and introducing dispatch tables but ... this makes templated function not different (in terms of call dispatching) than virtual functions of virtual bases, thus making the template performance to become similar to classic OOP dispatching (although it can limit code bloating ... the trade-off is not always obvious)
Right now, it seems there is no interest between compiler manufacturers to agree to a common standard since it will sacrifice all the performance differences every manufacturer can have with his own optimization.
C++ templates are compile-time generated code.
This means that if you want to use a templated class, you have to include its header (declaration) so the compiler can generate the right code for the templated class you need.
So, templates can't be pre-compiled to binary files.
What other libraries give you is pre-compiled base-utility classes that aren't templated.
C# generics for example are compiled into IL code in the form of dlls or executables.
But IL code is just like another programming language so this allows the compiler to read generics information from the included library.
.Net IL code is compiled into actual binary code at runtime, so the compiler at runtime has all the definitions it needs in IL to generate the right code for the generics.
I'm curious to know if anyone is working on compatible interface
layers for STL objects across multiple compilers and platforms for
C++.
Yes, I am. I am working on a layer of standardized interfaces which you can (among other things) use to pass binary safe "managed" references to instances of STL, Boost or other C++ types across component boundaries. The library (called 'Vex') will provide implementations of these interfaces plus proper factories to wrap and unwrap popular std:: or boost:: types. Additionally the library provides LINQ-like query operators to filter and manipulate contents of what I call Range and RangeSource. The library is not yet ready to "go public" but I intend to publish an early "preview" version as soon as possible...
Example:
com1 passes a reference to std::vector<uint32_t> to com2:
com1:
class Com2 : public ICom1 {
std::vector<int> mVector;
virtual void Com2::SendDataTo(ICom1* pI)
{
pI->ReceiveData(Vex::Query::From(mVector) | Vex::Query::ToInterface());
}
};
com2:
class Com2 : public ICom2 {
virtual void Com2::ReceiveData(Vex::Ranges::IRandomAccessRange<uint32_t>* pItf)
{
std::deque<uint32_t> tTmp;
// filter even numbers, reverse order and process data with STL or Boost algorithms
Vex::Query::From(pItf)
| Vex::Query::Where([](uint32_t _) -> { return _ % 2 == 0; })
| Vex::Query::Reverse
| Vex::ToStlSequence(std::back_inserter(tTmp));
// use tTmp...
}
};
You will recognize the relationship to various familiar concepts: LINQ, Boost.Range, any_iterator and D's Ranges... One of the basic intents of 'Vex' is not to reinvent wheel - it only adds that interface layer plus some required infrastructures and syntactic sugar for queries.
Cheers,
Paul

Modifying SWIG Interface file to Support C void* and structure return types

I'm using SWIG to generate my JNI layer for a large set of C APIs and I was wondering what is the best practices for the below situations. The below not only pertain to SWIG but JNI in general.
When C functions return pointers to Structures, should the SWIG interface file (JNI logic) be heavily used or should C wrapper functions be created to return the data in pieces (i.e. a char array that contains the various data elements)?
When C Functions return void* should the C APIs be modified to return the actual data type, whether it be primitive or structure types?
I'm unsure if I want to add a mass amount of logic and create a middle layer (SWIG interface file/JNI logic). Thoughts?
My approach to this in the past has been to write as little code as possible to make it work. When I have to write code to make it work I write it in this order of preference:
Write as C or C++ in the original library - everyone can use this code, you don't have to write anything Java or SWIG specific (e.g. add more overloads in C++, add more versions of functions in C, use return types that SWIG knows about in them)
Write more of the target language - supply "glue" to bring some bits of the library together. In this case that would be Java.
It doesn't really matter if this is "pure" Java, outside of SWIG altogether, or as part of the SWIG interface file from my perspective. Users of the Java interface shouldn't be able to distinguish the two. You can use SWIG to help avoid repetition in a number of cases though.
Write some JNI through SWIG typemaps. This is ugly, error prone if you're not familiar with writing it, harder to maintain (arguably) and only useful to SWIG+Java. Using SWIG typemaps does at least mean you only write it once for every type you wrap.
The times I'd favour this over 2. is one or more of:
When it comes up a lot (saves repetitious coding)
I don't know the target language at all, in which case using the language's C API probably is easier than writing something in that language
The users will expect this
Or it just isn't possible to use the previous styles.
Basically these guidelines I suggested are trying to deliver functionality to as many users of the library as possible whilst minimising the amount of extra, target language specific code you have to write and reducing the complexity of it when you do have to write it.
For a specific case of sockaddr_in*:
Approach 1
The first thing I'd try and do is avoid wrapping anything more than a pointer to it. This is what swig does by default with the SWIGTYPE_p_sockaddr_in thing. You can use this "unknown" type in Java quite happily if all you do is pass it from one thing to another, store in containers/as a member etc., e.g.
public static void main(String[] argv) {
Module.takes_a_sockaddr(Module.returns_a_sockaddr());
}
If that doesn't do the job you could do something like write another function, in C:
const char * sockaddr2host(struct sockaddr_in *in); // Some code to get the host as a string
unsigned short sockaddr2port(struct sockaddr_in *in); // Some code to get the port
This isn't great in this case though - you've got some complexity to handle there with address families that I'd guess you'd rather avoid (that's why you're using sockaddr_in in the first place), but it's not Java specific, it's not obscure syntax and it all happens automatically for you besides that.
Approach 2
If that still isn't good enough then I'd start to think about writing a little bit of Java - you could expose a nicer interface by hiding the SWIGTYPE_p_sockaddr_in type as a private member of your own Java type, and wrapping the call to the function that returns it in some Java that constructs your type for you, e.g.
public class MyExtension {
private MyExtension() { }
private SWIGTYPE_p_sockaddr_in detail;
public static MyExtension native_call() {
MyExtension e = new MyExtension();
e.detail = Module.real_native_call();
return e;
}
public void some_call_that_takes_a_sockaddr() {
Module.real_call(detail);
}
}
No extra SWIG to write, no JNI to write. You could do this through SWIG using %pragma(modulecode) to make it all overloads on the actual Module SWIG generates - this feels more natural to the Java users probably (it doesn't look like a special case) and isn't really any more complex. The hardwork is being done by SWIG still, this just provides some polish that avoids repetitious coding on the Java side.
Approach 3
This would basically be the second part of my previous answer. It's nice because it looks and feels native to the Java users and the C library doesn't have to be modified either. In essence the typemap provides a clean-ish syntax for encapsulating the JNI calls for converting from what Java users expect to what C works with and neither side knows about the other side's outlook.
The downside though is that it is harder to maintain and really hard to debug. My experience has been that SWIG has a steep learning curve for things like this, but once you reach a point where it doesn't take too much effort to write typemaps like that the power they give you through re-use and encapsulation of the C type->Java type mapping is very useful and powerful.
If you're part of a team, but the only person who really understands the SWIG interface then that puts a big "what if you get hit by a bus?" factor on the project as a whole. (Probably quite good for making you unfirable though!)

Design methods for multiple serialization targets/formats (not versions)

Whether as members, whether perhaps static, separate namespaces, via friend-s, via overloads even, or any other C++ language feature...
When facing the problem of supporting multiple/varying formats, maybe protocols or any other kind of targets for your types, what was the most flexible and maintainable approach?
Were there any conventions or clear cut winners?
A brief note why a particular approach helped would be great.
Thanks.
[ ProtoBufs like suggestions should not cut it for an upvote, no matter how flexible that particular impl might be :) ]
Reading through the already posted responses, I can only agree with a middle-tier approach.
Basically, in your original problem you have 2 distinct hierarchies:
n classes
m protocols
The naive use of a Visitor pattern (as much as I like it) will only lead to n*m methods... which is really gross and a gateway towards maintenance nightmare. I suppose you already noted it otherwise you would not ask!
The "obvious" target approach is to go for a n+m solution, where the 2 hierarchies are clearly separated. This of course introduces a middle-tier.
The idea is thus ObjectA -> MiddleTier -> Protocol1.
Basically, that's what Protocol Buffers does, though their problematic is different (from one language to another via a protocol).
It may be quite difficult to work out the middle-tier:
Performance issues: a "translation" phase add some overhead, and here you go from 1 to 2, this can be mitigated though, but you will have to work on it.
Compatibility issues: some protocols do not support recursion for example (xml or json do, edifact does not), so you may have to settle for a least-common approach or to work out ways of emulating such behaviors.
Personally, I would go for "reimplementing" the JSON language (which is extremely simple) into a C++ hierarchy:
int
strings
lists
dictionaries
Applying the Composite pattern to combine them.
Of course, that is the first step only. Now you have a framework, but you don't have your messages.
You should be able to specify a message in term of primitives (and really think about versionning right now, it's too late once you need another version). Note that the two approaches are valid:
In-code specification: your message is composed of primitives / other messages
Using a code generation script: this seems overkill there, but... for the sake of completion I thought I would mention it as I don't know how many messages you really need :)
On to the implementation:
Herb Sutter and Andrei Alexandrescu said in their C++ Coding Standards
Prefer non-member non-friend methods
This applies really well to the MiddleTier -> Protocol step > creates a Protocol1 class and then you can have:
Protocol1 myProtocol;
myProtocol << myMiddleTierMessage;
The use of operator<< for this kind of operation is well-known and very common. Furthermore, it gives you a very flexible approach: not all messages are required to implement all protocols.
The drawback is that it won't work for a dynamic choice of the output protocol. In this case, you might want to use a more flexible approach. After having tried various solutions, I settled for using a Strategy pattern with compile-time registration.
The idea is that I use a Singleton which holds a number of Functor objects. Each object is registered (in this case) for a particular Message - Protocol combination. This works pretty well in this situation.
Finally, for the BOM -> MiddleTier step, I would say that a particular instance of a Message should know how to build itself and should require the necessary objects as part of its constructor.
That of course only works if your messages are quite simple and may only be built from few combination of objects. If not, you might want a relatively empty constructor and various setters, but the first approach is usually sufficient.
Putting it all together.
// 1 - Your BOM
class Foo {};
class Bar {};
// 2 - Message class: GetUp
class GetUp
{
typedef enum {} State;
State m_state;
};
// 3 - Protocl class: SuperProt
class SuperProt: public Protocol
{
};
// 4 - GetUp to SuperProt serializer
class GetUp2SuperProt: public Serializer
{
};
// 5 - Let's use it
Foo foo;
Bar bar;
SuperProt sp;
GetUp getUp = GetUp(foo,bar);
MyMessage2ProtBase.serialize(sp, getUp); // use GetUp2SuperProt inside
If you need many output formats for many classes, I would try to make it a n + m problem instead of an n * m problem. The first way I come to think of is to have the classes reductible to some kind of dictionary, and then have a method to serlalize those dictionarys to each output formats.
Assuming you have full access to the classes that must be serialized. You need to add some form of reflection to the classes (probably including an abstract factory). There are two ways to do this: 1) a common base class or 2) a "traits" struct. Then you can write your encoders/decoders in relation to the base class/traits struct.
Alternatively, you could require that the class provide a function to export itself to a container of boost::any and provide a constructor that takes a boost::any container as its only parameter. It should be simple to write a serialization function to many different formats if your source data is stored in a map of boost::any objects.
That's two ways I might approach this. It would depend highly on the similarity of the classes to be serialized and on the diversity of target formats which of the above methods I would choose.
I used OpenH323 (famous enough for VoIP developers) library for long enough term to build number of application related to VoIP starting from low density answering machine and up to 32xE1 border controller. Of course it had major rework so I knew almost anything about this library that days.
Inside this library was tool (ASNparser) which converted ASN.1 definitions into container classes. Also there was framework which allowed serialization / de-serialization of these containers using higher layer abstractions. Note they are auto-generated. They supported several encoding protocols (BER,PER,XER) for ASN.1 with very complex ASN sntax and good-enough performance.
What was nice?
Auto-generated container classes which were suitable enough for clear logic implementation.
I managed to rework whole container layer under ASN objects hierarchy without almost any modification for upper layers.
It was relatively easy to do refactoring (performance) for debug features of that ASN classes (I understand, authors didn't intended to expect 20xE1 calls signalling to be logged online).
What was not suitable?
Non-STL library with lazy copy under this. Refactored by speed but I'd like to have STL compatibility there (at least that time).
You can find Wiki page of all the project here. You should focus only on PTlib component, ASN parser sources, ASN classes hierarchy / encoding / decoding policies hierarchy.
By the way,look around "Bridge" design pattern, it might be useful.
Feel free to comment questions if something seen to be strange / not enough / not that you requested actuall.