Google Protocol Buffers in C++: Creating a message from an existing struct - c++

I'm considering Google protocol buffers as a solution to my problem of communication between C++ and C# using named pipes. But I have one concern: all I've been able to find on protobuf is how to create a message from a prototype using protobuf compiler. This is neat, but I would also need to be able to serialize existing structs. I can't seem to find any info (but maybe I'm overlooking it). Do you know if it is possible to serialize a struct in C++ using protobufs, so it can be read in .NET, without modifying said existing struct?

Yes and no.
It's possible. In fact, I have done it. Not the .NET loading part, but the serialization to protobuf and the generation of a prototype from a C++ class. However, doing so requires a number of things and is not that easy.
First of all, protobufs are quite limited in their ability to represent data. They are basically only capable of representing POD-types (in the C++ sense), and very little else. I personally had to add a few basic things to the format to make it into a proper full-featured serialization format. But if you restrict yourself to POD-types, then the plain protobuf format will work fine.
The second thing is that you'll need a serialization library of some kind, and that will require that you add some code for each struct / class to perform the serialization / de-serialization (not necessarily "intrusively", meaning that you might not have to change the classes, just add some code on the side). You can look at Boost.Serialization, that's the basic template for how to create a serialization library in C++. Boost.Serialization is not particularly flexible for this purpose, and so, you might have to change a few things (like I had to do).
The third thing is that you will need quite a bit of wizardry under-the-hood to make this happen. In particular, you are going to need a reliable and feature-rich run-time type identification system (RTTI) in order to able to have useful type names, and you might need to clever meta-programming or some intrusive class hierarchy to be able to detect user-defined types for which you need to generate a prototype.
So, that's why my answer is "yes and no" because it is possible, but not without quite a bit of work and a good framework to rely on.
N.B.: Writing code to encode/decode data into the proto-buf format (with those small-ints, and all that) is really the easy part, proto-buf format is so simple, it's almost laughable. Writing the serialization framework that will allow you to do fancy things like generating prototypes, that's the hard part.

Related

Serialize/ Deserialize C++ classes

I'm looking for a way to send C++ class between 2 clients aptication.
I was looking for a way doing so and all i can find is that I need to create for each Class Serialize/ Deserialize (to JSON for example) functions and send it over TCP/IP.
The main problem I'm faceing is that I have ~600 classes (some are classes including instances of others) that I need to pass which mean I need to spent the next writing Serialize/ Deserialize functions.
Is there any generic way writing Serialize/Deserialize functions ?
Is there any other way sending C++ classes ?
Thanks,
Guy Ergas.
Are you using a Framework at all? Qt and MFC for example have built in Serialization that would make your task easier. Otherwise I would guess that you'd need to spend at least some effort on each of the 600 classes.
As recommended above Boost Serialization is probably a good way to go, you can send the serialized class over Tcp using Boost Asio too:
http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio.html
Alternatively, there is a C++ API for Google Protocol Buffers (protobuf):
https://developers.google.com/protocol-buffers/docs/reference/cpp/
Boost Serialization
Although I haven't used it my self, it is very popular around my peers at work.
More info about it can be found in "Boost (1.54.00) Serialization"
Thrift
Thrift have a very limited serialize functionality which I don't think fits your requirements. But it can help you "move" the data from one client to anther even if they are using different languages.
More info about it can be found in "Thrift: The Missing Guide"
try s11n or nosjob
s11n (an abbreviation for serialization) is an Open Source project
focused on the generic serialization of objects (i.e., object
persistence) in the C++ programming language.
nosjob, a C++ library for generating and consuming JSON data.
You may be interested in ASN.1. It's not necessarily the easiest to use and tools/libraries are a little hard to come by (Objective Systems at http://www.obj-sys.com/index.php is worth a look, though not free).
However the big advantage is that it is very heavily standardised (so no trouble with library version incompatibilities) and most languages are supported one way or another. Handy if you need support across multiple platforms. It also does binary encodings, so its way less bloaty than XML (which it also supports). I chose it for these reasons and didn't regret it.
If you are at linux platform , You can directly use json.h library for serialization.
Here is sample code i have come across :)
Json Serializer

Working with protobuf and POCOs in C++

I would like to use protobuf with a C++ project I'm working on.
However, I don't like to work with the auto-generated classes protoc creates and prefer to stick with the POCOs I already have. This is because the POCOs are already in use in other parts of the code and I want to be able to switch the serialization mechanism with ease later on. But manually writing converters between POCOs and protobuf message classes seems tedious and wrong.
I want to know if there's a way to use protobuf to create a serializer - an auto-generated class that will be able to serialize and deserialize my POCOs, without bugging me with internals.
Thanks.
First, you may like Cap'n Proto better, it was created by one of Google's former Google Protocol Buffer maintainers. Worth looking into, anyway.
But otherwise, you really need to consider why you're using Google Protocol Buffers.
If you want to achieve the forward and backward compatibility, and to be able to open, then edit, then save an object that possibly a different person created, with a different version of your protocol buffer declaration, and then sent along to yet another person with an even different version of the declaration... then you need to just bite the bullet and use the generated C++ from the Google Protocol Buffer Compiler.
It's really not just a serialization format. It's specifically designed to make it easy living with different versions of your serialization, over time.
If you don't need that flexibility, and you don't like the generated code, you may want to consider a different serialization tool.

C++ basic serialization without using extra libraries

Someone can give me a good hint about how to perform serialization without using extra libraries other than the standard ones ?
I would question whether you have a good reason for not using libraries. There is a lot of data that shows that the code you write yourself is most likely to blow up in your face down the line and the reason for that is because its the code with the least testing behind it.
If you do have a good reason and you still need to serialize, then you have to write your own. Basically, you're looking at overloading the usual ostream and istream operators so that they support the types you need.
Again, you run the risk of Re-inventing the square wheel. Keep in mind that the best libraries (like boost) are themselves written using standard C++ and the licensing requirements on bost do not require you to release your source or any such thing. In other words, your IP is safe even after you use them.
Your options are: don't serialize, or, write your own serializer code. Its not built into the language or standard libraries.
Also, you might want to look at some similar questions:
Serialize Strings, ints and floats to character arrays for networking WITHOUT LIBRARIES
You can use something called binary serialisation using streams ,as demonstrated in
http://www.functionx.com/cpp/articles/serialization.htm

Plugin framework in C++ with

I'm designing (brainstorming) a C++ plugin framework for an extensible architecture.
Each plugin registers an interface, which is implemented by the plugin itself.
Such framework may be running on relatively capable embedded devices (e.g. Atom/ARM) so I can use STL and Boost.
At the moment I've managed to write a similar framework, in which interfaces are known in advance and plugins (loaded from dynamic libraries) register the objects implementing them. Those objects are instantiated as needed by their factory methods, and methods are called correctly.
Now I want to make it more flexible, having plugins register new interfaces (not just implementing the existing ones) thus extending the API available to the framework users.
I thought of using a std::map<std::string, FunctionPtr>, which is also mentioned by several articles and stackoverflow replies I've read. Unfortunately it doesn't seem to capture the case of different method interfaces.
I feel it might have something to do with template metaprogramming, or traits perhaps, but I can't figure out how it should work exactly. Can anyone help?
Try looking at XPCOM which solves these problems for you - by sortof re-implementing COM.
You have the issue of not knowing what interface the plugin provides to your application, so you need a way for the developer to access it, without the compiler knowing what it is (though, if you supply a header file, then suddenly you do know what it is and you can compile it without any need for plugin unknown-interface fanciness)
so, you're going to have to rely on runtime determinism of the interface, that roughly requires you to define the interface in some way so that the framework can call arbitrary methods on it, and I think the only realistic way you can do that is to define each interface as a set of function pointers that are loaded individually and then stored in data for the user to call. And that pretty much means a map of function pointers to names. It also means you can only user compiler niceties (such as overloading) by making the function names unique. The compiler does this for you by 'mangling' all functions to unique, coded names.
Type Traits will help you wrap your imported functions in your framework, so you can inspect them and create classes that work with any imported type, but it isn't going to solve the main problem of importing arbitrary functions.
Possibly one approach that you'll want to read is Metaclasses and Reflection by Vollmann. This was referenced by the C++ standard body, though I don't know if it will become part of a future spec. Alternatively you can look at Boost.Extension
Maybe the first thing you need check is COM.
Anything that can be done with templates, can be done without, though perhaps in a much less convenient way, by writing "template instances" by hand.
If your framework was compiled without seeing a declaration of class MyNewShinyInterface, it cannot store pointers of type MyNewShinyInterface *, and cannot return them to the framework users. No amount of template wizardry can change that. The framework can only store an pass around pointers to some base class. The users will have to do a dynamic_cast to retrieve the correctly typed pointer.
The same is true about function pointers, only functions have no base classes and one will have to do the error-prone reinterpret_cast to retrieve the right type. (This is just another reason to prefer proper objects over function pointers.)

How to design a C / C++ library to be usable in many client languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm planning to code a library that should be usable by a large number of people in on a wide spectrum of platforms. What do I have to consider to design it right? To make this questions more specific, there are four "subquestions" at the end.
Choice of language
Considering all the known requirements and details, I concluded that a library written in C or C++ was the way to go. I think the primary usage of my library will be in programs written in C, C++ and Java SE, but I can also think of reasons to use it from Java ME, PHP, .NET, Objective C, Python, Ruby, bash scrips, etc... Maybe I cannot target all of them, but if it's possible, I'll do it.
Requirements
It would be to much to describe the full purpose of my library here, but there are some aspects that might be important to this question:
The library itself will start out small, but definitely will grow to enormous complexity, so it is not an option to maintain several versions in parallel.
Most of the complexity will be hidden inside the library, though
The library will construct an object graph that is used heavily inside. Some clients of the library will only be interested in specific attributes of specific objects, while other clients must traverse the object graph in some way
Clients may change the objects, and the library must be notified thereof
The library may change the objects, and the client must be notified thereof, if it already has a handle to that object
The library must be multi-threaded, because it will maintain network connections to several other hosts
While some requests to the library may be handled synchronously, many of them will take too long and must be processed in the background, and notify the client on success (or failure)
Of course, answers are welcome no matter if they address my specific requirements, or if they answer the question in a general way that matters to a wider audience!
My assumptions, so far
So here are some of my assumptions and conclusions, which I gathered in the past months:
Internally I can use whatever I want, e.g. C++ with operator overloading, multiple inheritance, template meta programming... as long as there is a portable compiler which handles it (think of gcc / g++)
But my interface has to be a clean C interface that does not involve name mangling
Also, I think my interface should only consist of functions, with basic/primitive data types (and maybe pointers) passed as parameters and return values
If I use pointers, I think I should only use them to pass them back to the library, not to operate directly on the referenced memory
For usage in a C++ application, I might also offer an object oriented interface (Which is also prone to name mangling, so the App must either use the same compiler, or include the library in source form)
Is this also true for usage in C# ?
For usage in Java SE / Java EE, the Java native interface (JNI) applies. I have some basic knowledge about it, but I should definitely double check it.
Not all client languages handle multithreading well, so there should be a single thread talking to the client
For usage on Java ME, there is no such thing as JNI, but I might go with Nested VM
For usage in Bash scripts, there must be an executable with a command line interface
For the other client languages, I have no idea
For most client languages, it would be nice to have kind of an adapter interface written in that language. I think there are tools to automatically generate this for Java and some others
For object oriented languages, it might be possible to create an object oriented adapter which hides the fact that the interface to the library is function based - but I don't know if its worth the effort
Possible subquestions
is this possible with manageable effort, or is it just too much portability?
are there any good books / websites about this kind of design criteria?
are any of my assumptions wrong?
which open source libraries are worth studying to learn from their design / interface / souce?
meta: This question is rather long, do you see any way to split it into several smaller ones? (If you reply to this, do it as a comment, not as an answer)
Mostly correct. Straight procedural interface is the best. (which is not entirely the same as C btw(**), but close enough)
I interface DLLs a lot(*), both open source and commercial, so here are some points that I remember from daily practice, note that these are more recommended areas to research, and not cardinal truths:
Watch out for decoration and similar "minor" mangling schemes, specially if you use a MS compiler. Most notably the stdcall convention sometimes leads to decoration generation for VB's sake (decoration is stuff like #6 after the function symbol name)
Not all compilers can actually layout all kinds of structures:
so avoid overusing unions.
avoid bitpacking
and preferably pack the records for 32-bit x86. While theoretically slower, at least all compilers can access packed records afaik, and the official alignment requirements have changed over time as the architecture evolved
On Windows use stdcall. This is the default for Windows DLLs. Avoid fastcall, it is not entirely standarized (specially how small records are passed)
Some tips to make automated header translation easier:
macros are hard to autoconvert due to their untypeness. Avoid them, use functions
Define separate types for each pointer types, and don't use composite types (xtype **) in function declarations.
follow the "define before use" mantra as much as possible, this will avoid users that translate headers to rearrange them if their language in general requires defining before use, and makes it easier for one-pass parsers to translate them. Or if they need context info to auto translate.
Don't expose more than necessary. Leave handle types opague if possible. It will only cause versioning troubles later.
Do not return structured types like records/structs or arrays as returntype of functions.
always have a version check function (easier to make a distinction).
be careful with enums and boolean. Other languages might have slightly different assumptions. You can use them, but document well how they behave and how large they are. Also think ahead, and make sure that enums don't become larger if you add a few fields, break the interface. (e.g. on Delphi/pascal by default booleans are 0 or 1, and other values are undefined. There are special types for C-like booleans (byte,16-bit or 32-bit word size, though they were originally introduced for COM, not C interfacing))
I prefer stringtypes that are pointer to char + length as separate field (COM also does this). Preferably not having to rely on zero terminated. This is not just because of security (overflow) reasons, but also because it is easier/cheaper to interface them to Delphi native types that way.
Memory always create the API in a way that encourages a total separation of memory management. IOW don't assume anything about memory management. This means that all structures in your lib are allocated via your own memory manager, and if a function passes a struct to you, copy it instead of storing a pointer made with the "clients" memory management. Because you will sooner or later accidentally call free or realloc on it :-)
(implementation language, not interface), be reluctant to change the coprocessor exception mask. Some languages change this as part of conforming to their standards floating point error(exception-)handling.
Always pair a callbacks with an user configurable context. This can be used by the user to give the the callback state without defining global variables. (like e.g. an object instance)
be careful with the coprocessor status word. It might be changed by others and break your code, and if you change it, other code might stop working. The status word is generally not saved/restored as part of calling conventions. At least not in practice.
don't use C style varargs parameters. Not all languages allow variable number of parameters in an unsafe way
(*) Delphi programmer by day, a job that involves interfacing a lot of hardware and thus translating vendor SDK headers. By night Free Pascal developer, in charge of, among others, the Windows headers.
(**)
This is because what "C" means binary is still dependant on the used C compiler, specially if there is no real universal system ABI. Think of stuff like:
C adding an underscore prefix on some binary formats (a.out, Coff?)
sometimes different C compilers have different opinions on what to do with small structures passed by value. Officially they shouldn't support it at all afaik, but most do.
structure packing sometimes varies, as do details of calling conventions (like skipping
integer registers or not if a parameter is registerable in a FPU register)
===== automated header conversions ====
While I don't know SWIG that well, I know and use some delphi specific header tools( h2pas, Darth/headconv etc).
However I never use them in fully automatic mode, since more often then not the output sucks. Comments change line or are stripped, and formatting is not retained.
I usually make a small script (in Pascal, but you can use anything with decent string support) that splits a header up, and then try a tool on relatively homogeneous parts (e.g. only structures, or only defines etc).
Then I check if I like the automated conversion output, and either use it, or try to make a specific converter myself. Since it is for a subset (like only structures) it is often way easier than making a complete header converter. Of course it depends a bit what my target is. (nice, readable headers or quick and dirty). At each step I might do a few substitutions (with sed or an editor).
The most complicated scheme I did for Winapi commctrl and ActiveX/comctl headers. There I combined IDL and the C header (IDL for the interfaces, which are a bunch of unparsable macros in C, the C header for the rest), and managed to get the macros typed for about 80% (by propogating the typecasts in sendmessage macros back to the macro declaration, with reasonable (wparam,lparam,lresult) defaults)
The semi automated way has the disadvantage that the order of declarations is different (e.g. first constants, then structures then function declarations), which sometimes makes maintenance a pain. I therefore always keep the original headers/sdk to compare with.
The Jedi winapi conversion project might have more info, they translated about half of the windows headers to Delphi, and thus have enormous experience.
I don't know but if it's for Windows then you might try either a straight C-like API (similar to the WINAPI), or packaging your code as a COM component: because I'd guess that programming languages might want to be able to invoke the Windows API, and/or use COM objects.
Regarding automatic wrapper generation, consider using SWIG. For Java, it will do all the JNI work. Also, it is able to translate complex OO-C++-interfaces properly (provided you follow some basic guidelines, i.e. no nested classes, no over-use of templates, plus the ones mentioned by Marco van de Voort).
Think C, nothing else. C is one of the most popular programming languages. It is widely used on many different software platforms, and there are few computer architectures for which a C compiler does not exist. All popular high-level languages provide an interface to C. That makes your library accessible from almost all platforms in existence. Don't worry too much about providing an Object Oriented interface. Once you have the library done in C, OOP, functional or any other style interface can be created in appropriate client languages. No other systems programming language will give you C's flexibility and potability.
NestedVM I think is going to be slower than pure Java because of the array bounds checking on the int[][] that represents the MIPS virtual machine memory. It is such a good concept but might not perform well enough right now (until phone manufacturers add NestedVM support (if they do!), most stuff is going to be SLOW for now, n'est-ce pas)? Whilst it may be able to unpack JPEGs without error, speed is of no small concern! :)
Nothing else in what you've written sticks out, which isn't to say that it's right or wrong! The principles sound (mainly just listening to choice of words and language to be honest) like roughly standard best practice but I haven't thought through the details of everything you've said. As you said yourself, this really ought to be several questions. But of course doing this kind of thing is not automatically easy just because you're fixed on perhaps a slightly different architecture to the last code base you've worked on...! ;)
My thoughts:
All your comments on C interface compatibility sound sensible to me, pretty much best practice except you don't seem to properly address memory management policy - some sentences a bit ambiguous/vague/wrong-sounding. The design of the memory management will be to a large extent determined by the access patterns made in your application, rather than the functionality per se. I suiggest you study others' attempts at making portable interfaces like the standard ANSI C API, Unix API, Win32 API, Cocoa, J2SE, etc carefully.
If it was me, I'd write the library in a carefully chosen subset of the common elements of regular Java and Davlik virtual machine Java and also write my own custom parser that translates the code to C for platforms that support C, which would of course be most of them. I would suggest that if you restrict yourself to data types of various size ints, bools, Strings, Dictionaries and Arrays and make careful use of them that will help in cross-platform issues without affecting performance much most of the time.
your assumptions seem ok, but i see trouble ahead, much of which you have already spotted in your assumptions.
As you said, you can't really export c++ classes and methods, you will need to provide a function based c interface. What ever facade you build around that, it will remain a function based interface at heart.
The basic problem i see with that is that people choose a specific language and its runtime because their way of thinking (functional or object oriented) or the problem they address (web programming, database,...) corresponds to that language in some way or other.
A library implemented in c will probably never feel like the libraries they are used to, unless they program in c themselves.
Personally, I would always prefer a library that "feels like python" when I use python, and one that feels like java when I do Java EE, even though I know c and c++.
So your effort might be of little actual use (other than your gain in experience), because people will probably want to stick with their mindset, and rather re-implement the functionality than use a library that does the job, but does not fit.
I also fear the desired portability will seriously hamper development. Just think of the infinite build settings needed, and tests for that. I have worked on a project that tried to maintain compatibility for 5 operating systems (all posix-like, but still) and about 10 compilers, the builds were a nightmare to test and maintain.
Give it an XML interface, whether passed as a parameter and return value or as files through a command-line invocation. This may not seem as direct as a normal function interface, but is the most practical way to access an executable from, e.g., Java.