Thrift or Protocol buffer as cross-language serialization solution?

Thrift or Protocol buffer as cross-language serialization solution? - c++

I've already chosen to use thrift as RPC framework in a project. This project has a lot of serialization / deserialization operations (e.g., store the data to disks). And the serialized format should be accessible for at least C++/Java/Python. It seems that thrift's serialization solution is more complicated than Protobuf (e.g., it needs to create a protocol before serializing an object).
So my question is: is it worth to use Protobuf for the serialization / deserialization part even if thrift is capable of this task?

I would agree that Thrift is a better choice for cross language RPC than Protobuf RPC ( see http://pjklauser.wordpress.com/2013/02/27/why-googles-protobuf-rpc-will-not-reach-widespread-adoption/ ). If you're using thrift already it's difficult to justify using a different "library" for serialization to file/storage. You'll need to write endless mapping code. Both libraries will have different maintenance cycles which you need to maintain independently which will give extra future effort. The cost of writing a line or two more code, or save one or two bytes of space, or save a microsecond of CPU time will be nothing compared to your additional efforts.

Related

Best way to "mangle" (represent) the memory

I would like to know what would be the best way to map/represent the memory. I mean, how to describe, for example, a structure with all its field to be serialized.
I am creating a RPC library that will create the client and server using the dwarf debug data, so i need to create a function wrapper to serialize and deserialize the functions´s parameters.
Now, i am using the gcc mangling types to identify all the fields, but the compiler sometimes creates holes to optimize the memory access time;
Any idea ?

I use the "cereal" library for serialization (http://uscilab.github.io/cereal/)
Alternatives include Google's Protocol Buffers, although I found it too difficult to integrate for my comparably simple serialization tasks.
For communication between processes, and languages, I've had a good experience with ZeroC's ICE library (https://zeroc.com/products/ice). You specific the structure as an external compilation step similar to Google's Protocol Buffers. The nice part is that the network connect was also taken care off.

How to best expose ocaml library to other languages?

There are various exchange languages - json, ect - that provide an ability to quickly and reliably export and parse data to a common format. This is a boon between languages, and for it there is Piqi, which basically generates parsable exchange formats for any type that you define; it automates the process of writing boiler code (writing functions that read in some exchange info and build up a instance of some arbitrary type). Basically, the best option to date is protocol buffers, and I absolutely want, if I go down the route of ocaml-rpc, to use protocol buffers.
It would be nice if there were some declarative pattern to manage function exposure, so that the ocaml library can be reached over some medium (like RPC, or map a function to a url with encoding for arguments).
Imagine offering a library as a service; where you don't want to or can't make actual bindings between every single pair of languages. But servers and the data parsing has already been written... so wouldn't there be some way to integrate the two, and just specify what functions should be exposed and where/how?
Lastly, it appears to me that protocol buffers is a mechanism by which you can encode/decode data quickly, but not a transport mechanism... is there some kind of ocaml-RPC spec or some ocaml RPC library? Aren't there various RPC protocols (and ergo, if I try to point two languages using diff protocols at one another, achieve failure)? Additionally, the server mechanism that waits and receives RPC calls is (possibly) another module(?)
How do I achieve this?

To update this, the latest efforts under the piqi project are aimed at producing a working OCaml RPC service. From this, it would be, in vision, easy to specify what functions to expose at the RPC service end, and target function selection on the client side should allow for some mechanized facility to allow those exposed functions to be selected.
At the current time, this RPC system for ocaml facilitates inter-language exchange of data that can be reconstructed by parsers through the use of proto-buffers; it is under development and still being discussed here

I think that ocaml-rpc library suits your requirements. It can infer serialization functions and, also, can generate client and server code. The interesting part, is that they use OCaml as a IDL language. For example, this is a definition of the rpc function:
external rpc2 : ?opt:string -> variant -> unit = ""
From which there will be inferred server and client functorized code, that will take care on transporting marshaling and demarshaling the data, so that you need to work only with pure OCaml data types.
The problem with this library is that it is barely documented, so you may find it hard to use.
Also, as now I know, that you're tackling with BAP, I would like to bring your attention to a new BAP 1.x, that will be ready soon, and it will have bindings, that will allow to call it from any language, although currently we're mostly targeting python.

Serialize/ Deserialize C++ classes

I'm looking for a way to send C++ class between 2 clients aptication.
I was looking for a way doing so and all i can find is that I need to create for each Class Serialize/ Deserialize (to JSON for example) functions and send it over TCP/IP.
The main problem I'm faceing is that I have ~600 classes (some are classes including instances of others) that I need to pass which mean I need to spent the next writing Serialize/ Deserialize functions.
Is there any generic way writing Serialize/Deserialize functions ?
Is there any other way sending C++ classes ?
Thanks,
Guy Ergas.

Are you using a Framework at all? Qt and MFC for example have built in Serialization that would make your task easier. Otherwise I would guess that you'd need to spend at least some effort on each of the 600 classes.
As recommended above Boost Serialization is probably a good way to go, you can send the serialized class over Tcp using Boost Asio too:
http://www.boost.org/doc/libs/1_54_0/doc/html/boost_asio.html
Alternatively, there is a C++ API for Google Protocol Buffers (protobuf):
https://developers.google.com/protocol-buffers/docs/reference/cpp/

Boost Serialization
Although I haven't used it my self, it is very popular around my peers at work.
More info about it can be found in "Boost (1.54.00) Serialization"
Thrift
Thrift have a very limited serialize functionality which I don't think fits your requirements. But it can help you "move" the data from one client to anther even if they are using different languages.
More info about it can be found in "Thrift: The Missing Guide"

try s11n or nosjob
s11n (an abbreviation for serialization) is an Open Source project
focused on the generic serialization of objects (i.e., object
persistence) in the C++ programming language.
nosjob, a C++ library for generating and consuming JSON data.

You may be interested in ASN.1. It's not necessarily the easiest to use and tools/libraries are a little hard to come by (Objective Systems at http://www.obj-sys.com/index.php is worth a look, though not free).
However the big advantage is that it is very heavily standardised (so no trouble with library version incompatibilities) and most languages are supported one way or another. Handy if you need support across multiple platforms. It also does binary encodings, so its way less bloaty than XML (which it also supports). I chose it for these reasons and didn't regret it.

If you are at linux platform , You can directly use json.h library for serialization.
Here is sample code i have come across :)
Json Serializer

MPI or Asio for wide-area plugin-based message passing

I'm writing a distributed system wherein each node interfaces with local applications via some RESTful API, supports extensions and runtime customization, etc. It's a bit like an Enterprise Service Bus over the wide area, but with a lot else going on which is not related to the question at hand.
I've read a little about both MPI and Asio - originally I was set on Asio, but then I found MPI, and now again I'm thinking Asio is the better solution for me. MPI seems to provide a lot I don't need and a higher level of abstraction than I want - for example, I only need point-to-point communication, and it is important to me to be in control of when and what data is transmitted. (e.g. I have already designed a packet structure that I would conform to ideally)
So my primary question: it is worth it to start from a lower level with Asio, or should I try to graft MPI onto what i'm looking for? Further, are there 'skeleton applications' available which use MPI or Asio which would aid development? (Actually I am 100% new to C++.. ;) Or, does it make sense to use them in tandem?
For more perspective, maybe it's worth noting that I already have implemented the bulk of this project in Perl using Perl Object Environment, which itself is just an asynchronous event system with a ton of networking libraries.
Also, if it makes a difference, I would ideally use threads for this.
Honestly though I have not used Boost at all yet, as I hinted above, so any input is appreciated.

I should start by saying that I know nothing about Asio, but from the 2 minute scan of the website and the description of your problem, it sounds like while either would probably work for you, Asio might be simpler. MPI is really designed less for general purpose network communication and more for running applications where the set of processes and applications is a little more static. While it provides a client/server style interface if desired, it's not the main focus of the library.
MPI is also more difficult to use if you already have a packet structure designed. MPI is great for abstracting away the need to worry about packets, location of processes, etc. but if you've already taken all of that into account in you application, you have already done the hard work.
There have been at least one other discussion of Asio vs. MPI that you can take a look at (For distributed applications, which to use, ASIO vs. MPI?) to get more opinions too.

Is it safe to use boost serialization to serialize objects in C++ to a binary format for use over a socket?

I know that you can use boost serialization to serialize to a text format and then push over a socket, but I'd like to serialize a class of statistics data into a binary format (both for size and encoding/decoding overhead reasons). Is it safe to use boost serialization for this?
My specific worries are:
Differences between integer type sizes on different platforms (mainly 32-bit vs 64-bit).
Though I can largely get around this by using exactly-sized integer from stdint, I'd still like to understand the behavior.
Differences in endianness between systems, does boost serialize into a standard endian-ness (eg: network ordering), and then deserialize using the host's endianness?
It's a very nice library, but unfortunately documentation on it's binary capabilities is somewhat limited, so I just want to make sure that using it this way would be safe.

No, in general boost binary serialization is not machine-independent. See here.

It's available, I've been hearing a lot about Google's protobuf. It has a C and C++ binding.

You should check out Apache Thrift. It was designed by Facebook for cross platform serialization/deserialization.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Thrift or Protocol buffer as cross-language serialization solution? - c++

Related

Best way to "mangle" (represent) the memory

How to best expose ocaml library to other languages?

Serialize/ Deserialize C++ classes

MPI or Asio for wide-area plugin-based message passing

Is it safe to use boost serialization to serialize objects in C++ to a binary format for use over a socket?

Categories

Resources