How to best expose ocaml library to other languages? - ocaml

There are various exchange languages - json, ect - that provide an ability to quickly and reliably export and parse data to a common format. This is a boon between languages, and for it there is Piqi, which basically generates parsable exchange formats for any type that you define; it automates the process of writing boiler code (writing functions that read in some exchange info and build up a instance of some arbitrary type). Basically, the best option to date is protocol buffers, and I absolutely want, if I go down the route of ocaml-rpc, to use protocol buffers.
It would be nice if there were some declarative pattern to manage function exposure, so that the ocaml library can be reached over some medium (like RPC, or map a function to a url with encoding for arguments).
Imagine offering a library as a service; where you don't want to or can't make actual bindings between every single pair of languages. But servers and the data parsing has already been written... so wouldn't there be some way to integrate the two, and just specify what functions should be exposed and where/how?
Lastly, it appears to me that protocol buffers is a mechanism by which you can encode/decode data quickly, but not a transport mechanism... is there some kind of ocaml-RPC spec or some ocaml RPC library? Aren't there various RPC protocols (and ergo, if I try to point two languages using diff protocols at one another, achieve failure)? Additionally, the server mechanism that waits and receives RPC calls is (possibly) another module(?)
How do I achieve this?

To update this, the latest efforts under the piqi project are aimed at producing a working OCaml RPC service. From this, it would be, in vision, easy to specify what functions to expose at the RPC service end, and target function selection on the client side should allow for some mechanized facility to allow those exposed functions to be selected.
At the current time, this RPC system for ocaml facilitates inter-language exchange of data that can be reconstructed by parsers through the use of proto-buffers; it is under development and still being discussed here

I think that ocaml-rpc library suits your requirements. It can infer serialization functions and, also, can generate client and server code. The interesting part, is that they use OCaml as a IDL language. For example, this is a definition of the rpc function:
external rpc2 : ?opt:string -> variant -> unit = ""
From which there will be inferred server and client functorized code, that will take care on transporting marshaling and demarshaling the data, so that you need to work only with pure OCaml data types.
The problem with this library is that it is barely documented, so you may find it hard to use.
Also, as now I know, that you're tackling with BAP, I would like to bring your attention to a new BAP 1.x, that will be ready soon, and it will have bindings, that will allow to call it from any language, although currently we're mostly targeting python.

Related

How to define a class and method in DDS idl file?

I am new to DDS.... so far I have little experience in OpenDDS and CycloneDDS
Is it possible to define a class inside the idle file and have member variables and member methods? or only structure and primitive data types are supported in DDS standards?
The IDL language is defined in the OMG IDL specification. It consists of a number of building blocks that include Core Data Types, like the structures and primitive data types that you mentioned, and Interfaces, that includes the methods you asked about.
However, only a subset of those building blocks is used by DDS. For the current version 4.2, section 9.3 DDS Profiles defines which of them are relevant for three different levels of support by DDS: Plain DDS, Extensible DDS and DDS over RPC.
You will see that the latter indeed includes Building Block Interfaces - Basic, as you might expect from RPC. However, not all DDS implementations support RPC. Plain DDS and Extensible DDS are more commonly supported and interfaces are not part of that functionality.
Since you asked about this in another question: note that the interface functionality as captured in DDS over RPC is not for the purpose of distributing objects with their methods, but for invoking methods on objects remotely -- as the name Remote Procedure Call implies.
Another answer to your question is that you are, perhaps, asking the follow-up question as if it is the initial one. There are many different ways of building distributed systems, and given your question, three examples seem appropriate:
those designed around remote procedure calls/remote method invocation: in this context, CORBA is the perfect reference, but there are many (RPC, gRPC, DCOM, you name it);
those designed around shipping objects with their implementation across: one example is Java/JINI, but there are many others (JavaScript in a browser could be considered one);
those designed around shipping state (a.k.a. plain old data) and adding/transforming that state: SPLICE in the ancient history, DDS today.
Your question suggests that you are looking for middleware for doing distributed object computing. If that's indeed what you are looking for, DDS is a very suboptimal choice. Yes, RPC can be built on top of it (RPC-over-DDS simply makes it a bit easier to do it) and in a system predominantly built around distributing state it makes sense to do that.
If you can serialise objects with their methods then of course you can use DDS to distribute them in the network (there are fun things you can do that way). However, that's more a function of the programming language you use than of the middleware and IDL won't help you with that.

Best way to "mangle" (represent) the memory

I would like to know what would be the best way to map/represent the memory. I mean, how to describe, for example, a structure with all its field to be serialized.
I am creating a RPC library that will create the client and server using the dwarf debug data, so i need to create a function wrapper to serialize and deserialize the functions´s parameters.
Now, i am using the gcc mangling types to identify all the fields, but the compiler sometimes creates holes to optimize the memory access time;
Any idea ?
I use the "cereal" library for serialization (http://uscilab.github.io/cereal/)
Alternatives include Google's Protocol Buffers, although I found it too difficult to integrate for my comparably simple serialization tasks.
For communication between processes, and languages, I've had a good experience with ZeroC's ICE library (https://zeroc.com/products/ice). You specific the structure as an external compilation step similar to Google's Protocol Buffers. The nice part is that the network connect was also taken care off.

How to use our own I/O framework inside a Thrift client?

On the server side, everything is ok.
But on the client side, it seems we cannot just use Thrift to process the protocol, and send/receive the data by using our own I/O framework(such as muduo or other ones).
Is there any way to implement this with C++?
I think this is a legitimate question, and it can be extended to the more general question:
How do I use other transport mechanisms with Apache Thrift?
As Hcorg pointed out, because of ist modular structure of the framework, it is not quite hard to achieve that. Basically, one has to follow these steps (this is true for all languages supported by Thrift, not only C++)
derive a specialized class from TTransport. In some cases this is an interface, not a base class, but that does not really matter.
implement all the methods needed
for the server side, you may need a TServerTransport derivative
The existing implementations may serve as models, and despite the number of methods in TTransport, most of them are not really hard to implement.
Additionally, I also provided a specialized transport implementation to use STOMP with Delphi, based on a TStreamTransport. The relevant code can be found in the /contrib Folder and is worth a look. You know, one of the nice things about Thrift is that stuff work very similar in all languages.

How to write c++ spout/bolt on Storm and Thrift usage in Storm

From here:
Storm was designed from the very beginning to be compatible with multiple languages. Nimbus is a Thrift service and topologies are defined as Thrift structures. The usage of Thrift allows Storm to be used from any language.
I see that a topology created in java get deployed by serializing the topology (spouts, bolts , ComponentCommon) as a Thrift datatypes and then gets deployed on Nimbus. In Java it is easy to serialize the object with its methods and data. So on the other side Nimbus just needs to create objects and invoke them. (i might be missing detail here but I hope I got the point correctly)
But I wonder how to write the topology in C++ and deploy it the same way. Does thrift help to serialize the c++ based topology and Nimbus deploys/executes the topology in the same way as for Java?
I have seen the links link1 link2 in this regard and the only solution seems to be using a Shelbolt. which invokes the process and communicates with it over standard i/o.
In order to use the Thrift way, do we need to rewrite the storm core also in C++? Also why use Thrift when it supports only JVM languages? Thrift doesn't seem to be used at all for languages like python/c++.
I'm not sure if I'm understand your question correctly -- in my understanding you're asking Is it possible [without the Shebolt hack] to use Storm [with Thrift as comm protocol] with C++-written bolts and with C++ as the language that creates the topology.
Because of the lack of other answers to this question and based on my own research I assume there is no finished, usable implementation for your problem.
Therefore if you really have to use Storm (its common usecase is the JVM, so even if it could theoretically work with any language, it doesn't mean there is an ecosystem for other languages) and C++, you have no option but to use the Shebolt hack or modify Thrift yourself.
As you know, thrift itself has also been ported to C++. Therefore it is possible to re-build the API calls in C++. Basically, you'd need to port the Java TopologyBuilder. On the C++ side, you could start with the Thrift C++ tutorial.
This is also some kind of a hack, as you basically just rebuild half of the stack (in this case ontop of Thrift), but in general you have very few other options with a system design like Storm.
For example, the MySQL binary protocol has been rebuilt from-scr
Unless anyone has done the work for you (which I would have completely missed in my research) I see no option than to do it yourself (maybe even storm is not the best tool for your usecase!?)
If another hack (which might be even more complex and maybe even slower) besides ShellBolt is good enough for you, you could try starting a JVM from inside C++, e.g. see this SO post. I would not recommend this.
If you need an alternative distributed task queue, I have had good experience with Celery in Python environments, however I have no experience in using it in C++ directly (I usually control Python with ZeroMQ, or write my own ZeroMQ-based queues where necessary, but this is no universal solution).

How to cross-platform remote procedure call in C++ (Linux/Win)

I want one application on a linux host to call procedures from applications on Win7x64 in a VM. I guess that the VM runs on the linux host does not matter since it should use sockets. How do approach this plan? Are there any libraries for that in the internet?
Edit:
Well I took am look at all of them. XMLRPC seems to be some kind of predecessor of SOAP. Prototbuf seem to me like having the focus on easy serialization of large objects. So my decision falls on SOAP. But now I am searching for a nice-to-use c++ binding. I read a few pages of the apache axis2 manual but its everything else but nice-to-use, and anyway in C.
Another question in my specific case is: Isn't it easier to send just send some textual numbers plus parameters to identify the procedures to call and reply in textual form, as I know the signature of the called procedures?
Edit2:
As SOAP is just a standard and RPC is just curious, (Imagined something like function pointers over IP :D) ,imho none of them all is a solution. But ZeroMQ is indeed a progress.
There is no magic to RPC. I would suggest having a look at a combination of ZeroMQ and Google protobuf. ZeroMQ is a very easy to use messaging system (your communication layer). You would use the REQ/REP pattern. Google protobuf is used to describe and serialize/deserialize your messages. Both libraries are cross-platform and even cross language (ruby, python, c++, etc. etc.)
Have you taken a look at SOAP? It has pros and cons but may meet your needs.
The fact that your Windows box is VM shouldn't make a difference providing that it's up and running, and so long as the system you choose treats the servers as logically separate sockets normally would, your solution will remain flexible.
One decent alternative is XMLRPC, which runs on top of HTTP. It's simpler than SOAP, at least. I've used it for Java-Python communication, and it was not much code then, but don't know any C++ libraries to recommend, so I'll just say http://en.wikipedia.org/wiki/XML-RPC and its C++ section to that.
Edit after edited question: For a really simple solution, just use plain HTTP, mapping request path to function and giving parameters with GET or POST. Return value in HTTP response data as simply as you can: plain text for number or string, for more complex return value a binary blob if binary compatibility can be ensured (same CPU, same word size, plain struct, using compiler struct packing options if needed), or as json (or even XML, but then you are starting to reinvent SOAP/XMLRPC...).