There are 2 c++ applications where one application let say A is reading from an interface device and does some processing and need to provide the data in certain format to an application B.
I feel this can be done in 2 ways as mentioned below -
1. I serialize the data structure in app A and write it to a socket.
2. I inject the packet to an interface.
Please help to evaluate which option would be faster. Or if there is another way to do it faster.
I'm not sure what you mean by "I inject the packet to an interface."
Anyway, if your 2 applications are or could be on separate machines, go for the socket solution.
If on the same machine, you can implement some type of interprocess communication. I recommend you to use Boost for this: http://www.boost.org/doc/libs/1_56_0/doc/html/interprocess.html
As far as performance is concern, ideally you want to perform some tests to find out which work better in your scenario. Also, if you're already familiar with sockets, it may be simpler to use them.
Related
I have two codebases: one written in C++ and the other in Common Lisp. There is a particular functionality implemented in the Lisp codebase that I would like to access from my C++ code. I searched for Foreign Function Interfaces to call Lisp functions from C++, but couldn't seem to find any (I found FFIs for the other direction mostly). So I decided to implement some form of RPC that fits my requirements, which are:
both codes are going to run on the same machine, so extensibility to remote machine calls is not important.
the input from C++ is going to be a Lisp-style list, which is what the function from the Lisp code is going to take as input.
this call is going to be made 1000s of times per execution of the code, so performance per remote call is critical.
So far, I've learnt from various resources on the web that possible solutions are:
Sockets - set up an instance of the Lisp code that will listen for function calls from the C++ code, run the function on the given input, and return the result to the C++ code.
XML-RPC - set up an XML-RPC server on the Lisp side (which will be easy since I use Allegro Common Lisp, which provides an API that supports XML-RPC) and then use one of the many XML-RPC libraries for C++ to make the client-side call.
The pros and cons I see with these approaches seem to be the following:
Sockets are a low-level construct, so it looks like I would need to do most of the connection management, reading and parsing the data on the sockets, etc on my own.
XML-RPC seems to suit my needs much better, but I read that it always uses HTTP, and there is no way to use UNIX domain sockets. So, it feels like XML-RPC might be overkill for what I have in mind.
Does anyone have any experience in achieving some similar integration of codes? Are there significant differences in performance between sockets and XML-RPC for local RPC? Any advice on which approach might be better would be extremely helpful. Also, suggestions on a different technique to do this would also be appreciated.
EDIT: Here are a few more details on the shared functionality. There is a function f available in the Lisp code (which is complex enough to make reimplementation in C++ prohibitively expensive). It takes as input two lists L1 and L2. How I envision this happening is the following:
L1 and L2 is constructed in C++ and sent over to the Lisp side and waits for the results,
f is invoked on the Lisp side on inputs L1 and L2 and returns results back to the C++ side,
the C++ side takes in the results and continues with its computation.
The sizes of L1 and L2 are typically not big:
L1 is a list containing typically 100s of elements, each element being a list of atmost 3-4 atoms.
L2 is also a list containing < 10 elements, each element being a list of atmost 3-4 atoms.
So the total amount of data per RPC is probably a string of 100s/1000s of bytes. This call is made at the start of each while loop in my C++ code, so its hard to give concrete numbers on number of calls per second. But from my experiments, I can say that its typically done 10s-100s of times per second. f is not a numerical computation: its symbolic. If you're familiar with AI, its essentially doing symbolic unification in first-order logic. So it is free of side-effects.
If you look at some Common Lisp implementations, their FFIs allow calling Lisp from the C side. That's not remote, but local. Sometimes it makes sense to include Lisp directly, and not call it remotely.
Commercial Lisps like LispWorks or Allegro CL also can delivered shared libraries, which you can use from your application code.
For example define-foreign-callable allows a LispWorks function to be called.
Franz ACL can do it: http://www.franz.com/support/documentation/9.0/doc/foreign-functions.htm#lisp-from-c-1
Also something like ECL should be usable from the C side.
I've started working recently on a project that requires similar functionality. Here are some things I've researched so far with some commentary:
cl-mpi would in principle allow (albeit very low-level) direct inter-process communication, but encoding data is a nightmare! You have very uncomfortable design on C/C++ side (just very-very limited + there's no way around sending variable length arrays). And on the other side, the Lisp library is both dated and seems to be at the very early stage in its development.
Apache Trift which is more of a language, then a program. Slow, memory hog. Protobuf, BSON are the same. Protobuf might be the most efficient in this group, but you'd need to roll your own communication solution, it's only the encoding/decoding protocol.
XML, JSON, S-expressions. S-expressions win in this category because they are more expressive and one side has already a very efficient parser. Alas, this is even worse then Trift / Protobuf in terms of speed / memory.
CFFI. Sigh... Managing pointers on both sides will be a nightmare. It is possible in theory, but must be very difficult in practice. This will also inevitably tax the performance of Lisp garbage collector, because you would have to get in its way.
Finally, I switched to ECL. So far so good. I'm researching mmaped files as means of sharing data. The conclusion that I've made so far for myself, this will be the way to go. At least I can't think of anything better at the moment.
There are many other ways to make two processes communicate. You could read the inter-process communication wikipage.
One of the parameters is asynchronous or synchronous character. Is your remote processing a remote procedure call (every request from client has exactly one response from server) or is it an asynchronous message passing (both sides are sending messages, but there is no notion of request and response; each side handle incoming messages as events).
The other parameter is the latency and bandwidth i.e. the volume of data exchanged (per message and e.g. per second).
Bandwidth does matter, even on the same machine. Of course, pipes or Unix sockets give you a very big bandwidth, eg 100 Megabytes/second. But there are scenarii where that might not be enough. In that pipe case, the data is usually copied (often twice) from memory to memory (e.g. from one process address space to another one).
But you might consider e.g. CORBA (see e.g. CLORB on the lisp side, and this tutorial on OmniORB), or RPC/XDR, or XML-RPC (with S-XML-RPC on the lisp side), or JSON-RPC etc...
If you don't have a lot of data and a lot of bandwidth (or a many requests or messages per second), I would suggest using a textual protocol (perhaps serializing with JSON or YAML or XML) because it is easier than a binary protocol (BSON, protobuf, etc...)
The socket layer (which could use unix(7) AF_UNIX sockets, plain anonymous or named pipe(7)-s, or tcp(7) i.e. TCP/IP, which has the advantage of giving you the ability to distribute the computation on two machines communicating by a network) is probably the simplest, as soon as you have on both (C++ and Lisp) sides a multiplexing syscall like poll(2). You need to buffer messages on both sides.
Maybe you want MPI (with CL-MPI on the lisp side).
We can't help you more, unless you explain really well and much more in the details what is the "functionality" to be shared from C++ to Lisp (what is it doing, how many remote calls per second, what volume and kind of data, what computation time, etc etc....). Is the remote function call idempotent or nullipotent, does it have side-effects? Is it a stateless protocol...
The actual data types involved in the remote procedure call matters a lot: it is much more costly to serialize a complex [mathematical] cyclic graph with shared nodes than a plain human readable string....
Given your latest details, I would suggest using JSON... It is quite fit to transmit abstract syntax tree like data. Alternatively, transmit just s-expressions (you may be left with the small issue in C++ to parse them, which is really easy once you specified and documented your conventions; if your leaf or symbolic names have arbitrary characters, you just need to define a convention to encode them.).
Background:
I want to create an automation framework in C++ where on the one hand "sensors" and "actors" and on the other "logic engines" can be connected to a "core".
The "sensors" and "actors" might be connected to the machine running the "core", but some might also be accessible via a field bus or via normal computer network. Some might work continuous or periodically (e.g. every 100 milliseconds a new value), others might work event driven (e.g. only when a switch is [de]activated a message will come with the new state).
The "logic engine" would be sort of pluggable into the core and e.g. consist out of embedded well known script languages (Perl, Python, Lua, ...). There will run different little scripts from the users that can subscribe to "sensors" and write to "actors".
The "core" would route the sensor/actor informations to the subscribed scripts and call them. Some just after the event occurred, others periodically as defined in a scheduler.
Additional requirements:
The systems ("server") running this automation application might also be quite
small (500MHz x86 and 256 MB RAM) or if possible even tiny (OpenWRT
based router) as power consumption is an issue
=> efficiency is important
=> multicore support not for the moment, but I'm sure it'll become important soon - so the design has to support it
Some sort of fail save mode has to be possible, e.g. two systems monitoring each other
application / framework will be GPL => all used libraries have to be compatible
the server would run Linux, but cross platform would be nice
The big question:
What is the best architecture for such a kind of application / framework?
My reasoning:
Not to reinvent the wheel I was wondering to use MPI to do all the event handling.
This would allow me to focus on the relevant stuff and not on the message handling, especially when two or more "servers" would work together (watchdog for each other as well as each having a few sensors and actors connected). Each sensor and actor handler as well as the logic engines themself would only be required to implement a predefined MPI based interface and thus be crash save. The core could restart each when it's not responsive anymore.
The additional questions:
Would that be even possible with MPI? (It'd be used a bit out of context...)
Would the overhead of MPI be too big? Should I just write it myself using sockets and threads?
Are there other libraries possible that are better suited in this case?
You should be able to construct your system using MPI, but I think MPI is too much focused on high performance computing. Moreover, since it was designed for C, it does not really fit the object oriented way of programming very much. IMO there are other approaches better suited for your needs:
Boost ASIO might be a good fit for designing your system. It includes both network functionality and helps at event-driven programming (which could be a good way to design your system). You can have a look at Think-Async webpage for some examples on using ASIO for event-driven programming.
You could also use plain threads and borrow the network capabilities from ASIO (without using the event-driven programming parts). If you can use C++11, then you can directly use std::thread and all the other functionality available (mutex, conditional variables, futures, etc.). If you cannot use C++11, you can always use Boost Thread.
Finally, if you really want to go for MPI, you can have a look at Boost MPI. At least you will have a much more C++ friendly way of using MPI.
I am working on a project which involves several C++ programs that each take input and generate output. The data (tens to hundreds of bytes, probably JSON) essentially flows (asynchronously) in one direction, and the programs will need to be located on different Linux computers around the LAN.
Since the data flows in only one direction, I don't believe I need a transactional model like HTTP. I think a message queue model (fire and forget) makes the most sense and should simplify the logic of each program. It is probably sufficient to merely note that the message was added to the remote queue successfully.
What I am looking for are recommendations for how to implement this message queue in C or C++. It seems like POSIX and Boost message queues are limited to a single host, and RabbitMQ seems to have weak C/C++ support, and MQ4CPP seems inadequately supported for a business-critical role. Am I wrong about this? What about Boost ASIO or ACE or writing socket code myself? I look forward to your suggestions.
In terms of simple messaging support, ZeroMQ is hard to beat. It's available in many language bindings and supports everything from simple send and receive to pub/sub, fanout, or even a messaging pipeline. The code is also easy to digest and makes it pretty easy to switch between patterns.
Looking at their Weather Update Server sample (in 20 some odd languages) shows how easy it can be to create publish/subscribe setups:
zmq::context_t context (1);
zmq::socket_t publisher (context, ZMQ_PUB);
publisher.bind("tcp://*:5556");
publisher.bind("ipc://weather.ipc");
while(1) {
// Send message to all subscribers
zmq::message_t message(20);
snprintf ((char *) message.data(), 20 ,
"%05d %d %d", zipcode, temperature, relhumidity);
publisher.send(message);
}
I've used it on some mixed C# and Python processes without much hassle.
Personally, if I understand the question, I think that you should use a lower-level TCP connection. It has all of the guarantied delivery that you want, and has a rather good Berkley Sockets API.
I've found that if your willing to implement a very simple protocol (eg. four-byte NBO message length, n bytes of data), you can get very simple, very customizable, and very simple. If you go with this, you also (as mentioned) get great C support (which means C++ support, although things aren't in classes and methods). The socket code is also very easy, and they have asynchronous IO with the standard async flags for the Linux/UNIX/POSIX IO functions (thats one of the other benefits, if you know anything about POSIX programing, you basically know the socket API).
One of the best resources for learning the socket API are:
Beej's Guide to Network Programing: http://beej.us/guide/bgnet/, this is very good if you need the overall programming model in addition to specifics
Man Pages: If you just need function signatures, return values, and arguments, these are all you need. I find the Linux ones to be very well written and useful (Proof: Look at my console: man, man, man, man, man, make, man, ...)
Also, for making data network-sendable, if your data is JSON, you have no worries. Because JSON is just ASCII (or UTF-8), it can be sent raw over the network with only a length header. Unless your trying to send something complicated in binary, this should be perfect (if you need complicated in binary, either look at serialization or prepare for a lot of Segmentation Fault).
Also, you probably, if you go the socket path, want to use TCP. Although UDP will give you the one-way aspect, the fact that making it reliable is pitting your home-baked solution against the top-of-the-line TCP given by the Linux kernel, TCP is an obvious option.
RabbitMQ is just one implementation of AMQP. You might want to investigate Apache Qpid or other variants that might be more C/C++ friendly. There is a libamqp for C though I have no first hand experience with it. I don't know exactly what your requirements are but AMQP, properly implemented, is industrial strength and should be orders of magnitude faster and more stable than anything you are going to build by hand in a short amount of time.
I am using Boost Serialization and socket sending for a similar application. You can find an example of serialization here :
http://code.google.com/p/cloudobserver/wiki/TutoriaslBoostSerialization
And on this page:
http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/examples.html
under serialization you will find examples on how to make servers and clients. Make one server on a particular port and you can generate multiple clients on multiple computers which can communicate with that port.
The downside to using boost serialization is that it has a large overhead if you have a simple data structure to be serialized but it does make it easy.
Another recommendation is the distributed framework OpenCL. The document The OpenCL C++ Wrapper for API provides further information on the library. In particular, the API function cl::CommandQueue could be of interest for creating queues on devices within a network setup.
Another messaging solution is ICE (http://www.zeroc.com/). It is multi-platform, multi-language. It uses more of an RPC approach.
Im searching for a RPC library that would allow me to call a memberfunction of an object in another Process (on Windows).
The problem im currently encountering is that some of the Serverside objects already exist and have more than one instance. The Server should be able to pass a pointer/identifier to the client which implements a proxy that then directs the calls to the remote objects instance. So what i basically want is something like this:
Client:
TestProxy test = RemoteTestManager.GetTestById(123);
test.echo("bla");
where the instance of Test already exists on the Server and the RemoteTestManager is a manager class on the server that the client obtained in another rpc call. Also it should preferably run over named pipes as there can be multiple servers on the same machine ( actually i want more like an easy IPC :D ).
So my question actually is: Is there something like this for C++ out there or do i have to code one myself
In terms of low-level serializing the messages across the network Protocol Buffers is a common choice...
http://code.google.com/p/protobuf/
For a more complete RPC stack take a look at Apache Thrift...
http://thrift.apache.org/
How about COM? Seems to fit your requirements perfectly.
You might have already found the solution. Just for the reference of other, I have created a library that match what you asked here. Take a look at CppRemote library. This library has features below that match your descriptions:
get pointer to objects at server by name (std::string).
bind existing object (non-intrusive) at server and then get a proxy to that object from client.
server can bind to more than one instance of existing object.
it has named pipe transport.
lightweight and easy to use.
server code
Test test1, test2;
remote::server svr;
svr.bind<itest>(&test1, "test1");
svr.bind<itest>(&test2, "test2");
svr.start(remote::make_basic_binding<text_serializer, named_pipe_transport>("pid"));
...
client code
remote::session client;
client.start(remote::make_basic_binding<text_serializer, named_pipe_transport>("pid"));
auto test1 = client.get<itest>("test1");
auto test2 = client.get<itest>("test2");
test1->echo("bla");
test2->echo("bla");
ZeroMQ is possibly the best IPC system out at the moment and allows for quite a varied combination of client/server topologies. And its really fast and efficient too.
How you access the server objects depends how they're implemented, CORBA had this facility, but I wouldn't try to use CORBA nowadays (or then TBH). A lot of RPC systems allow you to create objects as needed, or to connect to a single instance. Connecting to a object that is created for you, and kept for each call during that session (ie an object created for each client and kept alive) is still reasonably common. A pool of objects is also reasonably common too. However, you have to manage the lifetime of these server objects, and I can't really advise as you havn't said how yours are managed.
I doubt you want named pipes, stick to tcp/ip connections - connecting to localhost is a very lightweight operation (COM practically is a zero overhead in this configuration).
There are candidates on top of the list. But it depends on problem space. A quick look, Capnproto, by kenton varda, maybe a fit. CORBA is a bit old but used in many systems and frameworks such as ACE. One of the issues is PassByCopy of capability references which in capnproto PassByReference and PassByConstruction also provided. COM system also got some problems which needs it's own discussion. ZeroMQ is really cool which I caught a cold once. And it does not support RPC which means you have to implement it on level zero messaging. Also Google protobuf, kenton varda, could be a choice if you are not looking for features such as capabilities security, promise pipelining and other nice features provided by capnproto. I think you better give it a try and experiment yourself.
As a reminder, RPC is not only about remote object invocation. Areas of concern such as adequate level of abstraction and composition, pipelining, message passing, lambda calculus, capability security, ... are the important ones which have to be paid close attention. So, the better solution is finding the efficient and elegant one to your problem space.
Hope to be assistfull.
Bests,
Omid
I'm running a PHP front end to an application that does a lot of work with data and uses Cassandra as a data store.
However I know PHP will not give me the performance I need for some of the calculations (as well as the management for the sheer amount of data that needs to be in memory)
I'd like to write the backed stuff in C++ and access it from the PHP application. I'm trying to figure out the best way to interface the two.
Some options I've looked at:
Thrift (A natural choice since I'm already using it for Cassandra)
Google's Protocol Buffers
gSOAP
Apache Axis
The above are only things I looked at, I'm not limiting myself.
The data being transferred to the PHP application is very small, so streaming is not required. Only results of calculations are transferred.
What do you guys think?
If I were you I'd use thrift, no sense pulling in another RPC framework. Go with what you have and already know. Thrift makes it so easy (so does google protocol buffers, but you don't really need two different mechanisms)
Are you limiting yourself to having C++ as a separate application? Have you considered interfacing it with the PHP directly? (i.e. link a C++ extension into your PHP application).
I'm not saying the second approach is necessarily better than the first, but you should consider it anyway, because it offers some different tradeoff choices. For example, the latency of passing stuff between the PHP and C++ would surely be higher when the two are separate applications than when they're the same application dynamically linked.
More details about how much data your computations will need would be useful. Thrift does seem like a reasonable choice. You could use it between PHP, your computation node, and the Cassandra backend. If your result is small, your RPC transport between PHP and the computation node won't make too much difference.