Improving Socket Performance in Windows - c++

I am new to Network Communication methods. I just developed a very simple server/client connection using the procedure described in the Microsoft website:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms737889(v=vs.85).aspx
I am using the socket to transfer large amount of data (double numbers) between a FORTRAN program (client) and a C++ program(server). (In the FORTRAN, "USE IFWIN" provides most of the windows programming functions including the ones for defining clientsocket)
I would like to improve the performance of transferring data. Do you think using any library (like boost) can improve the performance for large amount of data? What exactly is the difference between the Microsoft procedure and using libraries like boost?
Any comment is appreciated

I think first you should determine if the performance of network is a problem for you application(s).

The easiest way to improve bulk data throughput across the network where you control both ends of communication is to compress it. I recommend zlib for this purpose. I am not sure what APIs/bindings are available in FORTRAN, but worst case you could implement the compression yourself using any of the well known, publicly available algorithms (Huffman encoding, etc.).

You could also try sending data in chunks.
So by the time you have read the next chunk, you would have been able to process the previous chunk.
like this: [Chunk-size;Chunk-Data] [Chunk-size;Chunk-Data]...

Related

RPC from C++ code to Common Lisp code

I have two codebases: one written in C++ and the other in Common Lisp. There is a particular functionality implemented in the Lisp codebase that I would like to access from my C++ code. I searched for Foreign Function Interfaces to call Lisp functions from C++, but couldn't seem to find any (I found FFIs for the other direction mostly). So I decided to implement some form of RPC that fits my requirements, which are:
both codes are going to run on the same machine, so extensibility to remote machine calls is not important.
the input from C++ is going to be a Lisp-style list, which is what the function from the Lisp code is going to take as input.
this call is going to be made 1000s of times per execution of the code, so performance per remote call is critical.
So far, I've learnt from various resources on the web that possible solutions are:
Sockets - set up an instance of the Lisp code that will listen for function calls from the C++ code, run the function on the given input, and return the result to the C++ code.
XML-RPC - set up an XML-RPC server on the Lisp side (which will be easy since I use Allegro Common Lisp, which provides an API that supports XML-RPC) and then use one of the many XML-RPC libraries for C++ to make the client-side call.
The pros and cons I see with these approaches seem to be the following:
Sockets are a low-level construct, so it looks like I would need to do most of the connection management, reading and parsing the data on the sockets, etc on my own.
XML-RPC seems to suit my needs much better, but I read that it always uses HTTP, and there is no way to use UNIX domain sockets. So, it feels like XML-RPC might be overkill for what I have in mind.
Does anyone have any experience in achieving some similar integration of codes? Are there significant differences in performance between sockets and XML-RPC for local RPC? Any advice on which approach might be better would be extremely helpful. Also, suggestions on a different technique to do this would also be appreciated.
EDIT: Here are a few more details on the shared functionality. There is a function f available in the Lisp code (which is complex enough to make reimplementation in C++ prohibitively expensive). It takes as input two lists L1 and L2. How I envision this happening is the following:
L1 and L2 is constructed in C++ and sent over to the Lisp side and waits for the results,
f is invoked on the Lisp side on inputs L1 and L2 and returns results back to the C++ side,
the C++ side takes in the results and continues with its computation.
The sizes of L1 and L2 are typically not big:
L1 is a list containing typically 100s of elements, each element being a list of atmost 3-4 atoms.
L2 is also a list containing < 10 elements, each element being a list of atmost 3-4 atoms.
So the total amount of data per RPC is probably a string of 100s/1000s of bytes. This call is made at the start of each while loop in my C++ code, so its hard to give concrete numbers on number of calls per second. But from my experiments, I can say that its typically done 10s-100s of times per second. f is not a numerical computation: its symbolic. If you're familiar with AI, its essentially doing symbolic unification in first-order logic. So it is free of side-effects.
If you look at some Common Lisp implementations, their FFIs allow calling Lisp from the C side. That's not remote, but local. Sometimes it makes sense to include Lisp directly, and not call it remotely.
Commercial Lisps like LispWorks or Allegro CL also can delivered shared libraries, which you can use from your application code.
For example define-foreign-callable allows a LispWorks function to be called.
Franz ACL can do it: http://www.franz.com/support/documentation/9.0/doc/foreign-functions.htm#lisp-from-c-1
Also something like ECL should be usable from the C side.
I've started working recently on a project that requires similar functionality. Here are some things I've researched so far with some commentary:
cl-mpi would in principle allow (albeit very low-level) direct inter-process communication, but encoding data is a nightmare! You have very uncomfortable design on C/C++ side (just very-very limited + there's no way around sending variable length arrays). And on the other side, the Lisp library is both dated and seems to be at the very early stage in its development.
Apache Trift which is more of a language, then a program. Slow, memory hog. Protobuf, BSON are the same. Protobuf might be the most efficient in this group, but you'd need to roll your own communication solution, it's only the encoding/decoding protocol.
XML, JSON, S-expressions. S-expressions win in this category because they are more expressive and one side has already a very efficient parser. Alas, this is even worse then Trift / Protobuf in terms of speed / memory.
CFFI. Sigh... Managing pointers on both sides will be a nightmare. It is possible in theory, but must be very difficult in practice. This will also inevitably tax the performance of Lisp garbage collector, because you would have to get in its way.
Finally, I switched to ECL. So far so good. I'm researching mmaped files as means of sharing data. The conclusion that I've made so far for myself, this will be the way to go. At least I can't think of anything better at the moment.
There are many other ways to make two processes communicate. You could read the inter-process communication wikipage.
One of the parameters is asynchronous or synchronous character. Is your remote processing a remote procedure call (every request from client has exactly one response from server) or is it an asynchronous message passing (both sides are sending messages, but there is no notion of request and response; each side handle incoming messages as events).
The other parameter is the latency and bandwidth i.e. the volume of data exchanged (per message and e.g. per second).
Bandwidth does matter, even on the same machine. Of course, pipes or Unix sockets give you a very big bandwidth, eg 100 Megabytes/second. But there are scenarii where that might not be enough. In that pipe case, the data is usually copied (often twice) from memory to memory (e.g. from one process address space to another one).
But you might consider e.g. CORBA (see e.g. CLORB on the lisp side, and this tutorial on OmniORB), or RPC/XDR, or XML-RPC (with S-XML-RPC on the lisp side), or JSON-RPC etc...
If you don't have a lot of data and a lot of bandwidth (or a many requests or messages per second), I would suggest using a textual protocol (perhaps serializing with JSON or YAML or XML) because it is easier than a binary protocol (BSON, protobuf, etc...)
The socket layer (which could use unix(7) AF_UNIX sockets, plain anonymous or named pipe(7)-s, or tcp(7) i.e. TCP/IP, which has the advantage of giving you the ability to distribute the computation on two machines communicating by a network) is probably the simplest, as soon as you have on both (C++ and Lisp) sides a multiplexing syscall like poll(2). You need to buffer messages on both sides.
Maybe you want MPI (with CL-MPI on the lisp side).
We can't help you more, unless you explain really well and much more in the details what is the "functionality" to be shared from C++ to Lisp (what is it doing, how many remote calls per second, what volume and kind of data, what computation time, etc etc....). Is the remote function call idempotent or nullipotent, does it have side-effects? Is it a stateless protocol...
The actual data types involved in the remote procedure call matters a lot: it is much more costly to serialize a complex [mathematical] cyclic graph with shared nodes than a plain human readable string....
Given your latest details, I would suggest using JSON... It is quite fit to transmit abstract syntax tree like data. Alternatively, transmit just s-expressions (you may be left with the small issue in C++ to parse them, which is really easy once you specified and documented your conventions; if your leaf or symbolic names have arbitrary characters, you just need to define a convention to encode them.).

Boost.Asio with google protocol buffers

I've currently investigating ways of improving our current c++ network hand-made serialization mechanism maintaining our existing binary protocol.
The first approach taken was to code it using Boost.Asio with Boost.Serialisation using binary serialization. Anyway it turned up that it's somewhat slower (10%) that our current hand-made implementation. Anyone has actual _real_work_ experience about using google protobuf together with Boost.Asio ?
I searched google for samples but was only able to come-up with this example:
Boost Asio with google protocol buffers sample
Does anybody did this in any actual project ? I've very interested performance figures since this has to be quite fast...
We use boost::asio and Protobuf for complex, low message rate protocols. For simple, high message rate protocols we do boost::asio and custom serialization.
The C++ Protobuf library uses std::string to represent the string fields for messages that it deserializes, which means a free store allocation is performed by Protobuf for every string field in every message you receive. That makes Protobuf not very performant for really high frequency messaging.
I would use Protobuf everywhere if I could, though. It's a marvelous tool for making rich, complex, platform independent, forward-and-backward-compatible protocols.
ADDENDUM
Since it seems like people are reading this answer, I should share that I've learned that in C++ Protobuf you can re-use deserialization message objects to reduce the malloc frequency when reading.
See Optimization Tips:
https://developers.google.com/protocol-buffers/docs/cpptutorial

Realtime TCP/IP stack

I want to program (as efficiently as possible) a TCP/IP communication stack in C or C++. It really must run as fast as possible.
Does anyone have a good example or suggestion of where to start?
This is not meant as an insult, the guys who have developed the stacks for the well established operating systems have been doing this for years. This is what they do, unless you are in the business, I suggest you look at a different approach.
Different approach being, pick a stack that has decent performance (I hear that the latest tcp/ip stack in Solaris is nifty), then tune the hell out of it (there are lots of different flags and settings you can tune). If that fails to meet your needs, consider hardware solutions such as tcp offloading etc.
Writing your own stack, means you have to be confident enough to know that you can beat maybe 1000s of man years worth of effort in this field.
If this is for self development and learning, I suggest something simple like the source code for minix, it may have a simple to understand stack.
m2c.
This is a huge task. I would recommend the Contiki operating system as a possible starting point. It has a TCP/IP stack.
As Steve points out in the comments you do need quite a bit of experience to do this well. So rather than jumping directly to your end goal I recommend these possible steps:
Write a reliable transport using UDP as a normal user-land protocol.
Write a custom protocol using raw sockets in user-land.
Write a kernel level protocol module/driver
Write your stack on a FPGA network card
Linux is a good option as the details you need are easily accessible and documented.
And oh yeah, stop as soon as you realize you won't likely outperform the Linux kernel.
This may be worth looking at:
Implementing a High Performance Object Oriented TCP/IP Protocol Stack
Thesis for the Degree of Master of Science Peter Kjellerstedt and
Henrik Baard
lwip - A Lightweight TCPIP stack it's best to start learning about TCP/IP Stack
git clone git://git.savannah.nongnu.org/lwip.git

Using C++ for backend calculations in a web app

I'm running a PHP front end to an application that does a lot of work with data and uses Cassandra as a data store.
However I know PHP will not give me the performance I need for some of the calculations (as well as the management for the sheer amount of data that needs to be in memory)
I'd like to write the backed stuff in C++ and access it from the PHP application. I'm trying to figure out the best way to interface the two.
Some options I've looked at:
Thrift (A natural choice since I'm already using it for Cassandra)
Google's Protocol Buffers
gSOAP
Apache Axis
The above are only things I looked at, I'm not limiting myself.
The data being transferred to the PHP application is very small, so streaming is not required. Only results of calculations are transferred.
What do you guys think?
If I were you I'd use thrift, no sense pulling in another RPC framework. Go with what you have and already know. Thrift makes it so easy (so does google protocol buffers, but you don't really need two different mechanisms)
Are you limiting yourself to having C++ as a separate application? Have you considered interfacing it with the PHP directly? (i.e. link a C++ extension into your PHP application).
I'm not saying the second approach is necessarily better than the first, but you should consider it anyway, because it offers some different tradeoff choices. For example, the latency of passing stuff between the PHP and C++ would surely be higher when the two are separate applications than when they're the same application dynamically linked.
More details about how much data your computations will need would be useful. Thrift does seem like a reasonable choice. You could use it between PHP, your computation node, and the Cassandra backend. If your result is small, your RPC transport between PHP and the computation node won't make too much difference.

Flash memory data format [duplicate]

This question already has answers here:
What free tiniest flash file system could you advice for embedded system?
(2 answers)
Closed 2 years ago.
I'm looking for a storage library for storing data in flash memory in an embedded system. I'm on the verge of writing a custom one for want of a format with the right mix of features and simplicity.
Ideally it would be a format and C/C++ library with something better than storing raw structures, but less complex than a full blown file system. I need to store multiple data structures some of which are optional and may change format from time to time.
Nice to haves would be simple wear leveling / journaling schemes and data redundancy/reliability features. The simple journaling is because most low level flash chips I'm working with are happiest when you write from one end to another and start over at the top. Data redundancy/reliability could be use and checking of parity bits or complete extra copies.
Any suggestions?
JFFS2 is an obvious candidate. I have used it extensibly with MIPS and SuperH guys, but only with NAND. It gives great results in wear leveling and performance. Not, it is a full-blown file-system which doesn't seem to be what you describe, but honestly, I don't think you'll find a single solution for what you want. But it might be the simeplest solution: JFFS2 + {SQLite|Protobuf|Berkeley DB}
I do hope that I'm wrong and you find one. :-)
Like Robert and Mtr I can recommend the FatFs Generic File System Module.
I am using it on an Cortex-M3 with 3 logical devices (USB, SD-Card and external Flash).
Especially the f_mkfs was very handy to get the FileSystem to the external Flash.
The "only" thing I had to code my self were the low level disk I/O functions.
If you do not need all functionality provided by the FatFs module, reducing the module size is pretty easy using the config.h (can't remember the name :D).
Edit: I chose FAT as it can be used by Win & Linux...