Related
I want to send raw ints with boost.asio compatibly with any CPU architecture. Usually, I would convert ints to strings, but I may be able to get better performance by skipping the int/ascii conversion. I don't know what boost.asio already does under the hood such as using htonl. The documentation doesn't say and there is no way to test this code on my own PC.
Here are several methods I have to send an array of ints over the network:
Store the ints in an array of int32_t. Use htonl and ntohl to get correct endian-ness. If int32_t is not supported on the target machine, handle that with int_fast32_t and extra work.
Use boost.serialization. The only example I can find is from pre-C++11 times (see section on serialization), so I don't know if boost.serialization and boost.asio should still be used in this way. Also, as boost.serialization is a black box, it isn't clear what the performance overhead of using this library is.
Convert each int to the ascii representation.
I think (1) is the "correct" alternative to non-ascii conversion. Can boost.asio or boost.serialization perform any of the steps of (1) for me? If not, what is the current recommended way to send ints with boost.asio?
The other answer by #bazza has good professional advice. I won't repeat that. I get the impression you're more focusing on understanding the implementation specifics here, so I'll dive into the details of that for you here:
Yeah, option 1 seems okay for the simple use case you describe.
I don't know what boost.asio already does under the hood such as using htonl
It doesn't except perhaps in the protocol layers - but those are implementation details.
The documentation doesn't say
That's because it doesn't have an opinion on your application layer. The fact that it isn't mentioned is a good sign that no magic happens.
Can boost.asio or boost.serialization perform any of the steps of (1) for me?
Actually, no. The Boost binary archive is not portable (see also EOS Portable Archive).
Ironically, it can portably use one of the text archive formats https://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/archives.html#archive_models (XML and plain text are provided), but as you surmised they can have considerable overhead: Boost C++ Serialization overhead. Besides, there will be copying into and out of the stream format.
If not, what is the current recommended way to send ints with boost.asio?
I'd consider using the raw array/vector:
std::array<int32_t, 45> data;
boost::asio::async_write(socket_,
boost::asio::buffer(data),
...);
Of course you have to cater for endianness. Boost Endian has you covered in several ways. Compare the performance trade-offs here: https://www.boost.org/doc/libs/1_80_0/libs/endian/doc/html/endian.html#overview_performance
The simplest idea here is to use the drop-in (un)aligned arithmetic types:
std::array<boost::endian::big_int32_t, 45> data;
boost::asio::async_write(socket_,
boost::asio::buffer(data),
...);
To make the example a bit more life-like:
struct Message {
boost::endian::big_uint32_t magic = 0xCAFEBABE;
boost::endian::big_uint32_t length = 0;
std::vector<boost::endian::big_uint32_t> data;
std::array<boost::asio::const_buffer, 3> as_buffers() const {
length = data.length(); // update leading length header field
return {
boost::asio::buffer(&magic, sizeof(magic)),
boost::asio::buffer(&length, sizeof(length)),
boost::asio::buffer(data),
};
}
};
Now you can still:
Message message;
message.data = {1,2,3,4,5,6,7,8};
boost::asio::async_write(socket_,
message.as_buffers(),
...);
A lot will depend on what your long term needs are, i.e. what do you mean by "compatibly".
If the sending and receiving ends are guaranteed to be written in C++ then you can follow the "code first" approach. This is your options 1, 2) or 3) because (in this circumstance) you have decided that you can re-use the code. Though I would encourage using Boost serialisation, option 2), as ultimately that does a lot of the work for you (even if it might not feel like it for a simple case!).
However, if you think that, one day, a sending or receiving end might have to be written in a different language, then I would strongly encourage a "schema first" approach. This means using something like Google Protocol Buffers, or ASN.1, or XSD (XML schema + a good code generator), Thrift, Cap'n Proto, Avro (though I think Avro can be a bit awkward in non-dynamic languages like C++). There are more. With this approach you start off with a schema to define your messages (an array of ints in your case), and compile that to a language of your choice. That way, sending and receiving ends can be written in different programming languages (the schema is compiled accordingly), and will be able to talk without you having to be aware of exactly how that's managed.
Sending Over a Network
I would also encourage use of ZeroMQ, as that is a very easy way to use a network connection in a program. Being message orientated, instead of stream orientated, makes life an awful lot easier. For example, a ZeroMQ "message" can be the byte array the serialisation tech has output in serialising an object.
Plus, ZeroMQ is available in almost any language. Google Protocol Buffers - a good schema-first serialisation technology - is also available in many languages. My personal preference - ASN.1 - is a more strict serialisation (it does "constraints"), but good tools for lots of languages cost money.
The combination of schema-first serialisation, + ZeroMQ, makes it pretty easy to convey messages problem-free between almost any combination of language, OS, and platform.
Some of these serialisation techs (e.g. Google Protocol Buffer, Thrift) provide an RPC-like framework too, which can also satisfy the "sending over a network" problem.
I'm looking into a mechanism for serialize data to be passed over a socket or shared-memory in a language-independent mechanism. I'm reluctant to use XML since this data is going to be very structured, and encoding/decoding speed is vital. Having a good C API that's liberally licensed is important, but ideally there should be support for a ton of other languages. I've looked at google's protocol buffers and ASN.1. Am I on the right track? Is there something better? Should I just implement my own packed structure and not look for some standard?
Given your requirements, I would go with Google Protocol Buffers. It sounds like it's ideally suited to your application.
You could consider XDR. It has an RFC. I've used it and never had any performance problems with it. It was used in ONC RPC and has an and comes with a tool called rpcgen. It is also easy to create a generator yourself when you just want to serialize data (which is what I ended up doing for portability reasons, took me half a day).
There is an open source C implementation, but it can already be in a system library, so you wouldn't need the sources.
ASN.1 always seemed a bit baroque to me, but depending on your actual needs might be more appropriate, since there are some limitations to XDR.
Just wanted to throw in ASN.1 into this mix. ASN.1 is a format standard, but there's libraries for most languages, and the C interface via asn1c is much cleaner than the C interface for protocol buffers.
JSON is really my favorite for this kind of stuff. I have no prior experience with binary stuff in it though. Please post your results if you are planning on using JSON!
Thrift is a binary format created by Facebook. Here's a comparison with google protocol buffers.
Check out Hessian
There is also Binary XML but it seems not stabilized yet. The article I link to gives a bunch of links which might be of interest.
Another option is SNAC/TLV which is used by AOL in it's Oscar/AIM protocol.
Also check out Muscle. While it does quite a bit, it serializes to a binary format.
Few Thing's you need to Consider
1. Storage
2. Encoding Style (1 byte 2 byte)
3. TLV standards
ASN.1 Parser is the good for binary represenations the best part is ASN.1 is a well-established technology that is widely used both within ITU-T and outside of it. The notation is supported by a number of software vendors.
I am new to Network Communication methods. I just developed a very simple server/client connection using the procedure described in the Microsoft website:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms737889(v=vs.85).aspx
I am using the socket to transfer large amount of data (double numbers) between a FORTRAN program (client) and a C++ program(server). (In the FORTRAN, "USE IFWIN" provides most of the windows programming functions including the ones for defining clientsocket)
I would like to improve the performance of transferring data. Do you think using any library (like boost) can improve the performance for large amount of data? What exactly is the difference between the Microsoft procedure and using libraries like boost?
Any comment is appreciated
I think first you should determine if the performance of network is a problem for you application(s).
The easiest way to improve bulk data throughput across the network where you control both ends of communication is to compress it. I recommend zlib for this purpose. I am not sure what APIs/bindings are available in FORTRAN, but worst case you could implement the compression yourself using any of the well known, publicly available algorithms (Huffman encoding, etc.).
You could also try sending data in chunks.
So by the time you have read the next chunk, you would have been able to process the previous chunk.
like this: [Chunk-size;Chunk-Data] [Chunk-size;Chunk-Data]...
I am working on a project which involves several C++ programs that each take input and generate output. The data (tens to hundreds of bytes, probably JSON) essentially flows (asynchronously) in one direction, and the programs will need to be located on different Linux computers around the LAN.
Since the data flows in only one direction, I don't believe I need a transactional model like HTTP. I think a message queue model (fire and forget) makes the most sense and should simplify the logic of each program. It is probably sufficient to merely note that the message was added to the remote queue successfully.
What I am looking for are recommendations for how to implement this message queue in C or C++. It seems like POSIX and Boost message queues are limited to a single host, and RabbitMQ seems to have weak C/C++ support, and MQ4CPP seems inadequately supported for a business-critical role. Am I wrong about this? What about Boost ASIO or ACE or writing socket code myself? I look forward to your suggestions.
In terms of simple messaging support, ZeroMQ is hard to beat. It's available in many language bindings and supports everything from simple send and receive to pub/sub, fanout, or even a messaging pipeline. The code is also easy to digest and makes it pretty easy to switch between patterns.
Looking at their Weather Update Server sample (in 20 some odd languages) shows how easy it can be to create publish/subscribe setups:
zmq::context_t context (1);
zmq::socket_t publisher (context, ZMQ_PUB);
publisher.bind("tcp://*:5556");
publisher.bind("ipc://weather.ipc");
while(1) {
// Send message to all subscribers
zmq::message_t message(20);
snprintf ((char *) message.data(), 20 ,
"%05d %d %d", zipcode, temperature, relhumidity);
publisher.send(message);
}
I've used it on some mixed C# and Python processes without much hassle.
Personally, if I understand the question, I think that you should use a lower-level TCP connection. It has all of the guarantied delivery that you want, and has a rather good Berkley Sockets API.
I've found that if your willing to implement a very simple protocol (eg. four-byte NBO message length, n bytes of data), you can get very simple, very customizable, and very simple. If you go with this, you also (as mentioned) get great C support (which means C++ support, although things aren't in classes and methods). The socket code is also very easy, and they have asynchronous IO with the standard async flags for the Linux/UNIX/POSIX IO functions (thats one of the other benefits, if you know anything about POSIX programing, you basically know the socket API).
One of the best resources for learning the socket API are:
Beej's Guide to Network Programing: http://beej.us/guide/bgnet/, this is very good if you need the overall programming model in addition to specifics
Man Pages: If you just need function signatures, return values, and arguments, these are all you need. I find the Linux ones to be very well written and useful (Proof: Look at my console: man, man, man, man, man, make, man, ...)
Also, for making data network-sendable, if your data is JSON, you have no worries. Because JSON is just ASCII (or UTF-8), it can be sent raw over the network with only a length header. Unless your trying to send something complicated in binary, this should be perfect (if you need complicated in binary, either look at serialization or prepare for a lot of Segmentation Fault).
Also, you probably, if you go the socket path, want to use TCP. Although UDP will give you the one-way aspect, the fact that making it reliable is pitting your home-baked solution against the top-of-the-line TCP given by the Linux kernel, TCP is an obvious option.
RabbitMQ is just one implementation of AMQP. You might want to investigate Apache Qpid or other variants that might be more C/C++ friendly. There is a libamqp for C though I have no first hand experience with it. I don't know exactly what your requirements are but AMQP, properly implemented, is industrial strength and should be orders of magnitude faster and more stable than anything you are going to build by hand in a short amount of time.
I am using Boost Serialization and socket sending for a similar application. You can find an example of serialization here :
http://code.google.com/p/cloudobserver/wiki/TutoriaslBoostSerialization
And on this page:
http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/examples.html
under serialization you will find examples on how to make servers and clients. Make one server on a particular port and you can generate multiple clients on multiple computers which can communicate with that port.
The downside to using boost serialization is that it has a large overhead if you have a simple data structure to be serialized but it does make it easy.
Another recommendation is the distributed framework OpenCL. The document The OpenCL C++ Wrapper for API provides further information on the library. In particular, the API function cl::CommandQueue could be of interest for creating queues on devices within a network setup.
Another messaging solution is ICE (http://www.zeroc.com/). It is multi-platform, multi-language. It uses more of an RPC approach.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Does anyone with experience with these libraries have any comment on which one they preferred? Were there any performance differences or difficulties in using?
I've been using Boost Serialization for a long time and just dug into protocol buffers, and I think they don't have the exact same purpose. BS (didn't see that coming) saves your C++ objects to a stream, whereas PB is an interchange format that you read to/from.
PB's datamodel is way simpler: you get all kinds of ints and floats, strings, arrays, basic structure and that's pretty much it. BS allows you to directly save all of your objects in one step.
That means with BS you get more data on the wire but you don't have to rebuild all of your objects structure, whereas protocol buffers is more compact but there is more work to be done after reading the archive. As the name says, one is for protocols (language-agnostic, space efficient data passing), the other is for serialization (no-brainer objects saving).
So what is more important to you: speed/space efficiency or clean code?
I've played around a little with both systems, nothing serious, just some simple hackish stuff, but I felt that there's a real difference in how you're supposed to use the libraries.
With boost::serialization, you write your own structs/classes first, and then add the archiving methods, but you're still left with some pretty "slim" classes, that can be used as data members, inherited, whatever.
With protocol buffers, the amount of code generated for even a simple structure is pretty substantial, and the structs and code that's generated is more meant for operating on, and that you use protocol buffers' functionality to transport data to and from your own internal structures.
There are a couple of additional concerns with boost.serialization that I'll add to the mix. Caveat: I don't have any direct experience with protocol buffers beyond skimming the docs.
Note that while I think boost, and boost.serialization, is great at what it does, I have come to the conclusion that the default archive formats it comes with are not a great choice for a wire format.
It's important to distinguish between versions of your class (as mentioned in other answers, boost.serialization has some support for data versioning) and compatibility between different versions of the serialization library.
Newer versions of boost.serialization may not generate archives that older versions can deserialize. (the reverse is not true: newer versions are always intended to deserialize archives made by older versions). This has led to the following problems for us:
Both our client & server software create serialized objects that the other consumes, so we can only move to a newer boost.serialization if we upgrade both client and server in lockstep. (This is quite a challenge in an environment where you don't have full control of your clients).
Boost comes bundled as one big library with shared parts, and both the serialization code and the other parts of the boost library (e.g. shared_ptr) may be in use in the same file, I can't upgrade any parts of boost because I can't upgrade boost.serialization. I'm not sure if it's possible/safe/sane to attempt to link multiple versions of boost into a single executable, or if we have the budget/energy to refactor out bits that need to remain on an older version of boost into a separate executable (DLL in our case).
The old version of boost we're stuck on doesn't support the latest version of the compiler we use, so we're stuck on an old version of the compiler too.
Google seem to actually publish the protocol buffers wire format, and Wikipedia describes them as forwards-compatible, backwards-compatible (although I think Wikipedia is referring to data versioning rather than protocol buffer library versioning). Whilst neither of these is a guarantee of forwards-compatibility, it seems like a stronger indication to me.
In summary, I would prefer a well-known, published wire format like protocol buffers when I don't have the ability to upgrade client & server in lockstep.
Footnote: shameless plug for a related answer by me.
Boost Serialisation
is a library for writing data into a stream.
does not compress data.
does not support data versioning automatically.
supports STL containers.
properties of data written depend on streams chosen (e.g. endian, compressed).
Protocol Buffers
generates code from interface description (supports C++, Python and Java by default. C, C# and others by 3rd party).
optionally compresses data.
handles data versioning automatically.
handles endian swapping between platforms.
does not support STL containers.
Boost serialisation is a library for converting an object into a serialised stream of data. Protocol Buffers do the same thing, but also do other work for you (like versioning and endian swapping). Boost serialisation is simpler for "small simple tasks". Protocol Buffers are probably better for "larger infrastructure".
EDIT:24-11-10: Added "automatically" to BS versioning.
I have no experience with boost serialization, but I have used protocol buffers. I like protocol buffers a lot. Keep the following in mind (I say this with no knowledge of boost).
Protocol buffers are very efficient so I don't imagine that being a serious issue vs. boost.
Protocol buffers provide an intermediate representation that works with other languages (Python and Java... and more in the works). If you know you're only using C++, maybe boost is better, but the option to use other languages is nice.
Protocol buffers are more like data containers... there is no object oriented nature, such as inheritance. Think about the structure of what you want to serialize.
Protocol buffers are flexible because you can add "optional" fields. This basically means you can change the structure of protocol buffer without breaking compatibility.
Hope this helps.
boost.serialization just needs the C++ compiler and gives you some syntax sugar like
serialize_obj >> archive;
// ...
unserialize_obj << archive;
for saving and loading. If C++ is the only language you use you should give boost.serialization a serious shot.
I took a fast look at google protocol buffers. From what I see I'd say its not directly comparable to boost.serialization. You have to add a compiler for the .proto files to your toolchain and maintain the .proto files itself. The API doesn't integrate into C++ as boost.serialization does.
boost.serialization does the job its designed for very well: to serialize C++ objects :)
OTOH an query-API like google protocol buffers has gives you more flexibility.
Since I only used boost.serialization so far I cannot comment on performance comparison.
Correction to above (guess this is that answer) about Boost Serialization :
It DOES allow supporting data versioning.
If you need compression - use a compressed stream.
Can handle endian swapping between platforms as encoding can be text, binary or XML.
I never implemented anything using boost's library, but I found Google protobuff's to be more thought-out, and the code is much cleaner and easier to read. I would suggest having a look at the various languages you want to use it with and have a read through the code and the documentation and make up your mind.
The one difficulty I had with protobufs was they named a very commonly used function in their generated code GetMessage(), which of course conflicts with the Win32 GetMessage macro.
I would still highly recommend protobufs. They're very useful.
I know that this is an older question now, but I thought I'd throw my 2 pence in!
With boost you get the opportunity to I'm write some data validation in your classes; this is good because the data definition and the checks for validity are all in one place.
With GPB the best you can do is to put comments in the .proto file and hope against all hope that whoever is using it reads it, pays attention to it, and implements the validity checks themselves.
Needless to say this is unlikely and unreliable if your relying on someone else at the other end of a network stream to do this with the same vigour as oneself. Plus if the constraints on validity change, multiple code changes need to be planned, coordinated and done.
Thus I consider GPB to be inappropriate for developments where there is little opportunity to regularly meet and talk with all team members.
==EDIT==
The kind of thing I mean is this:
message Foo
{
int32 bearing = 1;
}
Now who's to say what the valid range of bearing is? We can have
message Foo
{
int32 bearing = 1; // Valid between 0 and 359
}
But that depends on someone else reading this and writing code for it. For example, if you edit it and the constraint becomes:
message Foo
{
int32 bearing = 1; // Valid between -180 and +180
}
you are completely dependent on everyone who has used this .proto updating their code. That is unreliable and expensive.
At least with Boost serialisation you're distributing a single C++ class, and that can have data validity checks built right into it. If those constraints change, then no one else need do any work other than making sure they're using the same version of the source code as you.
Alternative
There is an alternative: ASN.1. This is ancient, but has some really, really, handy things:
Foo ::= SEQUENCE
{
bearing INTEGER (0..359)
}
Note the constraint. So whenever anyone consumes this .asn file, generates code, they end up with code that will automatically check that bearing is somewhere between 0 and 359. If you update the .asn file,
Foo ::= SEQUENCE
{
bearing INTEGER (-180..180)
}
all they need to do is recompile. No other code changes are required.
You can also do:
bearingMin INTEGER ::= 0
bearingMax INTEGER ::= 360
Foo ::= SEQUENCE
{
bearing INTEGER (bearingMin..<bearingMax)
}
Note the <. And also in most tools the bearingMin and bearingMax can appear as constants in the generated code. That's extremely useful.
Constraints can be quite elaborate:
Garr ::= INTEGER (0..10 | 25..32)
Look at Chapter 13 in this PDF; it's amazing what you can do;
Arrays can be constrained too:
Bar ::= SEQUENCE (SIZE(1..5)) OF Foo
Sna ::= SEQUENCE (SIZE(5)) OF Foo
Fee ::= SEQUENCE
{
boo SEQUENCE (SIZE(1..<6)) OF INTEGER (-180<..<180)
}
ASN.1 is old fashioned, but still actively developed, widely used (your mobile phone uses it a lot), and far more flexible than most other serialisation technologies. About the only deficiency that I can see is that there is no decent code generator for Python. If you're using C/C++, C#, Java, ADA then you are well served by a mixture of free (C/C++, ADA) and commercial (C/C++, C#, JAVA) tools.
I especially like the wide choice of binary and text based wireformats. This makes it extremely convenient in some projects. The wireformat list currently includes:
BER (binary)
PER (binary, aligned and unaligned. This is ultra bit efficient. For example, and INTEGER constrained between 0 and 15 will take up only 4 bits on the wire)
OER
DER (another binary)
XML (also XER)
JSON (brand new, tool support is still developing)
plus others.
Note the last two? Yes, you can define data structures in ASN.1, generate code, and emit / consume messages in XML and JSON. Not bad for a technology that started off back in the 1980s.
Versioning is done differently to GPB. You can allow for extensions:
Foo ::= SEQUENCE
{
bearing INTEGER (-180..180),
...
}
This means that at a later date I can add to Foo, and older systems that have this version can still work (but can only access the bearing field).
I rate ASN.1 very highly. It can be a pain to deal with (tools might cost money, the generated code isn't necessarily beautiful, etc). But the constraints are a truly fantastic feature that has saved me a whole ton of heart ache time and time again. Makes developers whinge a lot when the encoders / decoders report that they've generated duff data.
Other links:
Good intro
Open source C/C++ compiler
Open source compiler, does ADA too AFAIK
Commercial, good
Commercial, good
Try it yourself online
Observations
To share data:
Code first approaches (e.g. Boost serialisation) restrict you to the original language (e.g. C++), or force you to do a lot of extra work in another language
Schema first is better, but
A lot of these leave big gaps in the sharing contract (i.e. no constraints). GPB is annoying in this regard, because it is otherwise very good.
Some have constraints (e.g. XSD, JSON), but suffer patchy tool support.
For example, Microsoft's xsd.exe actively ignores constraints in xsd files (MS's excuse is truly feeble). XSD is good (from the constraints point of view), but if you cannot trust the other guy to use a good XSD tool that enforces them for him/her then the worth of XSD is diminished
JSON validators are ok, but they do nothing to help you form the JSON in the first place, and aren't automatically called. There's no guarantee that someone sending you JSON message have run it through a validator. You have to remember to validate it yourself.
ASN.1 tools all seem to implement the constraints checking.
So for me, ASN.1 does it. It's the one that is least likely to result in someone else making a mistake, because it's the one with the right features and where the tools all seemingly endeavour to fully implement those features, and it is language neutral enough for most purposes.
To be honest, if GPB added a constraints mechanism that'd be the winner. XSD is close but the tools are almost universally rubbish. If there were decent code generators of other languages, JSON schema would be pretty good.
If GPB had constraints added (note: this would not change any of the wire formats), that'd be the one I'd recommend to everyone for almost every purpose. Though ASN.1's uPER is very useful for radio links.
As with almost everything in engineering, my answer is... "it depends."
Both are well tested, vetted technologies. Both will take your data and turn it into something friendly for sending someplace. Both will probably be fast enough, and if you're really counting a byte here or there, you're probably not going to be happy with either (let's face it both created packets will be a small fraction of XML or JSON).
For me, it really comes down to workflow and whether or not you need something other than C++ on the other end.
If you want to figure out your message contents first and you're building a system from scratch, use Protocol Buffers. You can think of the message in an abstract way and then auto-generate the code in whatever language you want (3rd party plugins are available for just about everything). Also, I find collaboration simplified with Protocol Buffers. I just send over a .proto file and then the other team has a clear idea of what data is being transfered. I also don't impose anything on them. If they want to use Java, go ahead!
If I already have built a class in C++ (and this has happened more often than not) and I want to send that data over the wire now, Boost Serialization obviously makes a ton of sense (especially where I already have a Boost dependency somewhere else).
You can use boost serialization in tight conjunction with your "real" domain objects, and serialize the complete object hierarchy (inheritance). Protobuf does not support inheritance, so you will have to use aggregation. People argue that Protobuf should be used for DTOs (data transfer objects), and not for core domain objects themselves. I have used both boost::serialization and protobuf. The Performance of boost::serialization should be taken into account, cereal might be an alternative.