advantage of serialization over sockets c++ - c++

currently we have integrated networking into our game, using UDP protocol. It works fine. But we are sending strings over there network to the server. "10,10,23 - 23,9,10 - 9,23,23"
I came across that I need to serialize the data as this is the right way to do it? what are the benefits of it? does it reduces the performance? Or is sending string fine?

You're already serialising it.
I think what you're asking is whether it is beneficial to serialise to a compact, binary format rather than human-readable strings. The answer is yes, since you can reduce bandwidth requirements and parsing time.
Sometimes you can simply copy the bytes that make up your objects straight into the communications media, though watch out for endianness, padding, width, alignment and other implementation-defined quantities; generally you want to define a single, universal format for your data and some translation may be required on one or more endpoints in order to express the data interchange. That said, in most cases, that's still going to be cheaper than string parsing and stringisation.
The downside is you cannot snoop on the communications channel and immediately see with your eyes what's going on, when debugging your networking.

Related

How can I recv TCP socket data in one package without dividing

Since I create a TCP socket,it is fine when sending small amount data.no fragment. all data came in one package. but when data becomes bigger and bigger. TCP package has been divided into pieces.. it`s really annoying. Is there any option to set on socket, and the socket will automatically put pieces into one package for me ?
It's a byte stream. All the bytes will arrive correctly and in the right order, but not necessarily when you want them. If you need to send anything more complex than one byte, you need another protocol on top of TCP. That's why there are all those other TCP/IP protocols like HTTP, SMTP etc.
No there is not. There are even situations where you might receive 1 byte.
Consider using higher level messaging libraries like ZMQ. It handles all the message packing and unpacking for you.
TCP provides you reliable bi-directional byte stream. It takes care of sequencing, transport-layer packetization, retransmission, and flow-control. Decades of research went into optimizing its performance. Pretty nifty. The small price you pay for all this convenience is that you have to write and read the stream in a loop, watching for a complete application protocol message you can process when receiving, and flushing yet unbuffered bytes when sending.
Welcome to socket programming!
I'll chime in here and say that there's pretty much nothing you can do to solve you issue without adding extra dependencies on libraries which handle application protocols for you. There are some lower level message packing libraries (google's protocol buffers, among others) which may help.
It's probably the most beneficial to get used to reading and writing TCP data in a loop. It's proven and very portable.. even if you pay a small price in actually writing the streaming codecs yourself.
Try it a few times. It's a useful experience which you can re-use, and it's really not as difficult and annoying once you get the hang of it (like anything else, really).
Furthermore, it's fairly easy to unit-test (rather than dealing with esoteric libraries and uncommon protocols with badly/sparsely documented options)..
You can optimize sockets reads to return larger chunks, on platforms that support it, by setting low watermark using setsockopt() and SO_RECVLOWAT. But you will still have to handle the possibility of getting bytes less than the watermark.
I think you want SOCK_SEQPACKET (or possibly SOCK_RDM). See socket(2).

C++ IPC Communication

I am in dilema to make decision on the below scenarios. Kindly need experts help.
Scenario : There is TCP/IP communication between two process running in two boxes.
Communication Method 1 : Stream based communication on the socket. Where on the receiver side , he will receive the entire byte buffer and interpret first few fixed bytes as header and desrialize it and get to know the message length and start take message of that length and deserialize it and proceed to next message header like that goes on....
Communication Method2 : Put all the messages in a vector and vector will be residing in a class object. serialize the class object in one go and send to receiver. Receiver deserialize the class object and read the vector array one by one.
Please let me know which approach is efficient and if any other approach , please guide me.
Also pros and cons of class based data transmission and structure based data transmission and which is suitable for which scenario ?
Your question lacks some key details, and mixes different concerns, frustrating any attempt to provide a good answer.
Specifically, Method 2 mysteriously "serialises" and "deserialises" the object and contained vector without specifying any details of how that's being done. In practice, the details are of the kind alluded to in Method 1. So, 1 and 2 aren't alternatives unless you're choosing between using a serialisation library and doing it from scratch (in which case I'd say use the library as you're new to this and the library's more likely to get it right).
What I can say:
at a TCP level, it's most efficient to read into a decent sized block (given I tend to work on PC/server hardware, I'd just use 64k though smaller may be enough to get the same kind of throughput) and have each read() or recv() read as much data from the socket as possible
after reading enough bytes (in however many read/recvs) to attempt some interpretation of the data, it's necessary to recognise the end of particular parts of the serialised input: sometimes that's implicit in the data type involved, other times it's communicated using some sentinel (e.g. a linefeed or NUL), and other times there can be a prefixed fixed-size "expect N bytes" header. This aspect/consideration often applies hierarchically to the stream of objects and nested sub objects etc..
the TCP read/recvs may deliver more data than were sent in any single request, so you may have 1 or more bytes that are logically part of the subsequent but incomplete logical message at the end of the block assembled above
the process of reading larger blocks then accessing various fixed and variable sized elements inside the buffers is already supported by C++ iostreams, but you can roll your own if you want
So, let me emphasise this: do NOT assume you will receive any more than 1 byte from any given read of the socket: if you have say a 20 byte header you should loop reading until you hit either an error or have assembled all 20 bytes. Sending 20 bytes in a single write() or send() does not mean the 20 bytes will be presented to a single read() / recv(). TCP is a byte stream protocol, and you have to take arbitrary numbers of bytes as and when they're provided, waiting until you have enough data to interpret it. Similarly, be prepared to get more data than the client could write in a single write()/`send().
Also pros and cons of class based data transmission and structure based data transmission and which is suitable for which scenario ?
These terms are completely bogus. classes and structures are almost identical things in C++ - mechanisms for grouping data and related functions (they differ only in how they - by default - expose the base classes and data members to client code). Either can have or not have member functions or support code that helps serialise and deserialise the data. For example, the simplest and most typical support are operator<< and/or operator>> streaming functions.
If you want to contrast these kind of streaming functions with an ad-hoc "write a binary block, read a binary block" approach (perhaps from thinking of structs as being POD without support code), then I'd say prefer streaming functions where possible, starting with streaming to human-readable representations as they'll make your system easier and quicker to develop, debug and support. Once you're really comfortable with that, if the runtime performance requires it then optimise with a binary representation. If you write the serialisation code well, you won't notice much difference in performance between a cruder void*/#bytes model of data and proper per-member serialisation, but the latter can more easily support unusual cases - portability across systems with different size ints/longs etc., different byte ordering, intentional choices re shallow vs. deep copying of pointed to data etc....
I'd also recommend looking at the boost serialisation library. Even if you don't use it, it should give you a better understanding of how this kind of thing is reasonably implemented in C++.
Both methods are equivalent. In both you must send a header with message size and identifier in order to deserialize. If you assume that first option is composed by a serialized 'class' like a normal message, you must implement the same 'code'.
Another thing you must have in mind is message's size in order to full TCP buffers to optimize communications. If your 1st method messages are so little, try to improve the communication ratio with bigger messages like in 2nd option you describe.
Keep in mind that it's not safe simply streaming out a struct or class directly by interpreting it as a sequence of bytes, even if it's a simple POD - there's issues like endianness (which is unlikely to be a real-world problem for most of us), and structure alignment/padding (which is a potential problem).
C++ doesn't have any built-in serialization/deserialization, you'll either have to roll your own or take a look at things like boost Serialization, or Google's protobuf.
If it is not a homework or study project, there may be little point in fiddling with IPC at TCP stream level especially if that's not something one is familiar with.
Use a messaging library, like ØMQ to send and receive complete messages rather than streams of bytes.

Efficient Packet types/transfer protocol

Using boost::asio in C++, I'm trying to determine the best way to encrypt packets in my program. I thought of defining packets all by myself by type number, each with different fixed packet sizes. The system reads the header (type, and quantity of entries for lists of data) and creates the appropriate structure to receive the data, then it reacts according to the data received.
However, when I look at this method, I wonder if there would be a simpler way to accomplish this without sacrificing efficiency.
These packets are to be sent between different applications trough TCP. Ideally, I'm aiming at both applications using as least bandwidth and CPU as possible while also being as simple to modify as possible. Any suggestions?
TCP uses streams of data, not packets. I highly suggest thinking of your data transmission as a stream of data instead of sequence of packets. This will make it easier to abstract into your code. Take a look at Boost.Serialization or Google Protocol Buffers.
Boost.Asio has SSL encryption capabilities, so it's trivial to encrypt the stream of data. It also has an example using serialization.
Have you considered google protobufs? While it doesn't actually do the encryption (you'll have to do that yourself), it does provide a way of encoding the structured data allowing you to send it over the wire efficiently. Additionally, there are many language bindings for it (C++, Java, and Python off the top of my head).

How to delta encode a C/C++ struct for transmission via sockets

I need to send a C struct over the wire (using UDP sockets, and possibly XDR at some point) at a fairly high update rate, which potentially causes lots of redundant and unnecessary traffic at several khz.
This is because, some of the data in the struct may not have changed at times, so I thought that delta-encoding the current C struct against the previous C struct would seem like a good idea, pretty much like a "diff".
But I am wondering, what's the best approach of doing something like this, ideally in a portable manner that also ensures that data integrity is maintained? Would it be possible to simply XOR the data and proceed like this?
Similarly, it would be important that the approach remains extensible enough, so that new fields can be added to the struct or reordered if necessary (padding), which sounds as if it'd require versioning information, as well.
Any ideas or pointers (are there existing libraries?) would be highly appreciated!
Thanks
EDIT: Thanks to everyone one who provided an answer, the level of detail is really appreciated, I realize that I probably should not have mentioned UDP though, because that is in fact not the main problem, because there is already a corresponding protocol implemented on top of UDP that accounts for the mentioned difficulties, so the question was really meant to be specific to feasible means of delta encoding a struct, and not so much about using UDP in particular as a transport mechanism.
UDP does not guarantee that a given packet was actually received, so encoding whatever you transmit as a "difference from last time" is problematic -- you can't know that your counterpart has the same idea as you about what the "last time" was. Essentially you'd have to build some overhead on top of UDP to check what packets have been received (tagging each packet with a unique ID) -- everybody who's tried to go this route will agree that more often than not you find yourself more or less duplicating the TCP streaming infrastructure on top of UDP... only, most likely, not as solid and well-developed (although admittedly sometimes you can take advantage of very special characteristics of your payloads in order to gain some modest advantage over plain good old TCP).
Does your transmission need to be one-way, sender to receiver? If that's the case (i.e., it's not acceptable for the receiver to send acknowledgments or retransmits) then there's really not much you can do along these lines. The one thing that comes to mind: if it's OK for the receiver to be out of sync for a while, then the sender could send two kinds of packets -- one with a complete picture of the current value of the struct, and an identifying unique tag, to be sent at least every (say) 5 minutes (so realistically the receiver may be out of sync for up to 15 minutes if it misses two of these "big packets"); one with just an update (diff) from the last "big packet", including the big packet's identifying unique tag and (e.g.) a run-length-encoded version of the XOR you mention.
Of course once having prepared the run-length-encoded version, the server will compare its size vs the size of the whole struct, and only send the delta-kind of packet if the savings are substantial, otherwise it might as well send the big-packet a bit earlier than needed (gains in reliability). The received will keep track of the last big-packet unique tag it has received and only apply deltas which pertain to it (helps against missing packets and packets delivered out of order, depending how sophisticated you want to make your client).
The need for versioning &c, depending on what exactly you mean (will senders and receivers with different ideas about how the struct's C layout should look need to communicate regularly? how do they handshake about what versions are know to both? etc), will add a whole further universe of complications, but that is really another question, and your core question as summarized in the title is already plenty big enough;-).
If you can afford occasional meta-messages from the receiver back to the sender (acks or requests to resend) then depending on the various numerical parameters in play you may design different strategies. I suspect acks would have to be pretty frequent to do much good, so a request to resend a big-packet (either a specifically identified one or "whatever you have that's freshest") may be the best meta-strategy to cull the options space (which otherwise threatens to explode;-). If so then the sender may be blissfully ignorant of whatever strategy the receiver is using to request big-packet-resends, and you can experiment on the receiver side with various such strategies without needing to redeploy the sender as well.
It's hard to offer much more help without some specifics, i.e., at least ballpark numbers for all the numerical parameters -- packet sizes, frequencies of sending, how long is it tolerable for the sender to be out of sync with the receiver, a bundle of network parameters, etc etc. But I hope this somewhat generic analysis and suggestions still help.
To delta encode:
1) Send "key frames" periodically (e.g. once a second). A key frame is a complete copy (rather than a delta) so that if you lose comms for any reason, you only lose a small amount of data before you can "aquire the signal" again. Use a simple packet header that allows you to detect the start of a packet and know what type of data it contains.
2) Calculate the delta from the previous packet and encode that in a compact form. By examining the type of data you are sending and the way it typically changes, you should be able to devise a pretty compact delta. However, you may need to check the size of the delta - in some cases it may not be an efficient encoding - if it's bigger than a key frame you can just send another key frame instead. You can also decide at this point whether your deltas are lossy or lossless.
3) Add a CRC check to the packet (search for CRC32). This will allow the receiver to verify that the packet has been received intact, allowing them to skip invalid packets.
NOTES:
Be careful about doing this over UDP - it gives no guarantee that your packets will arrive in the same order you sent them. Obviously a delta will only work if packets are in order. In this case, you will need to add some form of sequence ID to each packet (first packet is "1", second packet is "2" etc) so that you can detect out-of-order receiving. You may even need to keep a buffer of "n" packets in the receiver so that you can reassemble them in the correct order when you come to decode them (but of course, this could introduce some latency). You will probably also miss some packets over UDP, in which case you'll need to wait until the next keyframe before you'll be able to "re-aquire the signal" - so the key frames must be frequent enough to avoid catastrophic outages in your comms.
Consider using compression (e.g. zip etc). You may find a full packet can be built in a zip-friendly manner (e.g. rearrage data to group bytes that are likely to have similar values (especially zeros) together) and then compressed so well that it is smaller than an uncompressed delta, and you won't need to go to all the effort of deltas at all (and you won't have to worry about packet ordering etc).
edit
- Always use a version number (or packet type) in your packets so you can add new fields or change the delta encoding in the future! You'll need this for differentiating key/delta frames anyway.
I'm not convinced that delta encoding values on UDP - which is inherently unreliable and out of order - is going to be particularly easy. Instead, I'd send an ID of the field which has changed and its current value. That also doesn't require anything to change if you want to add extra fields to the data structure you're sending. If you want a standard way of doing this, look at SNMP; that may be something you can drop in, or it may be a bit baggy for you (it qualifies the field names globally and uses ASN.1 - both of which give maximum interoperability, but at the cost of some bytes in the packet).
Use an RPC like corba or protocol buffers
Use DTLS with a compression option
Use a packed format
Repurposes an existing header compression library

Question about server socket programming model [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Over the last couple of months I've been working on some implementations of sockets servers in C++ and Java. I wrote a small server in Java that would handle & process input from a flash application hosted on a website and I managed to successfully write a server that handles input from a 2D game client with multiple players in C++. I used TCP in one project & UDP in the other one. Now, I do have some questions that I couldn't really find on the net and I hope that some of the experts could help me. :)
Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.
Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?
And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.
Thank you.
Sounds like you have a couple of questions here. I'll do my best to answer what I can see.
1. How should I handle threading in my network server?
I would take a good look at what kind of work you're doing on the worker threads that are being spawned by your server. Spawning a new thread for each request isn't a good idea...but it might not hurt anything if the number of parallel requests is small and and tasks performed on each thread are fast running.
If you really want to do things the right way, you could have a configurable/dynamic thread pool that would recycle the worker threads as they became free. That way you could set a max thread pool size. Your server would then work up to the pool size...and then make further requests wait until a worker thread was available.
2. How do I format the data in my packets?
Unless you're developing an entirely new protocol...this isn't something you really need to worry about. Unless you're dealing with streaming media (or another application where packet loss/corruption is acceptable), you probably won't be using UDP for this application. TCP/IP is probably going to be your best bet...and that will dictate the packet design for you.
3. Which format do I use for serialization?
The way you serialize your data over the wire depends on what kind of applications are going to be consuming your service. Binary serialization is usually faster and results in a smaller amount of data that needs to be transfered over the network. The downside to using binary serialization is that the binary serialization in one language may not work in another. Therefore the clients connecting to your server are, most likely, going to have to be written in the same language you are using.
XML Serialization is another option. It will take longer and have a larger amount of data to be transmitted over the network. The upside to using something like XML serialization is that you won't be limited to the types of clients that can connect to your server and consume your service.
You have to choose what fits your needs the best.
...play around with the different options and figure out what works best for you. Hopefully you'll find something that can perform faster and more reliably than anything I've mentioned here.
As far as server design concern, I would say that you are right: although ONE-THREAD-PER-SOCKET is a simple and easy approach, it is not the way to go since it won't scale as well as other server design patterns.
I personally like the COMMUNICATION-THREADS/WORKER-THREADS approach, where a pool of a dynamic number of worker threads handle all the work generated by producer threads.
In this model, you will have a number of threads in a pool waiting for tasks that are going to be generated from another set of threads handling network I/O.
I found UNIX Network Programming by Richard Stevens and amazing source for this kind on network programming approaches. And, despite its name, it will be very useful in windows environments as well.
Regarding the layout of the packets (you should have post a different question for this since it is a totally different question, in my opinion), there are tradeoffs when selecting TEXT vs BINARY approach.
TEXT (i.e. XML) is probably easier to parse and document, and more simple in general, while a BINARY protocol should give you better performance in terms of speed of processing and size of network packets, but you will have to deal with more complicated issues such as ENDIANNES of the words and stuff like that.
Hope it helps.
Though previous answers provide good direction, just for completeness, I'd like to point out that threads are not an absolute requirement for great socket server performance. Some examples are here. There are many approaches to scalability too - thread pools, pre-forked processes, server pools, etc.
1) And last, is there any easy to use library which is commonly used that supports portability (eg development on a windows machine & deployment on a linux box) other than boost asio.
The ACE library is another alternative. It's very mature (been around since the early 90s) and widely deployed. A brief discussion about how it compares to Boost ASIO is available on the Riverace website here. Keep in mind that ACE has had to support a large number of legacy platforms for long time so it doesn't utilize modern C++ features as much as Boost ASIO, for example.
2) Let's say I would like to build a server in C++ that would handle the input from thousands of standalone and/or web applications, how should I design my server then? So far, I usually create a new & unique thread for each user that connects, but I doubt this is the way to go.
There are a number of commonly used approaches including but not limited to: thread-per-connection (the approach you describe) and thread pool (the approach Justin described). Each have their pros and cons. Many have a looked at the trade-offs. A good starting point might be the links on the Thread Pool Pattern Wikipedia page.
Dan Kegel's "The C10K Problem" web page has lots of useful notes about improving scalability as well.
3) Also, How does one determine the layout of packets sent over the network; is data usually sent over the network in a binary or text state? How do you handle serializated objects when you send data to different media (eg C++ server to flash application)?
I agree with others that sending binary data is generally going to be most efficient. The boost serialization library can be used to marshal data into a binary form (as well as text). Mature binary formats include XDR and CDR. CDR is the format used by CORBA, for instance. The company ZeroC defines the ICE encoding, which is supposed to be much more efficient than CDR.
There are lots of binary formats to choose from. My suggestion would be to avoid reinventing the wheel by at least reading about some of these binary formats so that you don't end up running into the same pitfalls these existing binary formats were designed to address.
That said, lots of middleware exists that already provides a canned solution for most of your needs. For example, OpenSplice and OpenDDS are both implementations of the OMG Data Distribution Service standard. DDS focuses on efficient distribution of data such as through a publish-subscribe model, rather than remote invocation of functions. I'm more familiar with the OMG defined technologies but I'm sure there are other middleware implementations that will fit your needs.
you're still going to need a socket to handle every client, but the idea would be to create a pool of X sockets (say 50) and then, when you get close (say 90%) to consuming all those sockets, create another pool of X sockets. At some point, after clients have connected, sent data and disconnected, some of your sockets will be available for use and you can use them (google socket pools for this info)
The layout of data is always difficult. If all your clients and servers will be using the same hardware and operating system, you can send data in binary format, but there are many trips and traps there (byte alignment is at the top of the list). sending formatted text is always easier, but certainly more expensive in terms of bandwidth and processing power because you have to change format from machine to text before sending and, of course, back again at the receiver.
re: serialized, I'm sorry, I can't help you, nor with libraries (I'm too embedded to have used much of these)
About server sockets and serialization(marshaling). The most important problem is growing sockets number is readable and writable state in select. I am not about limitation in the FD_SET. This is solvable simply. I am about growth of time of signaling and problem data accumulation in not read sockets while processing data available in evaluated socket. So the solution may be even out of SW boundaries and require multiple processor model,when roles of processors are limited: one reads and writes, N are processing. In this case all available socket data should has been read when select returned and sent to another processing units.
The same is about incoming data.
About marshaling. Of coarse a binary format is preferable because performance.By the way XML in the terms of UNICODE has the same problem. But,... comrades, it is not simply copying long or integer value into a socket stream. But in this case even htons, htonl could help (it sends/receives in NW format and OS is responsible for data convert). But it is safe more sending data following representation header, where exposed format of most/least significant bits placed, bytes order and IEEE data type. This works, I had not a case when not.
Kind regards and great success for everyone.
Simon Cantor