compressing socket send data - c++

I'm trying to send a lot of data(basically data records converted to a string) over a socket and its slowing down the performance of the rest of my program. Is it possible to compress the data using gzip etc. and uncompress it at the other end?

Yes. The easiest way to implement this is to use the venerable zlib library.
The compress() and uncompress() utility functions may be what you're after.

Yes, but compression and decompression have their costs as well.
You might want to consider using another process or thread to handle the data transfer; this is probably harder than merely compressing, but will scale better when your data load increases n-fold.

Yes, it's possible. zlib is one library for doing this sort of compression and decompression. However, you may be better served by serializing your data records in a binary format rather than as a string; that should improve performance, possibly even more so than using compression.

Of course you can do that. When sending binary data, you have to take care of endiannes of the platform.
However, are you sure your performance problems will be solved through compression of sent data? You'll still have additional steps (compression/decompression, possibly solving endiannes issues).
Think about how the communication through sockets is done. Are you using synchronous or asynchronous communication. If you do the reads and writes synchronous, then you can feel performance penalities...

You may use AdOC a library to transparently overload socket system calls
http://www.labri.fr/perso/ejeannot/adoc/adoc.html
It does compression on the fly if it finds that it would be profitable.

Related

Which decompression algorithms are safe to use on attacker-supplied buffers?

I want to save network bandwidth by using compression, such as bzip2 or gzip.
Attackers, as well as normal users, may send compressed messages.
Are there sequences of bytes which will cause some decompression functions to become stuck in an infinite loop, or to use vast amounts of memory?
Is so, is this a fundamental property of those algorithms, or just an implementation bug?
I can only speak for zlib's inflate. There is no input that would result in an infinite loop or uncontrolled memory consumption.
Since the maximum compression of deflate is less than 1032:1, then inflate when working normally can expand up to almost 1032:1. You just need to be able to handle that possibility.

advantage of serialization over sockets c++

currently we have integrated networking into our game, using UDP protocol. It works fine. But we are sending strings over there network to the server. "10,10,23 - 23,9,10 - 9,23,23"
I came across that I need to serialize the data as this is the right way to do it? what are the benefits of it? does it reduces the performance? Or is sending string fine?
You're already serialising it.
I think what you're asking is whether it is beneficial to serialise to a compact, binary format rather than human-readable strings. The answer is yes, since you can reduce bandwidth requirements and parsing time.
Sometimes you can simply copy the bytes that make up your objects straight into the communications media, though watch out for endianness, padding, width, alignment and other implementation-defined quantities; generally you want to define a single, universal format for your data and some translation may be required on one or more endpoints in order to express the data interchange. That said, in most cases, that's still going to be cheaper than string parsing and stringisation.
The downside is you cannot snoop on the communications channel and immediately see with your eyes what's going on, when debugging your networking.

How can I recv TCP socket data in one package without dividing

Since I create a TCP socket,it is fine when sending small amount data.no fragment. all data came in one package. but when data becomes bigger and bigger. TCP package has been divided into pieces.. it`s really annoying. Is there any option to set on socket, and the socket will automatically put pieces into one package for me ?
It's a byte stream. All the bytes will arrive correctly and in the right order, but not necessarily when you want them. If you need to send anything more complex than one byte, you need another protocol on top of TCP. That's why there are all those other TCP/IP protocols like HTTP, SMTP etc.
No there is not. There are even situations where you might receive 1 byte.
Consider using higher level messaging libraries like ZMQ. It handles all the message packing and unpacking for you.
TCP provides you reliable bi-directional byte stream. It takes care of sequencing, transport-layer packetization, retransmission, and flow-control. Decades of research went into optimizing its performance. Pretty nifty. The small price you pay for all this convenience is that you have to write and read the stream in a loop, watching for a complete application protocol message you can process when receiving, and flushing yet unbuffered bytes when sending.
Welcome to socket programming!
I'll chime in here and say that there's pretty much nothing you can do to solve you issue without adding extra dependencies on libraries which handle application protocols for you. There are some lower level message packing libraries (google's protocol buffers, among others) which may help.
It's probably the most beneficial to get used to reading and writing TCP data in a loop. It's proven and very portable.. even if you pay a small price in actually writing the streaming codecs yourself.
Try it a few times. It's a useful experience which you can re-use, and it's really not as difficult and annoying once you get the hang of it (like anything else, really).
Furthermore, it's fairly easy to unit-test (rather than dealing with esoteric libraries and uncommon protocols with badly/sparsely documented options)..
You can optimize sockets reads to return larger chunks, on platforms that support it, by setting low watermark using setsockopt() and SO_RECVLOWAT. But you will still have to handle the possibility of getting bytes less than the watermark.
I think you want SOCK_SEQPACKET (or possibly SOCK_RDM). See socket(2).

Efficient Packet types/transfer protocol

Using boost::asio in C++, I'm trying to determine the best way to encrypt packets in my program. I thought of defining packets all by myself by type number, each with different fixed packet sizes. The system reads the header (type, and quantity of entries for lists of data) and creates the appropriate structure to receive the data, then it reacts according to the data received.
However, when I look at this method, I wonder if there would be a simpler way to accomplish this without sacrificing efficiency.
These packets are to be sent between different applications trough TCP. Ideally, I'm aiming at both applications using as least bandwidth and CPU as possible while also being as simple to modify as possible. Any suggestions?
TCP uses streams of data, not packets. I highly suggest thinking of your data transmission as a stream of data instead of sequence of packets. This will make it easier to abstract into your code. Take a look at Boost.Serialization or Google Protocol Buffers.
Boost.Asio has SSL encryption capabilities, so it's trivial to encrypt the stream of data. It also has an example using serialization.
Have you considered google protobufs? While it doesn't actually do the encryption (you'll have to do that yourself), it does provide a way of encoding the structured data allowing you to send it over the wire efficiently. Additionally, there are many language bindings for it (C++, Java, and Python off the top of my head).

Difference between stateless and stateful compression?

In the chapter Filters (scroll down ~50%) in an article about the Remote Call Framework are mentioned 2 ways of compression:
ZLib stateless compression
ZLib stateful compression
What is the difference between those? Is it ZLib-related or are these common compression methods?
While searching I could only find stateful and stateless webservices. Aren't the attributes stateless/ful meant to describe the compression-method?
From Transport Layer Security Protocol Compression Methods:
Compression methods used with TLS can
be either stateful (the compressor
maintains it's state through all
compressed records) or stateless
(the compressor compresses each record
independently), but there seems to
be little known benefit in using a
stateless compression method within
TLS.
Some compression methods have the
ability to maintain history
information when compressing and
decompressing packet payloads. The
compression history allows a higher
compression ratio to be achieved on
a stream as compared to per-packet
compression, but maintaining a
history across packets implies that a
packet might contain data needed to
completely decompress data contained
in a different packet. History
maintenance thus requires both a
reliable link and sequenced packet
delivery. Since TLS and lower-layer
protocols provide reliable,
sequenced packet delivery, compression
history information MAY be
maintained and exploited if supported
by the compression method.
In general, stateless describes any process that does not have a memory of past events, and stateful describes any process that does have such a memory (and uses it to make decisions.)
In compression, then, stateless means whatever chunk of data it sees, it compresses, without depending on previous inputs. It's faster but usually compresses less; stateful compression looks at previous data to decide how to compress current data, it's slower but compresses much better.
Zlib is a compression algorithm that's adaptive. All compression algorithms work because the data they work on isn't entirely random. Instead, their input data has a non-uniform distribution that can be exploited. Take English text as a simple example. The letter e is far more common than the letter q. Zlib will detect this, and use less bits for the letter e.
Now, when you send a lot lot of short text messages, and you know they're all in English, you should use Zlib statefull compression. It would keep that low-bit representation of the letter e across all messages. But if there are messages in Chinese, Japanese, French, etc intermixed, stateful compression is no longer that smart. There will be few letters e in a Japanese text. Stateless compression would check for each message which letters are common. A wellknown example of ZLib stateless compression is the PNG file format, which keeps no state between 2 distinct images.