Packet oriented lossless compression library - c++

Does anyone know of a free (non-GPL), decently performing compression library that supports packet oriented compression in C/C++?
With packet oriented, I mean the kind of feature QuickLZ (GPL) has, where multiple packets of a stream can be compressed and decompressed individually while a history is being maintained across packets to achieve sensible compression.
I'd favor compression ratio over CPU usage as long as the CPU usage isn't ridiculous, but I've had a hard time finding this feature at all, so anything is of interest.

zlib's main deflate() function takes a flush parameter, which allows various different flushing modes. If you pass Z_SYNC_FLUSH at the end of each packet, that should produce the desired effect.
The details are explained in the zLib manual.
bzip2 has flushing functionality as well, which might let you do this kind of thing. See http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#bzCompress

Google's Snappy may be a good option, if you need speed more than compression and are just looking to save a moderate amount of space.
Alternatively, Ilia Muraviev put a small piece of compression code called BALZ in public domain some time ago. It is quite decent for many kinds of data.
Both of these support stream flushing and independent state variables to do multiple, concurrent streams across packets.

Google's new SPDY protocol uses zlib to compress individual messages, and maintains the zlib state for the life of the connection to achieve better compression. I don't think there's a standalone library that handles this behavior exactly, but there are several open-source implementations of SPDY that could show you how it's done.

The public domain Crush algorithm by Ilia Muraviev has similar performance and compression ratio as QuickLZ has, Crush being a bit more powerful. The algorithms are conceptually similar too, Crush containing a bit more tricks. The BALZ algorithm that was already mentioned earlier is also by Ilia Muraviev. See http://compressme.net/

may be you could use lzma compression SDK, it's written and placed in the public domain by Igor Pavlov.
And since it can compress stream files, and has memory to memory compression I think it's possible to compress packet stream (may be with some changes) but not sure.

Related

Is bzip2 turing complete?

Or any other compression algorithm, for that matter.
(Then again, if there was a turing-complete compression algorithm, would it still be considered a compression algorithm, rather than a programming language?)
The question might almost make sense if you had asked about the de-compressor, as opposed to the compressor. The job of a compressor is effectively to write a program to be executed by the decompressor that will recreate the original file being compressed. The program is written in the language that is the compressed data format.
The answer to that question is no, the bzip2 decompressor is not Turing complete, since it has no way to loop or recurse. Nor does the decompressor for any other standard compression format that I am aware of.
Update:
It appears to have since been deprecated due to security concerns, but apparently WinRAR had a post-processing language built into the decompressor called RarVM, which was a Turing-complete machine for implementing arbitrarily complex pre-compression filters for data.

Have the monetdb's developers tested any other compression algorithm on it?

Have the MonetDb's developers tested any other compression algorithm on it before?
Perhaps they have tested other compression algorithms ,but it's really had a negative performance impact.
So why haven't they improved this database's compression performance?
I am a student from China. MonetDb is really interesting me and I want to try to improve its compression performance.
So, I should make sure that any body have done this before.
It would be my grateful if you could answer my question.
That is because i really need this.
Thank you So much.
MonetDB only compresses String (Varchar and char) types using dictionary compression and only if the number of unique strings in a column is small.
Integrating any other kind of compression (even simple ones like Prefix-Coding, Run-length Encoding, Delta-compression, ...) need a complete rewrite of the system because the operators have to be made compression-aware (which pretty much means creating a new operator).
The only thing that may be feasible without a complete rewrite is having dedicated compression operators the compress/decompress data instead of spilling to disk. However, this would be very close to the memory compression apple implemented in Mavericks
MonetDB compresses columns using PFor compression. See http://paperhub.s3.amazonaws.com/7558905a56f370848a04fa349dd8bb9d.pdf for details. This also answers the your question about checking other compression methods.
The choice for PFOR is because of the way modern CPU's work, but really any algorithm that doesn't work with branches but with (only) arithmetics will do just fine. I've hit similar speeds with arithmetic coding in the past.

Streaming Real and Debug Data To Disk in C++

What is a flexible way to stream data to disk in a c++ program in Windows?
I am looking to create a flexible stream of data that may contain arbitrary data (say time, average, a flag if reset, etc) to disk for later analysis. Data may come in at non-uniform, irregular intervals. Ideally this stream would have minimal overhead and be easily readable in something like MATLAB so I could easily analyze events and data.
I'm thinking of a binary file with a header file describing types of packets followed by a wild dump of data tagged with . I'm considering a lean, custom format but would also be interested in something like HDF5.
It is probably better to use an existing file format rather than a custom one. First you don't reinvent the wheel, second you will benefit from a well tested and optimized library.
HFD5 seems like a good bet. It is fast and reliable, and easy to read from Matlab. It has some overhead but it is to allow great flexibility and compatibility.
This requirement sounds suspiciously like a "database"

C++ Serialization Performance

I'm building a distributed C++ application that needs to do lots of serialization and deserialization of data stored in std containers.
Currently Boost.serialization is adopted. However, it performs terrible. Our B-tree also use Boost.serialization to store key-value pair data, however, if we change Boost.serialziation to memcpy, the accessing speed will be improved 10 times or more. Since given the current distributed platform, so many data exchange are needed, therefore easy programming is also required together with high performance. I know protocol buffer could also be used as a serialization mechanism, however, I am not sure about the performance comparison between Boost.serialization and protocol buffer, another issue is , does there exist any better solutions to provider higher performance as close to memcpy?
Thanks
Someone asked a very similar question:
C++ Serialization Performance
It looks like protocol buffers are a good way to go, though without knowing your applications' requirements, it is hard to recommend any particular library or technique.

best compression algorithm with the following features

What is the best compression algorithm with the following features:
should take less time to decompress (can take reasonably more time compress)
should be able to compress sorted data (approx list of 3,000,000 strings/integers ...)
Please suggest along with metrics: compression ratio, algorithmic complexity for compression and decompression (if possible)?
Entire site devoted to compression benchmarking here
Well if you just want speed, then standard ZIP compression is just fine and it's most likely integrated into your language/framework already (ex: .NET has it, Java has it). Sometimes the most universal solution is the best, ZIP is a very mature format, any ZIP library and application will work with any other.
But if you want better compression, I'd suggest 7-Zip as the author is very smart, easy to get ahold of and encourages people to use the format.
Providing you with compression times is impossible, as it's directly related to your hardware. If you want a benchmark, you have to do it yourself.
You don't have to worry about decompression time. The time spent the higher compression level is mostly finding the longest matching pattern.
Decompression either
1) Writes the literal
2) for (backward position, length)=(m,n) pair,
goes back, in the output buffer, m bytes,
reads n bytes and
writes n bytes at the end of the buffer.
So the decompression time is independent of the compression level. And, with my experience with Universal Decompression Virtual Machine (RFC3320), I guess the same is true for any decompression algorithm.
This is an interessing question.
On such sorted data of strings and integers, I would expect that difference coding compression approaches would outperform any out-of-the-box text compression approach as LZ77 or LZ78 in terms of compression ratio. General purpose encoder do not use the special properties of the data.