Why is there no open source implementation of the PKWare Data Compression Library's Implode? This is compression method 10, not 6.
There are many versions of decompressors. Is it very hard to do the reverse and write a compressor? I'm interested in the answer before I attempt it myself.
In answer to the question of why, it's because I need to transfer data in both directions and I can not use a newer method due to legacy limits.
No, it wouldn't be very hard to modify a deflate compressor to produce DCL compressed data. You're not finding it because there has not been any interest in it. It is an obsolete compressor.
Related
Or any other compression algorithm, for that matter.
(Then again, if there was a turing-complete compression algorithm, would it still be considered a compression algorithm, rather than a programming language?)
The question might almost make sense if you had asked about the de-compressor, as opposed to the compressor. The job of a compressor is effectively to write a program to be executed by the decompressor that will recreate the original file being compressed. The program is written in the language that is the compressed data format.
The answer to that question is no, the bzip2 decompressor is not Turing complete, since it has no way to loop or recurse. Nor does the decompressor for any other standard compression format that I am aware of.
Update:
It appears to have since been deprecated due to security concerns, but apparently WinRAR had a post-processing language built into the decompressor called RarVM, which was a Turing-complete machine for implementing arbitrarily complex pre-compression filters for data.
I need to compress data for logging by appending short strings to the log file using C++/C. I first tired gzip(zlib), but this makes a symbol table for each short string and actually makes the data longer rather than compressing. I believe the thing I'm looking for is a static Huffman table. Anyway, I was wondering if there was a common algorithm for this. I would much rather a format that anyone could read. I think the answer is no, but this is the place to ask. Thanks.
You should look at the examples/gzlog.[ch] source files in the zlib distribution. That code was written for precisely this purpose. It appends short strings to a growing compressed gzip file.
For large files or other files that are not necessarily text, how can i compress them and what are the most efficient methods to check for data corruption? any tutorials on these kinds of algorithms would be greatly appreciated.
For compression, LZO should be helpful. Easy to use and library easily available.
For data corruption check, CRC ca
http://cppgm.blogspot.in/2008/10/calculation-of-crc.html
For general compression, I would recommend Huffman coding. It's very easy to learn, a full-featured (2-pass) coder/decoder can be written in <4 hours if you understand it. It is part of DEFLATE which is part of the .zip format. Once you have that down, learn LZ77, then put them together and make your own DEFLATE implementation.
Alternatively, use zlib, the library everyone uses for zip files.
For large files, I wouldn't recommend CRC32 like everyone is telling you. Larger files suffer from birthday corruption pretty easily. What I mean is that as a file gets larger, a 32-bit checksum can only find an increasingly limited number of errors. A fast implementation of a hash - say, MD5 - would do you well. Yes MD5 is cryptographically broken but I'm assuming, considering your question, that you're not working on a security-conscious problem.
Hamming codes are a possibility. The idea is to insert a few sum-bits at each N bits of data , and initialize each of them with 0 or 1, such that the sum of some of the bits of data and sum-bits is 1 all the time. In case in which a sum is not 1, looking at the values of these sum-bits, you can see what bits of data were lost.
There are lots of other possibilities, as the previous post says.
http://en.wikipedia.org/wiki/Hamming_code#General_algorithm
Does anyone know of a free (non-GPL), decently performing compression library that supports packet oriented compression in C/C++?
With packet oriented, I mean the kind of feature QuickLZ (GPL) has, where multiple packets of a stream can be compressed and decompressed individually while a history is being maintained across packets to achieve sensible compression.
I'd favor compression ratio over CPU usage as long as the CPU usage isn't ridiculous, but I've had a hard time finding this feature at all, so anything is of interest.
zlib's main deflate() function takes a flush parameter, which allows various different flushing modes. If you pass Z_SYNC_FLUSH at the end of each packet, that should produce the desired effect.
The details are explained in the zLib manual.
bzip2 has flushing functionality as well, which might let you do this kind of thing. See http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#bzCompress
Google's Snappy may be a good option, if you need speed more than compression and are just looking to save a moderate amount of space.
Alternatively, Ilia Muraviev put a small piece of compression code called BALZ in public domain some time ago. It is quite decent for many kinds of data.
Both of these support stream flushing and independent state variables to do multiple, concurrent streams across packets.
Google's new SPDY protocol uses zlib to compress individual messages, and maintains the zlib state for the life of the connection to achieve better compression. I don't think there's a standalone library that handles this behavior exactly, but there are several open-source implementations of SPDY that could show you how it's done.
The public domain Crush algorithm by Ilia Muraviev has similar performance and compression ratio as QuickLZ has, Crush being a bit more powerful. The algorithms are conceptually similar too, Crush containing a bit more tricks. The BALZ algorithm that was already mentioned earlier is also by Ilia Muraviev. See http://compressme.net/
may be you could use lzma compression SDK, it's written and placed in the public domain by Igor Pavlov.
And since it can compress stream files, and has memory to memory compression I think it's possible to compress packet stream (may be with some changes) but not sure.
I am looking for a compression algorithm (for a programming competition) and I need a full description of how to implement it (all technical details), any loseless and patent-free algorithm will do, but the ease of implementation is a bonus :)
(Although possibly irrelevant) I plan to implement the algorithm in C++...
Thanks in advance.
EDIT:
I will be compressing text files only, no other file types...
Well, I can't go so far as to complete the competition for you, but please check out this article on wiki: Run Length Encoding. It is by far one of the simplest ways to compress data, albeit not always an efficient one. Compression is also domain specific, even amongst lossless algorithms you will find that what you are compressing determines how best to encode it.
RFC 1951 describes inflate/deflate, including a brief description of the compressor's algorithm. Antaeus Feldspar's An Explanation of the Deflate Algorithm provides a bit more background.
Also, the zlib source distribution contains a simplified reference inflater in contrib/puff/puff.c that can be helpful reading to understand exactly how the bits are arranged (but it doesn't contain a deflate, only inflate).
I'd start here on Wikipedia.
There's a whole lot to choose from, but without knowing more about what you want it's difficult to help more. Are you compressing text, images, video or just random files? Each one has it's own set of techniques and challenges for optimal results.
If ease of implementation is the sole criterion I'd use "filecopy" compression. Guaranteed compression ratio of exactly 1:1, and trivial implementation...
Huffman is good if you're compressing plain text. And all the commenters below assure me it's a joy to implement ;D
Ease of implementation: Huffman, as stated before. I believe LZW is no longer under patent, but I don't know for sure. It' a relatively simple algorithm. LZ77 should be available, though. Lastly, the Burrows-Wheeler transform allows for compression, but it's significantly more difficult to implement.
I like this introduction to the Burrows-Wheeler Transform.
If you go under "View" in your internet browser, there should be an option to either "Zoom Out" or make the text smaller.
Select one of those and...
BAM!
You just got more text on the same screen! Yay compression!
The Security Now! podcast recently put out an episode highlighting data compression algorithms. Steve Gibson gives a pretty good explanation of the basics of Huffman and Lempel-Ziv compression techniques. You can listen to the audio podcast or read the transcript for Episode 205.