Trying to identify exact Lempel-Ziv variant of compression algorithm in firmware - compression

I'm currently reverse engineering a firmware that seems to be compressed, but am really having hard time identifying which algorithm it is using.
I have the original uncompressed data dumped from the flash chip, below is some of human readable data, uncompressed vs (supposedly) compressed:
You can get the binary portion here, should it helps: Link
From what I can tell, it might be using Lempel-Ziv variant of compression algorithm such as LZO, LZF or LZ4.
gzip and zlib can be ruled out because there will be very little to no human readable data after compression.
I do tried to compress the dumped data with Lempel-Ziv variant algorithms mentioned above using their respective Linux cli tools, but none of them show exact same output as the "compressed data".
Another idea I have for now is to try to decompress the data with each algorithm and see what it gives. But this is very difficult due to lack of headers in the compressed firmware. (Binwalk and signsrch both detected nothing.)
Any suggestion on how I can proceed?

Related

How to compress ascii text without overhead

I want to compress small text (400 bytes) and decompress it on the other side. If I do it with standard compressor like rar or zip, it writes metadata along with the compressed file and it's bigger that the file itself..
Is there a way to compress the file without this metadata and open it on the other side with known ahead parameters?
You can do raw deflate compression with zlib. That avoids even the six-byte header and trailer of the zlib format.
However you will find that you still won't get much compression, if any at all, with just 400 bytes of input. Compression algorithms need much more history than that to get rolling, in order to build statistics and find redundancy in the data.
You should consider either a dictionary approach, where you build a dictionary of representative strings to provide the compressor something to work with, or you can consider a sequence of these 400-byte strings to be a single stream that is decompressed as a stream on the other end.
You can have a look at compression using Huffman codes. As an example look at here and here.

String Compression Algorithm

I've been wanting to compress a string in C++ and display it's compressed state to console. I've been looking around for something and can't find anything appropriate so far. The closest thing I've come to finding it this one:
How to simply compress a C++ string with LZMA
However, I can't find the lzma.h header which works with it anywhere.
Basically, I am looking for a function like this:
std::string compressString(std::string uncompressedString){
//Compression Code
return compressedString;
}
The compression algorithm choice doesn't really matter. Anybody can help me out finding something like this? Thank you in advance! :)
Based on the pointers in the article I'm fairly certain they are using XZ Utils, so download that project and make the produced library available in your project.
However, two caveats:
dumping a compressed string to the console isn't very useful, as that string will contain all possible byte values, most of which aren't displayable on a console
compressing short strings (actually any small quantity of data) isn't what most general-purpose compressors were designed for, in many cases the compressed result will be as big or even bigger than the input. However, I have no experience with LZMA on small data quantities, an extensive test with data representative for your use case will tell you whether it works as expected.
One algorithm I've been playing with that gives good compression on small amounts of data (tested on data chunks sized 300-500 bytes) is range encoding.

Hadoop, how to compress mapper output but not the reducer output

I have a map-reduce java program in which I try to only compress the mapper output but not the reducer output. I thought that this would be possible by setting the following properties in the Configuration instance as listed below. However, when I run my job, the generated output by the reducer still is compressed since the file generated is: part-r-00000.gz. Has anyone successfully just compressed the mapper data but not the reducer? Is that even possible?
//Compress mapper output
conf.setBoolean("mapred.output.compress", true);
conf.set("mapred.output.compression.type", CompressionType.BLOCK.toString());
conf.setClass("mapred.output.compression.codec", GzipCodec.class, CompressionCodec.class);
mapred.compress.map.output: Is the compression of data between the mapper and the reducer. If you use snappy codec this will most likely increase read write speed and reduce network overhead. Don't worry about spitting here. These files are not stored in hdfs. They are temp files that exist only for the map reduce job.
mapred.map.output.compression.codec: I would use snappy
mapred.output.compress: This boolean flag will define is the whole map/reduce job will output compressed data. I would always set this to true also. Faster read/write speeds and less disk spaced used.
mapred.output.compression.type: I use block. This will make the compression splittable even for all compression formats (gzip, snappy, and bzip2) just make sure you're using a splitable file format like sequence, RCFile, or Avro.
mapred.output.compression.codec: this is the compression codec for the map/reduce job. I mostly use one of the three: Snappy (Fastest r/w 2x-3x compression), gzip (normal r fast w 5x-8x compression), bzip2 (slow r/w 8x-12x compression)
Also remember when compression mapred output, that because of splitting compression will differ base on your sorting order. The close like data is together the better the compression.
With MR2, now we should set
conf.set("mapreduce.map.output.compress", true)
conf.set("mapreduce.output.fileoutputformat.compress", false)
For more details, refer: http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
"output compression" will compress your final output. To compress map-outputs only, use something like this:
conf.set("mapred.compress.map.output", "true")
conf.set("mapred.output.compression.type", "BLOCK");
conf.set("mapred.map.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec");
You need to set "mapred.compress.map.output" to true.
Optionally you can choose your compression codec by setting "mapred.map.output.compression.codec".
NOTE1: mapred output compression should never be BLOCK. See the following JIRA for detail:
https://issues.apache.org/jira/browse/HADOOP-1194
NOTE2: GZIP and BZ2 are CPU intensive. If you have slow network and GZIP or BZ2 gives better compression ratio, it may justify the spending of CPU cycles. Otherwise, consider LZO or Snappy codec.
NOTE3: if you want to use map output compression, consider install the native codec which is invoked via JNI and gives you better performance.
If you use MapR's distribution for Hadoop, you can get the benefits of compression without all the folderol with the codecs.
MapR compresses natively at the file system level so that the application doesn't need to know or care. Compression can be turned on or off at the directory level so you can compress inputs, but not outputs or whatever you like. Generally, the compression is so fast (it uses an algorithm similar to snappy by default) that most applications see a performance boost when using native compression. If your files are already compressed, that is detected very quickly and compression is turned off automatically so you don't see a penalty there, either.

Is there an open-source implementation of the Dictionary Huffman compression algorithm?

I'm working on a library to work with Mobipocket-format ebook files, and I have LZ77-style PalmDoc decompression and compression working. However, PalmDoc compression is only one of the two currently-used types of text compression being used on ebooks in the wild, the other being Dictionary Huffman aka huffcdic.
I've found a couple of implementations of the huffcdic decoding algorithm, but I'd like to be able to compress to the same format, and so far I haven't been able to find any examples of how to do that yet. Has someone else already figured this out and published the code?
i have been trying to use http://bazaar.launchpad.net/~kovid/calibre/trunk/view/head:/src/calibre/ebooks/compression/palmdoc.c but compression doesnt produce identical results, & there are 3 - 4 descrepencies also read one related thread LZ77 compression of palmdoc

WAV compression help

How do you programmatically compress a WAV file to another format (PCM, 11,025 KHz sampling rate, etc.)?
I'd look into audacity... I'm pretty sure they don't have a command line utility that can do it, but they may have a library...
Update:
It looks like they use libsndfile, which is released under the LGPL. I for one, would probably just try using that.
Use sox (Sound eXchange : universal sound sample translator) in Linux:
SoX is a command line program that can convert most popular audio files to most other popular audio file formats. It can optionally
change the audio sample data type and apply one or more sound effects to the file during this translation.
If you mean how do you compress the PCM data to a different audio format then there are a variety of libraries you can use to do this, depending on the platform(s) that you want to support. If you just want to change the sample rate of the PCM data then you need a sample rate conversion algorithm instead, which is a completely different problem. Can you be more specific in your requirements?
You're asking about resampling, and more specifically downsampling, not compression. While both processes are lossy (meaning that you will suffer loss of information), downsampling works on raw samples instead of in the frequency domain.
If you are interested in doing compression, then you should look into lame or OGG vorbis libraries; you are no doubt familiar with MP3 and OGG technology, though I have a feeling from your question that you are interested in getting back a PCM file with a lower sampling rate.
In that case, you need a resampling library, of which there are a few possibilites. The most widely known is libsamplerate, which I honestly would not recommend due to quality issues not only within the generated audio files, but also of the stability of the code used in the library itself. The other non-commercial possibility is sox, as a few others have mentioned. Depending on the nature of your program, you can either exec sox as a separate process, or you can call it from your own code by using it as a library. I personally have not tried this approach, but I'm working on a product now where we use sox (for upsampling, actually), and we're quite happy with the results.
The other option is to write your own sample rate conversion library, which can be a significant undertaking, but, if you only are interested in converting with an integer factor (ie, from 44.1kHz to 22kHz, or from 44.1kHz to 11kHz), then it is actually very easy, since you only need to strip out every Nth sample.
In Windows, you can make use of the Audio Compression Manager to convert between files (the acm... functions). You will also need a working knowledge of the WAVEFORMAT structure, and WAV file formats. Unfortunately, to write all this yourself will take some time, which is why it may be a good idea to investigate some of the open source options suggested by others.
I have written a my own open source .NET audio library called NAudio that can convert WAV files from one format to another, making use of the ACM codecs that are installed on your machine. I know you have tagged this question with C++, but if .NET is acceptable then this may save you some time. Have a look at the NAudioDemo project for an example of converting files.