ZLib Decompression in C++ - c++

I am trying to get a function going to unzip a single text file compressed with .gz. It needs to uncompress the .gz file given its path and write the uncompressed text file given its destination. I am using C++ and what I have seen is that ZLIB does exactly what I need except I cannot find 1 single example anywhere on the net that shows it doing this. Can anyone show me an example or at least guide me in the right direction?

If you just want to inflate a file with raw deflated data (i.e. no archive) you can use something like this:
gzFile inFileZ = gzopen(fileName, "rb");
if (inFileZ == NULL) {
printf("Error: Failed to gzopen %s\n", filename);
exit(0);
}
unsigned char unzipBuffer[8192];
unsigned int unzippedBytes;
std::vector<unsigned char> unzippedData;
while (true) {
unzippedBytes = gzread(inFileZ, unzipBuffer, 8192);
if (unzippedBytes > 0) {
unzippedData.insert(unzippedData.end(), unzipBuffer, unzipBuffer + unzippedBytes);
} else {
break;
}
}
gzclose(inFileZ);
The unzippedData vector now holds your inflated data. There are probably more efficient ways to store the inflated data, especially if you know the uncompressed size in advance, but this approach works for me.
If you only want to save the inflated data to a file without any further processing you could skip the vector and just write the unzipBuffer contents to another file.

You can use the gzopen(), gzread(), and gzclose() functions of zlib, much like you would the corresponding stdio functions fopen(), etc. That will read the gzip file and decompress it. You can then use fopen(), fwrite(), etc. to write the uncompressed data back out.

You can use ZLibComplete to do this. There is a complete example in C++ on the front page of GZip decompression.
http://rudi-cilibrasi.github.io/zlibcomplete/

Ah, I'll assume http://zlib.net/zlib_how.html does what you want?

Related

How do you stream a binary representation of a .zip folder in mfc c++?

I am trying to stream a .zip file to hardware using mfc c++. The hardware needs the file still in .zip format when sent over because it will do the unpacking itself.
I have been unable to find a class or method to grab a .zip file and stream it over.
Most searches lead me to questions about unzipping or zipping using c++ which is of no use in my particular case.
Any advice? Has anyone ran into this situation?
The following code snippet reads the 100 first bytes of a file into a buffer using CFile:
CFile f;
if (f.Open(L"path_to_your_file", CFile::modeRead))
{
char buffer[100];
f.Read(buffer, sizeof buffer);
f.Close();
}
else
{
// handle error
DWORD error = GetLastError();
// error number in error
...
}
This is more or less all you need. Google for the documentation of CFile.
You should be able to figure out the rest.
The format of the file you're reading is irrelevant. You just need to read the contents of your file and send it to the hardware.

Knowing current compressed file size using gzwrite (zlib)

I'm using zlib for c++.
Quote from
http://refspecs.linuxbase.org/LSB_3.0.0/LSB-PDA/LSB-PDA/zlib-gzwrite-1.html regarding gzwrite function:
The gzwrite() function shall write data to the compressed file referenced by file, which shall have been opened in a write mode (see gzopen() and gzdopen()). On entry, buf shall point to a buffer containing len bytes of uncompressed data. The gzwrite() function shall compress this data and write it to file. The gzwrite() function shall return the number of uncompressed bytes actually written.
I interpret this as the return value will NOT tell me how much larger the file became when writing. Only how much data was compressed into the file.
The only way to know how large the file is would then be to close it, and read the size from the file system. I have a requirement to only continue to write to the file until it reaches a certain size. Can this be achieved without closing the file?
A workaround would be to write until the uncompressed size reaches my limit and then close the file, read the size from file system and update my best guess of file size based on that, and then re-open the file and continue writing. This would make me close and open the file a few times towards the end (as I'm approaching the size limit).
Another workaround, which would give more of an estimate (which is not what I want really) would be to write until uncompressed size reaches the limit, close the file, read the file size from the file system and calculate the compression ratio so far. The I can use this compression ratio to calculate a new limit for uncompressed file size where the compression should get me down to the limit for the compressed file size. If I repeat this the estimate would improve, but again, not what I'm looking for.
Are there better options?
Preferred option would be if zlib could tell me the compressed file size while the file is still open. I don't see why this information would not be available inside zlib at this point, since compression happens when I call gzwrite and not when i close the file.
zlib provides the function gzoffset(), which does exactly what you're asking.
If for some reason you are stuck with a version of zlib that is more than about eight years old, when gzoffset() was added, then this is easy to do with gzdopen(). You open the output file with fopen() or open(), and provide the file descriptor (using fileno() and dup() if you used fopen()), and then provide that descriptor to gzdopen(). Then you can use ftell() or lseek() at any time to see how much as been written. Be careful to not try to double-close the descriptor. See the comments for gzdopen().
You can work around this issue by using a pipe. The idea is to write the compressed data into a pipe. After that, you read the data from the other end of the pipe, count it and write it to the actual file.
To set this up you need to first open the file to write to via a simple open. Then create a pipe via pipe2 and initialize zlib by passing one of the pipe descriptors to gzdopen:
int out = open("/path/to/file", O_WRONLY | O_CREAT | O_TRUNC);
int p[2];
pipe2(p, O_NONBLOCK);
gzFile zFile = gzdopen(p[0], "w");
You can now write the data first to the pipe and then splice it from the pipe to the out file:
gzwrite(zFile, buf, 1024); //or any other length
size_t bytesWritten = 0;
do {
bytesWritten = splice(p[1], NULL, out, NULL, 1024, SPLICE_F_NONBLOCK | SPLICE_F_MORE);
} while(bytesWritten == 1024);
As you can see, you now have the bytesWritten to tell you how much data was actually written. Simply sum it up in another variable and stop splicing as soon as you have written as much data as you need to (or just splice it in one go by writing everything to the zFile and the splice once with the amount of data you are allowed to store as the fifth parameter. If you want to not compress uneccessary data, simply do it in chunks as shown above).
A note on splice: Splice is linux specific, and is basically just a very efficient copy. You can always replace it with a simple "read and write" combo, i.e. read data from fd[1] into a buffer and then write the data from that buffer into out - splice is just faster and less code.

Minify output from rapidjson

I am using rapidjson to output some data for doing some statistic and plotting of a c++ programms algorithm like an internal runtime snapshots of the algorithm.
I output json like this:
string filename="output.json";
StringBuffer sb;
PrettyWriter<StringBuffer> writer(sb);
writer.StartArray();
for (std::vector<O_Class>::const_iterator netItr = O_Class_Array.begin(); netItr != O_Class_Array.end(); ++netItr)
netItr->Serialize(writer);
writer.EndArray();
ofstream out;
out.open(filename);
out << sb.GetString() ;
As files become quite big (~100MiB) i'd like to output minified json, but I didn't find a documented way of doing so.
With an external minifier I shrunk filesize from 100 to 18MB and like to have the same result as native in my application.
Any ideas?
Thanks for any suggestions!
Replace PrettyWriter for Writer.
And you could ZIP the content too. This will significantly reduce the size.

how to report progress of data read on a QuaGzipFile (QuaZIP library)

I am using QuaZIP 0.5.1 with Qt 5.1.1 for C++ on Ubuntu 12.04 x86_64.
My program reads a large gzipped binary file, usually 1GB of uncompressed data or more, and makes some computations on it. It is not computational-extensive, and most of the time is passed on I/O. So if I can find a way to report how much data of the file is read, I can report it on a progress bar, and even provide an estimation of ETA.
I open the file with:
QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
{
// report error
return;
}
But there is no functionality in QuaGzipFile to find the file size nor the current position.
I do not need to find size and position of uncompressed stream, the size and position of compressed stream are fine, because a rough estimation of progress is enough.
Currently, I can find size of compressed file, using QFile(fileName).size(). Also, I can easily find current position in uncompressed stream, by keeping sum of return values of gzip.read(). But these two numbers do not match.
I can alter the QuaZIP library, and access internal zlib-related stuff, if it helps.
There is no reliable way to determine total size of uncompressed stream. See this answer for details and possible workarounds.
However, there is a way to get position in compressed stream:
QFile file(fileName);
file.open(QFile::ReadOnly);
QuaGzipFile gzip;
gzip.open(file.handle(), QuaGzipFile::ReadOnly);
while(true) {
QByteArray buf = gzip.read(1000);
//process buf
if (buf.isEmpty()) { break; }
QFile temp_file_object;
temp_file_object.open(file.handle(), QFile::ReadOnly);
double progress = 100.0 * temp_file_object.pos() / file.size();
qDebug() << qRound(progress) << "%";
}
The idea is to open file manually and use file descriptor to get position. QFile cannot track external position changes, so file.pos() will be always 0. So we create temp_file_object from the file descriptor forcing QFile to request file position. I could use some lower level API (such as lseek()) to get file position but I think my way is more cross-platform.
Note that this method is not very accurate and can give progress values bigger than real. That's because zlib can internally read and decode more data than you have already read.
In zlib 1.2.4 and greater you can use the gzoffset() function to get the current position in the compressed file. The current version of zlib is 1.2.8.
Using an ugly hack to zlib, I was able to find position in compressed stream.
First, I copied definition of gz_stream from gzio.c (from zlib-1.2.3.4 source), to the end of quagzipfile.cpp. Then I reimplemented the virtual function qint64 QIODevice::pos() const:
qint64 QuaGzipFile::pos() const
{
gz_stream *s = (gz_stream *)d->gzd;
return ftello64(s->file);
}
Since quagzipfile.cpp and quagzipfile.h seem to be independent from other QuaZIP library files, maybe it is better to copy the functionality I need from these files and avoid this hack?
The current version of program is something like this:
QFile infile(fileName);
if (!infile.open(QIODevice::ReadOnly))
return;
qint64 fileSize = infile.size;
infile.close();
QuaGzipFile gzip(fileName);
if (!gzip.open(QIODevice::ReadOnly))
return;
qint64 nread;
char buffer[bufferSize];
while ((nread = gzip.read(&buffer, bufferSize)) > 0)
{
// use buffer
int percent = 100.0 * gzip.pos() / fileSize;
// report percent
}
gzip.close();

How to get boost::iostream to operate in a mode comparable to std::ios::binary?

I have the following question on boost::iostreams. If someone is familiar with writing filters, I would actually appreciate your advices / help.
I am writing a pair of multichar filters, that work with boost::iostream::filtering_stream as data compressor and decompressor.
I started from writing a compressor, picked up some algorithm from lz-family and now am working on a decompressor.
In a couple of words, my compressor splits data into packets, which are encoded separately and then flushed to my file.
When I have to restore data from my file (in programming terms, receive a read(byte_count) request), I have to read a full packed block, bufferize it, unpack it and only then give the requested number of bytes. I've implemented this logic, but right now I'm struggling with the following problem:
When my data is packed, any symbols can appear in the output file. And I have troubles when reading file, which contains symbol (hex 1A, char 26) using boost::iostreams::read(...., size).
If I was using std::ifstream, for example, I would have set a std::ios::binary mode and then this symbol could be read simply.
Any way to achieve the same when implementing a boost::iostream filter which uses boost::iostream::read routine to read char sequence?
Some code here:
// Compression
// -----------
filtering_ostream out;
out.push(my_compressor());
out.push(file_sink("file.out"));
// Compress the 'file.in' to 'file.out'
std::ifstream stream("file.in");
out << stream.rdbuf();
// Decompression
// -------------
filtering_istream in;
in.push(my_decompressor());
in.push(file_source("file.out"));
std::string res;
while (in) {
std::string t;
// My decompressor wants to retrieve the full block from input (say, 4096 bytes)
// but instead retrieves 150 bytes because meets '1A' char in the char sequence
// That obviously happens because file should be read as a binary one, but
// how do I state that?
std::getline(in, t); // <--------- The error happens here
res += t;
}
Short answer for reading file as binary :
specify ios_base::binary when opening file stream.
MSDN Link