I'm attempting to use zlib to uncompress (inflate) some IP packet payload data that is compressed via gzip. However, I'm having some difficultly understanding some of the documentation provided by zlib that covers inflation. I have a char array that my program fills but I can't seem to inflate it with the following code:
const u_char payload; /*contains gzip data,
captured prior to this point in the program*/
/*read compressed contents*/
int ret; //return val
z_stream stream;
unsigned char out[MEM_CHUNK]; //output array, MEM_CHUNK defined as 65535
/* allocate inflate state */
stream.zalloc = Z_NULL;
stream.zfree = Z_NULL;
stream.opaque = Z_NULL;
stream.avail_in = size_payload; // size of input
stream.next_in = (Bytef *)payload; // input char array
stream.avail_out = (uInt)sizeof(out); // size of output
stream.next_out = (Bytef *)out; // output char array
ret = inflateInit(&stream);
inflate(&stream, Z_NO_FLUSH);
inflateEnd(&stream);
printf("Inflate: %s\n\n", out);
In the zlib documentation, they have inflate continually called via a do/while loop, checking for the Z_STREAM_END flag. I'm a bit confused here, because it seems they're working from a file while I'm not. Do I need this loop as well, or am I able to provide a char array without looping over inflate?
Any guidance here would really be appreciated. I'm pretty new to both working with compression and C++.
Thanks.
Assuming you are giving inflate an appropriate and complete "compressed stream", and there is enough space to output the data, you would only need to call inflate once.
Edit: It is not written out as clearly as that in the zlib documentation, but it does say:
inflate decompresses as much data as possible, and stops when the
input buffer becomes empty or the output buffer becomes full. It may
introduce some output latency (reading input without producing any
output) except when forced to flush.
Of course, for any stream that isn't already "in memory and complete", you want to run it block by block, since that's going to have less total runtime (you can decompress while the data is being received [from network or filesystem pre-fetch caching] for the next block).
Here's the whole function from your example code. I've removed the text components from the page to concentrate the code, and marked sections with letters // A , // B etc, then marked tried to explain the sections below.
int inf(FILE *source, FILE *dest)
{
int ret;
unsigned have;
z_stream strm;
unsigned char in[CHUNK]; // A
unsigned char out[CHUNK];
/* allocate inflate state */
strm.zalloc = Z_NULL; // B
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.avail_in = 0;
strm.next_in = Z_NULL;
ret = inflateInit(&strm); // C
if (ret != Z_OK)
return ret;
/* decompress until deflate stream ends or end of file */
do {
strm.avail_in = fread(in, 1, CHUNK, source); // D
if (ferror(source)) {
(void)inflateEnd(&strm); // E
return Z_ERRNO;
}
if (strm.avail_in == 0) // F
break;
strm.next_in = in; // G
/* run inflate() on input until output buffer not full */
do {
strm.avail_out = CHUNK; // H
strm.next_out = out;
ret = inflate(&strm, Z_NO_FLUSH); // I
assert(ret != Z_STREAM_ERROR); /* state not clobbered */
switch (ret) {
case Z_NEED_DICT:
ret = Z_DATA_ERROR; /* and fall through */
case Z_DATA_ERROR:
case Z_MEM_ERROR:
(void)inflateEnd(&strm);
return ret;
}
have = CHUNK - strm.avail_out; // J
if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
(void)inflateEnd(&strm);
return Z_ERRNO;
}
} while (strm.avail_out == 0); // K
/* done when inflate() says it's done */
} while (ret != Z_STREAM_END); // L
/* clean up and return */
(void)inflateEnd(&strm);
return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
}
A: in is the input buffer (we read from a file into this buffer, then pass it to inflate a while later. out is the output buffer, which is used by inflate to store the output data.
B: Set up a z_stream object called strm. This holds various fields, most of which are not important here (thus set to Z_NULL). The important ones are the avail_in and next_in as well as avail_out and next_out (which are set later).
C: Start inflation process. This sets up some internal data structures and just makes the inflate function itself "ready to run".
D: Read a "CHUNK" amount of data from file. Store the number of bytes read in strm.avail_in, and the actual data goes into in.
E: If we errored out, finish the inflate by calling inflateEnd. Job done.
F: No data available, we're finished.
G: Set where our data is coming from (next_in is set to the input buffer, in).
H: We're now in the loop to inflate things. Here we set the output buffer up: next_out and avail_out indicate where the output goes and how much space there is, respectively.
I: Call inflate itself. This will uncompress a portion of the input buffer, until the output is full.
J: Calculate how much data is available in this step (have is the number of bytes).
K: Until we have space left when inflate finished - this indicates the output is completed for the data in the in buffer, rather than out of space in the out buffer. So time to read some more data from the input file.
L: If the error code from the inflate call is "happy", go round again.
Now, obviously, if you are reading from a network, and uncompressing into memory, you need to replace the fread and fwrite with some suitable read from network and memcpy type calls instead. I can't tell you EXACTLY what those are, since you haven't provided anything to explain where your data comes from - are you calling recv or read or WSARecv, or something else? - and where is it going to?
Related
I am trying to use zlib to compress a text file. It seems to kinda work except I pretty sure my calculation of the number of bytes to write to the output is wrong. My code (guided by http://zlib.net/zlib_how.html) is below:
int
deflateFile(
char *infile,
char *outfile)
{
#define CHUNKSIZE 1000
int n,nr,nw,towrite;
z_stream strm;
FILE *fin,*fout;
BYTE *inbuf,*outbuf;
int ntot=0;
printf( "Start doDeflateFile:\n" );
// ALLOC BUFFERS
inbuf = malloc( CHUNKSIZE+1 );
outbuf = malloc( CHUNKSIZE+1 );
// OPEN FILES
fin = fopen( infile, "rb" );
fout = fopen( outfile, "wb" );
// SETUP Z STREAM
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.avail_in = CHUNKSIZE; // size of input
strm.next_in = inbuf; // input buffer
strm.avail_out = CHUNKSIZE; // size of output
strm.next_out = outbuf; // output buffer
deflateInit( &strm, Z_BEST_COMPRESSION ); // init stream level
while( TRUE ) { // loop til EOF on input file
// READ NEXT INPUT CHUNK
nr = fread( inbuf, 1, CHUNKSIZE, fin );
if( nr <= 0 ) {
printf( "End of input\n" );
break;
}
printf( "\nread chunk of %6d bytes\n", nr );
printf( "calling deflate...\n" );
n = deflate(&strm, Z_FINISH); // call ZLIB deflate
towrite = CHUNKSIZE - strm.avail_out; // calc # bytes to write (FIXME???)
printf( "#bytes to write %6d bytes\n", towrite );
nw = fwrite( outbuf, 1, towrite, fout );
if( nw != towrite ) break;
printf( "wrote chunk of %6d bytes\n", nw );
ntot += nw;
}
deflateEnd(&strm); // end deflate
printf( "wrote total of %d bytes\n", ntot );
printf( "End deflateFile.\n" );
return( 0 );
}
The output for a 1010-byte input file with a CHUNKSIZE of 1000 is:
Start deflateFile:
read chunk of 1000 bytes
calling deflate...
#bytes to write 200 bytes
wrote chunk of 200 bytes
read chunk of 10 bytes
calling deflate...
#bytes to write 200 bytes
wrote chunk of 200 bytes
End of input
wrote total of 400 bytes
End deflateFile.
SO #4538586 sort of addressed this but not quite and it's very old..
Can anybody point out my problem?
You should read that page again. Much more carefully this time.
You are not setting avail_in properly at the start, and you are not resetting next_in, avail_in, next_out, and avail_out in the loop. The only thing you are doing correctly is the thing you think is wrong, which is the calculation of how many bytes to write out. What you have will not even "kinda work".
First off, avail_in must always be set to the amount of available input at next_in. Hence the name avail_in. You are setting it to CHUNKSIZE and calling inflateInit(), even though there is no available input in that buffer yet.
Then after you read data into the input buffer, you ignore nr! You need to set avail_in to nr, to indicate how much data is actually in the buffer. It might be less than CHUNKSIZE.
You should read data into the input buffer only if you have processed all of the data that was there from the last read, indicated by avail_in being zero.
When a call of deflate() completes inside the loop, it has updated next_in, avail_in, next_out, and avail_out. To use the inbuf and outbuf buffers again, you need reset the values of next_in, next_out, and avail_out to the values you did initially. avail_in will be set at the top of the loop from nr.
You are calling deflate() with Z_FINISH every time! The way this works is that you call deflate() with Z_NO_FLUSH until the last of the input is provided, and then you use Z_FINISH, to let it know to finish. (That's why it's called that.)
Your loop will exit prematurely, since you need to finish compressing and writing the output, not just finish reading the input.
You are not checking the return code of deflate(). Always check return codes. That's why they're there.
Good luck.
What guarantees does zlib give on the state of avail_in an avail_out after a call to inflate? I am seeing peculiar behaviour with miniz that I want to make sure is not a misunderstanding of the zlib API. Effectively, after calling inflate, I have avail_in non-zero, and avail_out also non-zero, so some input looks like it is not getting processed. More details below.
I have been using miniz to inflate/deflate a file I stream to/from disk. My inflate/deflate loop is identical to the zlib sample in zpipe.c, including using MZ_NO_FLUSH.
This loop has almost always worked, but today I inflated a stream deflated earlier and got an MZ_DATA_ERROR consistently. After adding the proper header though, gzip was able to inflate it fine and my data was intact.
The source of my issues came down to what would be the last call to mz_inflate. I include the typical inflate loop here:
/* decompress until deflate stream ends or end of file */
do {
strm.avail_in = fread(in, 1, CHUNK, source);
if (ferror(source)) {
(void)inflateEnd(&strm);
return Z_ERRNO;
}
if (strm.avail_in == 0)
break;
strm.next_in = in;
/* run inflate() on input until output buffer not full */
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = inflate(&strm, Z_NO_FLUSH);
assert(ret != Z_STREAM_ERROR); /* state not clobbered */
switch (ret) {
case Z_NEED_DICT:
ret = Z_DATA_ERROR; /* and fall through */
case Z_DATA_ERROR:
case Z_MEM_ERROR:
(void)inflateEnd(&strm);
return ret;
}
have = CHUNK - strm.avail_out;
if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
(void)inflateEnd(&strm);
return Z_ERRNO;
}
} while (strm.avail_out == 0);
/* done when inflate() says it's done */
} while (ret != Z_STREAM_END);
The inner do loop repeats until all of the current chunk has been processed and avail_out has extra room. However, on the last chunk of this particular stream, inflate did not return an error, but rather would reduce avail_in to some non-zero number, and would reduce avail_out also to some (other) non-zero number. So the inner do loop exits, as avail_out is non-zero, and we go try and get more data into next_in and avail_in, even though not all of avail_in has been processed, since avail_in is non-zero. This clobbers whatever was in next_in and avail_in and the inflate fails on the next call.
My workaround was to change the inner loop's termination condition from
strm.avail_out == 0
to
strm.avail_out == 0 || strm.avail_in > 0
but I have no idea if this is correct. I feel this may be a bug in miniz but am not sure. I would have thought that if avail_in indicated there was still data to be processed, that avail_out must be zero.
In case it is relevant: the input buffer size I am using is 512KB and the output buffer is 2MB.
If inflate() returns Z_OK or Z_BUF_ERROR, and avail_out is not zero, then avail_in is zero.
Can you provide the compressed data in question?
I'm trying to use ZLIB to inflate (decompress) .FLA files, thus extracting all its contents. Since FLA files use a ZIP format, I am able to read the local file headers(https://en.wikipedia.org/wiki/Zip_(file_format)) from it, and use the info inside to decompress the files.
It seems to work fine for regular text-based files, but when it comes to binary (I've only tried PNG and DAT files), it fails to decompress them, returning "Z_DATA_ERROR".
I'm unable to use the minilib library inside ZLIB, since the Central directory file header inside FLA files differs slightly from normal zip files (which is why im reading the local files header manually).
Here's the code I use to decompress a chunk of data:
void DecompressBuffer(char* compressedBuffer, unsigned int compressedSize, std::string& out_decompressedBuffer)
{
// init the decompression stream
z_stream stream;
stream.zalloc = Z_NULL;
stream.zfree = Z_NULL;
stream.opaque = Z_NULL;
stream.avail_in = 0;
stream.next_in = Z_NULL;
if (int err = inflateInit2(&stream, -MAX_WBITS) != Z_OK)
{
printf("Error: inflateInit %d\n", err);
return;
}
// Set the starting point and total data size to be read
stream.avail_in = compressedSize;
stream.next_in = (Bytef*)&compressedBuffer[0];
std::stringstream strStream;
// Start decompressing
while (stream.avail_in != 0)
{
unsigned char* readBuffer = (unsigned char*)malloc(MAX_READ_BUFFER_SIZE + 1);
readBuffer[MAX_READ_BUFFER_SIZE] = '\0';
stream.next_out = readBuffer;
stream.avail_out = MAX_READ_BUFFER_SIZE;
int ret = inflate(&stream, Z_NO_FLUSH);
if (ret == Z_STREAM_END)
{
// only store the data we have left in the stream
size_t length = MAX_READ_BUFFER_SIZE - stream.avail_out;
std::string str((char*)readBuffer);
str = str.substr(0, length);
strStream << str;
break;
}
else
{
if (ret != Z_OK)
{
printf("Error: inflate %d\n", ret); // This is what it reaches when trying to inflate a PNG or DAT file
break;
}
// store the readbuffer in the stream
strStream << readBuffer;
}
free(readBuffer);
}
out_decompressedBuffer = strStream.str();
inflateEnd(&stream);
}
I have tried zipping a single PNG file and extracing that. This doesn't return any errors from Inflate(), but doesn't correctly inflate the PNG either, and the only corresponding values seem to be the first few.
The original file (left) and the uncompressed via code file (right):
Hex editor versions of both PNGs
You do things that rely on the data being text and strings, not binary data.
For example
std::string str((char*)readBuffer);
If the contents of readBuffer is raw binary data then it might contain one or more zero bytes in the middle of it. When you use it as a C-style string then the first zero will act as the string terminator character.
I suggest you try to generalize it, and remove the dependency of strings. Instead I suggest you use e.g. std::vector<int8_t>.
Meanwhile, during your transition to a more generalized way, you can do e.g.
std::string str(readBuffer, length);
This will create a string of the specified length, and the contents will not be checked for terminators.
I am trying to implement a simple file transfer. Below here is two methods that i have been testing:
Method one: sending and receiving without splitting the file.
I hard coded the file size for easier testing.
sender:
send(sock,buffer,107,NULL); //sends a file with 107 size
receiver:
char * buffer = new char[107];
recv(sock_CONNECTION,buffer,107,0);
std::ofstream outfile (collector,std::ofstream::binary);
outfile.write (buffer,107);
The output is as expected, the file isn't corrupted because the .txt file that i sent contains the same content as the original.
Method two: sending and receiving by splitting the contents on receiver's side. 5 bytes each loop.
sender:
send(sock,buffer,107,NULL);
Receiver:
char * buffer = new char[107]; //total file buffer
char * ptr = new char[5]; //buffer
int var = 5;
int sizecpy = size; //orig size
while(size > var ){ //collect bytes
recv(sock_CONNECTION,ptr,5,0);
strcat(buffer,ptr); //concatenate
size= size-var; //decrease
std::cout<<"Transferring.."<<std::endl;
}
std::cout<<"did it reach here?"<<std::endl;
char*last = new char[size];
recv(sock_CONNECTION,last,2,0); //last two bytes
strcat(buffer,last);
std::ofstream outfile (collector,std::ofstream::binary);
outfile.write (buffer,107);
Output: The text file contains invalid characters especially at the beginning and the end.
Questions: How can i make method 2 work? The sizes are the same but they yield different results. the similarity of the original file and the new file on method 2 is about 98~99% while it's 100% on method one. What's the best method for transferring files?
What's the best method for transferring files?
Usually I'm not answering questions like What's the best method. But in this case it's obvious:
You sent the file size and a checksum in network byte order, when starting a transfer
Sent more header data (e.g filename) optionally
The client reads the file size and the checksum, and decodes it to host byte order
You sent the file's data in reasonably sized chunks (5 bytes isn't a reasonable size), chunks should match tcp/ip frames maximum available payload size
You receive chunk by chunk at the client side until the previously sent file size is matched
You calculate the checksum for the received data at the client side, and check if it matches the one that was received beforhand
Note: You don't need to combine all chunks in memory at the client side, but just append them to a file at a storage medium. Also the checksum (CRC) usually can be calculated from running through data chunks.
Disagree with Galik. Better not to use strcat, strncat, or anything but the intended output buffer.
TCP is knda fun. You never really know how much data you are going to get, but you will get it or an error.
This will read up to MAX bytes at a time. #define MAX to whatever you want.
std::unique_ptr<char[]> buffer (new char[size]);
int loc = 0; // where in buffer to write the next batch of data
int bytesread; //how much data was read? recv will return -1 on error
while(size > MAX)
{ //collect bytes
bytesread = recv(sock_CONNECTION,&buffer[loc],MAX,0);
if (bytesread < 0)
{
//handle error.
}
loc += bytesread;
size= size-bytesread; //decrease
std::cout<<"Transferring.."<<std::endl;
}
bytesread = recv(sock_CONNECTION,&buffer[loc],size,0);
if (bytesread < 0)
{
//handle error
}
std::ofstream outfile (collector,std::ofstream::binary);
outfile.write (buffer.get(),size);
Even more fun, write into the output buffer so you don't have to store the whole file. In this case MAX should be a bigger number.
std::ofstream outfile (collector,std::ofstream::binary);
char buffer[MAX];
int bytesread; //how much data was read? recv will return -1 on error
while(size)
{ //collect bytes
bytesread = recv(sock_CONNECTION,buffer,MAX>size?size:MAX,0);
// MAX>size?size:MAX is like a compact if-else: if (MAX>size){size}else{MAX}
if (bytesread < 0)
{
//handle error.
}
outfile.write (buffer,bytesread);
size -= bytesread; //decrease
std::cout<<"Transferring.."<<std::endl;
}
The initial problems I see are with std::strcat. You can't use it on an uninitialized buffer. Also you are not copying a null terminated c-string. You are copying a sized buffer. Better to use std::strncat for that:
char * buffer = new char[107]; //total file buffer
char * ptr = new char[5]; //buffer
int var = 5;
int sizecpy = size; //orig size
// initialize buffer
*buffer = '\0'; // add null terminator
while(size > var ){ //collect bytes
recv(sock_CONNECTION,ptr,5,0);
strncat(buffer, ptr, 5); // strncat only 5 chars
size= size-var; //decrease
std::cout<<"Transferring.."<<std::endl;
}
beyond that you should really as error checking so the sockets library can tell you if anything went wrong with the communication.
I'm developing some code that needs to be capable of unzipping large gzip'd files (up to 5GB uncompressed) and reading them into memory. I would prefer to be clean about this and not simply unzip them to disk temporarily so I've been working with zlib to try to accomplish this. I've got it running, most of the way. Meaning it runs for 4 of the 5 files I've used as input. The other file gives a Z_BUF_ERROR right in the middle of processing and I'd prefer not to ignore it.
This initially happened in different code but eventually I brought it all the way back to the example code that I got from zpipe.c on the zlib web page, and no matter what code I used, it resulted in the same Z_BUF_ERROR and only with this file. I played with the code for quite a while after reading several posts about Z_BUF_ERROR and after reading the manual on this as well. Eventually I was able to find a way to make it work by changing the size of the buffer used to hold the inflated output. Normally at this point I'd call it a day until it reported an error with another file, but ideally this will be production level code at some point and I'd like to understand what the error is so I can prevent it rather than just fix it for now. Especially since gzip is able to compress and decompress the file just fine.
I've tried this with the following variations:
different platforms: CentOS, OSX
different versions of zlib: 1.2.3, 1.2.8 (same results)
values of CHUNK and the number of bytes output (complete is 783049330):
2000000: 783049330
1048576: 783049330
1000000: 783049330
100000: 783049330
30000: 248421347
25000: 31095404
20000: 783049330
19000: 155821787
18000: 412613687
17000: 55799133
16384: 37541674
16000: 783049330
any CHUNK size greater than 4100000 gives an error
tried declaring out with a value greater than CHUNK (same results)
tried using malloc to declare out (same results)
tried using gzip to uncompress and then compress the file again thinking something may have been off in the gzip metadata (same results)
tried compressing a separate uncompressed version of the file using gzip for the same purpose, but I believe the original .gz file was created from this one (same results)
I may have tried a few things outside of this list as I've been trying to get to the bottom of it for a while, but only changing the CHUNK size will make this work. My only concern is that I don't know why a different size will work and I'm worried that another CHUNK size will put other files at risk for this issue, because again, this is only an issue for one file.
`
CODE:
FILE* fp = fopen( argv[1], "rb" );
int ret = inf( fp, stdout );
fclose( fp );
int inf(FILE *source, FILE *dest)
{
size_t CHUNK = 100000;
int count = 0;
int ret;
unsigned have;
z_stream strm;
unsigned char in[CHUNK];
unsigned char out[CHUNK];
char out_str[CHUNK];
/* allocate inflate state */
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
strm.avail_in = 0;
strm.next_in = Z_NULL;
ret = inflateInit2(&strm, 16+MAX_WBITS);
if (ret != Z_OK)
return ret;
/* decompress until deflate stream ends or end of file */
do {
strm.avail_in = fread(in, 1, CHUNK, source);
if (ferror(source)) {
(void)inflateEnd(&strm);
return Z_ERRNO;
}
if (strm.avail_in == 0)
break;
strm.next_in = in;
/* run inflate() on input until output buffer not full */
do {
strm.avail_out = CHUNK;
strm.next_out = out;
ret = inflate(&strm, Z_NO_FLUSH);
switch (ret) {
case Z_NEED_DICT:
ret = Z_DATA_ERROR; /* and fall through */
case Z_DATA_ERROR:
case Z_MEM_ERROR:
(void)inflateEnd(&strm);
return ret;
}
have = CHUNK - strm.avail_out;
char out_str[have+1];
strncpy( out_str, (char*)out, have );
out_str[have] = '\0';
// testing the ability to store the result in a string object and viewing the output
std::cout << "out_str: " << std::string(out_str) << " ::" << std::endl;
if( ret == Z_BUF_ERROR ){
std::cout << "Z_BUF_ERROR!" << std::endl;
exit(1);
}
} while (strm.avail_out == 0);
/* done when inflate() says it's done */
} while (ret != Z_STREAM_END);
/* clean up and return */
(void)inflateEnd(&strm);
return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR;
}
`
You should read the commentary where you got that code from. Z_BUF_ERROR is just an indication that there was nothing for inflate() to do on that call. Simply continue and provide more input data and more output space for the next inflate() call.