Why does my use of zlib decompress incorrectly? - c++

Please explain if this is a Zlib bug or I misunderstand the use of Zlib.
I am trying to do the following:
-I have two strings - data from which I need to compress: string_data_1 and string_data_2 and which I compress with Zlib as raw data.
-Next, I create a third string and copy the already compressed data into this single row.
-Now I'm decompressing this combined compressed data and there is a problem.
Zlib decompressed only the "first" part of the compressed data, did not decompress the second part. Is that how it should be?
For an example in the facebook/zstd:Zstandard library - exactly the same action - leads to unpacking - all compressed data and the first and second parts.
Here is a simple code:
#include <iostream>
#include <string>
#include <zlib.h>
int my_Zlib__compress__RAW(std::string& string_data_to_be_compressed, std::string& string_compressed_result, int level_compressed)
{
//-------------------------------------------------------------------------
uLong zlib_uLong = compressBound(string_data_to_be_compressed.size());
string_compressed_result.resize(zlib_uLong);
//-------------------------------------------------------------------------
//this is the standard Zlib compress2 function - with one exception: the deflateInit2 function is used instead of the deflateInit function and the windowBits parameter is set to "-15" so that Zlib compresses the data as raw data:
int status = my_compress2((Bytef*)&string_compressed_result[0], &zlib_uLong, (const Bytef*)&string_data_to_be_compressed[0], string_data_to_be_compressed.size(), level_compressed);
if (status == Z_OK)
{
string_compressed_result.resize(zlib_uLong);
return 0;
}
else
{
return 1;
}
}
int my_Zlib__uncompress__RAW(std::string& string_data_to_be_uncompressed, std::string& string_compressed_data, size_t size_uncompressed_data)
{
//-------------------------------------------------------------------------
string_data_to_be_uncompressed.resize(size_uncompressed_data);
//-------------------------------------------------------------------------
//this is the standard Zlib uncompress function - with one exception: the inflateInit2 function is used instead of the inflateInit function and the windowBits parameter is set to "-15" so that Zlib uncompresses the data as raw data:
int status = my_uncompress((Bytef*)&string_data_to_be_uncompressed[0], (uLongf*)&size_uncompressed_data, (const Bytef*)&string_compressed_data[0], string_compressed_data.size());
if (status == Z_OK)
{
return 0;
}
}
int main()
{
int level_compressed = 9;
//------------------------------------------Compress_1-------------------------------------------
std::string string_data_1 = "Hello12_Hello12_Hello125"; //The data to be compressed.
std::string string_compressed_result_RAW_1; //Compressed data will be written here
int status = my_Zlib__compress__RAW(string_data_1 , string_compressed_result_RAW_1, level_compressed);
//----------------------------------------------------------------------------------------------
//--------------------------------------Compress_2----------------------------------------------
std::string string_data_2= "BUY22_BUY22_BUY223"; //The data to be compressed.
std::string string_compressed_result_RAW_2; //Compressed data will be written here
status = my_Zlib__compress__RAW(string_data_2 , string_compressed_result_RAW_2, level_compressed);
//----------------------------------------------------------------------------------------------
std::string Total_compressed_data = string_compressed_result_RAW_1 + string_compressed_result_RAW_2; //Combine two compressed data into one string
//Now I want to uncompress the data in a string - "Total_compressed_data"
//--------------------------------------Uncompress--------------------------------
std::string string_uncompressed_result_RAW; //Uncompressed data will be written here
int size_that_should_be_when_unpacking = string_data_1.size() + string_data_2.size();
status = my_Zlib__uncompress__RAW(string_uncompressed_result_RAW, Total_compressed_data, size_that_should_be_when_unpacking , level_compressed);
//--------------------------------------------------------------------------------
std::cout<<string_uncompressed_result_RAW<<std::endl; //Hello12_Hello12_Hello125
}
Zlib decompressed only the "first" part of the compressed data, did not decompress the "second" part.
Is that how it should be?

As noted in the comments, a concatenation of zlib streams is not a zlib stream. You need to uncompress again for the second zlib stream. Or compress the whole thing as one zlib stream in the first place.
You would need to use a variant of uncompress2(), not uncompress(), since the former will return the size of the first decompressed zlib stream in the last parameter, so that you know where to start decompressing the second one.
Better yet, you should use the inflate() functions instead for your application. The retention of the uncompressed size for use in decompression means that you'd need that on the other end. How do you get that? Are you transmitting it separately? You do not need that. You should use inflate() to decompress a chunk at a time, and then you don't need to know the uncompressed size ahead of time.
You should also use the deflate() functions for compression. Then you can keep the stream open, and keep compressing until you're done. Then you will have a single zlib stream.

Related

Streaming decompression of data in a vector

I need to do the following procedure.
Compress an input text into an array.
Split the compressed output into multiple pieces with approx. the same length and store them in a vector.
Decompress using streaming decompression.
Is it possible to do that?
Consider that in my case, the size of each block is fixed and independent of the compression scheme.
In this example here, the decompression function returns the size of the next block, I wonder if that is somewhat related to the compression scheme, i.e. you cannot randomly take a sub-array contained in the full compressed array and decompress it.
I need to use zstd, no other compression algorithms.
Here is what I tried so far.
//std::vector<std::string_view> _content_compressed passed as parameter
ZSTD_DStream* const dstream = ZSTD_createDStream();
ZSTD_initDStream(dstream);
std::vector<char*> vec;
for (auto el : _content_compressed)
{
auto ee = el.data();
char* decompressed = new char[1000];
ZSTD_inBuffer input = { el.data(), el.size(), 0 };
ZSTD_outBuffer output = { decompressed, _decompressed_size, 0 };
std::size_t toRead = ZSTD_decompressStream(dstream, &output, &input);
vec.push_back(decompressed);
}
The problem is that decompressed doesn't contain the decompressed value at the end.

adapt zlib zpipe for char arrays

I have a char array (char* dataToInflate) obtained from a .gz file I would like to inflate into another char array.
I don't know the original decompressed size, so I believe this means I can't use the uncompress function that is within the zlib library, since per the manual:
The size of the uncompressed data must have been saved previously by the compressor and transmitted to the decompressor by some mechanism outside the scope of this compression library.
I have looked at the zpipe.C example (https://zlib.net/zpipe.c), and the inf function here looks suitable but I'm not sure how to adapt it from FILEs to char arrays.
Does anyone know how or have any other ideas for inflating a char array into another char array?
Update:
I read here: Uncompress() of 'zlib' returns Z_DATA_ERROR
that for arrays obtained through gzip files, uncompress isn't suitable.
I found that I could decompress the file in full using gzopen, gzread and gzclose like so:
gzFile in_file_gz = gzopen(gz_char_array, "rb");
char unzip_buffer[8192];
int unzipped_bytes;
std::vector<char> unzipped_data;
while (true) {
unzipped_bytes = gzread(in_file_gz, unzip_buffer, 8192);
if (unzipped_bytes > 0) {
unzipped_data.insert(unzipped_data.end(), unzip_buffer, unzip_buffer + unzipped_bytes);
} else {
break;
}
}
gzclose(in_file_gz)
but I would also like to be able to decompress the char array. I tried with the following method:
void test_inflate(Byte *compr, uLong comprLen, Byte *uncompr, uLong *uncomprLen) {
int err;
z_stream d_stream; /* decompression stream */
d_stream.zalloc = NULL;
d_stream.zfree = NULL;
d_stream.opaque = NULL;
d_stream.next_in = compr;
d_stream.avail_in = 0;
d_stream.next_out = uncompr;
err = inflateInit2(&d_stream, MAX_WBITS + 16);
CHECK_ERR(err, "inflateInit");
while (d_stream.total_out < *uncomprLen && d_stream.total_in < comprLen) {
d_stream.avail_in = d_stream.avail_out = 1; /* force small buffers */
err = inflate(&d_stream, Z_NO_FLUSH);
if (err == Z_STREAM_END)
break;
CHECK_ERR(err, "inflate");
}
err = inflateEnd(&d_stream);
*uncomprLen = d_stream.total_out;
}
but in the while loop, the inflate method returns Z_STREAM_END before the file has decompressed in full.
The method returns successfully, but only a partial buffer has been written.
I put a minimum working example here:
https://github.com/alanjtaylor/zlibExample
if anyone has time to look.
Thanks a lot!
The example you have on github, "zippedFile.gz" is a concatenation of seven independent gzip members. This is permitted by the gzip standard (RFC 1952), and the zlib gz* file functions automatically process all of the members.
pigz will show all of the members:
% pigz -lvt zippedFile.gz
method check timestamp compressed original reduced name
gzip 8 e323586d ------ ----- 616431 1543643 60.1% zippedFile
gzip 8 7efd928a ------ ----- 369231 921600 59.9% <...>
gzip 8 7ebd8b2a ------ ----- 919565 2319970 60.4% <...>
gzip 8 3dd6e2ba ------ ----- 619670 1549236 60.0% <...>
gzip 8 c1cb922e ------ ----- 600367 1533151 60.8% <...>
gzip 8 a9fef06c ------ ----- 620250 1541785 59.8% <...>
gzip 8 43b57506 ------ ----- 623081 1555203 59.9% <...>
The inflate* functions will only process one member at a time, in order to let you know with Z_STREAM_END that the member decompressed successfully and that the CRC checked out ok.
All you need to do is put your inflator in a loop and run it until the input is exhausted, or you run into an error. (This is noted in the documentation for inflateInit2 in zlib.h.)
There are a few issues with your inflator, but I understand that it is just an initial attempt to get things working, so I won't comment.
uncompress is indeed designed for where you have all that information ready. It's a utility function.
It probably wraps inflate, which is what you want to use. You have to run it in a loop and manage the "stream" parameters yourself by repeatedly pointing to the next chunk of buffered data until it's all been eaten.
There's an annotated example in the documentation.

Reading multiple bytes from file and storing them for comparison in C++

I want to binary read a photo in 1460 bytes increments and compare consecutive packets for corrupted transmission. I have a python script that i wrote and want to translate in C++, however I'm not sure that what I intend to use is correct.
for i in range(0, fileSize-1):
buff=f.read(1460) // buff stores a packet of 1460 bytes where f is the opened file
secondPacket=''
for j in buff:
secondPacket+="{:02x}".format(j)
if(secondPacket==firstPacket):
print(f'Packet {i+1} identical with {i}')
firstPacket=secondPacket
I have found int fseek ( FILE * stream, long int offset, int origin ); but it's unclear if it reads the first byte that is located offset away from origin or everything in between.
Thanks for clarifications.
#include <iostream>
#include <fstream>
#include <array>
std::array<char, 1460> firstPacket;
std::array<char, 1460> secondPacket;
int i=0;
int main() {
std::ifstream file;
file.open("photo.jpg", std::ios::binary);
while (file.read(firstPacket.data(), firstPacket.size())){
++i;
if (firstPacket==secondPacket)
std::cout<<"Packet "<<i<<" is a copy of packet "<<i-1<<std::endl;
memcpy(&secondPacket, &firstPacket, firstPacket.size());
}
std::cout<<i; //tested to check if i iterate correctly
return 0;
}
This is the code i have so far which doesn't work.
fseek
doesn't read, it just moves the point where the next read operation should begin. If you read the file from start to end you don't need this.
To read binary data you want the aptly named std::istream::read. You can use it like this wih a fixed size buffer:
// char is one byte, could also be uint8_t, but then you would need a cast later on
std::array<char, 1460> bytes;
while(myInputStream.read(bytes.data(), bytes.size())) {
// do work with the read data
}

Binary Files in C++, changing the content of raw data on an audio file

I have never worked with binary files before. I opened an .mp3 file using the mode ios::binary, read data from it, assigned 0 to each byte read and then rewrote them to another file opened in ios::binary mode. I opened the output file on a media player, it sounds corrupted but I can still hear the song. I want to know what happened physically.
How can I access/modify the raw data ( bytes ) of an audio ( video, images, ... ) using C++ ( to practice file encryption/decryption later )?
Here is my code:
#include <iostream>
#include <fstream>
#include <cstring>
using namespace std;
int main(){
char buffer[256];
ifstream inFile;
inFile.open("Backstreet Boys - Incomplete.mp3",ios::binary);
ofstream outFile;
outFile.open("Output.mp3",ios::binary);
while(!inFile.eof()){
inFile.read(buffer,256);
for(int i = 0; i<strlen(buffer); i++){
buffer[i] = 0;
}
outFile.write(buffer,256);
}
inFile.close();
outFile.close();
}
What you did has nothing to do with binary files or audio. You simply copied the file while zeroing some of the bytes. (The reason you didn't zero all of the bytes is because you use i<strlen(buffer), which simply counts up to the first zero byte rather than reporting the size of the buffer. Also you modify the buffer which means strlen(buffer) will report the length as zero after you zero the first byte.)
So the exact change in audio you get is entirely dependent on the mp3 file format and the audio compression it uses. MP3 is not an audio format that can be directly manipulated in useful ways.
If you want to manipulate digital audio, you need to learn about how raw audio is represented by computers.
It's actually not too difficult. For example, here's a program that writes out a raw audio file containing just a 400Hz tone.
#include <fstream>
#include <limits>
int main() {
const double pi = 3.1415926535;
double tone_frequency = 400.0;
int samples_per_second = 44100;
double output_duration_seconds = 5.0;
int output_sample_count =
static_cast<int>(output_duration_seconds * samples_per_second);
std::ofstream out("signed-16-bit_mono-channel_44.1kHz-sample-rate.raw",
std::ios::binary);
for (int sample_i = 0; sample_i < output_sample_count; ++sample_i) {
double t = sample_i / static_cast<double>(samples_per_second);
double sound_amplitude = std::sin(t * 2 * pi * tone_frequency);
// encode amplitude as a 16-bit, signed integral value
short sample_value =
static_cast<short>(sound_amplitude * std::numeric_limits<short>::max());
out.write(reinterpret_cast<char const *>(&sample_value),
sizeof sample_value);
}
}
To play the sound you need a program that can handle raw audio, such as Audacity. After running the program to generate the audio file, you can File > Import > Raw data..., to import the data for playing.
How can I access/modify the raw data ( bytes ) of an audio ( video, images, ... ) using C++ ( to practice file encryption/decryption later )?
As pointed out earlier, the reason your existing code is not completely zeroing out the data is because you are using an incorrect buffer size: strlen(buffer). The correct size is the number of bytes read() put into the buffer, which you can get with the function gcount():
inFile.read(buffer,256);
int buffer_size = inFile.gcount();
for(int i = 0; i < buffer_size; i++){
buffer[i] = 0;
}
outFile.write(buffer, buffer_size);
Note: if you were to step through your program using a debugger you probably would have pretty quickly seen the problem yourself when you noticed the inner loop executing less than you expected. Debuggers are a really handy tool to learn how to use.
I notice you're using open() and close() methods here. This is sort of pointless in this program. Just open the file in the constructor, and allow the file to be automatically closed when inFile and outFile go out of scope:
{
ifstream inFile("Backstreet Boys - Incomplete.mp3",ios::binary);
ofstream outFile("Output.mp3",ios::binary);
// don't bother calling .close(), it happens automatically.
}

Uncompressing files with Qt qUnCompress function

I read documentation and post about uncompressing ZIP files but I've additional questions.
I need to uncompress zip file in Qt. That is XML file compressed with gzip.
I know that qUnCompress can uncompressing zip files prepared with ZLIB and ZLIB has diffrent header than GZIP.
As i read in documentation:
Note: If you want to use this function to uncompress external data that was compressed using zlib, you first need to prepend a four byte header to the byte array containing the data. The header must contain the expected length (in bytes) of the uncompressed data, expressed as an unsigned, big-endian, 32-bit integer.
Is that means that I have to put at the beginning only length (bigendian) and than compressed data ?
I did it but I have an error from qUncompress function:
qUncompress: Z_DATA_ERROR: Input data is corrupted
You need to write you own gUncompress() function using either zlib, or some other library, that implements the DEFLATE algorithm. I personally prefer miniz:
http://code.google.com/p/miniz/
Here's some code for you:
#include <stdexcept>
#include <QtCore>
#ifndef TINFL_HEADER_FILE_ONLY
# define TINFL_HEADER_FILE_ONLY
#endif // TINFL_HEADER_FILE_ONLY
extern "C" {
# include "tinfl.h"
}
#include "guncompress.hpp"
static tinfl_decompressor inflator;
static QByteArray result(TINFL_LZ_DICT_SIZE, 0);
//////////////////////////////////////////////////////////////////////////////
QByteArray gUncompress(QByteArray const& data)
{
mz_uint8 const* inPtr(reinterpret_cast<mz_uint8 const*>(data.data()) + 10);
tinfl_init(&inflator);
size_t inAvail(data.size());
size_t outTotal(0);
tinfl_status ret;
do
{
size_t inSize(inAvail);
size_t outSize(result.size() - outTotal);
ret = tinfl_decompress(&inflator,
inPtr,
&inSize,
reinterpret_cast<mz_uint8*>(result.data()),
reinterpret_cast<mz_uint8*>(result.data()) + outTotal,
&outSize,
0
);
switch (ret)
{
case TINFL_STATUS_HAS_MORE_OUTPUT:
inAvail -= inSize;
inPtr += inSize;
result.resize(2 * result.size());
case TINFL_STATUS_DONE:
outTotal += outSize;
break;
default:
throw std::runtime_error("error decompressing gzipped content");
}
}
while (TINFL_STATUS_DONE != ret);
return QByteArray::fromRawData(result.data(), outTotal);
}
Also note that zip files and gzip files do not share the same format. Zip files need to be handled differently, as they contain a directory of files they contain.
Look for qzip.cpp, qzipreader_p.h, qzipwriter_p.h in the source for Qt. It can be used for reading and writing zip files.