Uncompress data in memory using Boost gzip_decompressor - c++

I'm trying to decompress binary data in memory using Boost gzip_decompressor. From this answer, I adapted the following code:
vector<char> unzip(const vector<char> compressed)
{
vector<char> decompressed = vector<char>();
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_decompressor());
os.push(boost::iostreams::back_inserter(decompressed));
boost::iostreams::write(os, &compressed[0], compressed.size());
return decompressed;
}
However, the returned vector has zero length. What am I doing wrong? I tried calling flush() on the os stream, but it did not make a difference

Your code works for me with this simple test program:
#include <iostream>
#include <vector>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
std::vector<char> unzip(const std::vector<char> compressed)
{
std::vector<char> decompressed = std::vector<char>();
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_decompressor());
os.push(boost::iostreams::back_inserter(decompressed));
boost::iostreams::write(os, &compressed[0], compressed.size());
return decompressed;
}
int main() {
std::vector<char> compressed;
{
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_compressor());
os.push(boost::iostreams::back_inserter(compressed));
os << "hello\n";
os.reset();
}
std::cout << "Compressed size: " << compressed.size() << '\n';
const std::vector<char> decompressed = unzip(compressed);
std::cout << std::string(decompressed.begin(), decompressed.end());
return 0;
}
Are you sure your input was compressed with gzip and not some other method (e.g. raw deflate)? gzip compressed data begins with bytes 1f 8b.
I generally use reset() or put the stream and filters in their own block to make sure that output is complete. I did both for compression above, just as an example.

Related

Boost gzip how to output compressed string as text

I'm using boost gzip example code here.
I am attempting to compress a simple string test and am expecting the compressed string H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA as shown in this online compressor
static std::string compress(const std::string& data)
{
namespace bio = boost::iostreams;
std::stringstream compressed;
std::stringstream origin(data);
bio::filtering_streambuf<bio::input> out;
out.push(bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
out.push(origin);
bio::copy(out, compressed);
return compressed.str();
}
int main(int argc, char* argv[]){
std::cout << compress("text") << std::endl;
// prints out garabage
return 0;
}
However when I print out the result of the conversion I get garbage values like +I-. ~
I know that it's a valid conversion because the decompression value returns the correct string. However I need the format of the string to be human readable i.e. H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA.
How can I modify the code to output human readable text?
Thanks
Motivation
The garbage format is not compatible with my JSON library where I will send the compressed text through.
The example site completely fails to mention they also base64 encode the result:
base64 -d <<< 'H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA' | gunzip -
Prints:
test
In short, you need to also do that:
Live On Coliru
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <iostream>
#include <sstream>
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>
std::string decode64(std::string const& val)
{
using namespace boost::archive::iterators;
return {
transform_width<binary_from_base64<std::string::const_iterator>, 8, 6>{
std::begin(val)},
{std::end(val)},
};
}
std::string encode64(std::string const& val)
{
using namespace boost::archive::iterators;
std::string r{
base64_from_binary<transform_width<std::string::const_iterator, 6, 8>>{
std::begin(val)},
{std::end(val)},
};
return r.append((3 - val.size() % 3) % 3, '=');
}
static std::string compress(const std::string& data)
{
namespace bio = boost::iostreams;
std::istringstream origin(data);
bio::filtering_istreambuf in;
in.push(
bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
in.push(origin);
std::ostringstream compressed;
bio::copy(in, compressed);
return compressed.str();
}
static std::string decompress(const std::string& data)
{
namespace bio = boost::iostreams;
std::istringstream compressed(data);
bio::filtering_istreambuf in;
in.push(bio::gzip_decompressor());
in.push(compressed);
std::ostringstream origin;
bio::copy(in, origin);
return origin.str();
}
int main() {
auto msg = encode64(compress("test"));
std::cout << msg << std::endl;
std::cout << decompress(decode64(msg)) << std::endl;
}
Prints
H4sIAAAAAAAC/ytJLS4BAAx+f9gEAAAA
test

How do you Deflate data and put it into a vector?

With zstr, a header-only C++ zlib wrapper library, I’m trying to Deflate a std::string and put it into a std::vector<unsigned char>.
zstr::ostream deflating_stream(std::cout);
deflating_stream.write(content.data(), content.size());
The above code works: it prints the Deflate’d. The problem is, I’m not familiar with C++ streams and I cannot get it into a std::vector. Tried several times in vain with std::ostringstream, std::ostream, std::istringstream, std::istreambuf_iterator, std::streambuf, .rdbuf(), et cetera, and the only thing that came out was an emptiness output (.tellp() == 0).
How do I Deflate a std::string and put it into a std::vector<unsigned char>?
The following is some of my tries. I have no idea how to access the Deflate’d data.
std::istringstream is;
std::ostream ss(is.rdbuf());
zstr::ostream deflating_stream(ss);
deflating_stream.write(
uncompressed_string.data(),
uncompressed_string.size()
);
the_vector.insert(
the_vector.cend(),
std::istreambuf_iterator<char>(is),
std::istreambuf_iterator<char>()
);
std::ostringstream oss;
zstr::ostream deflating_stream(oss);
deflating_stream.write(
uncompressed_string.data(),
uncompressed_string.size()
);
const std::string deflated = oss.str();
the_vector.insert(
the_vector.cend(),
deflated.cbegin(),
deflated.cend()
);
std::stringstream ss;
zstr::ostream deflating_stream(ss);
deflating_stream.write(
uncompressed_string.data(),
uncompressed_string.size()
);
std::string deflated = ss.str();
std::cout << deflated.size(); // Says 0.
Something like this works:
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
#include <algorithm>
#include "zstr.hpp"
int main() {
std::string text{"some text\n"};
std::stringbuf buffer;
zstr::ostream compressor{&buffer};
// Must flush to get complete gzip data in buffer
compressor << text << std::flush;
// It's probably easier to use just the string...
auto compstr = buffer.str();
std::vector<unsigned char> deflated;
deflated.resize(compstr.size());
std::copy(compstr.begin(), compstr.end(), deflated.begin());
std::cout.write(reinterpret_cast<char *>(deflated.data()), deflated.size());
return 0;
}
After compiling:
$ ./a.out | zcat
some text

Save json with rapidjson directly on file

I'm java programmer and I'm learning C++ for my personal project for a parser bitcoin core, my parser converts the information on file dat bitcoin to the json file.
Now my problem is when I create the big json with rapidjson with Writer on StringBuffer
This is a simple example my DAO
void DAOJson::serializationWithRapidJson(Person &person) {
rapidjson::StringBuffer s;
rapidjson::Writer<rapidjson::StringBuffer> writer(s);
person.toRapidJson(writer);
unique_ptr<string> json(new string(s.GetString()));
cout << *json;
ofstream stream(DIR_HOME + "dump_rapidJson_test.json");
stream << *json;
json.reset();
stream.close();
}
My question is
Is possible with rapidjson create the json on the file and not on the string? because I must save my memory
the example of the code that I would like to
rapidjson::Writer<rapidjson::FileWriter> writer(s);
Yes, you do have OStreamWrapper:
#include <rapidjson/ostreamwrapper.h>
#include <rapidjson/writer.h>
#include <fstream>
void f(auto person)
{
std::ofstream stream(DIR_HOME + "dump_rapidJson_test.json");
rapidjson::OStreamWrapper osw(stream);
rapidjson::Writer<rapidjson::OStreamWrapper> writer(osw);
person.toRapidJson(writer);
}
I'd define an operator if I were you:
std::ofstream operator<<(std::ofstream& os, Person const& person)
{
rapidjson::OStreamWrapper osw(os);
rapidjson::Writer<rapidjson::OStreamWrapper> writer(osw);
person.toRapidJson(writer);
return os;
}
// usage (e.g.):
std::ofstream out("tmp");
Person alice, bob;
out << "Alice: " << alice << "\nBob: " << bob;
You alsohave a C-compatible variant: rapidjson::FileWriteStream, but it needs a buffer anyway.
#include <rapidjson/filewritestream.h>
#include <rapidjson/writer.h>
#include <cstdio>
void f(auto person)
{
// output file (a la C)
FILE* fp = std::fopen("output.json", "wb"); // non-Windows use "w"
// writer to file (through a provided buffer)
char writeBuffer[65536];
rapidjson::FileWriteStream os(fp, writeBuffer, sizeof(writeBuffer));
rapidjson::Writer<rapidjson::FileWriteStream> writer(os);
// write
person.toRapidJson(writer);
std::fclose(fp);
}

Deflation compression algorithm for huge data streams

I've got C++ program that is getting data buffer from time to time, and should add it to existing compressed file.
I tried to make POC by reading 1k chunks from some file, passing them to compressed stream and uncompress it when the data is over.
I use Poco::DeflatingOutputStream to compress each chunk to the file, and Poco::InflatingOutputStream to check that after decompressing I get the original file.
However, it seems that after decompressing the stream my data went almost identical to the original file, except that between every 2 consecutive chunks of data i get a few garbage characters such as : à¿_ÿ
here's an example of line that is split between 2 chunks. the original line looks like that :
elevated=0 path=/System/Library/CoreServices/Dock.app/Contents/MacOS/Dock exist
while the decompressed line is :
elevated=0 path=/System/Libr à¿_ÿary/CoreServices/Dock.app/Contents/MacOS/Dock exist
May 19 19:12:51 PANMMUZNG8WNREM kernel[0]: pid=904 uid=1873876126 sbit=0
any idea what am i doing wrong. Here's my POC code:
int zip_unzip() {
std::ostringstream stream1;
Poco::DeflatingOutputStream gzipper(stream1, Poco::DeflatingStreamBuf::STREAM_ZLIB);
std::ifstream bigFile("/tmp/in.log");
constexpr size_t bufferSize = 1024;
char buffer[bufferSize];
while (bigFile) {
bigFile.read(buffer, bufferSize);
gzipper << buffer;
}
gzipper.close();
std::string zipped_string = stream1.str();
//////////////////
std::ofstream stream2("/tmp/out.log", std::ios::binary);
Poco::InflatingOutputStream gunzipper(stream2, InflatingStreamBuf::STREAM_ZLIB);
gunzipper << zipped_string;
gunzipper.close();
return 0;
}
Ok, i just realized i used the '<<' operator on each read from the HugeFile (the original decompressed file) without care, since there was no null termination symbol '/0' at the end of each window i read from the file.
That's the fixed version :
#include <stdio.h>
#include <fstream>
#include <Poco/DeflatingStream.h>
#include <Poco/Exception.h>
#include <iostream>
int BetterZip()
{
try {
// Create gzip file.
std::ofstream output_file("/tmp/out.gz", std::ios::binary);
Poco::DeflatingOutputStream output_stream(output_file, Poco::DeflatingStreamBuf::STREAM_GZIP);
// INPUT
std::ifstream big_file("/tmp/hugeFile");
constexpr size_t ReadBufferSize = 1024;
char buffer[ReadBufferSize];
while (big_file) {
big_file.read(buffer, ReadBufferSize);
output_stream.write(buffer, big_file.gcount());
}
output_stream.close();
} catch (const Poco::Exception& ex) {
std::cout << "Error : (error code " << ex.code() << " (" << ex.displayText() << ")";
return EINVAL;
}
return 0;
}

Simple Zlib C++ String Compression and Decompression

I need a simple compression and decompression of a std::string in C++. I looked at this site and the code is for Character array. What I want to implement are the two functions:
std::string original = "This is to be compressed!!!!";
std::string compressed = string_compress(original);
std::cout << compressed << std::endl;
std::string decompressed = string_decompress(compressed);
std::cout << decompressed << std::endl;
I had tried the boost compression as:
std::string CompressData(const std::string &data)
{
std::stringstream compressed;
std::stringstream decompressed;
decompressed << data;
boost::iostreams::filtering_streambuf<boost::iostreams::input> out;
out.push(boost::iostreams::zlib_compressor());
out.push(decompressed);
boost::iostreams::copy(out, compressed);
return compressed.str();
}
std::string DecompressData(const std::string &data)
{
std::stringstream compressed;
std::stringstream decompressed;
compressed << data;
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(compressed);
boost::iostreams::copy(in, decompressed);
return decompressed.str();
}
but the code sometimes gives Null characters in string ie \u0000. How do I handle if the compressed data contains these null characters. Is the return type string correct? How can I implement function string_compress and string_decompress using zlib?
You can do as #LawfulEvil suggested. Here is the code snippet that works :)
std::string original = "This is to be compressed!!!!";
std::string compressed_encoded = string_compress_encode(original);
std::cout << compressed_encoded << std::endl;
std::string decompressed_decoded = string_decompress_decode(compressed_encoded);
std::cout << decompressed_decoded << std::endl;
Using this as the base64 encode/decode library.
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/zlib.hpp>
#include <cpp-base64/base64.h>
std::string string_compress_encode(const std::string &data)
{
std::stringstream compressed;
std::stringstream original;
original << data;
boost::iostreams::filtering_streambuf<boost::iostreams::input> out;
out.push(boost::iostreams::zlib_compressor());
out.push(original);
boost::iostreams::copy(out, compressed);
/**need to encode here **/
std::string compressed_encoded = base64_encode(reinterpret_cast<const unsigned char*>(compressed.c_str()), compressed.length());
return compressed_encoded;
}
std::string string_decompress_decode(const std::string &data)
{
std::stringstream compressed_encoded;
std::stringstream decompressed;
compressed_encoded << data;
/** first decode then decompress **/
std::string compressed = base64_decode(compressed_encoded);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(compressed);
boost::iostreams::copy(in, decompressed);
return decompressed.str();
}
Compression makes use of all the values available for each byte, so it will appear as 'garbage' or 'weird' characters when attempting to view as ascii. Its expected. You'll need to encode the data for transmission / json packing to avoid nulls. I suggest base 64. Code to do that is available at the link below(which I didn't author so I won't copy here).
http://www.adp-gmbh.ch/cpp/common/base64.html
Binary data JSONCPP