Simple Zlib C++ String Compression and Decompression - c++

I need a simple compression and decompression of a std::string in C++. I looked at this site and the code is for Character array. What I want to implement are the two functions:
std::string original = "This is to be compressed!!!!";
std::string compressed = string_compress(original);
std::cout << compressed << std::endl;
std::string decompressed = string_decompress(compressed);
std::cout << decompressed << std::endl;
I had tried the boost compression as:
std::string CompressData(const std::string &data)
{
std::stringstream compressed;
std::stringstream decompressed;
decompressed << data;
boost::iostreams::filtering_streambuf<boost::iostreams::input> out;
out.push(boost::iostreams::zlib_compressor());
out.push(decompressed);
boost::iostreams::copy(out, compressed);
return compressed.str();
}
std::string DecompressData(const std::string &data)
{
std::stringstream compressed;
std::stringstream decompressed;
compressed << data;
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(compressed);
boost::iostreams::copy(in, decompressed);
return decompressed.str();
}
but the code sometimes gives Null characters in string ie \u0000. How do I handle if the compressed data contains these null characters. Is the return type string correct? How can I implement function string_compress and string_decompress using zlib?

You can do as #LawfulEvil suggested. Here is the code snippet that works :)
std::string original = "This is to be compressed!!!!";
std::string compressed_encoded = string_compress_encode(original);
std::cout << compressed_encoded << std::endl;
std::string decompressed_decoded = string_decompress_decode(compressed_encoded);
std::cout << decompressed_decoded << std::endl;
Using this as the base64 encode/decode library.
#include <sstream>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/zlib.hpp>
#include <cpp-base64/base64.h>
std::string string_compress_encode(const std::string &data)
{
std::stringstream compressed;
std::stringstream original;
original << data;
boost::iostreams::filtering_streambuf<boost::iostreams::input> out;
out.push(boost::iostreams::zlib_compressor());
out.push(original);
boost::iostreams::copy(out, compressed);
/**need to encode here **/
std::string compressed_encoded = base64_encode(reinterpret_cast<const unsigned char*>(compressed.c_str()), compressed.length());
return compressed_encoded;
}
std::string string_decompress_decode(const std::string &data)
{
std::stringstream compressed_encoded;
std::stringstream decompressed;
compressed_encoded << data;
/** first decode then decompress **/
std::string compressed = base64_decode(compressed_encoded);
boost::iostreams::filtering_streambuf<boost::iostreams::input> in;
in.push(boost::iostreams::zlib_decompressor());
in.push(compressed);
boost::iostreams::copy(in, decompressed);
return decompressed.str();
}

Compression makes use of all the values available for each byte, so it will appear as 'garbage' or 'weird' characters when attempting to view as ascii. Its expected. You'll need to encode the data for transmission / json packing to avoid nulls. I suggest base 64. Code to do that is available at the link below(which I didn't author so I won't copy here).
http://www.adp-gmbh.ch/cpp/common/base64.html
Binary data JSONCPP

Related

Boost gzip how to output compressed string as text

I'm using boost gzip example code here.
I am attempting to compress a simple string test and am expecting the compressed string H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA as shown in this online compressor
static std::string compress(const std::string& data)
{
namespace bio = boost::iostreams;
std::stringstream compressed;
std::stringstream origin(data);
bio::filtering_streambuf<bio::input> out;
out.push(bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
out.push(origin);
bio::copy(out, compressed);
return compressed.str();
}
int main(int argc, char* argv[]){
std::cout << compress("text") << std::endl;
// prints out garabage
return 0;
}
However when I print out the result of the conversion I get garbage values like +I-. ~
I know that it's a valid conversion because the decompression value returns the correct string. However I need the format of the string to be human readable i.e. H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA.
How can I modify the code to output human readable text?
Thanks
Motivation
The garbage format is not compatible with my JSON library where I will send the compressed text through.
The example site completely fails to mention they also base64 encode the result:
base64 -d <<< 'H4sIAAAAAAAACitJLS4BAAx+f9gEAAAA' | gunzip -
Prints:
test
In short, you need to also do that:
Live On Coliru
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <iostream>
#include <sstream>
#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/base64_from_binary.hpp>
#include <boost/archive/iterators/transform_width.hpp>
std::string decode64(std::string const& val)
{
using namespace boost::archive::iterators;
return {
transform_width<binary_from_base64<std::string::const_iterator>, 8, 6>{
std::begin(val)},
{std::end(val)},
};
}
std::string encode64(std::string const& val)
{
using namespace boost::archive::iterators;
std::string r{
base64_from_binary<transform_width<std::string::const_iterator, 6, 8>>{
std::begin(val)},
{std::end(val)},
};
return r.append((3 - val.size() % 3) % 3, '=');
}
static std::string compress(const std::string& data)
{
namespace bio = boost::iostreams;
std::istringstream origin(data);
bio::filtering_istreambuf in;
in.push(
bio::gzip_compressor(bio::gzip_params(bio::gzip::best_compression)));
in.push(origin);
std::ostringstream compressed;
bio::copy(in, compressed);
return compressed.str();
}
static std::string decompress(const std::string& data)
{
namespace bio = boost::iostreams;
std::istringstream compressed(data);
bio::filtering_istreambuf in;
in.push(bio::gzip_decompressor());
in.push(compressed);
std::ostringstream origin;
bio::copy(in, origin);
return origin.str();
}
int main() {
auto msg = encode64(compress("test"));
std::cout << msg << std::endl;
std::cout << decompress(decode64(msg)) << std::endl;
}
Prints
H4sIAAAAAAAC/ytJLS4BAAx+f9gEAAAA
test

How do you Deflate data and put it into a vector?

With zstr, a header-only C++ zlib wrapper library, I’m trying to Deflate a std::string and put it into a std::vector<unsigned char>.
zstr::ostream deflating_stream(std::cout);
deflating_stream.write(content.data(), content.size());
The above code works: it prints the Deflate’d. The problem is, I’m not familiar with C++ streams and I cannot get it into a std::vector. Tried several times in vain with std::ostringstream, std::ostream, std::istringstream, std::istreambuf_iterator, std::streambuf, .rdbuf(), et cetera, and the only thing that came out was an emptiness output (.tellp() == 0).
How do I Deflate a std::string and put it into a std::vector<unsigned char>?
The following is some of my tries. I have no idea how to access the Deflate’d data.
std::istringstream is;
std::ostream ss(is.rdbuf());
zstr::ostream deflating_stream(ss);
deflating_stream.write(
uncompressed_string.data(),
uncompressed_string.size()
);
the_vector.insert(
the_vector.cend(),
std::istreambuf_iterator<char>(is),
std::istreambuf_iterator<char>()
);
std::ostringstream oss;
zstr::ostream deflating_stream(oss);
deflating_stream.write(
uncompressed_string.data(),
uncompressed_string.size()
);
const std::string deflated = oss.str();
the_vector.insert(
the_vector.cend(),
deflated.cbegin(),
deflated.cend()
);
std::stringstream ss;
zstr::ostream deflating_stream(ss);
deflating_stream.write(
uncompressed_string.data(),
uncompressed_string.size()
);
std::string deflated = ss.str();
std::cout << deflated.size(); // Says 0.
Something like this works:
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
#include <algorithm>
#include "zstr.hpp"
int main() {
std::string text{"some text\n"};
std::stringbuf buffer;
zstr::ostream compressor{&buffer};
// Must flush to get complete gzip data in buffer
compressor << text << std::flush;
// It's probably easier to use just the string...
auto compstr = buffer.str();
std::vector<unsigned char> deflated;
deflated.resize(compstr.size());
std::copy(compstr.begin(), compstr.end(), deflated.begin());
std::cout.write(reinterpret_cast<char *>(deflated.data()), deflated.size());
return 0;
}
After compiling:
$ ./a.out | zcat
some text

Deflation compression algorithm for huge data streams

I've got C++ program that is getting data buffer from time to time, and should add it to existing compressed file.
I tried to make POC by reading 1k chunks from some file, passing them to compressed stream and uncompress it when the data is over.
I use Poco::DeflatingOutputStream to compress each chunk to the file, and Poco::InflatingOutputStream to check that after decompressing I get the original file.
However, it seems that after decompressing the stream my data went almost identical to the original file, except that between every 2 consecutive chunks of data i get a few garbage characters such as : à¿_ÿ
here's an example of line that is split between 2 chunks. the original line looks like that :
elevated=0 path=/System/Library/CoreServices/Dock.app/Contents/MacOS/Dock exist
while the decompressed line is :
elevated=0 path=/System/Libr à¿_ÿary/CoreServices/Dock.app/Contents/MacOS/Dock exist
May 19 19:12:51 PANMMUZNG8WNREM kernel[0]: pid=904 uid=1873876126 sbit=0
any idea what am i doing wrong. Here's my POC code:
int zip_unzip() {
std::ostringstream stream1;
Poco::DeflatingOutputStream gzipper(stream1, Poco::DeflatingStreamBuf::STREAM_ZLIB);
std::ifstream bigFile("/tmp/in.log");
constexpr size_t bufferSize = 1024;
char buffer[bufferSize];
while (bigFile) {
bigFile.read(buffer, bufferSize);
gzipper << buffer;
}
gzipper.close();
std::string zipped_string = stream1.str();
//////////////////
std::ofstream stream2("/tmp/out.log", std::ios::binary);
Poco::InflatingOutputStream gunzipper(stream2, InflatingStreamBuf::STREAM_ZLIB);
gunzipper << zipped_string;
gunzipper.close();
return 0;
}
Ok, i just realized i used the '<<' operator on each read from the HugeFile (the original decompressed file) without care, since there was no null termination symbol '/0' at the end of each window i read from the file.
That's the fixed version :
#include <stdio.h>
#include <fstream>
#include <Poco/DeflatingStream.h>
#include <Poco/Exception.h>
#include <iostream>
int BetterZip()
{
try {
// Create gzip file.
std::ofstream output_file("/tmp/out.gz", std::ios::binary);
Poco::DeflatingOutputStream output_stream(output_file, Poco::DeflatingStreamBuf::STREAM_GZIP);
// INPUT
std::ifstream big_file("/tmp/hugeFile");
constexpr size_t ReadBufferSize = 1024;
char buffer[ReadBufferSize];
while (big_file) {
big_file.read(buffer, ReadBufferSize);
output_stream.write(buffer, big_file.gcount());
}
output_stream.close();
} catch (const Poco::Exception& ex) {
std::cout << "Error : (error code " << ex.code() << " (" << ex.displayText() << ")";
return EINVAL;
}
return 0;
}

Uncompress data in memory using Boost gzip_decompressor

I'm trying to decompress binary data in memory using Boost gzip_decompressor. From this answer, I adapted the following code:
vector<char> unzip(const vector<char> compressed)
{
vector<char> decompressed = vector<char>();
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_decompressor());
os.push(boost::iostreams::back_inserter(decompressed));
boost::iostreams::write(os, &compressed[0], compressed.size());
return decompressed;
}
However, the returned vector has zero length. What am I doing wrong? I tried calling flush() on the os stream, but it did not make a difference
Your code works for me with this simple test program:
#include <iostream>
#include <vector>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
std::vector<char> unzip(const std::vector<char> compressed)
{
std::vector<char> decompressed = std::vector<char>();
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_decompressor());
os.push(boost::iostreams::back_inserter(decompressed));
boost::iostreams::write(os, &compressed[0], compressed.size());
return decompressed;
}
int main() {
std::vector<char> compressed;
{
boost::iostreams::filtering_ostream os;
os.push(boost::iostreams::gzip_compressor());
os.push(boost::iostreams::back_inserter(compressed));
os << "hello\n";
os.reset();
}
std::cout << "Compressed size: " << compressed.size() << '\n';
const std::vector<char> decompressed = unzip(compressed);
std::cout << std::string(decompressed.begin(), decompressed.end());
return 0;
}
Are you sure your input was compressed with gzip and not some other method (e.g. raw deflate)? gzip compressed data begins with bytes 1f 8b.
I generally use reset() or put the stream and filters in their own block to make sure that output is complete. I did both for compression above, just as an example.

boost json serialization and message_queue segfault

i'm making some test with boost interprocess and ptree structure, i have a segfault when i try to read the message sent(or when i try to parse it in json).
i'm using boost1.49 on debian linux.
i'm serializing it in json for later uses, and because i didn't find any good doc for the direct serialization of the boost property threes.
this is the code i'm using to test(the commed say where the segfault is):
recv.cc
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <sstream>
struct test_data{
std::string action;
std::string name;
int faceID;
uint32_t Flags;
uint32_t freshness;
};
test_data recvData()
{
boost::interprocess::message_queue::remove("queue");
boost::property_tree::ptree pt;
test_data data;
std::istringstream buffer;
boost::interprocess::message_queue mq(boost::interprocess::open_or_create,"queue", 1, sizeof(buffer)
boost::interprocess::message_queue::size_type recvd_size;
unsigned int pri;
mq.receive(&buffer,sizeof(buffer),recvd_size,pri);
std::cout << buffer.str() << std::endl; //the segfault is there
boost::property_tree::read_json(buffer,pt);
data.action = pt.get<std::string>("action");
data.name = pt.get<std::string>("name");
data.faceID = pt.get<int>("face");
data.Flags = pt.get<uint32_t>("flags");
data.freshness = pt.get<uint32_t>("freshness");
boost::interprocess::message_queue::remove("queue");
return data;
}
int main()
{
test_data test;
test = recvData();
std::cout << test.action << test.name << test.faceID << test.Flags << test.freshness << std::endl;
}
sender.cc
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/json_parser.hpp>
#include <boost/interprocess/ipc/message_queue.hpp>
#include <sstream>
struct test_data{
std::string action;
std::string name;
int faceID;
uint32_t Flags;
uint32_t freshness;
};
int sendData(test_data data)
{
boost::property_tree::ptree pt;
pt.put("action",data.action);
pt.put("name",data.name);
pt.put("face",data.faceID);
pt.put("flags",data.Flags);
pt.put("freshness",data.freshness);
std::ostringstream buffer;
boost::property_tree::write_json(buffer,pt,false);
boost::interprocess::message_queue mq(boost::interprocess::open_only,"chiappen")
std::cout << sizeof(buffer) << std::endl;
mq.send(&buffer,sizeof(buffer),0);
return 0;
}
int main ()
{
test_data prova;
prova.action = "registration";
prova.name = "prefix";
prova.Flags = 0;
prova.freshness = 52;
sendData(prova);
}
I know it's a bit late to an answer right now, but anyway..
You can't pass an istringstream as a buffer for receive. Boost message queues only handle raw bytes and don't handle std like objects.
To make it work, you must use a char array or any buffer previously reserved with malloc.
Ex:
char buffer [1024];
mq.receive(buffer, sizeof(buffer), recvd_size, pri);
For sending it's the same, you can only send raw bytes, so you can't use ostringstream.
Hope it helps.