Can boost iostreams read and compress gzipped files on the fly? - c++

I am reading a gzipped file using boost iostreams:
The following works fine:
namespace io = boost::iostreams;
io::filtering_istream in;
in.push(boost::iostreams::basic_gzip_decompressor<>());
in.push(io::file_source("test.gz"));
stringstream ss;
copy(in, ss);
However, I don't want to take the memory hit of reading an entire gzipped file
into memory. I want to be able to read the file incrementally.
For example, if I have a data structure X that initializes itself from istream,
X x;
x.read(in);
fails. Presumably this is because we may have to put back characters into the stream
if we are doing partial streams. Any ideas whether boost iostreams supports this?

According to the iostream documentation the type boost::io::filtering_istream derives from std::istream. That is, it should be possible to pass this everywhere an std::istream& is expected. If you have errors at run-time because you need to unget() or putback() characters you should have a look at the pback_size parameter which specifies how many characters are return at most. I haven't seen in the documentation what the default value for this parameter is.
If this doesn't solve your problem can you describe what your problem is exactly? From the looks of it should work.

I think you need to write your own filter. For instance, to read a .tar.gz and output the files contained, I wrote something like
//using namespace std;
namespace io = boost::iostreams;
struct tar_expander
{
tar_expander() : out(0), status(header)
{
}
~tar_expander()
{
delete out;
}
/* qualify filter */
typedef char char_type;
struct category :
io::input_filter_tag,
io::multichar_tag
{ };
template<typename Source>
void fetch_n(Source& src, std::streamsize n = block_size)
{
/* my utility */
....
}
// Read up to n filtered characters into the buffer s,
// returning the number of characters read or -1 for EOF.
// Use src to access the unfiltered character sequence
template<typename Source>
std::streamsize read(Source& src, char* s, std::streamsize n)
{
fetch_n(src);
const tar_header &h = cast_buf<tar_header>();
int r;
if (status == header)
{
...
}
std::ofstream *out;
size_t fsize, stored;
static const size_t block_size = 512;
std::vector<char> buf;
enum { header, store_file, archive_end } status;
}
}
My function read(Source &...) when called receives the unzipped text.
To use the filter:
ifstream file("/home/..../resample-1.8.1.tar.gz", ios_base::in | ios_base::binary);
io::filtering_streambuf<io::input> in;
in.push(tar_expander());
in.push(io::gzip_decompressor());
in.push(file);
io::copy(in, cout);

Related

use iostream or alternative for managing stream

I want to write a function which (simplified) takes as a parameter an input buffer of variable size, processes it (sequentially), and returns a buffer of a fixed size. The remaining part of the buffer has to stay in the "pipeline" for the next call of the function.
Question 1:
From my research it looks like iostream is the way to go, but apparently no one is using it. Is this the best way to go?
Question 2:
How can I declare the iostream object globally? Actually, as I have several streams I will need to write the iostream Object in a struct-vector. How do I do this?
At the moment my code looks like that:
struct membuf : std::streambuf
{
membuf(char* begin, char* end) {
this->setg(begin, begin, end);
}
};
void read_stream(char* bufferIn, char* BufferOut, int lengthBufferIn)
{
char* buffer = (char*) malloc(300); //How do I do this globally??
membuf sbuf(buffer, buffer + sizeof(buffer));//How do I do this globally??
std::iostream s(&sbuf); //How do I do this globally??
s.write(bufferIn, lengthBufferIn);
s.read(BufferOut, 100);
process(BufferOut);
}
I see no need for iostream here. You can create an object who has a reference to the buffer (so no copies involved) and to the position where it is left.
So something along this:
class Transformer {
private:
char const *input_buf_;
public:
Transformer(char const *buf) : input_buf_(buf) {
}
bool has_next() const { return input_buf_ != nullptr; } // or your own condition
std::array<char, 300> read_next() {
// read from input_buf_ as much as you need
// advance input_buf_ to the remaining part
// make sure to set input_buf_ accordingly after the last part
// e.g. input_buf_ = nullptr; for how I wrote hasNext
return /*the processed fixed size buffer*/;
}
}
usage:
char *str == //...;
Transformer t(str);
while (t.has_next()) {
std::array<char, 300> arr = t.read_next();
// use arr
}
Question 1: From my research it looks like iostream is the way to go, but apparently no one is using it. Is this the best way to go?
Yes (the std::istream class and specializations, are there to manage streams, and they fit the problem well).
Your code could look similar to this:
struct fixed_size_buffer
{
static const std::size_t size = 300;
std::vector<char> value;
fixed_size_buffer() : value(fixed_size_buffer::size, ' ') {}
};
std::istream& operator>>(std::istream& in, fixed_size_buffer& data)
{
std::noskipws(in); // read spaces as well as characters
std::copy_n(std::istream_iterator<char>{ in },
fixed_size_buffer::size);
std::begin(data.value)); // this leaves in in an invalid state
// if there is not enough data in the input
// stream;
return in;
}
Consuming the data:
fixed_size_buffer buffer;
std::ifstream fin{ "c:\\temp\\your_data.txt" };
while(fin >> buffer)
{
// do something with buffer here
}
while(std::cin >> buffer) // read from standard input
{
// do something with buffer here
}
std::istringstream sin{ "long-serialized-string-here" };
while(sin >> buffer) // read from standard input
{
// do something with buffer here
}
Question 2: How can I declare the iostream object globally? Actually, as I have several streams I will need to write the iostream Object in a struct-vector. How do I do this?
iostreams do not support copy-construction; Because of this, you will need to keep them in a sequence of pointers / references to base class:
auto fin = std::make_unique<std::ifstream>("path_to_input_file");
std::vector<std::istream*> streams;
streams.push_back(&std::cin);
streams.push_back(fin.get());
fixed_size_buffer buffer;
for(auto in_ptr: streams)
{
std::istream& in = &in_ptr;
while(in >> buffer)
{
// do something with buffer here
}
}

Serialize an object to encrypted std::fstream (C++)

Is there a way to create an encrypted file stream?
I want to do something like this:
string myString;
fstream myStream;
myStream.create("my path", "my password", cipherAlgorithm);
myStream.write(myString); - this code saves my string to an encrypted stream
myStream.close();
Thanks.
I would suggest having a sink object, a type of device that you write to. The sink can be wrapped in a boost stream. The stream can then be written to like any 'standard' output stream
#ifndef ENCRYPTION_SINK_HPP__
#define ENCRYPTION_SINK_HPP__
#include <boost/iostreams/categories.hpp> // sink_tag
#include <iosfwd> // streamsize
#include <string>
#include <iostream>
class EncryptionSink
{
public:
typedef char char_type;
typedef boost::iostreams::sink_tag category;
/**
* #param underlyingStream where the data is actually written
* #param key the key to use during encryption
* #note we could pass in a path rather than a stream and construct the
* underlying stream internally. But doing it this way affords the
* flexibility of using any ostream type e.g. ofstream, ostringstream,
* cout etc.
*/
EncryptionSink(std::ostream &underlyingStream, std::string const &key);
/**
* #param buf the data that you write
* #param n number of bytes to write
* #return the number of bytes written
*/
std::streamsize write(char_type const * const buf, std::streamsize const n) const;
~EncryptionSink();
private:
std::ostream &m_underlyingStream;
std::string const m_key;
};
#endif // ENCRYPTION_SINK_HPP__
The above class takes an underlying stream (or a FILE handle if you prefer) and writes to it in the actual '.cpp' implementation of write using whatever transformations your encryption algorithm requires. For example, the following implementation applies an XOR transformation (I think this is XOR, my knowledge is rusty at best):
EncryptionSink::EncryptionSink(std::ostream &underlyingStream, std::string const &key)
: m_underlyingStream(underlyingStream)
, m_key(key)
{}
std::streamsize
EncryptionSink::write(char_type const * const buf, std::streamsize const n) const
{
std::stringstream ss;
std::string::size_type t = 0;
for(unsigned long i = 0; i < n ; ++i) {
long result = buf[i] ^ m_key[t];
std::stringstream ss;
ss << (unsigned char)result;
m_underlyingStream.write(ss.str().c_str(), 1);
++t;
if(t >= m_key.length()) t = 0;
}
return n;
}
// etc.
Elsewhere in your code, an instantiation of the sink might be as follows:
std::fstream underlyingStream(path, std::ios::out | std::ios::binary);
EncryptionSink sink(underlyingStream, std::string("theEncryptionKey"));
boost::iostreams::stream<EncryptionSink> outputStream(sink);
outputStream << "some data!";
outputStream.flush();
I imagine you might then need an accompanying DecryptionSink to reverse the operation, using a stream copier to copy the encrypted data back to plain text. Note though that with XOR you wouldn't need to do this since reapplying an 'XOR' will transform back to the original data
EDIT: off the back of this, I have written a simple c++ api demonstrating how such code can be put to more effective use: cryptex

boost::iostreams reading from source device

I've been trying to get my head around the iostreams library by boost.
But i cant really fully grasp the concepts.
Say i have the following class:
Pseudocode: The below code is only to illustrate the problem.
Edit: removed the read code because it removed focus on the real problem.
class my_source {
public:
my_source():value(0x1234) {}
typedef char char_type;
typedef source_tag category;
std::streamsize read(char* s, std::streamsize n)
{
... read into "s" ...
}
private:
/* Other members */
};
Now say i want to stream the this to an int.
What do i need to do ? I've tried the following
boost::iostreams::stream<my_source> stream;
stream.open(my_source());
int i = 0;
stream >> i;
// stream.fail() == true; <-- ??
This results in a fail, (failbit is set)
While the following works fine.
boost::iostreams::stream<my_source> stream;
stream.open(my_source());
char i[4];
stream >> i;
// stream.fail() == false;
Could someone explain to me why this is happening ? Is this because i've set the char_type char ?
I cant really find a good explenation anywhere. I've been trying to read the documentation but i cant find the defined behavior for char_type if this is the problem. While when im using stringstreams i can read into a int without doing anything special.
So if anyone has any insight please enlighten me.
All iostreams are textual streams, so this will take the bytewise representation of 0x1234, interpret it as text and try to parse it as integer.
By the way
std::streamsize read(char* s, std::streamsize n)
{
int size = sizeof(int);
memcpy(s, &value, 4);
return size;
}
This has the potential for a buffer overflow if n < 4. Also, you write four bytes and then return the size of an int. memcpy(s, &value, sizeof value); will do the job, a simple return sizeof value; will do the rest.
boost::iostreams::stream constructor without arguments does nothing and in result stream is not open. You need to add fake argument to my_source constructor.
class my_source {
public:
my_source(int fake) : value(0x1234) {}
...
boost::iostreams::stream<my_source> stream(0);

How to write custom input stream in C++

I'm currently learning C++ (Coming from Java) and I'm trying to understand how to use IO streams properly in C++.
Let's say I have an Image class which contains the pixels of an image and I overloaded the extraction operator to read the image from a stream:
istream& operator>>(istream& stream, Image& image)
{
// Read the image data from the stream into the image
return stream;
}
So now I'm able to read an image like this:
Image image;
ifstream file("somepic.img");
file >> image;
But now I want to use the same extraction operator to read the image data from a custom stream. Let's say I have a file which contains the image in compressed form. So instead of using ifstream I might want to implement my own input stream. At least that's how I would do it in Java. In Java I would write a custom class extending the InputStream class and implementing the int read() method. So that's pretty easy. And usage would look like this:
InputStream stream = new CompressedInputStream(new FileInputStream("somepic.imgz"));
image.read(stream);
So using the same pattern maybe I want to do this in C++:
Image image;
ifstream file("somepic.imgz");
compressed_stream stream(file);
stream >> image;
But maybe that's the wrong way, don't know. Extending the istream class looks pretty complicated and after some searching I found some hints about extending streambuf instead. But this example looks terribly complicated for such a simple task.
So what's the best way to implement custom input/output streams (or streambufs?) in C++?
Solution
Some people suggested not using iostreams at all and to use iterators, boost or a custom IO interface instead. These may be valid alternatives but my question was about iostreams. The accepted answer resulted in the example code below. For easier reading there is no header/code separation and the whole std namespace is imported (I know that this is a bad thing in real code).
This example is about reading and writing vertical-xor-encoded images. The format is pretty easy. Each byte represents two pixels (4 bits per pixel). Each line is xor'd with the previous line. This kind of encoding prepares the image for compression (usually results in lot of 0-bytes which are easier to compress).
#include <cstring>
#include <fstream>
using namespace std;
/*** vxor_streambuf class ******************************************/
class vxor_streambuf: public streambuf
{
public:
vxor_streambuf(streambuf *buffer, const int width) :
buffer(buffer),
size(width / 2)
{
previous_line = new char[size];
memset(previous_line, 0, size);
current_line = new char[size];
setg(0, 0, 0);
setp(current_line, current_line + size);
}
virtual ~vxor_streambuf()
{
sync();
delete[] previous_line;
delete[] current_line;
}
virtual streambuf::int_type underflow()
{
// Read line from original buffer
streamsize read = buffer->sgetn(current_line, size);
if (!read) return traits_type::eof();
// Do vertical XOR decoding
for (int i = 0; i < size; i += 1)
{
current_line[i] ^= previous_line[i];
previous_line[i] = current_line[i];
}
setg(current_line, current_line, current_line + read);
return traits_type::to_int_type(*gptr());
}
virtual streambuf::int_type overflow(streambuf::int_type value)
{
int write = pptr() - pbase();
if (write)
{
// Do vertical XOR encoding
for (int i = 0; i < size; i += 1)
{
char tmp = current_line[i];
current_line[i] ^= previous_line[i];
previous_line[i] = tmp;
}
// Write line to original buffer
streamsize written = buffer->sputn(current_line, write);
if (written != write) return traits_type::eof();
}
setp(current_line, current_line + size);
if (!traits_type::eq_int_type(value, traits_type::eof())) sputc(value);
return traits_type::not_eof(value);
};
virtual int sync()
{
streambuf::int_type result = this->overflow(traits_type::eof());
buffer->pubsync();
return traits_type::eq_int_type(result, traits_type::eof()) ? -1 : 0;
}
private:
streambuf *buffer;
int size;
char *previous_line;
char *current_line;
};
/*** vxor_istream class ********************************************/
class vxor_istream: public istream
{
public:
vxor_istream(istream &stream, const int width) :
istream(new vxor_streambuf(stream.rdbuf(), width)) {}
virtual ~vxor_istream()
{
delete rdbuf();
}
};
/*** vxor_ostream class ********************************************/
class vxor_ostream: public ostream
{
public:
vxor_ostream(ostream &stream, const int width) :
ostream(new vxor_streambuf(stream.rdbuf(), width)) {}
virtual ~vxor_ostream()
{
delete rdbuf();
}
};
/*** Test main method **********************************************/
int main()
{
// Read data
ifstream infile("test.img");
vxor_istream in(infile, 288);
char data[144 * 128];
in.read(data, 144 * 128);
infile.close();
// Write data
ofstream outfile("test2.img");
vxor_ostream out(outfile, 288);
out.write(data, 144 * 128);
out.flush();
outfile.close();
return 0;
}
The proper way to create a new stream in C++ is to derive from std::streambuf and to override the underflow() operation for reading and the overflow() and sync() operations for writing. For your purpose you'd create a filtering stream buffer which takes another stream buffer (and possibly a stream from which the stream buffer can be extracted using rdbuf()) as argument and implements its own operations in terms of this stream buffer.
The basic outline of a stream buffer would be something like this:
class compressbuf
: public std::streambuf {
std::streambuf* sbuf_;
char* buffer_;
// context for the compression
public:
compressbuf(std::streambuf* sbuf)
: sbuf_(sbuf), buffer_(new char[1024]) {
// initialize compression context
}
~compressbuf() { delete[] this->buffer_; }
int underflow() {
if (this->gptr() == this->egptr()) {
// decompress data into buffer_, obtaining its own input from
// this->sbuf_; if necessary resize buffer
// the next statement assumes "size" characters were produced (if
// no more characters are available, size == 0.
this->setg(this->buffer_, this->buffer_, this->buffer_ + size);
}
return this->gptr() == this->egptr()
? std::char_traits<char>::eof()
: std::char_traits<char>::to_int_type(*this->gptr());
}
};
How underflow() looks exactly depends on the compression library being used. Most libraries I have used keep an internal buffer which needs to be filled and which retains the bytes which are not yet consumed. Typically, it is fairly easy to hook the decompression into underflow().
Once the stream buffer is created, you can just initialize an std::istream object with the stream buffer:
std::ifstream fin("some.file");
compressbuf sbuf(fin.rdbuf());
std::istream in(&sbuf);
If you are going to use the stream buffer frequently, you might want to encapsulate the object construction into a class, e.g., icompressstream. Doing so is a bit tricky because the base class std::ios is a virtual base and is the actual location where the stream buffer is stored. To construct the stream buffer before passing a pointer to a std::ios thus requires jumping through a few hoops: It requires the use of a virtual base class. Here is how this could look roughly:
struct compressstream_base {
compressbuf sbuf_;
compressstream_base(std::streambuf* sbuf): sbuf_(sbuf) {}
};
class icompressstream
: virtual compressstream_base
, public std::istream {
public:
icompressstream(std::streambuf* sbuf)
: compressstream_base(sbuf)
, std::ios(&this->sbuf_)
, std::istream(&this->sbuf_) {
}
};
(I just typed this code without a simple way to test that it is reasonably correct; please expect typos but the overall approach should work as described)
boost (which you should have already if you're serious about C++), has a whole library dedicated to extending and customizing IO streams: boost.iostreams
In particular, it already has decompressing streams for a few popular formats (bzip2, gzlib, and zlib)
As you saw, extending streambuf may be an involving job, but the library makes it fairly easy to write your own filtering streambuf if you need one.
Don't, unless you want to die a terrible death of hideous design. IOstreams are the worst component of the Standard library - even worse than locales. The iterator model is much more useful, and you can convert from stream to iterator with istream_iterator.
I agree with #DeadMG and wouldn't recommend using iostreams. Apart from poor design the performance is often worse than that of plain old C-style I/O. I wouldn't stick to a particular I/O library though, instead, I'd create an interface (abstract class) that has all required operations, for example:
class Input {
public:
virtual void read(char *buffer, size_t size) = 0;
// ...
};
Then you can implement this interface for C I/O, iostreams, mmap or whatever.
It is probably possible to do this, but I feel that it's not the "right" usage of this feature in C++. The iostream >> and << operators are meant for fairly simple operations, such as wriitng the "name, street, town, postal code" of a class Person, not for parsing and loading images. That's much better done using the stream::read() - using Image(astream);, and you may implement a stream for compression, as descrtibed by Dietmar.

Mirror console output to file in c++

In C++, is there a smart way to mirror output from stdout to both the console and the file?
I'm hoping there is a way to do it like in this question.
Edit: It would be nice to be able to do this with just the standard libraries (ie: no boost)..
Alternatively, just start your program so it's piped to the tee command.
You could try a Tee Device provided by Boost.Iostreams.
A Tee device directs output to multiple streams. As far as I know, you can chain them to reach theoretically infinite output devices from one write call.
This answer shows an example for how to do exactly what you want.
You can do this by creating a class that extends std::streambuf, and has a std::ofstream member. After overriding the std::streambuf::overflow and std::streambuf::sync member functions, you'll be all set.
Most of the code below comes from here. The stuff I've added ("ADDED:") for file-mirroring is pointed out. This might be overly complex as I'm at work and can't pore over it fully to simplify it, but it works - the bonus of doing it this way (instead of just using a std::streambuf* is that any code (say you have an external library) that writes to std::cout will write to your file.
mystreambuf.h
#ifndef MYSTREAMBUF_H
#define MYSTREAMBUF_H
template <typename charT, typename traits = std::char_traits<charT> >
class mystreambuf : public std::basic_streambuf<charT, traits>
{
public:
// The size of the input and output buffers.
static const size_t BUFF_SIZE = 1024;
typedef traits traits_type;
typedef typename traits_type::int_type int_type;
typedef typename traits_type::pos_type pos_type;
typedef typename traits_type::off_type off_type;
// You can use any method that you want, but here, we'll just take in a raw streambuf as a
// slave I/O object. xor_char is what each character is xored with before output.
explicit mystreambuf(std::streambuf* buf)
: out_buf_(new charT[BUFF_SIZE])
{
// ADDED: store the original cout stream and open our output file
this->original_cout = buf;
outfile.open("test.txt");
// Initialize the put pointer. Overflow won't get called until this buffer is filled up,
// so we need to use valid pointers.
this->setp(out_buf_, out_buf_ + BUFF_SIZE - 1);
}
// It's a good idea to release any resources when done using them.
~mystreambuf() {
delete [] out_buf_;
// ADDED: restore cout, close file
std::cout.rdbuf(original_cout);
outfile.flush();
outfile.close();
}
protected:
// This is called when there are too many characters in the buffer (thus, a write needs to be performed).
virtual int_type overflow(int_type c);
// This is called when the buffer needs to be flushed.
virtual int_type sync();
private:
// Output buffer
charT* out_buf_;
// ADDED: tracking the original std::cout stream & the file stream to open
std::streambuf* original_cout;
std::ofstream outfile;
};
#endif
mystreambuf.cpp
// Based on class by perfectly.insane from http://www.dreamincode.net/code/snippet2499.htm
#include <fstream>
#include <iostream>
#include <streambuf>
#include "mystreambuf.h"
// This function is called when the output buffer is filled.
// In this function, the buffer should be written to wherever it should
// be written to (in this case, the streambuf object that this is controlling).
template <typename charT, typename traits>
typename mystreambuf<charT, traits>::int_type
mystreambuf<charT, traits>::overflow(typename mystreambuf<charT, traits>::int_type c)
{
charT* ibegin = this->out_buf_;
charT* iend = this->pptr();
// Reset the put pointers to indicate that the buffer is free
// (at least it will be at the end of this function).
setp(out_buf_, out_buf_ + BUFF_SIZE + 1);
// If this is the end, add an eof character to the buffer.
// This is why the pointers passed to setp are off by 1
// (to reserve room for this).
if(!traits_type::eq_int_type(c, traits_type::eof())) {
*iend++ = traits_type::to_char_type(c);
}
// Compute the write length.
int_type ilen = iend - ibegin;
// ADDED: restore cout to its original stream, output to it, output to the file, then set cout's stream back to this, our streambuf)
std::cout.rdbuf(original_cout);
out_buf_[ilen] = '\0';
std::cout << out_buf_;
outfile << out_buf_;
std::cout.rdbuf(this);
return traits_type::not_eof(c);
}
// This is called to flush the buffer.
// This is called when we're done with the file stream (or when .flush() is called).
template <typename charT, typename traits>
typename mystreambuf<charT, traits>::int_type
mystreambuf<charT, traits>::sync()
{
return traits_type::eq_int_type(this->overflow(traits_type::eof()),
traits_type::eof()) ? -1 : 0;
}
int main(int argc, char* argv[]) {
mystreambuf<char> filter(std::cout.rdbuf());
std::cout.rdbuf( &filter );
std::cout << "Hello World" << std::endl;
return 0;
}
hope this helps; cheers