Easiest way to get a line when data is coming in chunks? - c++

I am using win32's ReadFile to read from a child process's pipe. That gives me chunks of characters at a time and the size of each chunk, but they may or may not have new lines. I want to process the output line by line. What is the simplest way of doing so?
I thought about appending each chunk to a string, and at the end, using a stringstream to process it line by line, but I'd like to do this as data is coming in. I suppose the trick is, how and where should I detect the new line ending? If only streamreader's getline returned nothing when no delimiter is found...

Append to a string until you encounter newline char or end of data. Then you have a line in the string, process it, empty string and repeat. Reason for using the string: that you don't know how long a line may be, and a string does the reallocation etc. for you as necessary.
Special case: end of data with nothing earlier on that line should probably be not-a-line, ignored.
Cheers & hth.

What I can think of is using a buffer where to store chunks coming to you, you know the size via lpNumberOfBytesRead, so on each chunk you check if it contains new line character/s and if it does, you output all the buffered characters till the new line character, then start buffering until you received another chunk with new line characters and so on.
Some pseudo code might look like:
w_char buffer[BUFFER_SIZE]; // enough for few chunks (use new to allocate dynamic memory)
ReadLine(hFile, lpBuffer, nNumberOfBytesToRead, lpNumberOfBytesRead, lpOverlapped);
if (w_char has new_line_character) {
process_whole_line(); // don't forget to clear up the buffer
}

You can extend std::basic_streambuf and implement xsputn so it stores data in an internal buffer, and detect newlines or do whatever processing you need. If you want to process only complete lines, upon detection you can push the buffer to a std::queue up to the newline, and erase the respective part from the buffer. You'll need to implement overflow too. For example:
template<typename CharT, typename Traits = std::char_traits<CharT> >
class line_streambuf : public std::basic_streambuf<CharT, Traits> {
public:
typedef CharT char_type;
typedef Traits traits_type;
typedef std::basic_string<char_type> string_type;
typedef typename string_type::size_type size_type;
typedef typename traits_type::int_type int_type;
line_streambuf(char_type separator = '\n') : _separator(separator) {}
virtual ~line_streambuf() {}
bool getline(string_type& output) { /* pop from the queue and return */ }
protected:
virtual int_type overflow(int_type v) {
if (v == _separator) {
_processed.push(_buffer);
_buffer.erase(_buffer.begin(), _buffer.end());
} else {
_buffer += v;
}
return v;
}
virtual std::streamsize xsputn(const char_type* p, std::streamsize n) {
_buffer.append(p, p + n);
while (true) {
// The 1st find could be smarter - finding only after old_length+p
size_type pos = _buffer.find(_separator);
if (pos == string_type::npos)
break;
_processed.push(string_type(_buffer.begin(), _buffer.begin()+pos));
_buffer.erase(_buffer.begin(), _buffer.begin() + pos + 1);
}
return n;
}
private:
char_type _separator;
string_type _buffer;
std::queue<string_type> _processed;
};
NOTE: highly untested code. Let me know if you find a problem, or feel free to edit.

Related

use iostream or alternative for managing stream

I want to write a function which (simplified) takes as a parameter an input buffer of variable size, processes it (sequentially), and returns a buffer of a fixed size. The remaining part of the buffer has to stay in the "pipeline" for the next call of the function.
Question 1:
From my research it looks like iostream is the way to go, but apparently no one is using it. Is this the best way to go?
Question 2:
How can I declare the iostream object globally? Actually, as I have several streams I will need to write the iostream Object in a struct-vector. How do I do this?
At the moment my code looks like that:
struct membuf : std::streambuf
{
membuf(char* begin, char* end) {
this->setg(begin, begin, end);
}
};
void read_stream(char* bufferIn, char* BufferOut, int lengthBufferIn)
{
char* buffer = (char*) malloc(300); //How do I do this globally??
membuf sbuf(buffer, buffer + sizeof(buffer));//How do I do this globally??
std::iostream s(&sbuf); //How do I do this globally??
s.write(bufferIn, lengthBufferIn);
s.read(BufferOut, 100);
process(BufferOut);
}
I see no need for iostream here. You can create an object who has a reference to the buffer (so no copies involved) and to the position where it is left.
So something along this:
class Transformer {
private:
char const *input_buf_;
public:
Transformer(char const *buf) : input_buf_(buf) {
}
bool has_next() const { return input_buf_ != nullptr; } // or your own condition
std::array<char, 300> read_next() {
// read from input_buf_ as much as you need
// advance input_buf_ to the remaining part
// make sure to set input_buf_ accordingly after the last part
// e.g. input_buf_ = nullptr; for how I wrote hasNext
return /*the processed fixed size buffer*/;
}
}
usage:
char *str == //...;
Transformer t(str);
while (t.has_next()) {
std::array<char, 300> arr = t.read_next();
// use arr
}
Question 1: From my research it looks like iostream is the way to go, but apparently no one is using it. Is this the best way to go?
Yes (the std::istream class and specializations, are there to manage streams, and they fit the problem well).
Your code could look similar to this:
struct fixed_size_buffer
{
static const std::size_t size = 300;
std::vector<char> value;
fixed_size_buffer() : value(fixed_size_buffer::size, ' ') {}
};
std::istream& operator>>(std::istream& in, fixed_size_buffer& data)
{
std::noskipws(in); // read spaces as well as characters
std::copy_n(std::istream_iterator<char>{ in },
fixed_size_buffer::size);
std::begin(data.value)); // this leaves in in an invalid state
// if there is not enough data in the input
// stream;
return in;
}
Consuming the data:
fixed_size_buffer buffer;
std::ifstream fin{ "c:\\temp\\your_data.txt" };
while(fin >> buffer)
{
// do something with buffer here
}
while(std::cin >> buffer) // read from standard input
{
// do something with buffer here
}
std::istringstream sin{ "long-serialized-string-here" };
while(sin >> buffer) // read from standard input
{
// do something with buffer here
}
Question 2: How can I declare the iostream object globally? Actually, as I have several streams I will need to write the iostream Object in a struct-vector. How do I do this?
iostreams do not support copy-construction; Because of this, you will need to keep them in a sequence of pointers / references to base class:
auto fin = std::make_unique<std::ifstream>("path_to_input_file");
std::vector<std::istream*> streams;
streams.push_back(&std::cin);
streams.push_back(fin.get());
fixed_size_buffer buffer;
for(auto in_ptr: streams)
{
std::istream& in = &in_ptr;
while(in >> buffer)
{
// do something with buffer here
}
}

Adapting C++ std/boost iostreams to provide cyclic write to memory block

Background:
I'm trying to optimize a logging system so that it uses memory-mapped files. I need to provide an std::ostream-like interface so that the logging system can write to that memory.
I have identified std::strstream (which is deprecated though) and boost::iostreams::basic_array_sink could fit my needs.
Now I want to have the logging cyclic, meaning that when the output pointer is near the end of the memory block it should start over at the beginning again.
My question is where would be the best point to start in order to implement this specific behaviour.
I'm rather overwhelmed by the std::iostreams class hierarchy and don't grasp all the internal workings as for now.
I'm uncertain to whether i should/need to derive from ostream, streambuf, or both?
Are these made for being derived from, anyway?
Or using boost:iostreams, would i need to have to write my own Sink?
EDIT:
The following attempt compiles and produces the expected output:
class rollingstreambuf : public std::basic_streambuf<TCHAR>
{
public:
typedef std::basic_streambuf<TCHAR> Base;
rollingstreambuf(Base::char_type* baseptr, size_t size)
{
setp(baseptr, baseptr + size);
}
protected:
virtual int_type overflow (int_type c)
{
// reset position to start of buffer
setp(pbase(), epptr());
return putchar(c);
}
virtual std::streamsize xsputn (const char* s, std::streamsize n)
{
if (n >= epptr() - pptr())
// reset position to start of buffer
setp(pbase(), epptr());
return Base::xsputn(s, n);
}
};
char buffer[100];
rollingstreambuf buf(buffer, sizeof(buffer));
std::basic_ostream<TCHAR> out(&buf);
for (int i=0; i<10; i++)
{
out << "Mumblemumble " << i << '\n';
}
out << std::ends; //write terminating NULL char
Printing the buffer gives:
Mumblemumble 6
Mumblemumble 7
Mumblemumble 8
Mumblemumble 9
(which confirms the roll-over has taken place)
What it does is that it makes the streambuf use the provided buffer as a cyclic output buffer (put area), without ever advancing the buffer window in the output sequence (stream).
(Using terminology from http://en.cppreference.com/w/cpp/io/basic_streambuf)
Now i feel very uncertain about the robustness and quality of this implementation. Please review and comment it.
This is a valid approach. overflow() should return:
traits::eof() or throws an exception if the function fails. Otherwise, returns some value other than traits::eof() to indicate success.
E.g.:
virtual int_type overflow (int_type c)
{
// reset position to start of buffer
setp(pbase(), epptr());
return traits::not_eof(c);
}
xsputn() should probably write the beginning of the sequence to the end of the buffer, then rewind and write the remaining sequence to the front of the buffer. You could probably get away with the default implementation of xsputn() that calls sputc(c) for each character and then overflow() when the buffer is full.

What is wrong with my implementation of overflow()?

I am trying to implement a stream buffer and I'm having trouble with making overflow() work. I resize the buffer by 10 more characters and reset the buffer using setp. Then I increment the pointer back where we left off. For some reason the output is not right:
template <class charT, class traits = std::char_traits<charT>>
class stringbuf : public std::basic_stringbuf<charT, traits>
{
public:
using char_type = charT;
using traits_type = traits;
using int_type = typename traits::int_type;
public:
stringbuf()
: buffer(10, 0)
{
this->setp(&buffer.front(), &buffer.back());
}
int_type overflow(int_type c = traits::eof())
{
if (traits::eq_int_type(c, traits::eof()))
return traits::not_eof(c);
std::ptrdiff_t diff = this->pptr() - this->pbase();
buffer.resize(buffer.size() + 10);
this->setp(&buffer.front(), &buffer.back());
this->pbump(diff);
return traits::not_eof(traits::to_int_type(*this->pptr()));
}
// ...
std::basic_string<charT> str()
{
return buffer;
}
private:
std::basic_string<charT> buffer;
};
int main()
{
stringbuf<char> buf;
std::ostream os(&buf);
os << "hello world how are you?";
std::cout << buf.str();
}
When I print the string it comes out as:
hello worl how are ou?
It's missing the d and the y. What did I do wrong?
The first thing to not is that you are deriving from std::basic_stringbuf<char> for whatever reason without overriding all of the relevant virtual functions. For example, you don't override xsputn() or sync(): whatever these functions end up doing you'll inherit. I'd strongly recommend to derive your stream buffer from std::basic_streambuf<char> instead!
The overflow() method announces a buffer which is one character smaller than the string to the stream buffer: &buffer.back() isn't a pointer to the end of the array but to the last character in the string. Personally, I would use
this->setp(&this->buffer.front(), &this->buffer.front() + this->buffer.size());
There is no problem so far. However, after making space for more characters you omitted adding the overflowing character, i.e., the argument passed to overflow() to the buffer:
this->pbump(diff);
*this->pptr() = traits::to_char_type(c);
this->pbump(1);
There are a few more little things which are not quite right:
It is generally a bad idea to give overriding virtual functions a default parameter. The base class function already provides the default and the new default is only picked up when the function is ever called explicitly.
The string returned may contain a number of null characters at the end because the held string is actually bigger than sequence which was written so far unless the buffer is exactly full. You should probably implement the str() function differently:
std::basic_string<charT> str() const
{
return this->buffer.substr(0, this->pptr() - this->pbase());
}
Growing the string by a constant value is a major performance problem: the cost of writing n characters is n * n. For larger n (they don't really need to become huge) this will cause problems. You are much better off growing your buffer exponentially, e.g., doubling it every time or growing by a factor of 1.5 if you feel doubling isn't a good idea.

boost::iostreams reading from source device

I've been trying to get my head around the iostreams library by boost.
But i cant really fully grasp the concepts.
Say i have the following class:
Pseudocode: The below code is only to illustrate the problem.
Edit: removed the read code because it removed focus on the real problem.
class my_source {
public:
my_source():value(0x1234) {}
typedef char char_type;
typedef source_tag category;
std::streamsize read(char* s, std::streamsize n)
{
... read into "s" ...
}
private:
/* Other members */
};
Now say i want to stream the this to an int.
What do i need to do ? I've tried the following
boost::iostreams::stream<my_source> stream;
stream.open(my_source());
int i = 0;
stream >> i;
// stream.fail() == true; <-- ??
This results in a fail, (failbit is set)
While the following works fine.
boost::iostreams::stream<my_source> stream;
stream.open(my_source());
char i[4];
stream >> i;
// stream.fail() == false;
Could someone explain to me why this is happening ? Is this because i've set the char_type char ?
I cant really find a good explenation anywhere. I've been trying to read the documentation but i cant find the defined behavior for char_type if this is the problem. While when im using stringstreams i can read into a int without doing anything special.
So if anyone has any insight please enlighten me.
All iostreams are textual streams, so this will take the bytewise representation of 0x1234, interpret it as text and try to parse it as integer.
By the way
std::streamsize read(char* s, std::streamsize n)
{
int size = sizeof(int);
memcpy(s, &value, 4);
return size;
}
This has the potential for a buffer overflow if n < 4. Also, you write four bytes and then return the size of an int. memcpy(s, &value, sizeof value); will do the job, a simple return sizeof value; will do the rest.
boost::iostreams::stream constructor without arguments does nothing and in result stream is not open. You need to add fake argument to my_source constructor.
class my_source {
public:
my_source(int fake) : value(0x1234) {}
...
boost::iostreams::stream<my_source> stream(0);

Mirror console output to file in c++

In C++, is there a smart way to mirror output from stdout to both the console and the file?
I'm hoping there is a way to do it like in this question.
Edit: It would be nice to be able to do this with just the standard libraries (ie: no boost)..
Alternatively, just start your program so it's piped to the tee command.
You could try a Tee Device provided by Boost.Iostreams.
A Tee device directs output to multiple streams. As far as I know, you can chain them to reach theoretically infinite output devices from one write call.
This answer shows an example for how to do exactly what you want.
You can do this by creating a class that extends std::streambuf, and has a std::ofstream member. After overriding the std::streambuf::overflow and std::streambuf::sync member functions, you'll be all set.
Most of the code below comes from here. The stuff I've added ("ADDED:") for file-mirroring is pointed out. This might be overly complex as I'm at work and can't pore over it fully to simplify it, but it works - the bonus of doing it this way (instead of just using a std::streambuf* is that any code (say you have an external library) that writes to std::cout will write to your file.
mystreambuf.h
#ifndef MYSTREAMBUF_H
#define MYSTREAMBUF_H
template <typename charT, typename traits = std::char_traits<charT> >
class mystreambuf : public std::basic_streambuf<charT, traits>
{
public:
// The size of the input and output buffers.
static const size_t BUFF_SIZE = 1024;
typedef traits traits_type;
typedef typename traits_type::int_type int_type;
typedef typename traits_type::pos_type pos_type;
typedef typename traits_type::off_type off_type;
// You can use any method that you want, but here, we'll just take in a raw streambuf as a
// slave I/O object. xor_char is what each character is xored with before output.
explicit mystreambuf(std::streambuf* buf)
: out_buf_(new charT[BUFF_SIZE])
{
// ADDED: store the original cout stream and open our output file
this->original_cout = buf;
outfile.open("test.txt");
// Initialize the put pointer. Overflow won't get called until this buffer is filled up,
// so we need to use valid pointers.
this->setp(out_buf_, out_buf_ + BUFF_SIZE - 1);
}
// It's a good idea to release any resources when done using them.
~mystreambuf() {
delete [] out_buf_;
// ADDED: restore cout, close file
std::cout.rdbuf(original_cout);
outfile.flush();
outfile.close();
}
protected:
// This is called when there are too many characters in the buffer (thus, a write needs to be performed).
virtual int_type overflow(int_type c);
// This is called when the buffer needs to be flushed.
virtual int_type sync();
private:
// Output buffer
charT* out_buf_;
// ADDED: tracking the original std::cout stream & the file stream to open
std::streambuf* original_cout;
std::ofstream outfile;
};
#endif
mystreambuf.cpp
// Based on class by perfectly.insane from http://www.dreamincode.net/code/snippet2499.htm
#include <fstream>
#include <iostream>
#include <streambuf>
#include "mystreambuf.h"
// This function is called when the output buffer is filled.
// In this function, the buffer should be written to wherever it should
// be written to (in this case, the streambuf object that this is controlling).
template <typename charT, typename traits>
typename mystreambuf<charT, traits>::int_type
mystreambuf<charT, traits>::overflow(typename mystreambuf<charT, traits>::int_type c)
{
charT* ibegin = this->out_buf_;
charT* iend = this->pptr();
// Reset the put pointers to indicate that the buffer is free
// (at least it will be at the end of this function).
setp(out_buf_, out_buf_ + BUFF_SIZE + 1);
// If this is the end, add an eof character to the buffer.
// This is why the pointers passed to setp are off by 1
// (to reserve room for this).
if(!traits_type::eq_int_type(c, traits_type::eof())) {
*iend++ = traits_type::to_char_type(c);
}
// Compute the write length.
int_type ilen = iend - ibegin;
// ADDED: restore cout to its original stream, output to it, output to the file, then set cout's stream back to this, our streambuf)
std::cout.rdbuf(original_cout);
out_buf_[ilen] = '\0';
std::cout << out_buf_;
outfile << out_buf_;
std::cout.rdbuf(this);
return traits_type::not_eof(c);
}
// This is called to flush the buffer.
// This is called when we're done with the file stream (or when .flush() is called).
template <typename charT, typename traits>
typename mystreambuf<charT, traits>::int_type
mystreambuf<charT, traits>::sync()
{
return traits_type::eq_int_type(this->overflow(traits_type::eof()),
traits_type::eof()) ? -1 : 0;
}
int main(int argc, char* argv[]) {
mystreambuf<char> filter(std::cout.rdbuf());
std::cout.rdbuf( &filter );
std::cout << "Hello World" << std::endl;
return 0;
}
hope this helps; cheers