C++: buffer the cin istream

C++: buffer the cin istream - c++

The problem is:
I have a code that operates on a fully functional istream. It uses methods like:
istream is;
is.seekg(...) // <--- going backwards at times
is.tellg() // <--- to save the position before looking forward
etc.
These methods are only available for istreams from, say, a file. However, if I use cin in this fashion, it will not work--cin does not have the option of saving a position, reading forward, then returning to the saved position.
// So, I can't cat the file into the program
cat file | ./program
// I can only read the file from inside the program
./program -f input.txt
// Which is the problem with a very, very large zipped file
// ... that cannot coexist on the same raid-10 drive system
// ... with the resulting output
zcat really_big_file.zip | ./program //<--- Doesn't work due to cin problem
./program -f really_big_file.zip //<--- not possible without unzipping
I can read cin into a deque, and process the deque. A 1mb deque buffer would be more than enough. However, this is problematic in three senses:
I have to rewrite everything to do this with a deque
It wont be as bulletproof as just using an istream, for which the code has already been debugged
It seems like, if I implement it as a deque with some difficulty, someone is going to come along and say, why didn't you just do it like ___
What is the proper/most efficient way to create a usable istream object, in the sense that all members are active, with a cin istream?
(Bearing in mind that performance is important)

You could create a filtering stream buffer reading from std::cin when getting new data but buffering all received characters. You'd be able to implement seeking within the buffered range of the input. Seeking beyond the end of the already buffered input would imply reading corresponding amounts of data. Here is an example of a corresponding implementation:
#include <iostream>
#include <vector>
class bufferbuf
: public std::streambuf {
private:
std::streambuf* d_sbuf;
std::vector<char> d_buffer;
int_type underflow() {
char buffer[1024];
std::streamsize size = this->d_sbuf->sgetn(buffer, sizeof(buffer));
if (size == 0) {
return std::char_traits<char>::eof();
}
this->d_buffer.insert(this->d_buffer.end(), buffer, buffer + size);
this->setg(this->d_buffer.data(),
this->d_buffer.data() + this->d_buffer.size() - size,
this->d_buffer.data() + this->d_buffer.size());
return std::char_traits<char>::to_int_type(*this->gptr());
}
pos_type seekoff(off_type off, std::ios_base::seekdir whence, std::ios_base::openmode) {
switch (whence) {
case std::ios_base::beg:
this->setg(this->eback(), this->eback() + off, this->egptr());
break;
case std::ios_base::cur:
this->setg(this->eback(), this->gptr() + off, this->egptr());
break;
case std::ios_base::end:
this->setg(this->eback(), this->egptr() + off, this->egptr());
break;
default: return pos_type(off_type(-1)); break;
}
return pos_type(off_type(this->gptr() - this->eback()));
}
pos_type seekpos(pos_type pos, std::ios_base::openmode) {
this->setg(this->eback(), this->eback() + pos, this->egptr());
return pos_type(off_type(this->gptr() - this->eback()));
}
public:
bufferbuf(std::streambuf* sbuf)
: d_sbuf(sbuf)
, d_buffer() {
this->setg(0, 0, 0); // actually the default setting
}
};
int main() {
bufferbuf sbuf(std::cin.rdbuf());
std::istream in(&sbuf);
std::streampos pos(in.tellg());
std::string line;
while (std::getline(in, line)) {
std::cout << "pass1: '" << line << "'\n";
}
in.clear();
in.seekg(pos);
while (std::getline(in, line)) {
std::cout << "pass2: '" << line << "'\n";
}
}
This implementation buffers input before passing it on to the reading step. You can read individual characters (e.g. change char buffer[1024]; to become char buffer[1]; or replace the use of sgetn() appropriately using sbumpc()) to provide a more direct response: there is a trade-off between immediate response and performance for batch processing.

cin is user input and should be treated as unpredictable. If you want to use mentioned functionality and you are sure about your input you can read whole input to istringstream and then operate on it

Related

Finding a way to reach at beginning of the buffer once input is entered and reading characters one by one

I wanted to read the buffer until EOF is found in the buffer and then set the stream as corrupt or bad. If epack() was not a protected member function I could've tried this:
std::string s;
int position=0;
char c;
std::streambuf *buffer_obj = std::cin.rdbuf();
/*ASSOCIATING STREAM_OBJ WITH INPUT BUFFER*/
std::istream stream_obj(buffer_obj);
/*ENTER "ABC" THEN PRESS CTRL+D TO INPUT EOF THEN ENTER "DEF"*/
stream_obj>>s;
/*LOOP STARTS FOR EOF*/
if((c=*(std::buffer_obj->eback()[position++])) != EOF)
{
std::cout<<c;
}
else
{
std::cout<<"EOF in buffer"<<endl;
break;
}
/*LOOP ENDS*/
Two questions:
Had this above code worked if eback() were a public function?
How can I search for specific characters in the buffer since pointers are protected and I can't get sbumpc or sgetc to do that.

Got the answer to look for specific character in the buffer =>
#include<iostream>
#include<string>
int main()
{
int i;
std::streambuf *obj=std::cin.rdbuf();
std::istream sobj(obj);
/*THIS ONE EXTRACTS FROM THE BUFFER ONE CHARACTER AT A TIME*/
while((i=sobj.std::istream::get())!=EOF)
{
std::cout<<(char)i;
}
/*EOFBIT IS SET FOR THE STREAM WHEN EOF IS ENCOUNTERED*/
if(sobj.eof()==true)
{
std::cout<<"EOF FOUND"<<std::endl;
}
return 0;
}
Similarly any character can be found in the buffer and sobj can be replaced with cin and can be checked for EOF. For example, lets say the character one is looking for is D and we need to stop reading after we found D in the buffer:
int i;
while((i=std::cin.std::istream::get())!='D')
{
std::cout<<(char)i;
}
so if input is avb45Dert then the output will be avb45.

Write performance of a file c++

I need to write a file in c++. The content is token from a while loop so now I'm writing it line by line.
Now I'm thinking that I can improve the writing time saving all the content in a variable and then write the file.
Does someone know which of the two ways is better?
Each line is written by this function:
void writeFile(char* filename, string value){
ofstream outFile(filename, ios::app);
outFile << value;
outFile.close();
}
while(/* Something */){
/* something */
writeFile(..);
}
The other way is:
void writeNewFile(char* filename, string value){
ofstream outFile(filename);
outFile<<value;
outFile.close();
}
string res = "";
while(/* Something */){
/* something */
res += mydata;
}
writeNewFile(filename, res);

Have you considered;
ofstream outFile(filename);
while(/* Something */){
/*...*/
outFile<< mydata;
}
outFile.close();
The outfile (streams) are buffered, which means that it will accumulate the data in an internal buffer (like a string) before writing it to disk -- you are unlikely to be able to beat that unless you have very special requirements

In the first case you are closing the filestream every loop iteration and opening a new stream, it'd be better to do something like:
void writeFile(const std::string& value, std::ostream* os = nullptr) {
if (os != nullptr) {
os << value;
}
}
Or:
void writeFile(const std::string& value, std::ostream& os) {
os << value;
}
And then call this with the fstream object (or its address in first case) you created in another function/main/whatever.
As for whether it's quicker to write continuously or at the end, it really depends on the kind of computations you're performing in the while loop and how much that'll slow down the whole process. However, for reliability reasons, sometimes it's best to write continuously to avoid losing all the data if the program crashes during the while loop execution for whatever reason.

fstream objects are already buffered, therfore adding to string doesn't help you much, in fact one may even lose performance as result of reallocation of string content when exceeding previously allocated size. One would have to test to know.

Parse buffered data line by line

I want to write a parser for Wavefront OBJ file format, plain text file.
Example can be seen here: people.sc.fsu.edu/~jburkardt/data/obj/diamond.obj.
Most people use old scanf to parse this format line by line, however I would prefer to load the whole file at once to reduce IO operation count. Is there a way to parse this kind of buffered data line by line?
void ObjModelConditioner::Import(Model& asset)
{
uint8_t* buffer = SyncReadFile(asset.source_file_info());
delete [] buffer;
}
Or would it be preferable to load whole file into a string and try to parse that?

After a while It seems I found sufficient (and simple) solution. Since my goal is to create asset conditioning pipeline, the code has to be able to handle large amounts of data efficiently. Data can be read into a string at once and once loaded, stringstream can be initialized with this string.
std::string data;
SyncReadFile(asset.source_file_info(), data);
std::stringstream data_stream(data);
std::string line;
Then I simply call getline():
while(std::getline(data_stream, line))
{
std::stringstream line_stream(line);
std::string type_token;
line_stream >> type_token;
if (type_token == "v") {
// Vertex position
Vector3f position;
line_stream >> position.x >> position.y >> position.z;
// ...
}
else if (type_token == "vn") {
// Vertex normal
}
else if (type_token == "vt") {
// Texture coordinates
}
else if (type_token == "f") {
// Face
}
}

Here's a function that splits a char array into a vector of strings (assuming each new string starts with '\n' symbol):
#include <iostream>
#include <vector>
std::vector< std::string >split(char * arr)
{
std::string str = arr;
std::vector< std::string >result;
int beg=0, end=0;//begining and end of each line in the array
while( end = str.find( '\n', beg + 1 ) )
{
if(end == -1)
{
result.push_back(str.substr(beg));
break;
}
result.push_back(str.substr(beg, end - beg));
beg = end;
}
return result;
}
Here's the usage:
int main()
{
char * a = "asdasdasdasdasd \n asdasdasd \n asdasd";
std::vector< std::string >result = split(a);
}

If you've got the raw data in a char[] (or a unsigned char[]), and
you know its length, it's pretty trivial to write an input only, no seek
supported streambuf which will allow you to create an std::istream
and to use std::getline on it. Just call:
setg( start, start, start + length );
in the constructor. (Nothing else is needed.)

It really depends on how you're going to parse the text. One way to do this would be simply to read the data into a vector of strings. I'll assume that you've already covered issues such as scaleability / use of memory etc.
std::vector<std::string> lines;
std::string line;
ifstream file(filename.c_str(), ios_base::in);
while ( getline( file, line ) )
{
lines.push_back( line );
}
file.close();
This would cache your file in lines. Next you need to go through lines
for ( std::vector<std::string>::const_iterator it = lines.begin();
it != lines.end(); ++it)
{
const std::string& line = *it;
if ( line.empty() )
continue;
switch ( line[0] )
{
case 'g':
// Some stuff
break;
case 'v':
// Some stuff
break;
case 'f':
// Some stuff
break;
default:
// Default stuff including '#' (probably nothing)
}
}
Naturally, this is very simplistic and depends largely on what you want to do with your file.
The size of the file that you've given as an example is hardly likely to cause IO stress (unless you're using some very lightweight equipment) but if you're reading many files at once I suppose it might be an issue.
I think your concern here is to minimise IO and I'm not sure that this solution will really help that much since you're going to be iterating over a collection twice. If you need to go back and keep reading the same file over and over again, then it will definitely speed things up to cache the file in memory but there are just as easy ways to do this such as memory mapping a file and using normal file accessing. If you're really concerned, then try profiling a solution like this against simply processing the file directly as you read from IO.

Reading multiple files

I want to alternate between reading multiple files. Below is a simplified version of my code.
ifstream* In_file1 = new ifstream("a.dat", ios::binary);
ifstream* In_file2 = new ifstream("b..dat", ios::binary);
ifstream* In_file;
int ID;
In_file = In_file1;
int ID = 0;
//LOOPING PORTION
if (In_file -> eof())
{
In_file -> seekg(0, ios_base::beg);
In_file->close();
switch (ID)
{
case 0:
In_file = In_file2; ID = 1; break;
case 1:
In_file = In_file1; ID = 0; break;
}
}
//some codes
:
:
In_file->read (data, sizeof(double));
//LOOPING PORTION
The code works well if I am reading the files one time and I thought that everything was cool. However, if the part termed 'looping portion' is within a loop, then the behaviour becomes weird and I start having a single repeating output. Please, can someone tell me what is wrong and how I can fix it? If you have a better method of tacking the problem, please suggest. I appreciate it.
//SOLVED
Thank you everybody for your comments, I appreciate it. Here is what I simple did:
Instead of the original
switch (ID)
{
case 0:
In_file = In_file2; ID = 1; break;
case 1:
In_file = In_file1; ID = 0; break;
}
I simply did
switch (ID)
{
case 0:
In_file = new ifstream("a.dat", ios::binary); ID = 1; break;
case 1:
In_file = new ifstream("b.dat", ios::binary); ID = 0; break;
}
Now it works like charm and I can loop as much as I want:-). I appreciate your comments, great to know big brother still helps.

Let's see: the code you posted works fine, and you want us to tell you
what's wrong with the code you didn't post. That's rather difficult.
Still, the code you posted probably doesn't work correctly either.
std::istream::eof can only be used reliably after an input (or some
other operation) has failed; in the code you've posted, it will almost
certainly be false, regardless.
In addition: there's no need to dynamically allocate ifstream; in
fact, there are almost no cases where dynamic allocation of ifstream
is appropriate. And you don't check that the opens have succeeded.
If you want to read two files, one after the other, the simplest way is
to use two loops, one after the other (calling a common function for
processing the data). If for some reason that's not appropriate, I'd
use a custom streambuf, which takes a list of filenames in the
constructor, and advances to the next whenever it reaches end of file on
one, only returning EOF when it has reached the end of all of the
files. (The only complication in doing this is what to do if one of the
opens fails. I do this often enough that it's part of my tool kit,
and I use a callback to handle failure. For a one time use, however,
you can just hard code in whatever is appropriate.)
As a quick example:
// We define our own streambuf, deriving from std::streambuf
// (All istream and ostream delegate to a streambuf for the
// actual data transfer; we'll use an instance of this to
// initialize the istream we're going to read from.)
class MultiFileInputStreambuf : public std::streambuf
{
// The list of files we will process
std::vector<std::string> m_filenames;
// And our current position in the list (actually
// one past the current position, since we increment
// it when we open the file).
std::vector<std::string>::const_iterator m_current;
// Rather than create a new filebuf for each file, we'll
// reuse this one, closing any previously open file, and
// opening a new file, as needed.
std::filebuf m_streambuf;
protected:
// This is part of the protocol for streambuf. The base
// class will call this function anytime it needs to
// get a character, and there aren't any in the buffer.
// This function can set up a buffer, if it wants, but
// in this case, the buffering is handled by the filebuf,
// so it's likely not worth the bother. (But this depends
// on the cost of virtual functions---without a buffer,
// each character read will require a virtual function call
// to get here.
//
// The protocol is to return the next character, or EOF if
// there isn't one.
virtual int underflow()
{
// Get one character from the current streambuf.
int result = m_streambuf.sgetc();
// As long as 1) the current streambuf is at end of file,
// and 2) there are more files to read, open the next file
// and try to get a character from it.
while ( result == EOF && m_current != m_filenames.eof() ) {
m_streambuf.close();
m_streambuf.open( m_current->c_str(), std::ios::in );
if ( !m_streambuf.is_open() )
// Error handling here...
++ m_current;
result = m_streambuf.sgetc();
}
// We've either gotten a character from the (now) current
// streambuf, or there are no more files, and we'll return
// the EOF from our last attempt at reading.
return result;
}
public:
// Use a template and two iterators to initialize the list
// of files from any STL sequence whose elements can be
// implicitly converted to std::string.
template<typename ForwardIterator>
MultiFileInputStreambuf(ForwardIterator begin, ForwardIterator end)
: m_filenames(begin, end)
, m_current(m_filenames.begin())
{
}
};

#include <iostream>
#include <fstream>
#include <string>
#define NO_OF_FILES 2
int main ()
{
std::ifstream in;
std::string line;
std::string files[NO_OF_FILES] =
{
"file1.txt",
"file2.txt",
};
// start our engine!
for (int i = 0; i < NO_OF_FILES; i++)
{
in.open(files[i].c_str(), std::fstream::in);
if (in.is_open())
{
std::cout << "reading... " << files[i] << endl;
while (in.good())
{
getline(in, line);
std::cout << line << std::endl;
}
in.close();
std::cout << "SUCCESS" << std::endl;
}
else
std::cout << "Error: unable to open " + files[i] << std::endl;
}
return 0;
}

How to write console data into a text file in C++?

I'm working on a file sharing application in C++. I want to write console output into a separate file and at the same time I want to see the output in console also. Can anybody help me...Thanks in advance.

Here we go...
#include <fstream>
using std::ofstream;
#include <iostream>
using std::cout;
using std::endl;
int main( int argc, char* argv[] )
{
ofstream file( "output.txt" ); // create output file stream to file output.txt
if( !file ) // check stream for error (check if it opened the file correctly)
cout << "error opening file for writing." << endl;
for( int i=0; i<argc; ++i ) // argc contains the number of arguments
{
file << argv[i] << endl; // argv contains the char arrays of commandline arguments
cout << argv[i] << endl;
}
file.close(); // always close a file stream when you're done with it.
return 0;
}
PS: OK, read your question wrong (console output/input mixup), but you still get the idea I think.

The idea is to create a derivate of std::streambuf which will output data to both the file and cout. Then create an instance of it and use cout.rdbuf(...);
Here is the code (tested with MSVC++ 2010, should work on any compiler):
class StreambufDoubler : public std::streambuf {
public:
StreambufDoubler(std::streambuf* buf1, std::streambuf* buf2) :
_buf1(buf1), _buf2(buf2), _buffer(128)
{
assert(_buf1 && _buf2);
setg(0, 0, 0);
setp(_buffer.data(), _buffer.data(), _buffer.data() + _buffer.size());
}
~StreambufDoubler() {
sync();
}
void imbue(const std::locale& loc) {
_buf1->pubimbue(loc);
_buf2->pubimbue(loc);
}
std::streampos seekpos(std::streampos sp, std::ios_base::openmode which) {
return seekoff(sp, std::ios_base::cur, which);
}
std::streampos seekoff(std::streamoff off, std::ios_base::seekdir way, std::ios_base::openmode which) {
if (which | std::ios_base::in)
throw(std::runtime_error("Can't use this class to read data"));
// which one to return? good question
// anyway seekpos and seekoff should never be called
_buf1->pubseekoff(off, way, which);
return _buf2->pubseekoff(off, way, which);
}
int overflow(int c) {
int retValue = sync() ? EOF : 0;
sputc(c);
return retValue;
}
int sync() {
_buf1->sputn(pbase(), pptr() - pbase());
_buf2->sputn(pbase(), pptr() - pbase());
setp(_buffer.data(), _buffer.data(), _buffer.data() + _buffer.size());
return _buf1->pubsync() | _buf2->pubsync();
}
private:
std::streambuf* _buf1;
std::streambuf* _buf2;
std::vector<char> _buffer;
};
int main() {
std::ofstream myFile("file.txt");
StreambufDoubler doubler(std::cout.rdbuf(), myFile.rdbuf());
std::cout.rdbuf(&doubler);
// your code here
return 0;
}
However note that a better implementation would use templates, a list of streambufs instead of just two, etc. but I wanted to keep it as simple as possible.

What you want actually is to follow in real time the lines added to the log your application writes.
In the Unix world, there's a simple tool that has that very function, it's called tail.
Call tail -f your_file and you will see the file contents appearing in almost real time in the console.
Unfortunately, tail is not a standard tool in Windows (which I suppose you're using, according to your question's tags).
It can however be found in the GnuWin32 package, as well as MSYS.
There are also several native tools for Windows with the same functionality, I'm personally using Tail For Win32, which is licensed under the GPL.
So, to conclude, I think your program should not output the same data to different streams, as it might slow it down without real benefits, while there are established tools that have been designed specifically to solve that problem, without the need to develop anything.

i don't program in c++ but here is my advice: create new class, that takes InputStream (istream in c++ or smth), and than every incoming byte it will transfer in std.out and in file.
I am sure there is a way to change standard output stream with forementioned class. As i remember, std.out is some kind of property of cout.
And again, i spent 1 week on c++ more than half a year ago, so there is a chance that all i've said is garbage.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js