Avoiding Error Flags when Reading Files - c++

This is how I usually read files with std::ifstream:
while (InFile.peek() != EOF)
{
char Character = InFile.get();
// Do stuff with Character...
}
This avoids the need of an if statement inside the loop. However, it seems that even peek() causes eofbit to be set, which makes calling clear() necessary if I plan on using that same stream later.
Is there a cleaner way to do this?

Typically, you would just use
char x;
while(file >> x) {
// do something with x
}
// now clear file if you want
If you forget to clear(), then use an RAII scope-based class.
Edit: Given a little more information, I'd just say
class FileReader {
std::stringstream str;
public:
FileReader(std::string filename) {
std::ifstream file(filename);
file >> str.rdbuf();
}
std::stringstream Contents() {
return str;
}
};
Now you can just get a copy and not have to clear() the stream every time. Or you could have a self-clearing reference.
template<typename T> class SelfClearingReference {
T* t;
public:
SelfClearingReference(T& tref)
: t(&tref) {}
~SelfClearingReference() {
tref->clear();
}
template<typename Operand> T& operator>>(Operand& op) {
return *t >> op;
}
};

I'm not sure I understand. Infile.peek() only sets eofbit
when it returns EOF. And if it returns EOF, and later read
is bound to fail; the fact that it sets eofbit is an
optimization, more than anything else.

Related

Is there a way to make a smart pointer that points to either a std::ifstream or std::cin?

I want to take input from an istream from either a std::ifstream or std::cin depending on some condition.
As far as I can get it working, I had to use a raw std::istream* pointer:
int main(int argc, char const* argv[]) {
std::istream *in_stream;
std::ifstream file;
if (argc > 1) {
std::string filename = argv[1];
file.open(filename);
if (not file.is_open()) {
return -1;
}
in_stream = &file;
} else {
in_stream = &std::cin;
}
// do stuff with the input, regardless of the source...
}
Is there a way to rewrite the above using smart pointers?
This may be a question about ownership: How does one wrap and pass around two different std::istream instances, one of which is owned and needs disposal (which calls close() upon destruction) and ownership transfers, whereas the other one is not owned by user code (e.g. std::cin) and cannot be deleted. The solution is polymorphism: Create an interface like
struct StreamWrapper {
virtual std::istream &stream() = 0;
virtual ~StreamWrapper() = default;
};
and two implementations, e.g. OwnedStreamWrapper and UnownedStreamWrapper. The former can contain a std::istream directly (which is moveable, but not copyable) and the latter can contain (e.g.) a std::istream&, i.e. have no ownership over the referenced stream instance.
struct OwnedStreamWrapper : public StreamWrapper {
template <typename... A>
OwnedStreamWrapper(A &&...a) : stream_{std::forward<A>(a)...} {}
std::istream &stream() override { return stream_; }
private:
std::ifstream stream_;
};
struct UnownedStreamWrapper : public StreamWrapper {
UnownedStreamWrapper(std::istream &stream) : stream_{stream} {}
std::istream &stream() override { return stream_; }
private:
std::istream &stream_;
};
Apart from the stream ownership, wrapper ownership matters as well and can be used either to handle the wrappers’ (and streams’) lifespan automatically on the stack, or to allow a wrapper’s lifespan to exceed that of the current stack frame. In the example below, the first ReadWrapper() takes ownership of the wrapper and deletes it automatically whereas the second ReadWrapper() does not take ownership. In both cases polymorphism (i.e. the implementation of StreamWrapper) determines how the underlying stream is handled, i.e. whether it outlives its wrapper (like std::cin should) or dies together with it (like a manually instantiated stream should).
void ReadWrite(std::istream &in, std::ostream &out) {
std::array<char, 1024> buffer;
while (in) {
in.read(buffer.data(), buffer.size());
out.write(buffer.data(), in.gcount());
}
}
void ReadWrapper(std::unique_ptr<StreamWrapper> stream) {
ReadWrite(stream->stream(), std::cout);
}
void ReadWrapper(StreamWrapper &stream) {
ReadWrite(stream.stream(), std::cout);
}
Below are all four combinations of { owned | unowned } { streams | stream wrappers } in a runnable example:
#include <array>
#include <fstream>
#include <iostream>
#include <memory>
#include <utility>
namespace {
/********** The three snippets above go here! **********/
} // namespace
int main() {
OwnedStreamWrapper stream1{"/proc/cpuinfo"};
std::unique_ptr<StreamWrapper> stream2{
std::make_unique<OwnedStreamWrapper>("/proc/cpuinfo")};
std::ifstream stream3_file{"/proc/cpuinfo"}; // Let’s pretend it is unowned.
UnownedStreamWrapper stream3{stream3_file};
std::ifstream stream4_file{"/proc/cpuinfo"}; // Let’s pretend it is unowned.
std::unique_ptr<StreamWrapper> stream4{
std::make_unique<UnownedStreamWrapper>(stream4_file)};
ReadWrapper(stream1); // owned stream, wrapper kept
ReadWrapper(std::move(stream2)); // owned stream, wrapper transferred
ReadWrapper(stream3); // unowned stream, wrapper kept
ReadWrapper(std::move(stream4)); // unowned stream, wrapper transferred
}
A smart pointer is kind of an awkward fit for this situation, because it requires a dynamic allocation (which is probably not necessary in this case)
1. Do your processing in another function
As #BoP pointed out in the comments a nice way to handle this could be to use another function that gets passed the appropriate input stream:
godbolt
int do_thing(std::istream& in_stream) {
// do stuff with the input, regardless of the source...
int foobar;
in_stream >> foobar;
return 0;
}
int main(int argc, char const* argv[]) {
std::ofstream{"/tmp/foobar"} << 1234;
if(argc > 1) {
std::ifstream file{argv[1]};
if(!file) return -1;
return do_thing(file);
} else {
return do_thing(std::cin);
}
}
2. Switch the stream buffer
If you won't be using std::cin at all if a filename is passed you could change the streambuffer of std::cin to the one of the file with rdbuf() - that way you can use std::cin in both cases.
godbolt
std::ifstream file;
std::streambuf* oldbuf = std::cin.rdbuf();
if (argc > 1) {
file.open(argv[1]);
if (!file) return -1;
// switch buffer of std::cin
std::cin.rdbuf(file.rdbuf());
}
// do stuff with the input, regardless of the source...
int foobar;
std::cin >> foobar; // this will reader either from std::cin or the file, depending on the current streambuffer assigned to std::cin
// restore old stream buffer
std::cin.rdbuf(oldbuf);
3. Using std::unique_ptr / std::shared_ptr with a custom deleter
The easiest solution if you still want to use smart pointers would be to use std::unique_ptr / std::shared_ptr with a custom deleter that handles the std::cin case:
godbolt
// deletes the stream only if it is not std::cin
struct istream_ptr_deleter {
void operator()(std::istream* stream) {
if(&std::cin == stream) return;
delete stream;
}
};
using istream_ptr = std::unique_ptr<std::istream, istream_ptr_deleter>;
// Example Usage:
istream_ptr in_stream;
if (argc > 1) {
in_stream.reset(new std::ifstream(argv[1]));
if (!*in_stream) return -1;
} else {
in_stream.reset(&std::cin);
}
int foobar;
*in_stream >> foobar;

why stream keep reading and won't reach to eof?

I'm implementing a very simple serialization library, to save/restore user_info object, which contains an int and a std::string, here's my code:
#include <iostream>
#include <sstream>
using namespace std;
struct user_info {
int id;
string name;
};
// for any POD type like int, read sizeof(int) from the stream
template <class stream_t, class T>
void de_serialize(stream_t& stream, T& x) {
stream.read((char*)&x, sizeof(T));
}
// for std::string, read length(int) first, then read string content when length>0
template <class stream_t>
void de_serialize(stream_t& stream, std::string& str) {
int len;
de_serialize(stream, len);
str.resize(len);
char x;
for(int i=0; i<len; i++) {
de_serialize(stream, x);
str[i] = x;
}
}
// read from stream, fill id and name
template <typename stream_t>
void de_serialize(stream_t& ss, user_info& ui) {
de_serialize(ss, ui.id);
de_serialize(ss, ui.name);
}
int main() {
// read from file, but here I use a 8-bytes-content represents the file content
stringstream ifs;
// two int, \x1\x1\x1\x1 for id, \x0\x0\x0\x0 for name
ifs.write("\x1\x1\x1\x1\x0\x0\x0\x0", 8);
while(!ifs.eof()) {
user_info ui;
de_serialize(ifs, ui);
// when first time goes here, the stream should've read 8 bytes and reaches eof,
// then break the while loop, but it doesn't
// so the second time it calls de_serialize, the program crashes
}
}
the code illustrates the part of de-serializing. The while loop is supposed to run once, and the stream reaches eof, but why it doesn't stop looping? The second loop causes a crash.
Please refer to the comments, thanks.
The eof() flag will never be set if the streem develops an error condition. It is usually wrong to loop on eof(). What I would do here is change the return type from your de_serialize() function to return the stream and then rewrite your loop around the de_serialize() function
Like this:
#include <iostream>
#include <sstream>
using namespace std;
struct user_info {
int id;
string name;
};
// for any POD type like int, read sizeof(int) from the stream
template <class stream_t, class T>
stream_t& de_serialize(stream_t& stream, T& x) {
stream.read((char*)&x, sizeof(T));
return stream; // return the stream
}
// for std::string, read length(int) first, then read string content when length>0
template <class stream_t>
stream_t& de_serialize(stream_t& stream, std::string& str) {
int len;
de_serialize(stream, len);
str.resize(len);
char x;
for(int i=0; i<len; i++) {
de_serialize(stream, x);
str[i] = x;
}
return stream; // return the stream
}
// read from stream, fill id and name
template <typename stream_t>
stream_t& de_serialize(stream_t& ss, user_info& ui) {
de_serialize(ss, ui.id);
de_serialize(ss, ui.name);
return ss; // return the stream
}
int main() {
// read from file, but here I use a 8-bytes-content represents the file content
stringstream ifs;
// two int, \x1\x1\x1\x1 for id, \x0\x0\x0\x0 for name
ifs.write("\x1\x1\x1\x1\x0\x0\x0\x0", 8);
user_info ui;
while(de_serialize(ifs, ui)) { // loop on de_serialize()
// Now you know ui was correctly read from the stream
// when first time goes here, the stream should've read 8 bytes and reaches eof,
// then break the while loop, but it doesn't
// so the second time it calls de_serialize, the program crashes
}
}

Use multiple ofstreams to write to a single output file in c++

I have class Writer that has two ofstream members.
Both streams are associated with the same output file. I'd like to use both streams in Writer::write method, but to make sure that each stream writes to the end of the real output file.
Code
class my_ofstream1:
public ofstream
{
// implement some functions.
// using internal, extended type of streambuf
};
class my_ofstream2:
public ofstream
{
// implement some functions.
// using internal, extended type of streambuf
// (not the same type as found in my_ofstream1)
};
class Writer
{
public:
void open(string path)
{
f1.open(path.c_str(),ios_base::out); f2.open(path.c_str(),ios_base::out);
}
void close()
{
f1.close(); f2.close();
}
void write()
{
string s1 = "some string 1";
string s2 = "some string 2";
f1.write(s1.c_str(), s1.size());
// TBD - ensure stream f2 writes to the end of the actual output file
assert(f1.tellp() == f2.tellp());
f2.write(s2.c_str(), s2.size());
}
private:
my_ofstream1 f1;
my_ofstream1 f2;
};
void main()
{
Writer w;
w.open("some_file.txt");
w.write();
w.close();
}
Questions
How to ensure f2 is in sync with f1? meaning, before writing, stream offset of f2 must be in sync with stream offset of f1 and vice versa?
I can't use function std::ios::rdbuf since each ofstream uses special derived streambuf. so by using rdbuf() I'll lose the necessary functionality.
I tried using some of the techniques found in Synchronizing Streams topic but could not make it happen.
This looks like both of your classes use the filtering streambuf
idiom. In any case, don't derive your classes from
std::ofstream, but directly from ostream, and have them both
use the same std::filebuf object. If you are using the
filtering streambuf idiom, don't let your filtering
streambuf's buffer; leave that to the final std::filebuf.
In other words, your "internal, extended type of streambuf"
should contain a pointer to the final sink, which will be
a filebuf (but your filtering streambufs don't need to know
this). Functions like sync, they just pass on to the final
destination, and they should never establish a buffer
themselves, but pass everything on to the filebuf.
Is this not what you are looking for? This could be easily modified to work with ostreams rather the ofstreams, which is nicer - the actual issue is synchronisation of the buffers. In this code I have simply made the filebuf bf unbuffered and it works fine. Alternatively leave it buffered but include calls to pubsync when switching between my_ofstream's. I don't understand why ios:rdbuf is not available. Are you creating your own streambuf?
#include <iostream>
#include <fstream>
#include <assert.h>
using namespace std;
class my_ofstream1 : public ofstream
{
public:
my_ofstream1& write (const char_type* s, streamsize n)
{
ofstream::write (s, n);
//rdbuf()->pubsync();
return *this;
}
void attach (filebuf* bf){
ios::rdbuf(bf);
}
};
class my_ofstream2 : public ofstream
{
public:
my_ofstream2& write (const char_type* s, streamsize n)
{
ofstream::write (s, n);
//rdbuf()->pubsync();
return *this;
}
void attach (filebuf* bf){
ios::rdbuf(bf);
}
};
class Writer
{
filebuf bf;
my_ofstream1 f1;
my_ofstream1 f2;
public:
void open(string path)
{
bf.open(path.c_str(),ios_base::out);
bf.pubsetbuf(0,0); //unbufferred
f1.attach(&bf); f2.attach(&bf);
}
void close()
{
f1.close(); f2.close();
}
void write()
{
string s1 = "some string 1";
string s2 = "some string 2";
f1.write(s1.c_str(), s1.size());
assert(f1.tellp() == f2.tellp());
f2.write(s2.c_str(), s2.size());
}
};
int main()
{
Writer w;
w.open("some_file.txt");
w.write();
w.close();
return 0;
}

Trouble implementing a line-by-line file parser in C++

I have trouble implementing a simple file parser in C++11 which reads a file line by line and tokenizes the line. It should properly manage its resources. Usage of the parser should be like:
Parser parser;
parser.open("/path/to/file");
std::pair<int> header = parser.getHeader();
while (parser.hasNext()) {
std::vector<int> tokens = parser.getNext();
}
parser.close();
So the Parser class needs one member std::ifstream file (or std::ifstream* file?)
1) How should the constructor initialize this->file?
2) How should the open method set this->file to the input file?
3) How should the next line from the file get loaded into a string?
(Is this what you would use: std::getline(this->file, line)) ?
Can you give some advice? Ideally, could you sketch out the class as a code example.
Since the Parser is probably in a pretty useless state once you've constructed it and before you've opened the file, I would suggest having your use case look something like this:
Parser parser("/path/to/file");
std::pair<int> header = parser.getHeader();
while (parser.hasNext()) {
std::vector<int> tokens = parser.getNext();
}
parser.close();
In which case, you should use the constructor's member initialization list to initialise the file member (which, yes, should be of type std::ifstream):
Parser::Parser(std::string file_name)
: file(file_name)
{
// ...
}
If you kept the constructor and open member function separate, you could just leave the constructor as default because the file member will be default constructed giving you a file stream that is not associated with any file. You would then get Parser::open to forward the file name to std::ifstream::open, like so:
void Parser::open(std::string file_name)
{
file.open(file_name);
}
Then, yes, to read lines from the file, you want to use something similar to this:
std::string line;
while (std::getline(file, line)) {
// Do something with line
}
Good job for not falling into the trap of doing while (!file.eof()).
It can be designed in many ways.
You may ask the user to provide you a stream instead of specifying a filename.
That will be more generic and will work in all streams.
That way you should have a std::ifstream& member variable though you can have a pointer type as well but you need to do *_stream << to invoke any operator.
If you take a file, you mat construct a stream in your constructor and close it if open in destructor
Actually, there is an alternative to feeding the name of the file to Parser: you could feed it a std::istream. What's interesting in this is that this way any derived class of std::istream can be used, and thus you could feed it, for example, a std::istringstream, which makes it easier to write unit-tests.
class Parser {
public:
explicit Parser(std::istream& is);
/**/
private:
std::istream& _stream;
/**/
};
Next, comes iteration. It is not idiomatic in C++ to have a has followed by a get. std::istream supports iteration (with an input iterator), you could perfectly design your parser so it does too. This way you will have the benefit of compatibility with many STL algorithms.
class ParserIterator:
public std::iterator< std::input_iterator_tag, std::vector<int> >
{
public:
ParserIterator(): _stream(nullptr) {} // end
ParserIterator(std::istream& is): _stream(&is) { this->advance(); }
// Accessors
std::vector<int> const& operator*() const { return _vec; }
std::vector<int> const* operator->() const { return &_vec; }
bool equals(ParserIterator const& other) const {
if (_stream != other._stream) { return false; }
if (_stream == nullptr) { return true; }
return false;
}
// Modifiers
ParserIterator& operator++() { this->advance(); return *this; }
ParserIterator operator++(int) {
ParserIterator tmp(*this);
this->advance();
return tmp;
}
private:
void advance() {
assert(_stream && "cannot advance an end iterator");
_vec.clear();
std::string buffer;
if (not getline(*_stream, buffer)) {
_stream = 0; // end of story
}
// parse here
}
std::istream* _stream;
std::vector<int> _vec;
}; // class ParserIterator
inline bool operator==(ParserIterator const& left, ParserIterator const& right) {
return left.equals(right);
}
inline bool operator!= (parserIterator const& left, ParserIterator const& right) {
return not left.equals(right);
}
And with that we can augment our parser:
ParserIterator Parser::begin() const {
return ParserIterator(_stream);
}
ParserIterator Parser::end() const {
return ParserIterator();
}
I'll leave the getHeader method and the actual parsing content to you ;)

How do I iterate over cin line by line in C++?

I want to iterate over std::cin, line by line, addressing each line as a std::string. Which is better:
string line;
while (getline(cin, line))
{
// process line
}
or
for (string line; getline(cin, line); )
{
// process line
}
? What is the normal way to do this?
Since UncleBen brought up his LineInputIterator, I thought I'd add a couple more alternative methods. First up, a really simple class that acts as a string proxy:
class line {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, line &l) {
std::getline(is, l.data);
return is;
}
operator std::string() const { return data; }
};
With this, you'd still read using a normal istream_iterator. For example, to read all the lines in a file into a vector of strings, you could use something like:
std::vector<std::string> lines;
std::copy(std::istream_iterator<line>(std::cin),
std::istream_iterator<line>(),
std::back_inserter(lines));
The crucial point is that when you're reading something, you specify a line -- but otherwise, you just have strings.
Another possibility uses a part of the standard library most people barely even know exists, not to mention being of much real use. When you read a string using operator>>, the stream returns a string of characters up to whatever that stream's locale says is a white space character. Especially if you're doing a lot of work that's all line-oriented, it can be convenient to create a locale with a ctype facet that only classifies new-line as white-space:
struct line_reader: std::ctype<char> {
line_reader(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(table_size, std::ctype_base::mask());
rc['\n'] = std::ctype_base::space;
return &rc[0];
}
};
To use this, you imbue the stream you're going to read from with a locale using that facet, then just read strings normally, and operator>> for a string always reads a whole line. For example, if we wanted to read in lines, and write out unique lines in sorted order, we could use code like this:
int main() {
std::set<std::string> lines;
// Tell the stream to use our facet, so only '\n' is treated as a space.
std::cin.imbue(std::locale(std::locale(), new line_reader()));
std::copy(std::istream_iterator<std::string>(std::cin),
std::istream_iterator<std::string>(),
std::inserter(lines, lines.end()));
std::copy(lines.begin(), lines.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
Keep in mind that this affects all input from the stream. Using this pretty much rules out mixing line-oriented input with other input (e.g. reading a number from the stream using stream>>my_integer would normally fail).
What I have (written as an exercise, but perhaps turns out useful one day), is LineInputIterator:
#ifndef UB_LINEINPUT_ITERATOR_H
#define UB_LINEINPUT_ITERATOR_H
#include <iterator>
#include <istream>
#include <string>
#include <cassert>
namespace ub {
template <class StringT = std::string>
class LineInputIterator :
public std::iterator<std::input_iterator_tag, StringT, std::ptrdiff_t, const StringT*, const StringT&>
{
public:
typedef typename StringT::value_type char_type;
typedef typename StringT::traits_type traits_type;
typedef std::basic_istream<char_type, traits_type> istream_type;
LineInputIterator(): is(0) {}
LineInputIterator(istream_type& is): is(&is) {}
const StringT& operator*() const { return value; }
const StringT* operator->() const { return &value; }
LineInputIterator<StringT>& operator++()
{
assert(is != NULL);
if (is && !getline(*is, value)) {
is = NULL;
}
return *this;
}
LineInputIterator<StringT> operator++(int)
{
LineInputIterator<StringT> prev(*this);
++*this;
return prev;
}
bool operator!=(const LineInputIterator<StringT>& other) const
{
return is != other.is;
}
bool operator==(const LineInputIterator<StringT>& other) const
{
return !(*this != other);
}
private:
istream_type* is;
StringT value;
};
} // end ub
#endif
So your loop could be replaced with an algorithm (another recommended practice in C++):
for_each(LineInputIterator<>(cin), LineInputIterator<>(), do_stuff);
Perhaps a common task is to store every line in a container:
vector<string> lines((LineInputIterator<>(stream)), LineInputIterator<>());
The first one.
Both do the same, but the first one is much more readable, plus you get to keep the string variable after the loop is done (in the 2nd option, its enclosed in the for loop scope)
Go with the while statement.
See Chapter 16.2 (specifically pages 374 and 375) of Code Complete 2 by Steve McConell.
To quote:
Don't use a for loop when a while loop is more appropriate. A common abuse of the flexible for loop structure in C++, C# and Java is haphazardly cramming the contents of a while loop into a for loop header.
.
C++ Example of a while loop abusively Crammed into a for Loop Header
for (inputFile.MoveToStart(), recordCount = 0; !inputFile.EndOfFile(); recordCount++) {
inputFile.GetRecord();
}
C++ Example of appropriate use of a while loop
inputFile.MoveToStart();
recordCount = 0;
while (!InputFile.EndOfFile()) {
inputFile.getRecord();
recordCount++;
}
I've omitted some parts in the middle but hopefully that gives you a good idea.
This is based on Jerry Coffin's answer. I wanted to show c++20's std::ranges::istream_view. I also added a line number to the class. I did this on godbolt, so I could see what happened. This version of the line class still works with std::input_iterator.
https://en.cppreference.com/w/cpp/ranges/basic_istream_view
https://www.godbolt.org/z/94Khjz
class line {
std::string data{};
std::intmax_t line_number{-1};
public:
friend std::istream &operator>>(std::istream &is, line &l) {
std::getline(is, l.data);
++l.line_number;
return is;
}
explicit operator std::string() const { return data; }
explicit operator std::string_view() const noexcept { return data; }
constexpr explicit operator std::intmax_t() const noexcept { return line_number; }
};
int main()
{
std::string l("a\nb\nc\nd\ne\nf\ng");
std::stringstream ss(l);
for(const auto & x : std::ranges::istream_view<line>(ss))
{
std::cout << std::intmax_t(x) << " " << std::string_view(x) << std::endl;
}
}
prints out:
0 a
1 b
2 c
3 d
4 e
5 f
6 g