A standard loop for reading from a text file C++ - c++

I'm learning about how to work with files in C++ and as a beginner I've got some doubts that I would like to clarify :
In my book the author introduces the stream states and writes this simple piece of code to show how to read until we reach end of file or a terminator :
// somewhere make ist throw if it goes bad :
void fill_vector(istream& ist, vector<int>& v, char terminator)
{
ist.exceptions(ist.exceptions() | ios_base::badbit);
for (int i; ist >> i;) v.push_back(i);
if (ist.eof()) return; // fine: we found end of file
// not good() not bad() and not eof(), it must be fail()
ist.clear();
char c;
ist >> c; // read a character, hopefully terminator
if (c != terminator) { // not the terminator, so we must fail
ist.unget(); // maybe my caller can use that character
ist.clear(ios_base::failbit);
}
}
This was a first example, which provides a useful method to read data, but I'm having some issues with the second example where the author says :
Often, we want to check our read as we go along, this is the general strategy assuming that ist is an 'istream':
for (My_type var; ist >> var;) { // read until end of file
// maybe check that var is valid
// do something with var
}
if (ist.fail()) {
ist.clear();
char ch;
// the error function is created into the book :
if (!(ist >> ch && ch == '|')) error("Bad termination of input\n");
}
// carry on : we found end of file or terminator
If we don't want to accept a terminator-that is, to accept only the end o file as the end- we simply delete the test before the call of error().
Here's my doubt : In the first example we basically check for every possible state of the istream to be sure that the reading terminated as we wanted to, and that's ok. But I have problems in understanding the second example :
What does the author means when he says to remove the test before the call of error ?
Is it possible to avoid triggering both eof and fail when reading ? If yes, how ?
I'm really confused and I can't understand the example because from the test that I've done the failbit will always be set after eofbit, so what's the sense of checking for failbit if It will always be triggered? Why is the author doing that

What would happen to the code if I remove the test before the call of error as the author says ? Wouldn't that be useless as I would only be checking for the bad state of the stream ?
I think I see what you mean. No, it's not really useless, because you would tell someone (I don't know what error actually does), the programmer (exception) or the user (standard output), that the data had some invalid data, and someone has to act accordingly.
It may be useless, but that depends on what you want the function to do, if for example you want it to just silently ignore the error and use the correct data already processed, it really is useless.
How can I read data from a file until I just reach the end of that file without using any other terminator ?
I can't see what you mean, you are already doing that in both examples:
if (ist.eof()) return; // fine: we found end of file
and
if (ist.fail()) { //If 'ist' didn't fail (reaching eof is not a failure), just skip 'if'

Related

Is it always safe to use std::istream::peek()?

I usually teach my students that the safe way to tackle file input is:
while (true) {
// Try to read
if (/* failure check */) {
break;
}
// Use what you read
}
This saved me and many people from the classical and most of the time wrong:
while (!is.eof()) {
// Try to read
// Use what you read
}
But people really like this form of looping, so it has become common to see this in student code:
while (is.peek()!=EOF) { // <-- I know this is not C++ style, but this is how it is usually written
// Try to read
// Use what you read
}
Now the question is: is there a problem with this code? Are there corner cases in which things don't work exactly as expected? Ok, it's two questions.
EDIT FOR ADDITIONAL DETAILS: during exams you sometimes guarantee the students that the file will be correctly formatted, so they don't need to do all the checks and just need to verify if there's more data. And most of the time we deal with binary formats, which allow you to not worry about whitespace at all (because the data is all meaningful).
While the accepted answer is totally clear and correct, I'd still like someone to try to comment on the joint behavior of peek() and unget().
The unget() stuff came to my mind because I once observed (I believe it was on Windows) that by peeking at the 4096 internal buffer limit (so effectively causing a new buffer to be loaded), ungetting the previous byte (last of the previous buffer) failed. But I can be wrong. So that was my additional doubt: something known I missed, which maybe is well coded in the standard or in some library implementations.
is.peek()!=EOF tells you whether there are still characters left in the input stream, but it doesn't tell you whether your next read will succeed:
while (is.peek()!=EOF) {
int a;
is >> a;
// Still need to test `is` to verify that the read succeeded
}
is >> a could fail for a number of reasons, e.g. the input might not actually be a number.
So there is no point to this if you could instead do
int a;
while (is >> a) { // reads until failure of any kind
// use `a`
}
or, maybe better:
for (int a; is >> a;) { // reads until failure of any kind
// use `a`
}
or your first example, in which case the is.peek()!=EOF in the loop will become redundant.
This is assuming you want the loop to exit on every failure, following your first code example, not only on end-of-file.

What is the difference between !std::basic_ios::fail() and std::basic_ios::good()?

while(true)
{
int a, c;
string b;
file >> a >> b >> c;
if( file.good() )
f(a, b, c);`
else
break;
}
This code is not reading the last line form .txt file. If I change file.good() to !file.fail() it works. Why?
bad() --> Returns true if a reading or writing operation fails. For example, in the case that we try to write to a file that is not open for writing or if the device where we try to write has no space left.
fail() --> Returns true in the same cases as bad(), but also in the case that a format error happens, like when an alphabetical character is extracted when we are trying to read an integer number.
good() --> It is the most generic state flag: it returns false in the same cases in which calling any of the previous functions would return true. Note that good and bad are not exact opposites (good checks more state flags at once).
Will elaborate latter.
I think this is covered here.
A relevant excerpt:
"All of the stream state functions – fail, bad, eof, and good – tell you the current state of the stream rather than predicting the success of a future operation. Check the stream itself (which is equivalent to an inverted fail check) after the desired operation"

Example of Why stream::good is Wrong?

I gave an answer which I wanted to check the validity of stream each time through a loop here.
My original code used good and looked similar to this:
ifstream foo("foo.txt");
while (foo.good()){
string bar;
getline(foo, bar);
cout << bar << endl;
}
I was immediately pointed here and told to never test good. Clearly this is something I haven't understood but I want to be doing my file I/O correctly.
I tested my code out with several examples and couldn't make the good-testing code fail.
First (this printed correctly, ending with a new line):
bleck 1
blee 1 2
blah
ends in new line
Second (this printed correctly, ending in with the last line):
bleck 1
blee 1 2
blah
this doesn't end in a new line
Third was an empty file (this printed correctly, a single newline.)
Fourth was a missing file (this correctly printed nothing.)
Can someone help me with an example that demonstrates why good-testing shouldn't be done?
They were wrong. The mantra is 'never test .eof()'.
Why is iostream::eof inside a loop condition considered wrong?
Even that mantra is overboard, because both are useful to diagnose the state of the stream after an extraction failed.
So the mantra should be more like
Don't use good() or eof() to detect eof before you try to read any further
Same for fail(), and bad()
Of course stream.good can be usefully employed before using a stream (e.g. in case the stream is a filestream which has not been successfully opened)
However, both are very very very often abused to detect the end of input, and that's not how it works.
A canonical example of why you shouldn't use this method:
std::istringstream stream("a");
char ch;
if (stream >> ch) {
std::cout << "At eof? " << std::boolalpha << stream.eof() << "\n";
std::cout << "good? " << std::boolalpha << stream.good() << "\n";
}
Prints
false
true
See it Live On Coliru
This is already covered in other answers, but I'll go over it briefly for completeness. The only functional difference with
while(foo.good()) { // effectively same as while(foo) {
getline(foo, bar);
consume(bar); // consume() represents any operation that uses bar
}
And
while(getline(foo, bar)){
consume(bar);
}
Is that the former will do an extra loop when there are no lines in the file, making that case indistinguishable from the case of one empty line. I would argue that this is not typically desired behaviour. But I suppose that's matter of opinion.
As sehe says, the mantra is overboard. It's a simplification. What really is the point is that you must not consume() the result of reading the stream before you test for failure or at least EOF (and any test before the read is irrelevant). Which is what people easily do when they test good() in the loop condition.
However, the thing about getline(), is that it tests EOF internally, for you and returns an empty string even if only EOF is read. Therefore, the former version could maybe be roughly the similar to following pseudo c++:
while(foo.good()) {
// inside getline
bar = ""; // Reset bar to empty
string sentry;
if(read_until_newline(foo, sentry)) {
// The streams state is tested implicitly inside getline
// after the value is read. Good
bar = sentry // The read value is used only if it's valid.
// ... // Otherwise, bar is empty.
consume(bar);
}
I hope that illustrates what I'm trying to say. One could say that there is a "correct" version of the read loop inside getline(). This is why the rule is at least partially satisfied by the use of readline even if the outer loop doesn't conform.
But, for other methods of reading, breaking the rule hurts more. Consider:
while(foo.good()) {
int bar;
foo >> bar;
consume(bar);
}
Not only do you always get the extra iteration, the bar in that iteration is uninitialized!
So, in short, while(foo.good()) is OK in your case, because getline() unlike certain other reading functions, leaves the output in a valid state after reading EOF bit. and because you don't care or even do expect the extra iteration when the file is empty.
both good() and eof() will both give you an extra line in your code. If you have a blank file and run this:
std::ifstream foo1("foo1.txt");
std::string line;
int lineNum = 1;
std::cout << "foo1.txt Controlled With good():\n";
while (foo1.good())
{
std::getline(foo1, line);
std::cout << lineNum++ << line << std::endl;
}
foo1.close();
foo1.open("foo1.txt");
lineNum = 1;
std::cout << "\n\nfoo1.txt Controlled With getline():\n";
while (std::getline(foo1, line))
{
std::cout << line << std::endl;
}
The output you will get is
foo1.txt Controlled With good():
1
foo1.txt Controlled With getline():
This proves that it isn't working correctly since a blank file should never be read. The only way to know that is to use a read condition since the stream will always be good the first time it reads.
Using foo.good() just tells you that the previous read operation worked just fine and that the next one might as well work. .good() checks the state of the stream at a given point. It does not check if the end of the file is reached. Lets say something happened while the file was being read (network error, os error, ...) good will fail. That does not mean the end of the file was reached. Nevertheless .good() fails when end of file is reached because the stream is not able to read anymore.
On the other hand, .eof() checks if the end of file was truly reached.
So, .good() might fail while the end of file was not reached.
Hope this helps you understand why using .good() to check end of file is a bad habit.
Let me clearly say that sehe's answer is the correct one.
But the option proposed by, Nathan Oliver, Neil Kirk, and user2079303 is to use readline as the loop condition rather than good. Needs to be addressed for the sake of posterity.
We will compare the loop in the question to the following loop:
string bar;
while (getline(foo, bar)){
cout << bar << endl;
}
Because getline returns the istream passed as the first argument, and because when an istream is cast to bool it returns !(fail() || bad()), and since reading the EOF character will set both the failbit and the eofbit this makes getline a valid loop condition.
The behavior does change however when using getline as a condition because if a line containing only an EOF character is read the loop will exit preventing that line from being outputted. This doesn't occur in Examples 2 and 4. But Example 1:
bleck 1
blee 1 2
blah
ends in new line
Prints this with the good loop condition:
bleck 1
blee 1 2
blah
ends in new line
But chops the last line with the getline loop condition:
bleck 1
blee 1 2
blah
ends in new line
Example 3 is an empty file:
Prints this with the good condition:
Prints nothing with the getline condition.
Neither of these behaviors are wrong. But that last line can make a difference in code. Hopefully this answer will be helpful to you when deciding between the two for coding purposes.

Why does ofstream give me an echo ie. writes input twice [duplicate]

This question already has answers here:
Testing stream.good() or !stream.eof() reads last line twice [duplicate]
(3 answers)
Closed 8 years ago.
When I run the following code, and I write for example "Peter" then the result is that I get "PeterPeter" in the file.
Why?
#include "stdafx.h"
#include "iostream"
#include "iomanip"
#include "cstdlib"
#include "fstream"
#include "string"
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
ofstream File2;
File2.open("File2.dat",ios::out);
string name;
cout<<"Name?"<<endl;
while(!cin.eof())
{
cin>>name;
File2<<name;
}
return 0;
}
When I change the while loop to
while(cin>>name)
{
File2<<name;
}
it works. But I don't understand why the first approach does not.
I can't answer my own question (as I don't have enough reputation). Hence I write my Answer here:
Ahhhh!!! Ok Thanks. Now I got it ^^
I have been testing with
while(!cin.eof())
{
cin>>name;
File2<<name;
cout<<j++<<"cin.eof() "<<cin.eof()<<endl;
}
What happens is that when I tip crtl+z he is still in the while loop. The variable name stays unchanged and is added to "File2" in the next line of code.
The following is working:
while(!cin.eof())
{
cin>>name;
if(!cin.eof()){File2<<name;}
cout<<j++<<"cin.eof() "<<cin.eof()<<endl;
}
Ho hum, the millionth time this has been asked. This is wrong
while(!cin.eof())
{
cin>>name;
File2<<name;
}
eof() doesn't do what you think it does. You think it tells you whether you're at the end of file, right?
What eof() actually does is tell you why the last read you did failed. So it's something you call after you have done a read to see why it failed, not something you do before a read to see if it will fail. The return value of eof() when the last read has not failed is more complex. It depends on what you have been reading and how you've been reading it. You are trying to use eof() in a situation where there has been no failure and so the results can vary.
The short answer is don't to it like that, do it like this
while(cin >> name)
{
File2<<name;
}
BTW, sorry for the flippant tone, I am seriously interested to know why you wrote the code wrong in the first place. We see this mistake all the time, it seems almost every newbie makes the same mistake, so I am interested to understand where this mistake comes from. Did you see that code somewhere else, did someone teach you to write that, did it just seem right to you, etc. etc. If you could explain in your case I'd appreciate it.
The basic problem of the using std::cin.eof() in the loop condition is that it tests the stream state before it is attempted to read anything from the stream. At this point, the stream has no idea what will be attempted to be read and it can't make any prediction of what will by tried. The fundamental insight is: You always have to verify that reading data was successful after reading it!
A secondary problem is that eof() only tests one of multiple error conditions. Reading a std::string can only go wrong if there is no further data but for most other data types there are also format failure. For example, reading an int can go wrong because there was a format mismatch. In that case the std::ios_base::failbit will be set and fail() would return true while eof() keeps returning false.
Testing the stream itself is equivalent to testing fail() which detects that something is wrong with the stream (it actually also tests if the stream is bad()). Thus, the canonical approach for reading a file typically has one of the following forms:
while (input) {
// multiple read operations go here
if (input) {
// processing of the read data goes here
}
}
or
while (/* reading everything goes here */) {
// processing of the read data goes here
}
Obviously, you can use a for-loop instead of a while-loop. Another interesting approach to reading data uses std::istream_iterator<T> and assumes that there is an input operator for the type T. For example:
for (std::istream_iterator<std::string> it(std::cin), end; it != end; ++it) {
std::cout << "string='" << *it << "'\n";
}
In none of these approaches eof() is used in the main reading loop. However, it is reasonable to use eof() after the loops to detect if the loop stopped because the end of the file was reached or because there was some formatting error.

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}
The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek
That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.
Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/
There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.
No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?