Why read file first then check? - c++

I'm just revising for my exams and can't get my head around the following provided by our lecturer:
When opening fstreams, check if you have opened or not
Then read before check for input_file.fail()
If you check before read, you may end up with an
extra unwanted input
It doesn't make sense to me to read first, shouldn't you check first?
If anyone is able to explain, I would be very grateful :)

input_file.fail() determines if any preceding operations have failed, not whether the upcoming operation is going to fail. Consequently, if you write this:
if (!input_file.fail()) {
int value;
input_file >> value;
/* ... process value ... */
}
Then after reading value, you have no idea whatsoever whether you actually read anything successfully or not. All you know is that right before you did the read, everything was working correctly. It's quite possible that you failed to read an integer, either because you hit the end of the file, or the data in the file wasn't an integer.
On the other hand, if you write
int value;
input_file >> value;
if (!input_file.fail()) {
/* ... process value ... */
}
Then you attempt to do a read. If it succeeds, you then process the value you've read. If not, you can then react to the fact that the last operation failed.
(You can be even cuter than this:
int value;
if (input_file >> value) {
/* ... process value ... */
}
which combines the read and test operations into one. It's much clearer here that you're confirming that the read succeeded.)
If you're doing reads in a loop, a very clean way to do this is
for (int value; input_file >> value; ) {
/* ... process value ... */
}
This makes clear that you loop while you're able to keep reading values from the file.
Hope this helps!

Related

Is it always safe to use std::istream::peek()?

I usually teach my students that the safe way to tackle file input is:
while (true) {
// Try to read
if (/* failure check */) {
break;
}
// Use what you read
}
This saved me and many people from the classical and most of the time wrong:
while (!is.eof()) {
// Try to read
// Use what you read
}
But people really like this form of looping, so it has become common to see this in student code:
while (is.peek()!=EOF) { // <-- I know this is not C++ style, but this is how it is usually written
// Try to read
// Use what you read
}
Now the question is: is there a problem with this code? Are there corner cases in which things don't work exactly as expected? Ok, it's two questions.
EDIT FOR ADDITIONAL DETAILS: during exams you sometimes guarantee the students that the file will be correctly formatted, so they don't need to do all the checks and just need to verify if there's more data. And most of the time we deal with binary formats, which allow you to not worry about whitespace at all (because the data is all meaningful).
While the accepted answer is totally clear and correct, I'd still like someone to try to comment on the joint behavior of peek() and unget().
The unget() stuff came to my mind because I once observed (I believe it was on Windows) that by peeking at the 4096 internal buffer limit (so effectively causing a new buffer to be loaded), ungetting the previous byte (last of the previous buffer) failed. But I can be wrong. So that was my additional doubt: something known I missed, which maybe is well coded in the standard or in some library implementations.
is.peek()!=EOF tells you whether there are still characters left in the input stream, but it doesn't tell you whether your next read will succeed:
while (is.peek()!=EOF) {
int a;
is >> a;
// Still need to test `is` to verify that the read succeeded
}
is >> a could fail for a number of reasons, e.g. the input might not actually be a number.
So there is no point to this if you could instead do
int a;
while (is >> a) { // reads until failure of any kind
// use `a`
}
or, maybe better:
for (int a; is >> a;) { // reads until failure of any kind
// use `a`
}
or your first example, in which case the is.peek()!=EOF in the loop will become redundant.
This is assuming you want the loop to exit on every failure, following your first code example, not only on end-of-file.

A standard loop for reading from a text file C++

I'm learning about how to work with files in C++ and as a beginner I've got some doubts that I would like to clarify :
In my book the author introduces the stream states and writes this simple piece of code to show how to read until we reach end of file or a terminator :
// somewhere make ist throw if it goes bad :
void fill_vector(istream& ist, vector<int>& v, char terminator)
{
ist.exceptions(ist.exceptions() | ios_base::badbit);
for (int i; ist >> i;) v.push_back(i);
if (ist.eof()) return; // fine: we found end of file
// not good() not bad() and not eof(), it must be fail()
ist.clear();
char c;
ist >> c; // read a character, hopefully terminator
if (c != terminator) { // not the terminator, so we must fail
ist.unget(); // maybe my caller can use that character
ist.clear(ios_base::failbit);
}
}
This was a first example, which provides a useful method to read data, but I'm having some issues with the second example where the author says :
Often, we want to check our read as we go along, this is the general strategy assuming that ist is an 'istream':
for (My_type var; ist >> var;) { // read until end of file
// maybe check that var is valid
// do something with var
}
if (ist.fail()) {
ist.clear();
char ch;
// the error function is created into the book :
if (!(ist >> ch && ch == '|')) error("Bad termination of input\n");
}
// carry on : we found end of file or terminator
If we don't want to accept a terminator-that is, to accept only the end o file as the end- we simply delete the test before the call of error().
Here's my doubt : In the first example we basically check for every possible state of the istream to be sure that the reading terminated as we wanted to, and that's ok. But I have problems in understanding the second example :
What does the author means when he says to remove the test before the call of error ?
Is it possible to avoid triggering both eof and fail when reading ? If yes, how ?
I'm really confused and I can't understand the example because from the test that I've done the failbit will always be set after eofbit, so what's the sense of checking for failbit if It will always be triggered? Why is the author doing that
What would happen to the code if I remove the test before the call of error as the author says ? Wouldn't that be useless as I would only be checking for the bad state of the stream ?
I think I see what you mean. No, it's not really useless, because you would tell someone (I don't know what error actually does), the programmer (exception) or the user (standard output), that the data had some invalid data, and someone has to act accordingly.
It may be useless, but that depends on what you want the function to do, if for example you want it to just silently ignore the error and use the correct data already processed, it really is useless.
How can I read data from a file until I just reach the end of that file without using any other terminator ?
I can't see what you mean, you are already doing that in both examples:
if (ist.eof()) return; // fine: we found end of file
and
if (ist.fail()) { //If 'ist' didn't fail (reaching eof is not a failure), just skip 'if'

Example of Why stream::good is Wrong?

I gave an answer which I wanted to check the validity of stream each time through a loop here.
My original code used good and looked similar to this:
ifstream foo("foo.txt");
while (foo.good()){
string bar;
getline(foo, bar);
cout << bar << endl;
}
I was immediately pointed here and told to never test good. Clearly this is something I haven't understood but I want to be doing my file I/O correctly.
I tested my code out with several examples and couldn't make the good-testing code fail.
First (this printed correctly, ending with a new line):
bleck 1
blee 1 2
blah
ends in new line
Second (this printed correctly, ending in with the last line):
bleck 1
blee 1 2
blah
this doesn't end in a new line
Third was an empty file (this printed correctly, a single newline.)
Fourth was a missing file (this correctly printed nothing.)
Can someone help me with an example that demonstrates why good-testing shouldn't be done?
They were wrong. The mantra is 'never test .eof()'.
Why is iostream::eof inside a loop condition considered wrong?
Even that mantra is overboard, because both are useful to diagnose the state of the stream after an extraction failed.
So the mantra should be more like
Don't use good() or eof() to detect eof before you try to read any further
Same for fail(), and bad()
Of course stream.good can be usefully employed before using a stream (e.g. in case the stream is a filestream which has not been successfully opened)
However, both are very very very often abused to detect the end of input, and that's not how it works.
A canonical example of why you shouldn't use this method:
std::istringstream stream("a");
char ch;
if (stream >> ch) {
std::cout << "At eof? " << std::boolalpha << stream.eof() << "\n";
std::cout << "good? " << std::boolalpha << stream.good() << "\n";
}
Prints
false
true
See it Live On Coliru
This is already covered in other answers, but I'll go over it briefly for completeness. The only functional difference with
while(foo.good()) { // effectively same as while(foo) {
getline(foo, bar);
consume(bar); // consume() represents any operation that uses bar
}
And
while(getline(foo, bar)){
consume(bar);
}
Is that the former will do an extra loop when there are no lines in the file, making that case indistinguishable from the case of one empty line. I would argue that this is not typically desired behaviour. But I suppose that's matter of opinion.
As sehe says, the mantra is overboard. It's a simplification. What really is the point is that you must not consume() the result of reading the stream before you test for failure or at least EOF (and any test before the read is irrelevant). Which is what people easily do when they test good() in the loop condition.
However, the thing about getline(), is that it tests EOF internally, for you and returns an empty string even if only EOF is read. Therefore, the former version could maybe be roughly the similar to following pseudo c++:
while(foo.good()) {
// inside getline
bar = ""; // Reset bar to empty
string sentry;
if(read_until_newline(foo, sentry)) {
// The streams state is tested implicitly inside getline
// after the value is read. Good
bar = sentry // The read value is used only if it's valid.
// ... // Otherwise, bar is empty.
consume(bar);
}
I hope that illustrates what I'm trying to say. One could say that there is a "correct" version of the read loop inside getline(). This is why the rule is at least partially satisfied by the use of readline even if the outer loop doesn't conform.
But, for other methods of reading, breaking the rule hurts more. Consider:
while(foo.good()) {
int bar;
foo >> bar;
consume(bar);
}
Not only do you always get the extra iteration, the bar in that iteration is uninitialized!
So, in short, while(foo.good()) is OK in your case, because getline() unlike certain other reading functions, leaves the output in a valid state after reading EOF bit. and because you don't care or even do expect the extra iteration when the file is empty.
both good() and eof() will both give you an extra line in your code. If you have a blank file and run this:
std::ifstream foo1("foo1.txt");
std::string line;
int lineNum = 1;
std::cout << "foo1.txt Controlled With good():\n";
while (foo1.good())
{
std::getline(foo1, line);
std::cout << lineNum++ << line << std::endl;
}
foo1.close();
foo1.open("foo1.txt");
lineNum = 1;
std::cout << "\n\nfoo1.txt Controlled With getline():\n";
while (std::getline(foo1, line))
{
std::cout << line << std::endl;
}
The output you will get is
foo1.txt Controlled With good():
1
foo1.txt Controlled With getline():
This proves that it isn't working correctly since a blank file should never be read. The only way to know that is to use a read condition since the stream will always be good the first time it reads.
Using foo.good() just tells you that the previous read operation worked just fine and that the next one might as well work. .good() checks the state of the stream at a given point. It does not check if the end of the file is reached. Lets say something happened while the file was being read (network error, os error, ...) good will fail. That does not mean the end of the file was reached. Nevertheless .good() fails when end of file is reached because the stream is not able to read anymore.
On the other hand, .eof() checks if the end of file was truly reached.
So, .good() might fail while the end of file was not reached.
Hope this helps you understand why using .good() to check end of file is a bad habit.
Let me clearly say that sehe's answer is the correct one.
But the option proposed by, Nathan Oliver, Neil Kirk, and user2079303 is to use readline as the loop condition rather than good. Needs to be addressed for the sake of posterity.
We will compare the loop in the question to the following loop:
string bar;
while (getline(foo, bar)){
cout << bar << endl;
}
Because getline returns the istream passed as the first argument, and because when an istream is cast to bool it returns !(fail() || bad()), and since reading the EOF character will set both the failbit and the eofbit this makes getline a valid loop condition.
The behavior does change however when using getline as a condition because if a line containing only an EOF character is read the loop will exit preventing that line from being outputted. This doesn't occur in Examples 2 and 4. But Example 1:
bleck 1
blee 1 2
blah
ends in new line
Prints this with the good loop condition:
bleck 1
blee 1 2
blah
ends in new line
But chops the last line with the getline loop condition:
bleck 1
blee 1 2
blah
ends in new line
Example 3 is an empty file:
Prints this with the good condition:
Prints nothing with the getline condition.
Neither of these behaviors are wrong. But that last line can make a difference in code. Hopefully this answer will be helpful to you when deciding between the two for coding purposes.

what will happen if the input stream is invalid

What will happen if the input stream is invalid? For example, as follows:
int main()
{
int value;
while(!(cin>>value).eof());
}
If the entered sequence is : 1 2 3 q 4 5, the while will fall into endless loop when the cin scans 'q' and the value continues to be 3.
My questions are:
1. Why can't cin ignore 'q' and proceed to scan 4?
2. What's the underlying implementation fo input stream? Are there any materials I can refer to ?
Thank you!
Why can't cin ignore 'q' and proceed to scan 4?
You can if you want to. You could get that effect with the following:
int value;
while(std::cin>>value);
if (!std::cin)
{
std::cin.clear(); // clear error state
std::cin.ignore(); // ignore the q
while(std::cin>>value); // read rest of values until 5
}
std::cin >> value just does not do that by default, as the behavior desired is different depending on the program. For many people it would be undesirable for std::cin to ignore a read failure and keep scanning. The default behavior allows you the programmer to decide what to do on failure.
Also, note that eof() is for checking for end of file. You should not use it to check if a read was successful or not. The common idiom would be:
while(std::cin>>value)
{
// Do stuff
}
What's the underlying implementation fo input stream? Are there any materials I can refer to ?
std::cin is a global static object and is defined like so:
extern std::istream cin;
In other words it is an instance of std::basic_istream<char> (std::istream is a typedef for it).
If you would like more information, here are some references:
http://en.cppreference.com/w/cpp/io/cin
https://github.com/cplusplus/draft
However, most likely you would benefit from a good C++ book.
If you want to get into deep iostreams, I also recommend these articles.

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}
The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek
That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.
Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/
There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.
No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?