C++: end-of-file interpretation when using std::cin as a condition - c++

I know that we can use std::cin as a condition, for example, in
while (std::cin >> value)
using std::cin as a condition will call a member function std::ios::operator bool. It says that
it "returns whether an error flag is set (either failbit or badbit)", which does not include
eofbit. Despite this, passing end-of-file (by Ctrl+d) terminates the loop. Why? Can failbit or badbit also set an eofbit?
I also found this explanation, but in C++ Reference it specifically says that "this function does not return the same as member good"

The loop above does not test for end of file. It tests for failure to read a value, end of file is just one possible cause of this. Even end of file does not necessarily cause a failure to read a value, imagine reading an integer where the digits are terminated by the end of file, you still read an integer even though you hit the end of file.
The bottom line is that failure to read a value for any reason sets the fail bit, and this loop tests for that.

Related

Are there cases where the last operation succeeds but the istream/ostream goes fail()?

I'm a new learner of C++. In my understanding, fail() indicates whether the last operation on the istream/ostream succeeded or not. For example, let ifs be an ifstream opened to some valid file:
string s;
while (ifs >> s) {
// 1. assume the reading was taken successfully
// and do something according to that
}
// 2. assume the reading was unsuccessful
// and do something according to that
May I ask if it is possible that, in some (corner) cases, we arrive at 2. while the reading was actually successfully taken?
If so, what we do at 2. could be problematic. For example, if we have set ifs.exceptions(ifs.exceptions() | ios_base::badbit);, then we would (probably) have assumed that ifs is empty at 2..
I came up with this question when I was self-learning C++ using Stroustrup's book Programming: Principle and Practice Using C++ (2nd ed.). In Chapter 11.7 of this book, the writers design a Punct_stream class that takes an istream as source, processes data extracted from the source and then put the processed data into a data member of type istringstream called buffer waiting to be inputted into the program. The writers design the operator bool() method as follows (Page 404):
Punct_stream::operator bool()
{
return !(source.fail() || source.bad()) && buffer.good();
}
Firstly, I think !(source.fail() || source.bad()) should be equivalent to just checking source since fail() accounts for both failbit and badbit, and checking source is equivalent to checking !fail().
What is more related to my question is that, by this design, operator bool() will first check if the source is in an error state, and if so, it won't even look at the buffer. But reading data from a Punct_stream into a string will be done as long as there is character in buffer. So, if the buffer is not empty and the source is fail() somehow, then reading from the Punct_stream into a string will actually succeed while the checking on the Punct_stream condition will be false.
I wonder if the implementation of the standard library istreams may behave in some similar way such that in some cases the reading is actually successfully taken but the condition is evaluated to false?
All iostream input functions start by creating a sentry object which checks the good state of the stream. If the state is not good (that is any of badbit, eof or failbit is set), then the sentry will set the failbit and return false, which will cause the input function to immediately return without reading anything, without modifying (or even checking) the streambuffer, and without writing anything into the destination of the input function.
So by definition, if the stream test in the while is false, it must have either been false or eof before (and nothing was read into the string s), or it failed to be able to read anything into the string and it set the state to false. In either case, nothing was read into the string.

Explain how the for loop is terminating?

I came across the a code which I didn't understand. It was on a coding website. The code was like this-
#include<iostream>
using namespace std;
char s[4];
int x;
int main()
{
for(cin>>s;cin>>s;x+=44-s[1]);
cout<<x;
}
My question is how the for loop is terminating and since it was on a coding website so answers are checked using file operation in my knowledge. But if we are running it on IDE this for loop is not terminating instead it keeps on taking input from the user.So whats the explanation for this??
Sample Input
3
x++
x--
--x
Output
-1
EDIT
This is the problem link - Bit++
This is the solution link - In status filter set language to MS C++ Author name - wafizaini (Solution id - 27116030)
The loop is terminating because istream has operator bool() (prior to C++11 it was operator void*) which returns false when no additional input is available. Basically, the reason the loop stops is the same as why a more common while loop terminates:
while (cin >> s) {
...
}
The reason this does not terminate when you run with an IDE is that you need to supply an end-of-stream mark, which is delivered in a system-dependent way. On UNIX and other systems derived from it you press Ctrl+d, while on Windows you press Ctrl+z.
Note: Your program is at risk of getting a buffer overrun in case an end-user enters more than three characters (character #4 would be used for null terminator of the string). Also note that the initial input cin>>s is thrown away, because loop condition is checked before entering the body of the loop.
That's perfectly valid, although a bit difficult to read, C++11 code.
std::istream::operator>>()
returns a reference to the input stream itself, and
std::istream::operator bool()
in turn evaluates the stream to a boolean value, returning false whenever a fail bit is set.
When reading from a file, that loop will eventually try to read past the end of file, causing the eof fail bit to be set and thus stopping the loop.
However, when running that code on a shell, you need to manually input the EOF control code on the stream, otherwise the for loop won't stop. This can be done by pressing Ctrl+D on Unix shells, for example.
A more common loop condition is while (cin >> s).
The convention is that operator>> returns a reference to the stream (cin). The stream classes then have a conversion operator that will return false in if (cin) or while (cin) after some input has failed.
This would work in the middle part of a for-loop as well, but is a bit unusual.

What do the function cin.clear() in C ++ detailed description?

Good day, my teacher said that I should learn what makes function cin.clear() in C ++. I was looking for, but a normal explanation was never found. This resource is cplusplus said that this function
Sets a new value for the stream's internal error state flags. The current value of the flags is overwritten: All bits are replaced by those in state; If state is goodbit (which is zero) all error flags are cleared.
But I do not quite understand what the "state"and from there there are flags and error, which is why, and how well we replace them at 0 value. And what is the "flags" and why they are needed. And as he said that I should know what parameters or data which takes a function cin.clear() and returns, I understand that it does not returns, but it also takes something? Please help. Sorry for bad English, I write through a translator.
The function std::basic_ios<>::clear() affects the
std::ios_base::iostate bits, which are, for the most part,
error conditions. The standard defines "four" bits:
badbit
Set if the last input failed because of some hardware failure,
e.g. a read error on the disk. (In practice, I'm not sure that
all implementations check for this; I suspect that some will
just treat this as if there were an end of file.)
failbit
Set if the last input failed for some reason other than that
which would of set the badbit. The most common
reasons are a format error (trying to read an `int` when the
next characters in input were `"abc"`) and encountering end of
file _before_ having been able to read sufficient data for the
requested input.
eofbit
This is _not_ an error condition; it will be set anytime the
stream sees the end of file. This may be because it needs yet
another character in order to parse the input, in which case the
failbit will also be set; but it may also be
because the input stream saw the end of file on look-ahead.
(For this last case, consider inputting an int, where the
remaining characters in the stream are "123", with no trailing
whitespace, not even a new line. In order to know that it has
processed all of the relevant characters, the stream must try to
read a character after the 3. In which case, it sets
eofbit, to remember that it has seen the end of
file, but it does _not_ set failbit, because "123"
is a valid complet input for an int.)
goodbit
This isn't even a bit pattern, but simply a special value in
which none of the preceding bits are set.
For the most part, failbit and eofbit are only relevant on
input; you'll get (or should get) badbit on output if the disk
is full.

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

Is the inconsistency of C++'s istream::eof() a bug in the spec or a bug in the implementation?

The following program demonstrates an inconsistency in the way that std::istream (specifically in my test code, std::istringstream) sets eof().
#include <sstream>
#include <cassert>
int main(int argc, const char * argv[])
{
// EXHIBIT A:
{
// An empty stream doesn't recognize that it's empty...
std::istringstream stream( "" );
assert( !stream.eof() ); // (Not yet EOF. Maybe should be.)
// ...until I read from it:
const int c = stream.get();
assert( c < 0 ); // (We received garbage.)
assert( stream.eof() ); // (Now we're EOF.)
}
// THE MORAL: EOF only happens when actually attempting to read PAST the end of the stream.
// EXHIBIT B:
{
// A stream that still has data beyond the current read position...
std::istringstream stream( "c" );
assert( !stream.eof() ); // (Clearly not yet EOF.)
// ... clearly isn't eof(). But when I read the last character...
const int c = stream.get();
assert( c == 'c' ); // (We received something legit.)
assert( !stream.eof() ); // (But we're already EOF?! THIS ASSERT FAILS.)
}
// THE MORAL: EOF happens when reading the character BEFORE the end of the stream.
// Conclusion: MADNESS.
return 0;
}
So, eof() "fires" when you read the character before the actual end-of-file. But if the stream is empty, it only fires when you actually attempt to read a character. Does eof() mean "you just tried to read off the end?" or "If you try to read again, you'll go off the end?" The answer is inconsistent.
Moreover, whether the assert fires or not depends on the compiler. Apple Clang 4.1, for example, fires the assertion (raises eof() when reading the preceding character). GCC 4.7.2, for example, does not.
This inconsistency makes it hard to write sensible loops that read through a stream but handle both empty and non-empty streams well.
OPTION 1:
while( stream && !stream.eof() )
{
const int c = stream.get(); // BUG: Wrong if stream was empty before the loop.
// ...
}
OPTION 2:
while( stream )
{
const int c = stream.get();
if( stream.eof() )
{
// BUG: Wrong when c in fact got the last character of the stream.
break;
}
// ...
}
So, friends, how do I write a loop that parses through a stream, dealing with each character in turn, handles every character, but stops without fuss either when we hit the EOF, or in the case when the stream is empty to begin with, never starts?
And okay, the deeper question: I have the intuition that using peek() could maybe workaround this eof() inconsistency somehow, but...holy crap! Why the inconsistency?
The eof() flag is only useful to determine if you hit end of file after some operation. The primary use is to avoid an error message if reading reasonably failed because there wasn't anything more to read. Trying to control a loop or something using eof() is bound to fail. In all cases you need to check after you tried to read if the read was successful. Before the attempt the stream can't know what you are going to read.
The semantics of eof() is defined thoroughly as "this flag gets set when reading the stream caused the stream buffer to return a failure". It isn't quite as easy to find this statement if I recall correct but this is what comes down. At some point the standard also says that the stream is allowed to read more than it has to in some situation which may cause eof() to be set when you don't necessarily expect it. One such example is reading a character: the stream may end up detecting that there is nothing following that character and set eof().
If you want to handle an empty stream, it's trivial: look at something from the stream and proceed only if you know it's not empty:
if (stream.peek() != std::char_traits<char>::eof()) {
do_what_needs_to_be_done_for_a_non_empty_stream();
}
else {
do_something_else();
}
Never, ever check for eof alone.
The eof flag (which is the same as the eofbit bit flag in a value returned by rdstate()) is set when end-of-file is reached during an extract operation. If there were no extract operations, eofbit is never set, which is why your first check returns false.
However eofbit is no indication as to whether the operation was successful. For that, check failbit|badbit in rdstate(). failbit means "there was a logical error", and badbit means "there was an I/O error". Conveniently, there's a fail() function that returns exactly rdstate() & (failbit|badbit). Even more conveniently, there's an operator bool() function that returns !fail(). So you can do things like while(stream.read(buffer)){ ....
If the operation has failed, you may check eofbit, badbit and failbit separately to figure out why it has failed.
What compiler / standard c++ library are you using? I tried it with gcc 4.6.3/4.7.2 and clang 3.1, and all of them worked just fine (i.e. the assertion does not fire).
I think you should report this as a bug in your tool-chain, since my reading of the standard accords with your intuition that eof() should not be set as long as get() is able to return a character.
It's not a bug, in the sense that it's the intended behavior. It is
not the intent that you use test for eof() until after input has
failed. It's main purpose is for use inside extraction functions, where
in early implementations, the fact that std::streambuf::sgetc()
returned EOF didn't mean that it would the next time it was called:
the intent was that anytime sgetc() returned EOF (now
std::char_traits<>::eof(), this would be memorized, and the stream
would make no further calls to the streambuf.
Practically speaking: we really need two eof(): one for internal use,
as above, and another which will reliably state that failure was due to
having reached end of file. As it is, given something like:
std::istringstream s( "1.23e+" );
s >> aDouble;
There's no way of detecting that the error is due to a format error,
rather than the stream not having any more data. In this case, the
internal eof should return true (because we have seen end of file, when
looking ahead, and we want to suppress all further calls to the
streambuf extractor functions), but the external one should be false,
because there was data present (even after skipping initial whitespace).
If you're not implementing an extractor function, of course, you should
never test ios_base::eof() until you've actually had an input failure.
It was never the intent that this would provide any useful information
(which makes one wonder why they defined ios_base::good()—the
fact that it returns false if eof() means that it can provide nor
reliable information untin fail() returns true, at which point, we
know that it will return false, so there's no point in calling it).
And I'm not sure what your problem is. Because the stream cannot know
in advance what your next input will be (e.g. whether it will skip
whitespace or not), it cannot know in advance whether your next input
will fail because of end of file or not. The idiom adopted is clear:
try the input, then test whether is succeeded or not. There is no
other way, because no other alternative can be implemented. Pascal did
it differently, but a file in Pascal was typed—you could only read
one type from it, so it could always read ahead one element under the
hood, and return end of file if this read ahead failed. Not having
previsional end of file is the price we pay for being able to read more
than one type from a file.
The behavior is somewhat subtle. eofbit is set when an attempt is made to read past the end of the file, but that may or may not cause failure of the current extraction operation.
For example:
ifstream blah;
// assume the file got opened
int i, j;
blah >> i;
if (!blah.eof())
blah >> j;
If the file contains 142<EOF>, then the sequence of digits is terminated by end of file, so eofbit is set AND the extraction succeeds. Extraction of j won't be attempted, because the end of file has already been encountered.
If the file contains 142 <EOF>, the the sequence of digits is terminated by whitespace (extraction of i succeeds). eofbit is not set yet, so blah >> j will be executed, and it will reach end of file without finding any digits, so it will fail.
Notice how the innocuous-looking whitespace at the end of file changed the behavior.