Why does string extraction from a stream set the eof bit? - c++

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]

std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.

Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

Related

Does istream::ignore discard more than n characters?

(this is possibly a duplicate of Why does std::basic_istream::ignore() extract more characters than specified?, however my specific case doesn't deal with the delim)
From cppreference, the description of istream::ignore is the following:
Extracts and discards characters from the input stream until and including delim.
ignore behaves as an UnformattedInputFunction. After constructing and checking the sentry object, it extracts characters from the stream and discards them until any one of the following conditions occurs:
count characters were extracted. This test is disabled in the special case when count equals std::numeric_limitsstd::streamsize::max()
end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()
However, let's say I've got the following program:
#include <iostream>
int main(void) {
int x;
char p;
if (std::cin >> x) {
std::cout << x;
} else {
std::cin.clear();
std::cin.ignore(2);
std::cout << "________________";
std::cin >> p;
std::cout << p;
}
Now, let's say I input something like p when my program starts. I expect cin to 'fail', then clear to be called and ignore to discard 2 characters from the buffer. So 'p' and '\n' that are left in the buffer should be discarded. However, the program still expects input after ignore gets called, so in reality it's only get to the final std::cin>>p after I've given it more than 2 characters to discard.
My issue:
Inputting something like 'b' and hitting Enter immediately after the first input (so 2 after the characters get discarded, 'p' and '\n') keeps 'b' in the buffer and immediately passes it to cin, without first printing the message. How can I make it so that the message gets printed immediately after the two characters are discarded and then << is called?
After a lot of back and forth in the comments (and reproducing the problem myself), it's clear the problem is that:
You enter p<Enter>, which isn't parsable
You try to discard exactly two characters with ignore
You output the underscores
You prompt for the next input
but in fact things seem to stop at step 2 until you give it more input, and the underscores only appear later. Well, bad news, you're right, the code is blocking at step 2 in ignore. ignore is blocking waiting for a third character to be entered (really, checking if it's EOF after those two characters), and by the spec, this is apparently the correct thing to do, I think?
The problem here is the same basic issue as the problem you linked just a different manifestation. When ignore terminates because it's read the number of characters requested, it always attempts to reads one more character, because it needs to know if condition 2 might also be true (it happened to read the last character so it can take the appropriate action, putting cin in EOF state, or leaving the next character in the buffer for the next read otherwise):
Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
Since you didn't provide an end character for ignore, it's looking for EOF, and if it doesn't find it after two characters, it must read one more to see if it shows up after the ignored characters (if it does, it'll leave cin in EOF state, if not, the character it peeked at will be the next one you read).
Simplest solution here is to not try to specifically discard exactly two characters. You want to get rid of everything through the newline, so do that with:
std::cin.ignore(std::numeric_limits<std::stringsize>::max(), '\n');
instead of std::cin.ignore(2);; that will read any and all characters until the newline (or EOF), consume the newline, and it won't ever overread (in the sense that it continues forever until the delimiter or EOF is found, there is no condition under which it finishes reading a count of characters and needs to peek further).
If for some reason you want to specifically ignore exactly two characters (how do you know they entered p<Enter> and not pabc<Enter>?), just call .get() on it a couple times or .read(&two_byte_buffer, 2) or the like, so you read the raw characters without the possibility of trying to peek beyond them.
For the record, this seems a little from the cppreference spec (which may be wrong); condition 2 in the spec doesn't specify it needs to verify if it is at EOF after reading count characters, and cppreference claims condition 3 (which would need to peek) is explicitly not checked if the "delimiter" is the default Traits::eof(). But the spec quote found in your other answer doesn't include that line about condition 3 not applying for Traits::eof(), and condition 2 might allow for checking if you're at EOF, which would end up with the observed behavior.
Your problem is related to your terminal. When you press ENTER, you are most likely getting two characters -- '\r' and '\n'. Consequently, there is still one character left in the input stream to read from. Change that line to:
std::cin.ignore(10, '\n'); // 10 is not magical. You may use any number > 2
to see the behavior you are expecting.
Passing exact number of characters in buffer will do the trick:
std::cin.ignore(std::cin.rdbuf()->in_avail());

Get amount of read characters from formatted input operation

I am reading numbers from an istreamby using the >> operator overload. This works fine, but now I need to know how many characters have been consumed by that operation. I'm currently using something like
int startPos = in.tellg();
double number;
in >> number;
int readChars = in.tellg() - startPos;
This does work in some cases but it is quite fragile. When using std::cin as in, this doesn't work at all though (I assume that this is because std::cin doesn't have a position in the stream a it's potentially an endless one).
My question is (I think) rather simple: How can I get the amount of characters that have been read when using the >> operator?
During my search I encountered gcount() but this only works for unformatted input.
The documentation of the >> operator doesn't seem to give a hint on this either: http://www.cplusplus.com/reference/istream/istream/operator%3E%3E/
If the stream is formatted can't you just check the length of it?
Anways, std::istream::operator>> for C++ 98:
The function is considered to perform formatted input: Internally, the function accesses the input sequence by first constructing a sentry object (with noskipws set to false). Then (if good), it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
For C++ 11:
The function is considered to perform unformatted input: Internally, the function accesses the input sequence by first constructing a sentry object (with noskipws set to true). Then (if good), it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
The number of characters successfully read and stored by this function can be accessed by calling member gcount.
So it seems that you can only count characters from unformatted input.
But:
The unformatted input operations that modify the value returned by this function (gcount()) are: get, getline, ignore, peek, read, readsome, putback and unget.
Notice though, that peek, putback and unget do not actually extract any characters, and thus gcount will always return zero after calling any of them.
So maybe you can use, for instance, istream& getline (istream& is, string& str); or std::istream::get to get gcount() to count a formatted stream,

ifstream getline does not find delim

If the ifstream::getline call does not find the delimeter I know it sets failbit but does it also clear out the buffer or does it leave the buffer intact and just set the fail bit to let you know?
There seems to be some confusion about the different states of an input stream (and rightly so, they are confusing):
C++ Standard, table 124
badbit indicates a loss of integrity in an input or output sequence (such as an irrecoverable read error from a file);
eofbit indicates that an input operation reached the end of an input sequence;
failbit indicates that an input operation failed to read the expected characters, or
that an output operation failed to generate the desired characters.
That is, failbit is set when basic_istream::getline(char_type* s, std::streamsize count, char_type delim) extracts count-1 characters without finding the delimiter (-1 for the terminating \0 stored). This does not indicate the stream is bad, but it rather indicates that getline has failed to find the delimiter.
Description of basic_istream::getline in the C++ Standard:
[istream.unformatted]/18
Effects: [...] After constructing a sentry object [= preparation for input and error checks], extracts characters and stores them into successive locations of an array whose first element is designated by s. Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) [= delimiter found] for the next available input character c (in which case the input character is extracted but not stored);
n is less than one or n - 1 characters are stored (in which case the function calls setstate(failbit)).
[...]
These conditions are tested in the order shown.
If the function extracts no characters, it calls setstate(failbit) [...]
In any case, if n is greater than zero, it then stores a null character [...] into the next successive location of the array.
[emphasis and omissions mine]
From here:
Internally, the function accesses the input sequence by first
constructing a sentry object (with noskipws set to true). Then (if
good), it extracts characters from its associated stream buffer object
as if calling its member functions sbumpc or sgetc, and finally
destroys the sentry object before returning.
It seems that the buffer is filled until there is a problem. (See DyP's comments)

Contents of the string after failed extraction from istream

If I do this:
ifstream stream("somefilewhichopenssuccesfully.txt");
string token;
if( stream >> token )
cout << token;
else
cout << token;
Is the output in the second case guaranteed to be an empty string? I can't seem to find the answer to this on cplusplus.com.
Thanks!
Is the output in the second case guaranteed to be an empty string?
The answer is : no, because it depends, as described below.
Since else block will be executed only if an attempt to read from the stream fails, and that can occur anytime in the course of reading.
If it fails at the very first attempt, then there is no character extraction from the stream, and hence token will be empty (as it was).
If it fails after few reads, then token will not be empty. It will contain the characters successfully read so far from the stream.
The section §21.3.7.9 from the Standard says,
Begins by constructing a sentry object
k as if k were constructed by typename
basic_istream::sentry
k(is). If bool(k) is true, it calls
str.erase() and then extracts
characters from is and appends them to
str as if by calling str.append(1,c).
If is.width() is greater than zero,
the maximum number n of characters
appended is is.width(); otherwise n is
str.max_size(). Characters are
extracted and appended until any of
the following occurs:
— n characters
are stored;
— end-of-file occurs on
the input sequence;
— isspace(c,is.getloc()) is true for the
next available input character c.
After the last character (if any) is
extracted, is.width(0) is called and
the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.4.4.3).
Also note that the section §21.3.1/2 from the Standard guarantees that the default constructed string will be empty. The Standard says its size will be zero, that means, empty.
I deleted my original answer because I wanted to test this. This is what I see, if there is an error whilst reading (EOF is not counted in this context), the original string is modified and the branch sees the modified version. To test I did the following, created a 2Gb file (touch then truncate), the above code to read. Whilst the code was running, removed the file (this should set the failbit - I think). Immediately stops reading, but the string is modified - it has a larger size.
To me this indicates that the string is modified even if the stream operation fails.
No, even if the operation fails, the string will contain the characters extracted so far.
The standard says (§21.4.8.9):
Effects: Behaves as a formatted input function (27.7.2.2.1). After constructing a sentry object, if the sentry converts to true, calls str.erase() and then extracts characters from is and appends them to str as if by calling str.append(1,c). If is.width() is greater than zero, the maximum number n of characters appended is is.width(); otherwise n is str.max_size(). Characters are extracted and appended until any of the following occurs:
— n characters are stored;
— end-of-file occurs on the input sequence;
— isspace(c,is.getloc()) is true for the next available input character c.

How do I set EOF on an istream without reading formatted input?

I'm doing a read in on a file character by character using istream::get(). How do I end this function with something to check if there's nothing left to read in formatted in the file (eg. only whitespace) and set the corresponding flags (EOF, bad, etc)?
Construct an istream::sentry on the stream. This will have a few side effects, the one we care about being:
If its skipws format flag is set, and the constructor is not passed true as second argument (noskipws), all leading whitespace characters (locale-specific) are extracted and discarded. If this operation exhausts the source of characters, the function sets both the failbit and eofbit internal state flags
You can strip any amount of leading (or trailing, as it were) whitespace from a stream at any time by reading to std::ws. For instance, if we were reading a file from STDIN, we would do:
std::cin >> std::ws
Credit to this comment on another version of this question, asked four years later.
How do I end this function with something to check if there's nothing left to read in formatted in the file (eg. only whitespace)?
Whitespace characters are characters in the stream. You cannot assume that the stream will do intelligent processing for you. Until and unless, you write your own filtering stream.
By default, all of the formatted extraction operations (overloads of operator>>()) skip over whitespace before extracting an item -- are you sure you want to part ways with this approach?
If yes, then you could probably achieve what you want by deriving a new class, my_istream, from istream, and overriding each operator>>() to call the following method at the end:
void skip_whitespace() {
char ch;
ios_base old_flags = flags(ios_base::skipws);
*this >> ch; // Skips over whitespace to read a character
flags(old_flags);
if (*this) { // I.e. not at end of file and no errors occurred
unget();
}
}
It's quite a bit of work. I'm leaving out a few details here (such as the fact that a more general solution would be to override the class template basic_istream<CharT, Traits>).
istream is not going to help a lot - it functions as designed. However, it delegates the actual reading to streambufs. If your streambuf wrapper trims trailing whitespace, an istream reading from that streambuf won't notice it.