Contents of the string after failed extraction from istream - c++

If I do this:
ifstream stream("somefilewhichopenssuccesfully.txt");
string token;
if( stream >> token )
cout << token;
else
cout << token;
Is the output in the second case guaranteed to be an empty string? I can't seem to find the answer to this on cplusplus.com.
Thanks!

Is the output in the second case guaranteed to be an empty string?
The answer is : no, because it depends, as described below.
Since else block will be executed only if an attempt to read from the stream fails, and that can occur anytime in the course of reading.
If it fails at the very first attempt, then there is no character extraction from the stream, and hence token will be empty (as it was).
If it fails after few reads, then token will not be empty. It will contain the characters successfully read so far from the stream.
The section §21.3.7.9 from the Standard says,
Begins by constructing a sentry object
k as if k were constructed by typename
basic_istream::sentry
k(is). If bool(k) is true, it calls
str.erase() and then extracts
characters from is and appends them to
str as if by calling str.append(1,c).
If is.width() is greater than zero,
the maximum number n of characters
appended is is.width(); otherwise n is
str.max_size(). Characters are
extracted and appended until any of
the following occurs:
— n characters
are stored;
— end-of-file occurs on
the input sequence;
— isspace(c,is.getloc()) is true for the
next available input character c.
After the last character (if any) is
extracted, is.width(0) is called and
the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.4.4.3).
Also note that the section §21.3.1/2 from the Standard guarantees that the default constructed string will be empty. The Standard says its size will be zero, that means, empty.

I deleted my original answer because I wanted to test this. This is what I see, if there is an error whilst reading (EOF is not counted in this context), the original string is modified and the branch sees the modified version. To test I did the following, created a 2Gb file (touch then truncate), the above code to read. Whilst the code was running, removed the file (this should set the failbit - I think). Immediately stops reading, but the string is modified - it has a larger size.
To me this indicates that the string is modified even if the stream operation fails.

No, even if the operation fails, the string will contain the characters extracted so far.
The standard says (§21.4.8.9):
Effects: Behaves as a formatted input function (27.7.2.2.1). After constructing a sentry object, if the sentry converts to true, calls str.erase() and then extracts characters from is and appends them to str as if by calling str.append(1,c). If is.width() is greater than zero, the maximum number n of characters appended is is.width(); otherwise n is str.max_size(). Characters are extracted and appended until any of the following occurs:
— n characters are stored;
— end-of-file occurs on the input sequence;
— isspace(c,is.getloc()) is true for the next available input character c.

Related

Does istream::ignore discard more than n characters?

(this is possibly a duplicate of Why does std::basic_istream::ignore() extract more characters than specified?, however my specific case doesn't deal with the delim)
From cppreference, the description of istream::ignore is the following:
Extracts and discards characters from the input stream until and including delim.
ignore behaves as an UnformattedInputFunction. After constructing and checking the sentry object, it extracts characters from the stream and discards them until any one of the following conditions occurs:
count characters were extracted. This test is disabled in the special case when count equals std::numeric_limitsstd::streamsize::max()
end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()
However, let's say I've got the following program:
#include <iostream>
int main(void) {
int x;
char p;
if (std::cin >> x) {
std::cout << x;
} else {
std::cin.clear();
std::cin.ignore(2);
std::cout << "________________";
std::cin >> p;
std::cout << p;
}
Now, let's say I input something like p when my program starts. I expect cin to 'fail', then clear to be called and ignore to discard 2 characters from the buffer. So 'p' and '\n' that are left in the buffer should be discarded. However, the program still expects input after ignore gets called, so in reality it's only get to the final std::cin>>p after I've given it more than 2 characters to discard.
My issue:
Inputting something like 'b' and hitting Enter immediately after the first input (so 2 after the characters get discarded, 'p' and '\n') keeps 'b' in the buffer and immediately passes it to cin, without first printing the message. How can I make it so that the message gets printed immediately after the two characters are discarded and then << is called?
After a lot of back and forth in the comments (and reproducing the problem myself), it's clear the problem is that:
You enter p<Enter>, which isn't parsable
You try to discard exactly two characters with ignore
You output the underscores
You prompt for the next input
but in fact things seem to stop at step 2 until you give it more input, and the underscores only appear later. Well, bad news, you're right, the code is blocking at step 2 in ignore. ignore is blocking waiting for a third character to be entered (really, checking if it's EOF after those two characters), and by the spec, this is apparently the correct thing to do, I think?
The problem here is the same basic issue as the problem you linked just a different manifestation. When ignore terminates because it's read the number of characters requested, it always attempts to reads one more character, because it needs to know if condition 2 might also be true (it happened to read the last character so it can take the appropriate action, putting cin in EOF state, or leaving the next character in the buffer for the next read otherwise):
Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
Since you didn't provide an end character for ignore, it's looking for EOF, and if it doesn't find it after two characters, it must read one more to see if it shows up after the ignored characters (if it does, it'll leave cin in EOF state, if not, the character it peeked at will be the next one you read).
Simplest solution here is to not try to specifically discard exactly two characters. You want to get rid of everything through the newline, so do that with:
std::cin.ignore(std::numeric_limits<std::stringsize>::max(), '\n');
instead of std::cin.ignore(2);; that will read any and all characters until the newline (or EOF), consume the newline, and it won't ever overread (in the sense that it continues forever until the delimiter or EOF is found, there is no condition under which it finishes reading a count of characters and needs to peek further).
If for some reason you want to specifically ignore exactly two characters (how do you know they entered p<Enter> and not pabc<Enter>?), just call .get() on it a couple times or .read(&two_byte_buffer, 2) or the like, so you read the raw characters without the possibility of trying to peek beyond them.
For the record, this seems a little from the cppreference spec (which may be wrong); condition 2 in the spec doesn't specify it needs to verify if it is at EOF after reading count characters, and cppreference claims condition 3 (which would need to peek) is explicitly not checked if the "delimiter" is the default Traits::eof(). But the spec quote found in your other answer doesn't include that line about condition 3 not applying for Traits::eof(), and condition 2 might allow for checking if you're at EOF, which would end up with the observed behavior.
Your problem is related to your terminal. When you press ENTER, you are most likely getting two characters -- '\r' and '\n'. Consequently, there is still one character left in the input stream to read from. Change that line to:
std::cin.ignore(10, '\n'); // 10 is not magical. You may use any number > 2
to see the behavior you are expecting.
Passing exact number of characters in buffer will do the trick:
std::cin.ignore(std::cin.rdbuf()->in_avail());

ifstream getline does not find delim

If the ifstream::getline call does not find the delimeter I know it sets failbit but does it also clear out the buffer or does it leave the buffer intact and just set the fail bit to let you know?
There seems to be some confusion about the different states of an input stream (and rightly so, they are confusing):
C++ Standard, table 124
badbit indicates a loss of integrity in an input or output sequence (such as an irrecoverable read error from a file);
eofbit indicates that an input operation reached the end of an input sequence;
failbit indicates that an input operation failed to read the expected characters, or
that an output operation failed to generate the desired characters.
That is, failbit is set when basic_istream::getline(char_type* s, std::streamsize count, char_type delim) extracts count-1 characters without finding the delimiter (-1 for the terminating \0 stored). This does not indicate the stream is bad, but it rather indicates that getline has failed to find the delimiter.
Description of basic_istream::getline in the C++ Standard:
[istream.unformatted]/18
Effects: [...] After constructing a sentry object [= preparation for input and error checks], extracts characters and stores them into successive locations of an array whose first element is designated by s. Characters are extracted and stored until one of the following occurs:
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit));
traits::eq(c, delim) [= delimiter found] for the next available input character c (in which case the input character is extracted but not stored);
n is less than one or n - 1 characters are stored (in which case the function calls setstate(failbit)).
[...]
These conditions are tested in the order shown.
If the function extracts no characters, it calls setstate(failbit) [...]
In any case, if n is greater than zero, it then stores a null character [...] into the next successive location of the array.
[emphasis and omissions mine]
From here:
Internally, the function accesses the input sequence by first
constructing a sentry object (with noskipws set to true). Then (if
good), it extracts characters from its associated stream buffer object
as if calling its member functions sbumpc or sgetc, and finally
destroys the sentry object before returning.
It seems that the buffer is filled until there is a problem. (See DyP's comments)

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.

std::ifstream::read or std::ofstream::write with a zero parameter?

Is it perfectly ok (= well defined behaviour according to the standard) to call :
mystream.read(buffer, 0);
or
mystream.write(buffer, 0);
(and of course nothing will be read or written).
I would like to know if I have to test if the provided size is null before calling one of these two functions.
Yes, the behavior is well-defined: both functions will go through the motions for unformatted input/output functions (constructing the sentry, setting failbit if eofbit is set, flushing the tied stream if necessary), and then they will get to this clause:
§27.7.2.3[istream.unformatted]/30
Characters are extracted and stored until either of the following occurs:
— n characters are stored;
§27.7.3.7[ostream.unformatted]/5
Characters are inserted until either of the following occurs
— n characters are inserted;
"zero characters are stored/inserted" is true before anything is stored or extracted.
Looking at actual implementations, I see for (; gcount < n; ++gcount) in libc++ or sgetn(buffer, n); in stdlibc++ which has the equivalent loop
Another extraction from 27.7.2.3 Unformatted input functions/1 gives us a clue that zero-size input buffers are valid case:
unformatted input functions taking a character array of non-zero size as an argument shall also store a null character (using charT()) in the first location of the array.

Does using ignore(numeric_limits<streamsize>::max()) in the IOStreams library handle arbitrarily massive streams?

In the C++ standard (section 27.6.1.3\24), for the
istream ignore() function in the IOStreams library, it implies that if you supply an argument for 'n' of numeric_limits::max(), it will continue to ignore characters
forever up until the delimiter is found, even way beyond the actual
max value for streamsize (i.e. the 'n' argument is interpreted as infinite).
For the gcc implementation this does indeed appear to be how
ignore() is implemented, but I'm still unclear as to
whether this is implementation specific, or mandated by the standard.
Can someone who knows this well confirm that this is guaranteed by a
standard compliant iostreams library?
The standard says that numeric_limits<streamsize>::max() is a special value that doesn't affect the number of characters skipped.
Effects: Behaves as an unformatted input function (as described in 27.7.2.3, paragraph 1). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
-- if n != numeric_limits<streamsize>::max() (18.3.2), n characters are extracted
-- end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
-- traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
According to here:
istream& istream::ignore ( streamsize n = 1, int delim = EOF );
Extract and discard characters
Extracts characters from the input sequence and discards them.
The extraction ends when n characters have been extracted and discarded or when the character delim is found, whichever comes first. In the latter case, the delim character itself is also extracted.
In your case, when numeric_limits::max() number of characters have been reached, the first condition is met.
[Per Bo]
However, according to spec, the above case is applied only when n is not equal to numeric_limits<streamsize>::max().