C++ istream::peek - shouldn't it be nonblocking? - c++

It seems well accepted that the istream::peek operation is blocking.
The standard, though arguably a bit ambiguous, leans towards nonblocking behavior. peek calls sgetc in turn, whose behavior is:
"The character at the current position of the controlled input sequence, as a value of type int.
If there are no more characters to read from the controlled input sequence, the function returns the end-of-file value (EOF)."
It doesn't say "If there are no more characters.......wait until there are"
Am I missing something here? Or are the peek implementations we use just kinda wrong?

The controlled input sequence is the file (or whatever) from which you're reading. So if you're at end of file, it returns EOF. Otherwise it returns the next character from the file.
I see nothing here that's ambiguous at all--if it needs a character that hasn't been read from the file, then it needs to read it (and wait till it's read, and return it).
If you're reading from something like a socket, then it's going to wait until data arrives (or the network stack detects EOF, such as the peer disconnecting).

The description from cppreference.com might be clearer than the one in your question:
Ensures that at least one character is available in the input area by [...] reading more data in from the input sequence (if applicable)."
"if applicable" does apply in this case; and "reading data from the input sequence" entails waiting for more data if there is none and the stream is not in an EOF or other error state.

When I get confused about console input I remind myself that console input can be redirected to come from a file, so the behavior of the keyboard more or less mimics the behavior of a file. When you try to read a character from file, you can get one of two results: you get a character, or you get EOF because you've reached the end of the file -- there are no more characters to be read. Same thing for keyboard input: either you get a character, or you get EOF because you've reached the end of the file. With a file, there is no notion of waiting for more characters: either a file has unread characters or it doesn't. Same thing for the keyboard. So if you have't reached EOF on the keyboard, reading a character returns the next character. You reach EOF on the keyboard by typing whatever character your system recognizes as EOF; on Unix systems that's ctrl-D, on Windows (if I remember correctly) that's ctrl-C. If you haven't reached EOF, there are more characters to be read.

Related

Does istream::ignore discard more than n characters?

(this is possibly a duplicate of Why does std::basic_istream::ignore() extract more characters than specified?, however my specific case doesn't deal with the delim)
From cppreference, the description of istream::ignore is the following:
Extracts and discards characters from the input stream until and including delim.
ignore behaves as an UnformattedInputFunction. After constructing and checking the sentry object, it extracts characters from the stream and discards them until any one of the following conditions occurs:
count characters were extracted. This test is disabled in the special case when count equals std::numeric_limitsstd::streamsize::max()
end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()
However, let's say I've got the following program:
#include <iostream>
int main(void) {
int x;
char p;
if (std::cin >> x) {
std::cout << x;
} else {
std::cin.clear();
std::cin.ignore(2);
std::cout << "________________";
std::cin >> p;
std::cout << p;
}
Now, let's say I input something like p when my program starts. I expect cin to 'fail', then clear to be called and ignore to discard 2 characters from the buffer. So 'p' and '\n' that are left in the buffer should be discarded. However, the program still expects input after ignore gets called, so in reality it's only get to the final std::cin>>p after I've given it more than 2 characters to discard.
My issue:
Inputting something like 'b' and hitting Enter immediately after the first input (so 2 after the characters get discarded, 'p' and '\n') keeps 'b' in the buffer and immediately passes it to cin, without first printing the message. How can I make it so that the message gets printed immediately after the two characters are discarded and then << is called?
After a lot of back and forth in the comments (and reproducing the problem myself), it's clear the problem is that:
You enter p<Enter>, which isn't parsable
You try to discard exactly two characters with ignore
You output the underscores
You prompt for the next input
but in fact things seem to stop at step 2 until you give it more input, and the underscores only appear later. Well, bad news, you're right, the code is blocking at step 2 in ignore. ignore is blocking waiting for a third character to be entered (really, checking if it's EOF after those two characters), and by the spec, this is apparently the correct thing to do, I think?
The problem here is the same basic issue as the problem you linked just a different manifestation. When ignore terminates because it's read the number of characters requested, it always attempts to reads one more character, because it needs to know if condition 2 might also be true (it happened to read the last character so it can take the appropriate action, putting cin in EOF state, or leaving the next character in the buffer for the next read otherwise):
Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
Since you didn't provide an end character for ignore, it's looking for EOF, and if it doesn't find it after two characters, it must read one more to see if it shows up after the ignored characters (if it does, it'll leave cin in EOF state, if not, the character it peeked at will be the next one you read).
Simplest solution here is to not try to specifically discard exactly two characters. You want to get rid of everything through the newline, so do that with:
std::cin.ignore(std::numeric_limits<std::stringsize>::max(), '\n');
instead of std::cin.ignore(2);; that will read any and all characters until the newline (or EOF), consume the newline, and it won't ever overread (in the sense that it continues forever until the delimiter or EOF is found, there is no condition under which it finishes reading a count of characters and needs to peek further).
If for some reason you want to specifically ignore exactly two characters (how do you know they entered p<Enter> and not pabc<Enter>?), just call .get() on it a couple times or .read(&two_byte_buffer, 2) or the like, so you read the raw characters without the possibility of trying to peek beyond them.
For the record, this seems a little from the cppreference spec (which may be wrong); condition 2 in the spec doesn't specify it needs to verify if it is at EOF after reading count characters, and cppreference claims condition 3 (which would need to peek) is explicitly not checked if the "delimiter" is the default Traits::eof(). But the spec quote found in your other answer doesn't include that line about condition 3 not applying for Traits::eof(), and condition 2 might allow for checking if you're at EOF, which would end up with the observed behavior.
Your problem is related to your terminal. When you press ENTER, you are most likely getting two characters -- '\r' and '\n'. Consequently, there is still one character left in the input stream to read from. Change that line to:
std::cin.ignore(10, '\n'); // 10 is not magical. You may use any number > 2
to see the behavior you are expecting.
Passing exact number of characters in buffer will do the trick:
std::cin.ignore(std::cin.rdbuf()->in_avail());

Can "eof" be set in ofstream?

I could not find pretty much any information on this. Whether and if so, under what circumstances, can eofbit be set (meaning ofstream_instance.eof() is true )?
I am more interested in an independent ofstream, one that is not associated with an ifstream within some fstream, so that the "shared" eofbit can't be set by the ifstream (if something like that is possible).
If I simply write to a file and there is no space on disk or operation system does not provide another space for the writing, then I'd expect just only either failbit or badbit to be set, but reaching end of file while writing to it does not make sense to me. However no discussion on this I was able to find.
No. eof() returns the eofbit, which has no real meaning for an output stream with no associated input stream.
eofbit indicates that an input operation reached the end of an input sequence
[ios.types] / 3.1, Table 107
The actions that set eofbit are enumerated here, and they all act on input streams only.
We could imagine some weird implementation-specific scenario in which EOF (as opposed to some other error condition) would be hit while writing to a file - maybe there is a fixed-size file buffer we are writing to through some OS functions - but as far as I know the standard library abstractions do not deal with that case, and I have never seen or heard of such an API in the first place.

Clarify the difference between input/output stream and input/output buffer

EDITED
Input stream and Input buffer
From what I understand, when a key is pressed on the keyboard, the character go into the input stream (stdin) and get stored in the buffer. Then the scanf (in case of C) or cin(in case of C++) extracts the character from the buffer and places it in the main memory.
Output stream and Output buffer
Similarly, before characters are displayed on the screen, they are first stored in a buffer. Then the printf (in case of C) or cout(in case of C++) extracts the characters from the buffer (when it is full) and sends it to the output (stdout) stream.
Am I right? I've been stuck on this for quite a while now and my logic may be flawed.
Side note: scanf() is not the function to read input, see more here.
Now for your question: When asking about C (and, C++), e.g., the language, you should stay within the abstract concepts the language provides. So, don't start at the keyboard, that's far outside your program.
Start here: The operating system wants to deliver some input to you. Now, your C runtime provides a stream of input to your code. The stream is an abstract concept, it just means something you can continuously read from. This stream can be buffered or unbuffered, and if it's buffered, there are different modes (fully buffered or line buffered) available. You can configure all of that.
If your stream is unbuffered, this means the operating system has to wait until your code wants to read from the input stream. By default, your standard input stream is line buffered, which means your C runtime accepts the input immediately and puts it into a buffer until there is a newline -- your code calling input functions will get a result from that buffer.
Conceptually the same happens with output, just the other way around. If your output stream is for example line buffered, your C runtime will fill a buffer until there is a newline and deliver that whole line to the operating system for output. If the output is unbuffered, every single character is immediately passed to the operating system.
Disclaimer: this is still a lot simplified, but should be enough to start with.
As you ask about the term "buffer overflow" in the comments, mentioning gets() -- this is about a buffer inside your own code. With any input function that reads more than a single value/char, you have to provide your own buffer for it to store the result to. With gets(), there's no way to tell the function how large this buffer is, so it will just overflow it whenever the input available is too large. This is why gets() is ill-defined and meanwhile removed from the language C. It has nothing to do with buffers of your C runtime that are possibly used in the implementation of the I/O streams.

What do the function cin.clear() in C ++ detailed description?

Good day, my teacher said that I should learn what makes function cin.clear() in C ++. I was looking for, but a normal explanation was never found. This resource is cplusplus said that this function
Sets a new value for the stream's internal error state flags. The current value of the flags is overwritten: All bits are replaced by those in state; If state is goodbit (which is zero) all error flags are cleared.
But I do not quite understand what the "state"and from there there are flags and error, which is why, and how well we replace them at 0 value. And what is the "flags" and why they are needed. And as he said that I should know what parameters or data which takes a function cin.clear() and returns, I understand that it does not returns, but it also takes something? Please help. Sorry for bad English, I write through a translator.
The function std::basic_ios<>::clear() affects the
std::ios_base::iostate bits, which are, for the most part,
error conditions. The standard defines "four" bits:
badbit
Set if the last input failed because of some hardware failure,
e.g. a read error on the disk. (In practice, I'm not sure that
all implementations check for this; I suspect that some will
just treat this as if there were an end of file.)
failbit
Set if the last input failed for some reason other than that
which would of set the badbit. The most common
reasons are a format error (trying to read an `int` when the
next characters in input were `"abc"`) and encountering end of
file _before_ having been able to read sufficient data for the
requested input.
eofbit
This is _not_ an error condition; it will be set anytime the
stream sees the end of file. This may be because it needs yet
another character in order to parse the input, in which case the
failbit will also be set; but it may also be
because the input stream saw the end of file on look-ahead.
(For this last case, consider inputting an int, where the
remaining characters in the stream are "123", with no trailing
whitespace, not even a new line. In order to know that it has
processed all of the relevant characters, the stream must try to
read a character after the 3. In which case, it sets
eofbit, to remember that it has seen the end of
file, but it does _not_ set failbit, because "123"
is a valid complet input for an int.)
goodbit
This isn't even a bit pattern, but simply a special value in
which none of the preceding bits are set.
For the most part, failbit and eofbit are only relevant on
input; you'll get (or should get) badbit on output if the disk
is full.

Why does string extraction from a stream set the eof bit?

Let's say we have a stream containing simply:
hello
Note that there's no extra \n at the end like there often is in a text file. Now, the following simple code shows that the eof bit is set on the stream after extracting a single std::string.
int main(int argc, const char* argv[])
{
std::stringstream ss("hello");
std::string result;
ss >> result;
std::cout << ss.eof() << std::endl; // Outputs 1
return 0;
}
However, I can't see why this would happen according to the standard (I'm reading C++11 - ISO/IEC 14882:2011(E)). operator>>(basic_stream<...>&, basic_string<...>&) is defined as behaving like a formatted input function. This means it constructs a sentry object which proceeds to eat away whitespace characters. In this example, there are none, so the sentry construction completes with no problems. When converted to a bool, the sentry object gives true, so the extractor continues to get on with the actual extraction of the string.
The extraction is then defined as:
Characters are extracted and appended until any of the following occurs:
n characters are stored;
end-of-file occurs on the input sequence;
isspace(c,is.getloc()) is true for the next available input character c.
After the last character (if any) is extracted, is.width(0) is called and the sentry object k is destroyed.
If the function extracts no characters, it calls is.setstate(ios::failbit), which may throw ios_base::failure (27.5.5.4).
Nothing here actually causes the eof bit to be set. Yes, extraction stops if it hits the end-of-file, but it doesn't set the bit. In fact, the eof bit should only be set if we do another ss >> result;, because when the sentry attempts to gobble up whitespace, the following situation will occur:
If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(failbit | eofbit)
However, this is definitely not happening yet because the failbit isn't being set.
The consequence of the eof bit being set is that the only reason the evil-idiom while (!stream.eof()) doesn't work when reading files is because of the extra \n at the end and not because the eof bit isn't yet set. My compiler is happily setting the eof bit when the extraction stops at the end of file.
So should this be happening? Or did the standard mean to say that setstate(eofbit) should occur?
To make it easier, the relevant sections of the standard are:
21.4.8.9 Inserters and extractors [string.io]
27.7.2.2 Formatted input functions [istream.formatted]
27.7.2.1.3 Class basic_istream::sentry [istream::sentry]
std::stringstream is a basic_istream and the operator>> of std::string "extracts" characters from it (as you found out).
27.7.2.1 Class template basic_istream
2 If rdbuf()->sbumpc() or rdbuf()->sgetc() returns traits::eof(), then the input function, except as
explicitly noted otherwise, completes its actions and does setstate(eofbit), which may throw ios_-
base::failure (27.5.5.4), before returning.
Also, "extracting" means calling these two functions.
3 Two groups of member function signatures share common properties: the formatted input functions (or
extractors) and the unformatted input functions. Both groups of input functions are described as if they
obtain (or extract) input characters by calling rdbuf()->sbumpc() or rdbuf()->sgetc(). They may use
other public members of istream.
So eof must be set.
Intuitively speaking, the EOF bit is set because during the read operation to extract the string, the stream did indeed hit the end of the file. Specifically, it continuously read characters out of the input stream, stopping because it hit the end of the stream before encountering a whitespace character. Accordingly, the stream set the EOF bit to mark that the end of stream was reached. Note that this is not the same as reporting failure - the operation was completed successfully - but the point of the EOF bit is not to report failure. It's to mark that the end of the stream was encountered.
I don't have a specific part of the spec to back this up, though I'll try to look for one when I get the chance.