Why two EOF needed as input? [duplicate] - c++

This question already has an answer here:
Canonical vs. non-canonical terminal input
(1 answer)
Closed 7 years ago.
When I run the code below, I use three inputs (in Ubuntu terminal):
abc(Ctrl+D)(Ctrl+D)
abc(Ctrl+D)(Enter)(Ctrl+D)
abc(Enter)(Ctrl+D)
The code reacts well in all cases. My question is: why in 1) and 2) I need two EOF?
#include <iostream>
int main()
{
int character;
while((character=std::cin.get())!=EOF){}
std::cout << std::endl << character << std::endl;
}

You don't have "two EOF". Bash is putting the tty in raw mode, and interpreting ^D differently depending on context. If you type ^D after a newline, bash closes the input stream on the foreground process. If you type a few characters first, bash requires you to type ^D twice before doing so. (The first ^D is treated like 'delete')

That's how the "EOF" character works (in "canonical" mode input, which is the default). It's actually never sent to the application, so it would be more accurate to call it the EOF signal.
The EOF character (normally Ctrl-D) causes the current line to be returned to the application program immediately. That's very similar to the behaviour of the EOL character (Enter), but unlike EOL, the EOF character is not included in the line.
If the EOF character is typed at the beginning of a line, then zero bytes are returned to the application program (since the EOF character is not sent). But if a read system call returns 0 bytes, that is considered an end-of-file indication. So at the beginning of a line, an EOF will be treated as terminating input; anywhere else, it will merely terminate the line and so you need two of them to terminate input.
For more details, see the .Posix terminal interface specification.

Related

Does istream::ignore discard more than n characters?

(this is possibly a duplicate of Why does std::basic_istream::ignore() extract more characters than specified?, however my specific case doesn't deal with the delim)
From cppreference, the description of istream::ignore is the following:
Extracts and discards characters from the input stream until and including delim.
ignore behaves as an UnformattedInputFunction. After constructing and checking the sentry object, it extracts characters from the stream and discards them until any one of the following conditions occurs:
count characters were extracted. This test is disabled in the special case when count equals std::numeric_limitsstd::streamsize::max()
end of file conditions occurs in the input sequence, in which case the function calls setstate(eofbit)
the next available character c in the input sequence is delim, as determined by Traits::eq_int_type(Traits::to_int_type(c), delim). The delimiter character is extracted and discarded. This test is disabled if delim is Traits::eof()
However, let's say I've got the following program:
#include <iostream>
int main(void) {
int x;
char p;
if (std::cin >> x) {
std::cout << x;
} else {
std::cin.clear();
std::cin.ignore(2);
std::cout << "________________";
std::cin >> p;
std::cout << p;
}
Now, let's say I input something like p when my program starts. I expect cin to 'fail', then clear to be called and ignore to discard 2 characters from the buffer. So 'p' and '\n' that are left in the buffer should be discarded. However, the program still expects input after ignore gets called, so in reality it's only get to the final std::cin>>p after I've given it more than 2 characters to discard.
My issue:
Inputting something like 'b' and hitting Enter immediately after the first input (so 2 after the characters get discarded, 'p' and '\n') keeps 'b' in the buffer and immediately passes it to cin, without first printing the message. How can I make it so that the message gets printed immediately after the two characters are discarded and then << is called?
After a lot of back and forth in the comments (and reproducing the problem myself), it's clear the problem is that:
You enter p<Enter>, which isn't parsable
You try to discard exactly two characters with ignore
You output the underscores
You prompt for the next input
but in fact things seem to stop at step 2 until you give it more input, and the underscores only appear later. Well, bad news, you're right, the code is blocking at step 2 in ignore. ignore is blocking waiting for a third character to be entered (really, checking if it's EOF after those two characters), and by the spec, this is apparently the correct thing to do, I think?
The problem here is the same basic issue as the problem you linked just a different manifestation. When ignore terminates because it's read the number of characters requested, it always attempts to reads one more character, because it needs to know if condition 2 might also be true (it happened to read the last character so it can take the appropriate action, putting cin in EOF state, or leaving the next character in the buffer for the next read otherwise):
Effects: Behaves as an unformatted input function (as described above). After constructing a sentry object, extracts characters and discards them. Characters are extracted until any of the following occurs:
n != numeric_limits::max() (18.3.2) and n characters have been extracted so far
end-of-file occurs on the input sequence (in which case the function calls setstate(eofbit), which may throw ios_base::failure (27.5.5.4));
traits::eq_int_type(traits::to_int_type(c), delim) for the next available input character c (in which case c is extracted).
Since you didn't provide an end character for ignore, it's looking for EOF, and if it doesn't find it after two characters, it must read one more to see if it shows up after the ignored characters (if it does, it'll leave cin in EOF state, if not, the character it peeked at will be the next one you read).
Simplest solution here is to not try to specifically discard exactly two characters. You want to get rid of everything through the newline, so do that with:
std::cin.ignore(std::numeric_limits<std::stringsize>::max(), '\n');
instead of std::cin.ignore(2);; that will read any and all characters until the newline (or EOF), consume the newline, and it won't ever overread (in the sense that it continues forever until the delimiter or EOF is found, there is no condition under which it finishes reading a count of characters and needs to peek further).
If for some reason you want to specifically ignore exactly two characters (how do you know they entered p<Enter> and not pabc<Enter>?), just call .get() on it a couple times or .read(&two_byte_buffer, 2) or the like, so you read the raw characters without the possibility of trying to peek beyond them.
For the record, this seems a little from the cppreference spec (which may be wrong); condition 2 in the spec doesn't specify it needs to verify if it is at EOF after reading count characters, and cppreference claims condition 3 (which would need to peek) is explicitly not checked if the "delimiter" is the default Traits::eof(). But the spec quote found in your other answer doesn't include that line about condition 3 not applying for Traits::eof(), and condition 2 might allow for checking if you're at EOF, which would end up with the observed behavior.
Your problem is related to your terminal. When you press ENTER, you are most likely getting two characters -- '\r' and '\n'. Consequently, there is still one character left in the input stream to read from. Change that line to:
std::cin.ignore(10, '\n'); // 10 is not magical. You may use any number > 2
to see the behavior you are expecting.
Passing exact number of characters in buffer will do the trick:
std::cin.ignore(std::cin.rdbuf()->in_avail());

What the general purpose when using cin.clear? [duplicate]

This question already has answers here:
Why would we call cin.clear() and cin.ignore() after reading input?
(4 answers)
Closed 5 years ago.
I am a beginner to c++, and I just can't wrap my head around whats cin.ignore & cin.clear, they make absolutely no sense to me. When you explain this to me, please be very descriptive
In C++ input processing, cin.fail() would return true if the last cin command failed.
Usually, cin.fail() would return true in the following cases:
anytime you reach the EOF and try to read anything, cin.fail() would return true.
if you try to read an integer and it receives something that cannot be converted to an integer.
When cin.fail() return true and error occurs, the input buffer of cin is placed in an "error state". The state would block the further input processing.
Therefore, you have to use cin.clear(). It would overwrite the current value of the stream internal error flag => All bits are replaced by those in state, if state is good bit all error flags are cleared.
For cin.ignore, first it would accesses the input sequence by first constructing a sentry object. After that, it extracts characters from its associated stream buffer object as if calling its member functions sbumpc or sgetc, and finally destroys the sentry object before returning.
Therefore, It commonly used to perform extracting and discarding characters. A classical cases of cin.ignore is that when you're using getline() after cin, it would leaves a newline in your buffer until you switch function. That why you MUST flush the newline out of the buffer.
std::cin.ignore() can be called three different ways:
No arguments: A single character is taken from the input buffer and discarded:
std::cin.ignore(); //discard 1 character
One argument: The number of characters specified are taken from the input buffer and discarded:
std::cin.ignore(33); //discard 33 characters
Two arguments: discard the number of characters specified, or discard characters up to and including the specified delimiter (whichever comes first):
std::cin.ignore(26, '\n'); //ignore 26 characters or to a newline, whichever comes first
source: http://www.augustcouncil.com/~tgibson/tutorial/iotips.html

C++ istream::peek - shouldn't it be nonblocking?

It seems well accepted that the istream::peek operation is blocking.
The standard, though arguably a bit ambiguous, leans towards nonblocking behavior. peek calls sgetc in turn, whose behavior is:
"The character at the current position of the controlled input sequence, as a value of type int.
If there are no more characters to read from the controlled input sequence, the function returns the end-of-file value (EOF)."
It doesn't say "If there are no more characters.......wait until there are"
Am I missing something here? Or are the peek implementations we use just kinda wrong?
The controlled input sequence is the file (or whatever) from which you're reading. So if you're at end of file, it returns EOF. Otherwise it returns the next character from the file.
I see nothing here that's ambiguous at all--if it needs a character that hasn't been read from the file, then it needs to read it (and wait till it's read, and return it).
If you're reading from something like a socket, then it's going to wait until data arrives (or the network stack detects EOF, such as the peer disconnecting).
The description from cppreference.com might be clearer than the one in your question:
Ensures that at least one character is available in the input area by [...] reading more data in from the input sequence (if applicable)."
"if applicable" does apply in this case; and "reading data from the input sequence" entails waiting for more data if there is none and the stream is not in an EOF or other error state.
When I get confused about console input I remind myself that console input can be redirected to come from a file, so the behavior of the keyboard more or less mimics the behavior of a file. When you try to read a character from file, you can get one of two results: you get a character, or you get EOF because you've reached the end of the file -- there are no more characters to be read. Same thing for keyboard input: either you get a character, or you get EOF because you've reached the end of the file. With a file, there is no notion of waiting for more characters: either a file has unread characters or it doesn't. Same thing for the keyboard. So if you have't reached EOF on the keyboard, reading a character returns the next character. You reach EOF on the keyboard by typing whatever character your system recognizes as EOF; on Unix systems that's ctrl-D, on Windows (if I remember correctly) that's ctrl-C. If you haven't reached EOF, there are more characters to be read.

eof() is returning last character twice [duplicate]

This question already has answers here:
Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?
(5 answers)
Closed 7 years ago.
I am reading in from an input file "input.txt" which has the string 'ABCDEFGH' and I am reading it in char by char. I am doing this using the code:
ifstream plaintext (input.txt);
char ch;
if (plaintext.is_open())
{
while(!plaintext.eof()){
plaintext.get(ch);
cout<<ch<<endl;
}
plaintext.close();
}
The string 'ABCDEFGHH' is printed out. I have no idea why it is printing 'H' twice. Any help would be appreciated. I got this code example from HERE
This is because the EOF test does not mean "our crystal ball tells us that there are no more characters available in this tream". Rather, it is a test which we apply after an input operation fails to determine whether the input failed due to running out of data (EOF) or some other condition (an I/O error of some sort).
In other words, EOF can be false even after we have successfully read what will be the last character. We will then try to read again, and this time get will fail, and not overwrite the existing value of ch, so it still holds the H.
Streams cannot predict the end of the data because then they could not be used for communication devices such as serial lines, interactive terminals or network sockets. On a terminal, we cannot tell that the user has typed the last character they will ever type. On a network, we cannot tell that the byte we have just received is the last one. Rather, we know that the previous byte was the last one, because the current read operation has failed.

Why doesn't the EOF character work if put at the end of a line?

I'm learning C++ and trying to understand,
Why doesn't the EOF character (Ctrl + Z on Windows) break the while loop if put at the end of a line?
My code:
int main() {
char ch;
while(cin >> ch) {
cout << ch;
}
}
When I enter ^Z, the loop breaks;
But when I enter 12^Z, it doesn't.
You won't find an answer to your question in the C++ standard.
cin >> ch will be a "true" condition as long as there's neither an end-of-file condition nor an input error. How an end-of-file condition is triggered is not specified by the language, and it can and will vary from one operating system to another, and even with configuration options in the same OS. (For example, Unix-like systems use control-D by default, but that can be altered by the stty command.)
Windows uses Control-Z to trigger an end-of-file condition for a text input stream; it just happens not to do so other than at the beginning of a line.
Unix behaves a bit differently; it uses Control-D (by default) at the beginning of a line, or two Control-Ds in the middle of a line.
For Unix, this applies only when reading from a terminal; if you're reading from a file, control-D is just another non-printing character, and it doesn't trigger an end-of-file condition. Windows appears to recognize control-Z as an end-of-file trigger even when reading from a disk file.
Bottom line: Different operating systems behave differently, largely for obscure historical reasons. C++ is designed to work with any of these behaviors, which is why it's not specific about some of the details.
The C and C++ standards allow text streams to do quite Unholy things in text mode, which is the default. These Unholy Things include translation between internal newline markers and external newline control characters, as well as treating certain characters or character sequences as denoting end of file. In Unix-land it's not done, but in Windows-land it's done, so the the code can relate only to the original Unix-land conventions.
This means that in Windows, there is no way to write a portable C or C++ program that will copy its input exactly to its input.
While in Unix-land, that's no problem at all.
In Windows, a line consisting of a single [Ctrl Z] is by convention an End Of File marker. This is so not only in the console, but also in text files (depending a bit on the tools). Windows inherited this from DOS, which in turn inherited the general idea from CP/M.
I'm not sure where CP/M got it from, but it's only similar, not at all the same!, as Unix' [Ctrl D].
Over in Unix-land the general convention for end of file is just "no more data". In the console a [Ctrl D] will by default send your typed text immediately to the waiting program. When you haven't typed anything on the line yet, 0 bytes are sent, and a read that returns 0 bytes has by convention encountered end-of-file.
The main difference is that internally in Windows the text end of file marker is data, that can occur within a file, while internally in Unix it's lack of data, which can't occur within a file. Of course Windows also supports ordinary end of file (no more data!) for text. Which complicates things – Windows is just more complicated.
#include <iostream>
using namespace std;
int main()
{
char ch;
while(cin >> ch) {
cout << 0+ch << " '" << ch << "'" << endl;
}
}
This is caused by cin >> ^Z will evaluate to false.
More detailed: cin.eof() will return true on that, so that
the while, which implicitly calls eof() will return false
and therefore end the loop.
If you input 12^Z, eof() will return false, as it can parse
a valid inputvalue, hence it will not stop the loop.
You might be interested in this SO also:
SO on semantics of flags