Is it always safe to use std::istream::peek()? - c++

I usually teach my students that the safe way to tackle file input is:
while (true) {
// Try to read
if (/* failure check */) {
break;
}
// Use what you read
}
This saved me and many people from the classical and most of the time wrong:
while (!is.eof()) {
// Try to read
// Use what you read
}
But people really like this form of looping, so it has become common to see this in student code:
while (is.peek()!=EOF) { // <-- I know this is not C++ style, but this is how it is usually written
// Try to read
// Use what you read
}
Now the question is: is there a problem with this code? Are there corner cases in which things don't work exactly as expected? Ok, it's two questions.
EDIT FOR ADDITIONAL DETAILS: during exams you sometimes guarantee the students that the file will be correctly formatted, so they don't need to do all the checks and just need to verify if there's more data. And most of the time we deal with binary formats, which allow you to not worry about whitespace at all (because the data is all meaningful).
While the accepted answer is totally clear and correct, I'd still like someone to try to comment on the joint behavior of peek() and unget().
The unget() stuff came to my mind because I once observed (I believe it was on Windows) that by peeking at the 4096 internal buffer limit (so effectively causing a new buffer to be loaded), ungetting the previous byte (last of the previous buffer) failed. But I can be wrong. So that was my additional doubt: something known I missed, which maybe is well coded in the standard or in some library implementations.

is.peek()!=EOF tells you whether there are still characters left in the input stream, but it doesn't tell you whether your next read will succeed:
while (is.peek()!=EOF) {
int a;
is >> a;
// Still need to test `is` to verify that the read succeeded
}
is >> a could fail for a number of reasons, e.g. the input might not actually be a number.
So there is no point to this if you could instead do
int a;
while (is >> a) { // reads until failure of any kind
// use `a`
}
or, maybe better:
for (int a; is >> a;) { // reads until failure of any kind
// use `a`
}
or your first example, in which case the is.peek()!=EOF in the loop will become redundant.
This is assuming you want the loop to exit on every failure, following your first code example, not only on end-of-file.

Related

A standard loop for reading from a text file C++

I'm learning about how to work with files in C++ and as a beginner I've got some doubts that I would like to clarify :
In my book the author introduces the stream states and writes this simple piece of code to show how to read until we reach end of file or a terminator :
// somewhere make ist throw if it goes bad :
void fill_vector(istream& ist, vector<int>& v, char terminator)
{
ist.exceptions(ist.exceptions() | ios_base::badbit);
for (int i; ist >> i;) v.push_back(i);
if (ist.eof()) return; // fine: we found end of file
// not good() not bad() and not eof(), it must be fail()
ist.clear();
char c;
ist >> c; // read a character, hopefully terminator
if (c != terminator) { // not the terminator, so we must fail
ist.unget(); // maybe my caller can use that character
ist.clear(ios_base::failbit);
}
}
This was a first example, which provides a useful method to read data, but I'm having some issues with the second example where the author says :
Often, we want to check our read as we go along, this is the general strategy assuming that ist is an 'istream':
for (My_type var; ist >> var;) { // read until end of file
// maybe check that var is valid
// do something with var
}
if (ist.fail()) {
ist.clear();
char ch;
// the error function is created into the book :
if (!(ist >> ch && ch == '|')) error("Bad termination of input\n");
}
// carry on : we found end of file or terminator
If we don't want to accept a terminator-that is, to accept only the end o file as the end- we simply delete the test before the call of error().
Here's my doubt : In the first example we basically check for every possible state of the istream to be sure that the reading terminated as we wanted to, and that's ok. But I have problems in understanding the second example :
What does the author means when he says to remove the test before the call of error ?
Is it possible to avoid triggering both eof and fail when reading ? If yes, how ?
I'm really confused and I can't understand the example because from the test that I've done the failbit will always be set after eofbit, so what's the sense of checking for failbit if It will always be triggered? Why is the author doing that
What would happen to the code if I remove the test before the call of error as the author says ? Wouldn't that be useless as I would only be checking for the bad state of the stream ?
I think I see what you mean. No, it's not really useless, because you would tell someone (I don't know what error actually does), the programmer (exception) or the user (standard output), that the data had some invalid data, and someone has to act accordingly.
It may be useless, but that depends on what you want the function to do, if for example you want it to just silently ignore the error and use the correct data already processed, it really is useless.
How can I read data from a file until I just reach the end of that file without using any other terminator ?
I can't see what you mean, you are already doing that in both examples:
if (ist.eof()) return; // fine: we found end of file
and
if (ist.fail()) { //If 'ist' didn't fail (reaching eof is not a failure), just skip 'if'

Output info from 2 struct arrays into one file

I apologize if this doesn't make sense. I Am not sure what to google.
Lets say I have two arrays
string a_1[16];
string a_2[20];
I need to output these to a file with a function, first, a_1[0] to a_1[n].
Then reads in the a_2's.
It's also possible to run the function again to add in more a_1's and a_2's to the output file.
so the format will be:
//output_file.txt
a_1[0].....a_1[n]
a_2[0].....a_2[M]
a_1[n+1]...a_1[16]
a_2[M+1]...a_2[20]
my question is. Is there a way to read output_file.txt back so that it will read in all of the a_1's to be in order, a_1[0] to a_1[16].
and then input a_2[0] to a_2[20].
maybe just put "something" between each group so that when "something" is read, it knows to stop reading a_1's and switch to reading in for a_2....
What the OP calls "Something" is typically called a Sentinel or Canary value. To be used as a sentinel, you have to find a pattern that cannot exist in the data stream. This is hard because pretty much anything can be in a string. If you use, say, "XxXxXx" as your sentinel, then you have to be very careful that it is never written to the file.
The concept of Escape Characters (Look it up) can be used here, but a better approach could be to store a count of stored strings at the beginning of the file. Consider an output file that looks like
4
string a1_1
string a1_2
string a1_3
string a1_4
2
string a2_1
string a2_2
Read the cont, four, and then read count strings, then read for the next count and then read count more strings
OK, so you're thinking his sucks. I can't just insert a new string into a1 without also changing the number at the front of the file.
Well, good luck with inserting data into the middle of a file without totally smurfing up the file. It can be done, but only after moving everything after the insertion over by the size of the insertion, and that's not as trivial as it sounds. At the point in a programming career where this is the sort of task to which you are assigned, and you have to ask for help, you are pretty much doomed to reading the file into memory, inserting the new values, and writing the file back out again, so just go with it.
So what does this look like in code? First we ditch the arrays in favour of std::vector. Vectors are smart. They grow to fit. They know how much stuff is in them. They look after themselves so there is no unnecessary new and delete nonsense. You gotta be stupid not to use them.
Reading:
std::ifstream infile(file name);
std::vector<std::string> input;
int count;
if (infile >> count)
{
infile.ignore(); // discard end of line
std::string line;
while (input.size() < count && getline(infile, line))
{
input.push_back(line);
}
if (input.size() != count)
{
//handle bad file
}
}
else
{
// handle bad file
}
and writing
std::ofstream outfile(file name);
if(outfile << output.size())
{
for (std::string & out: output)
{
if (!outfile << out << '\n')
{
// handle write error
}
}
}
else
{
// handle write error
}
But this looks like homework, so OP's probably not allowed to use one. In that case, the logic is the same, but you have to
std::unique_ptr<std::string[]> inarray(new std::string[count]);
or
std::string * inarray = new std::string[count];
to allocate storage for the string you are reading in. The second one looks like less work than the first. Looks are deceiving. The first one looks after your memory for you. The second requires at least one delete[] in your code at the right pace to put the memory away. Miss it and you have a memory leak.
You also need to have a variable keeping track of the size of the array, because pointers don't know how big whatever they are pointing at is. This makes the write for loop less convenient.

what will happen if the input stream is invalid

What will happen if the input stream is invalid? For example, as follows:
int main()
{
int value;
while(!(cin>>value).eof());
}
If the entered sequence is : 1 2 3 q 4 5, the while will fall into endless loop when the cin scans 'q' and the value continues to be 3.
My questions are:
1. Why can't cin ignore 'q' and proceed to scan 4?
2. What's the underlying implementation fo input stream? Are there any materials I can refer to ?
Thank you!
Why can't cin ignore 'q' and proceed to scan 4?
You can if you want to. You could get that effect with the following:
int value;
while(std::cin>>value);
if (!std::cin)
{
std::cin.clear(); // clear error state
std::cin.ignore(); // ignore the q
while(std::cin>>value); // read rest of values until 5
}
std::cin >> value just does not do that by default, as the behavior desired is different depending on the program. For many people it would be undesirable for std::cin to ignore a read failure and keep scanning. The default behavior allows you the programmer to decide what to do on failure.
Also, note that eof() is for checking for end of file. You should not use it to check if a read was successful or not. The common idiom would be:
while(std::cin>>value)
{
// Do stuff
}
What's the underlying implementation fo input stream? Are there any materials I can refer to ?
std::cin is a global static object and is defined like so:
extern std::istream cin;
In other words it is an instance of std::basic_istream<char> (std::istream is a typedef for it).
If you would like more information, here are some references:
http://en.cppreference.com/w/cpp/io/cin
https://github.com/cplusplus/draft
However, most likely you would benefit from a good C++ book.
If you want to get into deep iostreams, I also recommend these articles.

Why read file first then check?

I'm just revising for my exams and can't get my head around the following provided by our lecturer:
When opening fstreams, check if you have opened or not
Then read before check for input_file.fail()
If you check before read, you may end up with an
extra unwanted input
It doesn't make sense to me to read first, shouldn't you check first?
If anyone is able to explain, I would be very grateful :)
input_file.fail() determines if any preceding operations have failed, not whether the upcoming operation is going to fail. Consequently, if you write this:
if (!input_file.fail()) {
int value;
input_file >> value;
/* ... process value ... */
}
Then after reading value, you have no idea whatsoever whether you actually read anything successfully or not. All you know is that right before you did the read, everything was working correctly. It's quite possible that you failed to read an integer, either because you hit the end of the file, or the data in the file wasn't an integer.
On the other hand, if you write
int value;
input_file >> value;
if (!input_file.fail()) {
/* ... process value ... */
}
Then you attempt to do a read. If it succeeds, you then process the value you've read. If not, you can then react to the fact that the last operation failed.
(You can be even cuter than this:
int value;
if (input_file >> value) {
/* ... process value ... */
}
which combines the read and test operations into one. It's much clearer here that you're confirming that the read succeeded.)
If you're doing reads in a loop, a very clean way to do this is
for (int value; input_file >> value; ) {
/* ... process value ... */
}
This makes clear that you loop while you're able to keep reading values from the file.
Hope this helps!

C++: std::istream check for EOF without reading / consuming tokens / using operator>>

I would like to test if a std::istream has reached the end without reading from it.
I know that I can check for EOF like this:
if (is >> something)
but this has a series of problems. Imagine there are many, possibly virtual, methods/functions which expect std::istream& passed as an argument.
This would mean I have to do the "housework" of checking for EOF in each of them, possibly with different type of something variable, or create some weird wrapper which would handle the scenario of calling the input methods.
All I need to do is:
if (!IsEof(is)) Input(is);
the method IsEof should guarantee that the stream is not changed for reading, so that the above line is equivalent to:
Input(is)
as regards the data read in the Input method.
If there is no generic solution which would word for and std::istream, is there any way to do this for std::ifstream or cin?
EDIT:
In other words, the following assert should always pass:
while (!IsEof(is)) {
int something;
assert(is >> something);
}
The istream class has an eof bit that can be checked by using the is.eof() member.
Edit: So you want to see if the next character is the EOF marker without removing it from the stream? if (is.peek() == EOF) is probably what you want then. See the documentation for istream::peek
That's impossible. How is the IsEof function supposed to know that the next item you intend to read is an int?
Should the following also not trigger any asserts?
while(!IsEof(in))
{
int x;
double y;
if( rand() % 2 == 0 )
{
assert(in >> x);
} else {
assert(in >> y);
}
}
That said, you can use the exceptions method to keep the "house-keeping' in one place.
Instead of
if(IsEof(is)) Input(is)
try
is.exceptions( ifstream::eofbit /* | ifstream::failbit etc. if you like */ )
try {
Input(is);
} catch(const ifstream::failure& ) {
}
It doesn't stop you from reading before it's "too late", but it does obviate the need to have if(is >> x) if(is >> y) etc. in all the functions.
Normally,
if (std::is)
{
}
is enough. There is also .good(), .bad(), .fail() for more exact information
Here is a reference link: http://www.cplusplus.com/reference/iostream/istream/
There are good reasons for which there is no isEof function: it is hard to specify in an usable way. For instance, operator>> usually begin by skipping white spaces (depending on a flag) while some other input functions are able to read space. How would you isEof() handle the situation? Begin by skipping spaces or not? Would it depend on the flag used by operator>> or not? Would it restore the white spaces in the stream or not?
My advice is use the standard idiom and characterize input failure instead of trying to predict only one cause of them: you'd still need to characterize and handle the others.
No, in the general case there is no way of knowing if the next read operation will reach eof.
If the stream is connected to a keyboard, the EOF condition is that I will type Ctrl+Z/Ctrl+D at the next prompt. How would IsEof(is) detect that?