C++ istream operator>> bad-data handling - c++

Every time I ask a question here on SO, it turns out to be some very dumb mistake (check my history if you don't believe me), so bear with me if you can here.
It feels like my question should be very popular, but I couldn't find anything about it and I've run out of ideas to try.
Anyway, without further ado:
I'm trying to overload the input operator>>. It's supposed to read one integer at a time from a file, skipping invalid data such as chars, floats, etc.
Naturally, I'm checking if(in >> inNum) to both get() the next token and check for successful get().
If successful, not much to say there.
If it fails, however, I assume that one of two things happened:
It stumbled upon a non-integer
It reached the eof
Here's how I tried to deal with it:
istream& operator>> (istream& in, SortSetArray& setB) {
bool eof = false;
int inNum = -1;
while(!eof) {
if(in >> inNum) {
cout << "DEBUG SUCCESS: inNum = " << inNum << endl;
setB.insert(inNum);
}
else {
// check eof, using peek()
// 1. clear all flags since peek() returns eof regardless of what
// flag is raised, even if it's not `eof`
in.clear();
cout << "DEBUG FAIL: inNum = " << inNum << endl;
// 2. then check eof with peek()
eof = (in.peek() == std::char_traits<char>::eof());
}
}
return in;
}
The file contains [1 2 3 4 a 5 6 7], and the program naturally goes into infinite loop.
Okay, easy guess, peek() doesn't consume the char 'a', and maybe in >> inNum also failed to consume it somehow. No biggie, I'll just try something that does.
And that's pretty much where I've been for the last 2 hours. I tried istream::ignore(), istream::get(), ios::rdstate to check eof, double and string instead of char in the file, just in case char is read numerically.
Nothing works and I'm desperate.
Weirdly enough, the approach above worked for a previous program where I had to read a triplet of data entries on a line of the format: string int int
The only difference is I used an ifstream object for that one, and an istream object for this one.
Bonus Question: inNum has the value of 0 when the hiccup occurs. I'm guessing it's something that istream::operator>> does?

Implementation description
try to read an int
if successful;
insert the read value to setB
next iteration
else;
clear error flags
check so that we haven't reached the end of the file
still more data? next iteration.
The above is the logic description of your function, but there's something missing...
In case we try to read a value, but fail, std::istream's handle these cases by setting the approriate error flags, but it will not discard any data.
The problem with your implementation is that upon trying to read invalid data, you will just try to read the same invalid data again.. over, and over, and over, inf.
Solution
After clearing the error flags you can use std::istream::ignore to discard any data from the stream.
The function's 1st argument is the max number of potential chars to ignore, and the 2nd is the "if you hit this char, don't ignore any more*.
Let's ignore the maximum amount of characters, or until we hit ' ' (space):
#include <limits> // std::numeric_limits
in.ignore (std::numeric_limits<std::streamsize>::max(), ' ');

Related

does the this stl operator >> function make magic happens?

I have a weird problem when I test C++ STL features.
If I uncomment the line if(eee), my while loop never exits.
I'm using vs2015 under 64-bit Windows.
int i = 0;
istream& mystream = data.getline(mycharstr,128);
size_t mycount = data.gcount();
string str(mycharstr,mycharstr+mycount);
istringstream myinput(str);
WORD myfunclist[9] = {0};
for_each(myfunclist,myfunclist+9, [](WORD& i){ i = UINT_MAX;});
CALLEESET callee_set;
callee_set.clear();
bool failbit = myinput.fail();
bool eof = myinput.eof();
while (!failbit && !eof)
{
int eee = myinput.peek();
if (EOF == eee) break;
//if (eee) // if i uncomment this line ,the failbit and eof will always be false,so the loop will never exit.
{
myinput >> myfunclist[i++];
}
//else break;
failbit = myinput.fail();
eof = myinput.eof();
cout << myinput.rdstate() << endl;
}
I think that
int eee = myinput.peek();
at some point returns zero.
Then due to
if (eee)
you stop reading from the stream and never reach EOF.
Try to do
if (eee >= 0)
instead
As an alternative you could do:
if (eee < 0)
{
break;
}
// No need for further check of eee - just do the read
myinput >> myfunclist[i++];
The root cause of your problem is a misunderstanding about the way streams set their flags: fail() and eof() are only set once a reading operation fails or tried to read after the last byte was reached.
In other words, with C++ streams you may perfectly have read the last byte of your input and be at the end of file, yet eof() will stay false until you try to read more. You will find on StackOverflow many questions and answers about why you should not loop on eof in a C++ stream.
Consequences:
You will always enter into the loop, even if there is no character to read in myinput.
You therefore have to check for the special case of peek() returning EOF.
If you're still in the loop after the peek, then there are still characters to read. Keep in mind that peek() does not consume the characters. If you do not read it in a proper way, you stay at the same position in the stream. So if for any reason you do no reach myinput >> myfunclist[i++];, you're stuck in an endless loop, constantly peeking the same character over and over again. This is the 0 case that is well described in 4386427's answer : it's still there and you do not progress in the stream.
Other comments:
since your input can be 128 bytes long, and you read integers in text encoding, you could have evil input with 64 different words, causing your loop to go out ov bounds and cause for example memory corruption.
It is not clear why at all you try to peek.
I'd suggest to forget about the flags, use the usual stream reading idiom and simplify the code to:
...
callee_set.clear(); // until there, no change
while (i<9 && myinput >> myfunclist[i++])
{
cout << myinput.rdstate() << endl; // if you really want to know ;-)
}

Differences between eof and fail

I know, there was hundreds questions about fail and eof, but no one was able to answered my question.
In this example, only eof bit will be set:
while (!Ifstream.eof()){
Ifstream >> Buffer;
cout << Buffer << endl;
}
But in this example, eof bit as well as fail bit will be set:
while (!Ifstream.fail()){
Ifstream >> Buffer;
cout << Buffer << endl;
}
What is the reason of this two differences? .I consider only situation, when stream reach end of file.
The difference is very slight.
The first piece of code, tries reading as long as it doesn't hit EOF condition. Then it breaks. But, if for some reason an error occurrs (i.e. failure to convert data through << operator), this piece of code will loop indefinitely, since FAIL will be set on error, reading will stop, and EOF will never be hit.
The second piece of code works with a small trick. It reads as long as it can and stops when error occurs. Ok, logical. However, when hittin an end-of-file, but IIRC the EOF condition will not be set yet. So, the loop will run once more, try to read at EOF state, and that will rise the FAIL flag and break the loop. But that also means that you will get one processing ( cout<
The right way to do is to check immediatelly whether READING succeeded:
while (true){
if(!(Ifstream >> Buffer))
break;
cout << Buffer << endl;
}
only that will guarantee you that the loop will stop as soon as read fails, be it EOF or FAIL or other cause.
As MatsPetersson and πάντα ῥεῖ have suggested, the above snippet may be "squashed" to form:
while (Ifstream >> Buffer)
cout << Buffer << endl;
and that's the usual way to write it. No fail/good/eof checks needed. The value returned from operator>> is the stream itself (like in operator<<) and stream is testable for "truthness", so it's just as simple as that. When talking about fail/good/eof differences, I like to write the longer form to emphasize that .. you shouldn't use fail/good/eof in the while-condition at all, and that you should check the state of the stream only after actually trying to read from it. while(true) shows the point very clearly.
fail is different from eof in that it covers various other error conditions than "file reached its end".
For example, if Buffer is int Buffer then the second will stop on reading ABC, where the first one will loop forever (not making any progress, as ABC is not numeric input).
Of course, the CORRECT thing to do is:
while(Ifstream >> Buffer)
{
cout << Buffer << endl;
}
that will stop both on EOF and invalid input (if applicable), as well as not performing the cout << Buffer << endl; when the fail or eof condition happens.
[Note that the while(!eof()) solution is valid in for example Pascal, because in Pascal, the input is "pre-read", so that the current read knows if "the next read will result in EOF" before you actually TRY to read it. C and C++ doesn't mark EOF until you actually READ past the end of the file.
Programically,
'EOF in read' and 'fail of read' is described differently.
EOF indicates End Of File.
So, programmer knows when they have to stop reading file.
But 'fail' is indicates 'not successfully'
It means some process ends with wrong state or exception has been occurred when execute the process.

C++ Read in file with only numbers (doubles)

I'm trying to read in a file that should contain only numbers in it. I can successfully read in the entire file if it meets that criteria, but if it so happened to have a letter in it, I need to return false with an error statement.
The problem is I'm finding it hard for my program to error when it finds this character. It can find it no problem, but when it does, it decides to just skip over it.
My code to read in the file and attempt to read in only numbers:
bool compute::Read (ifstream& stream)
{
double value;
string line;
int lineNumber = 1;
if (stream)
{
while (getline(stream, line))
{
lineNumber++;
istringstream strStream(line);
while (strStream >> value)
{
cout << value << endl;
}
}
}
return true;
}
The input file which I use for this is
70.5 61.2 A8 10.2
2
Notice that there is a non-number character in my input file. It should fail and return false at that point.
Currently, all it does is once it hits the "A", it simply returns to the next line, continuing the getline while loop.
Any help with this would be much appreciated.
The stringstream does catch those errors, but you're doing nothing to stop the enclosing loop from continuing when an error is found. You need to tailor your main loop so that it stops when the stringstream finds an error, which you can't do if the stringstream is being reconstructed on each iteration. You should create a for() loop instead and construct the stringstream in the declaration part. And the condition to the loop should be "as long as the stringstream and stream do not catch an error". For example:
for (std::istringstream iss; iss && std::getline(stream, line);)
{
iss.clear();
iss.str(line);
while (iss >> value)
{
std::cout << value << '\n';
}
}
Futhermore, it doesn't look like you need to use std::getline() or std::istringstream if you just want to print each value. Just do:
while (stream >> value) {
std::cout << value << '\n';
}
The above will stop when it finds an invalid character for a double.
You need the code to stop streaming but return false if it hasn't yet reached the end of the "input".
One way, possibly not the most efficient but still one way, to do that is parse a word at a time.
If you read first into a std::string and if it works (so the string is not empty) create an istringstream from that string, or reuse an existing one, and try streaming that into a double value.
If that fails, you have an invalid character.
Of course you can read a line at a time from the file, then split that into words, so that you can output a meaningful error message showing what line the bad text was found.
The issue of reading straight into doubles is that the stream will fail when it reaches end of file.
However it is possible to workaround that too because the reason for failing has an error status which you can check, i.e. you can check if it eofbit is set. Although the f in eofbit stands for "file" it applies to any stream not just files.
Although this method may sound better than reading words into a string first, I prefer that method in normal circumstances because you want to be able to report the error so you'll want to print in the error what was read.

Simple C++ not reading EOF

I'm having a hard time understanding why while (cin.get(Ch)) doesn't see the EOF. I read in a text file with 3 words, and when I debug my WordCount is at 3 (just what I hoped for). Then it goes back to the while loop and gets stuck. Ch then has no value. I thought that after the newline it would read the EOF and break out. I am not allowed to use <fstream>, I have to use redirection in DOS. Thank you so much.
#include <iostream>
using namespace std;
int main()
{
char Ch = ' ';
int WordCount = 0;
int LetterCount = 0;
cout << "(Reading file...)" << endl;
while (cin.get(Ch))
{
if ((Ch == '\n') || (Ch == ' '))
{
++WordCount;
LetterCount = 0;
}
else
++LetterCount;
}
cout << "Number of words => " << WordCount << endl;
return 0;
}
while (cin >> Ch)
{ // we get in here if, and only if, the >> was successful
if ((Ch == '\n') || (Ch == ' '))
{
++WordCount;
LetterCount = 0;
}
else
++LetterCount;
}
That's the safe, and common, way to rewrite your code safely and with minimal changes.
(Your code is unusual, trying to scan all characters and count whitespace and newlines. I'll give a more general answer to a slightly different question - how to read in all the words.)
The safest way to check if a stream is finished if if(stream). Beware of if(stream.good()) - it doesn't always work as expected and will sometimes quit too early. The last >> into a char will not take us to EOF, but the last >> into an int or string will take us to EOF. This inconsistency can be confusing. Therefore, it is not correct to use good(), or any other test that tests EOF.
string word;
while(cin >> word) {
++word_count;
}
There is an important difference between if(cin) and if(cin.good()). The former is the operator bool conversion. Usually, in this context, you want to test:
"did the last extraction operation succeed or fail?"
This is not the same as:
"are we now at EOF?"
After the last word has been read by cin >> word, the string is at EOF. But the word is still valid and contains the last word.
TLDR: The eof bit is not important. The bad bit is. This tells us that the last extraction was a failure.
The Counting
The program counts newline and space characters as words. In your file contents "this if fun!" I see two spaces and no newline. This is consistent with the observed output indicating two words.
Have you tried looking at your file with a hex editor or something similar to be sure of the exact contents?
You could also change your program to count one more word if the last character read in the loop was a letter. This way you don't have to have newline terminated input files.
Loop Termination
I have no explanation for your loop termination issues. The while-condition looks fine to me. istream::get(char&) returns a stream reference. In a while-condition, depending on the C++ level your compiler implements, operator bool or operator void* will be applied to the reference to indicate if further reading is possible.
Idiom
The standard idiom for reading from a stream is
char c = 0;
while( cin >> c )
process(c);
I do not deviate from it without serious reason.
you input file is
this is fun!{EOF}
two spaces make WordCount increase to 2
and then EOF, exit loop! if you add a new line, you input file is
this is fun!\n{EOF}
I took your program loaded it in to visual studio 2013, changed cin to an fstream object that opened a file called stuff.txt which contains the exact characters "This is fun!/n/r" and the program worked. As previous answers have indicated, be careful because if there's not a /n at the end of the text the program will miss the last word. However, I wasn't able to replicate the application hanging in an infinite loop. The code as written looks correct to me.
cin.get(char) returns a reference to an istream object which then has it's operator bool() called which returns false when any of the error bits are set. There are some better ways to write this code to deal with other error conditions... but this code works for me.
In your case, the correct way to bail out of the loop is:
while (cin.good()) {
char Ch = cin.get();
if (cin.good()) {
// do something with Ch
}
}
That said, there are probably better ways to do what you're trying to do.

How does ifstream's eof() work?

#include <iostream>
#include <fstream>
int main() {
std::fstream inf( "ex.txt", std::ios::in );
while( !inf.eof() ) {
std::cout << inf.get() << "\n";
}
inf.close();
inf.clear();
inf.open( "ex.txt", std::ios::in );
char c;
while( inf >> c ) {
std::cout << c << "\n";
}
return 0;
}
I'm really confused about eof() function. Suppose that my ex.txt's content was:
abc
It always reads an extra character and shows -1 when reading using eof(). But the inf >> c gave the correct output which was 'abc'? Can anyone help me explain this?
-1 is get's way of saying you've reached the end of file. Compare it using the std::char_traits<char>::eof() (or std::istream::traits_type::eof()) - avoid -1, it's a magic number. (Although the other one is a bit verbose - you can always just call istream::eof)
The EOF flag is only set once a read tries to read past the end of the file. If I have a 3 byte file, and I only read 3 bytes, EOF is false, because I've not tried to read past the end of the file yet. While this seems confusing for files, which typically know their size, EOF is not known until a read is attempted on some devices, such as pipes and network sockets.
The second example works as inf >> foo will always return inf, with the side effect of attempt to read something and store it in foo. inf, in an if or while, will evaluate to true if the file is "good": no errors, no EOF. Thus, when a read fails, inf evaulates to false, and your loop properly aborts. However, take this common error:
while(!inf.eof()) // EOF is false here
{
inf >> x; // read fails, EOF becomes true, x is not set
// use x // we use x, despite our read failing.
}
However, this:
while(inf >> x) // Attempt read into x, return false if it fails
{
// will only be entered if read succeeded.
}
Which is what we want.
The EOF flag is only set after a read operation attempts to read past the end of the file. get() is returning the symbolic constant traits::eof() (which just happens to equal -1) because it reached the end of the file and could not read any more data, and only at that point will eof() be true. If you want to check for this condition, you can do something like the following:
int ch;
while ((ch = inf.get()) != EOF) {
std::cout << static_cast<char>(ch) << "\n";
}
iostream doesn't know it's at the end of the file until it tries to read that first character past the end of the file.
The sample code at cplusplus.com says to do it like this: (But you shouldn't actually do it this way)
while (is.good()) // loop while extraction from file is possible
{
c = is.get(); // get character from file
if (is.good())
cout << c;
}
A better idiom is to move the read into the loop condition, like so:
(You can do this with all istream read operations that return *this, including the >> operator)
char c;
while(is.get(c))
cout << c;
eof() checks the eofbit in the stream state.
On each read operation, if the position is at the end of stream and more data has to be read, eofbit is set to true. Therefore you're going to get an extra character before you get eofbit=1.
The correct way is to check whether the eof was reached (or, whether the read operation succeeded) after the reading operation. This is what your second version does - you do a read operation, and then use the resulting stream object reference (which >> returns) as a boolean value, which results in check for fail().