How Can I Detect That a Binary File Has Been Completely Consumed? - c++

If I do this:
ofstream ouput("foo.txt");
output << 13;
output.close();
ifstream input("foo.txt");
int dummy;
input >> dummy;
cout << input.good() << endl;
I'll get the result: "0"
However if I do this:
ofstream ouput("foo.txt", ios_base::binary);
auto dummy = 13;
output.write(reinterpret_cast<const char*>(&dummy), sizeof(dummy));
output.close();
ifstream input("foo.txt", ios_base::binary);
input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy));
cout << input.good() << endl;
I'll get the result: "1"
This is frustrating to me. Do I have to resort to inspecting the ifstream's buffer to determine whether it has been entirely consumed?

Regarding
How Can I Detect That a Binary File Has Been Completely Consumed?
A slightly inefficient but easy to understand way is to measure the size of the file:
ifstream input("foo.txt", ios_base::binary);
input.seekg(0, ios_base::end); // go to end of the file
auto filesize = input.tellg(); // current position is the size of the file
input.seekg(0, ios_base::beg); // go back to the beginning of the file
Then check current position whenever you want:
if (input.tellg() == filesize)
cout << "The file was consumed";
else
cout << "Some stuff left in the file";
This way has some disadvantages:
Not efficient - goes back and forth in the file
Doesn't work with special files (e.g. pipes)
Doesn't work if the file is changed (e.g. you open your file in read-write mode)
Only works for binary files (seems your case, so OK), not text files
So better just use the regular way people do it, that is, try to read and bail if it fails:
if (input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)))
cout << "I have read the stuff, will work on it now";
else
cout << "No stuff in file";
Or (in a loop)
while (input.read(reinterpret_cast<char*>(&dummy), sizeof(dummy)))
{
cout << "Working on your stuff now...";
}

You are doing totally different things.
The operator>> is greedy and will read as much as possible into dummy. It so happens that while doing so, it runs into the end of file. That sets the input.eof(), and the stream is no longer good(). As it did find some digits before the end, the operation is still successful.
In the second read, you ask for a specific number of bytes (4 most likely) and the read is successful. So the stream is still good().
The stream interface doesn't predict the outcome of any future I/O, because in the general case it cannot know. If you use cin instead of input there might now be more to read, if the user continued typing.
Specifically, the eof() state doesn't appear until someone tries to read past end-of-file.

For text streams, as you have written only the integer value and not even a space not an end of line, at read time, the library must try to read one character passed the 1 and 3 and hits the end of file. So the good bit is false and the eof is true.
For binary streams, you have written 4 bytes (sizeof(int)) assuming ints are 32 bits large, and you read 4 bytes. Fine. No problem has still occured and the good bit is true and eof false. Only next read will hit the end of file.
But beware. In text example, if you open the text file in a editor and simply save it without changing anything, chances are that the editor automacally adds an end of line. In that case, the read will stop on the end of line and as for the binary case the good bit will be true and eof false. Same is you write with output << 13 << std::endl;
All that means that you must never assume that a read is not the last element of a file when good it true and eof is false, because the end of file may be hit only on next read even if nothing is returned then.
TL/DR: the only foolproof way to know that there is nothing left in a file is when you are no longer able to read something from it.

You do not need to resort to inspecting the buffer. You can determine if the whole file has been consumed: cout << (input.peek() != char_traits<char>::eof()) << endl This uses: peek, which:
Reads the next character from the input stream without extracting it
good in the case of the example is:
Returning false after the last extraction operation, which occurs because the int extraction operator has to read until it finds a character that is not a digit. In this case that's the EOF character, and when that character is read even as a delimiter the stream's eofbit is set, causing good to fail
Returning true after calling read, because read extracts exactly sizeof(int)-bytes so even if the EOF character is the next character it is not read, leaving the stream's eofbit unset and good passing
peek can be used after either of these and will correctly return char_traits<char>::eof() in both cases. Effectively this is inspecting the buffer for you, but with one vital distinction for binary files: If you were to inspect a binary file yourself you'd find that it may contain the EOF character. (On most systems that's defined as 0xFF, 4 of which are in the binary representation of -1.) If you are inspecting the buffer's next char you won't know whether that's actually the end of the file or not.
peek doesn't just return a char though, it returns an int_type. If peek returns 0x000000FF then you're looking at an EOF character, but not the end of file. If peek returns char_traits<char>::eof() (typically 0xFFFFFFFF) then you're looking at the end of the file.

Related

C++ open() not working for any apparent reason

ifstream infile;
infile.open("BONUS.txt");
string info;
if (!infile)
cout << "File Open Failure" << endl;
else
{
while (infile >> info)
cout << info << endl;
infile.close();
}
This is my code. And no matter what I do, my file always fails to open. It enters the if and exits. What could possibly be the problem? My text file is saved in the correct directory and nothing seems to be wrong with it.
There are two parameters in open(), file to be opened and mode. The mode refers to what you can do with that file, i.e. write to, read from, etc.
There are six possible modes when using open():
Parameter in stands for input. The internal stream buffer enables input. (Use for reading the file.)
Parameter out stands for output. The same internal buffer enables output. (Use for writing to the file.)
Parameter binary allows all operations to be done in binary, instead of text.
Parameter ate stands for at end and begins output at the end of the file.
Parameter app stands for append and output events happen at the end of the file.
Parameter trunc stands for truncate. All contents in existence before it is opened are deleted.
It seems that you want to write to the file, in which case use out.
ifstream infile;
infile.open("BONUS.txt", out);
If you are not using the correct mode, the function will fail. If you have any more questions, Google fstream::open().

Calling unget when EOF is triggered

I am reading characters from an ifstream, if those characters don't match a certain criteria, then I unget() a number of times equal to those characters. This all works fine up until I get to the end of the file. Then if I try to unget(), the good bit is set to 0.
This means no characters are unget and when I read again I get blank characters. How can I unget without properly if I reach EOF?
Thanks :)
EDIT:
I have attempted to replace my unget() with putback(), however no I am encountering the error that some rather than putting back characters it seems to be overwriting characters in the ifstream. This is the code I am using:
if (substr.length() > 0)
{
for (int i = substr.length()-1; i >= 0; --i)
{
std::cout << "at " << i << ": " << substr.at(i) << " ";
infile.putback(substr.at(i));
}
}
Where substr is the string to putback to the stream. I put it back in reverse order, so it is read in the correct order next time.
If the stream has the following string up next ", r1", and I am putting " #1" back to the stream. After the putback, I expect it to read the next 5 characters as " #1, ", however it is reading it as "#r1"... Somewhere the "1, " is being overwritten.
EDIT/SOLVED:
Rather than using unget() or putback(), I instead used seekg() and tellg(). I store the starting position with tellg(), then read however many characters, find the matching string and seek back to the start position + the length of the matching string.
First of all, you need to be sure that no iostate bits are set (fail, bad or eof. In C++11 eof bit will be cleared automatically) before calling unget.
Then check stram state after operation. If failbit is set, then you forgot to clear stream state before calling unget. If badbit is set, then there is an internal error. Possibly stream cannot back any more.
You are not guaranteed that you can unget even one character, unlike putback, where putting back at least one character should be supported by the library.

Difference between while(!file.eof()) and while(file >> variable)

First things first - I've got a text file in which there are binary numbers, one number for each row. I'm trying to read them and sum them up in a C++ program. I've written a function which transforms them to decimal and adds them after that and I know for sure that function's ok. And here's my problem - for these two different ways of reading a text file, I get different results (and only one of these results is right) [my function is decimal()]:
ifstream file;
file.open("sample.txt");
int sum = 0;
string BinaryNumber;
while (!file.eof()){
file >> BinaryNumber;
sum+=decimal(BinaryNumber);
}
and that way my sum is too large, but by a small quantity.
ifstream file;
file.open("sample.txt");
int sum = 0;
string BinaryNumber;
while (file >> BinaryNumber){
sum+=decimal(BinaryNumber);
}
and this way gives me the the right sum. After some testing I came to a conclusion that the while loop with eof() is making one more iteration than the other while loop. So my question is - what is the difference between those two ways of reading from a text file? Why the first while loop gives me the wrong result and what may be this extra iteration that it's doing?
The difference is that >> reads the data first, and then tells you whether it has been a success or not, while file.eof() does the check prior to the reading. That is why you get an extra read with the file.eof() approach, and that read is invalid.
You can modify the file.eof() code to make it work by moving the check to a place after the read, like this:
// This code has a problem, too!
while (true) { // We do not know if it's EOF until we try to read
file >> BinaryNumber; // Try reading first
if (file.eof()) { // Now it's OK to check for EOF
break; // We're at the end of file - exit the loop
}
sum+=decimal(BinaryNumber);
}
However, this code would break if there is no delimiter following the last data entry. So your second approach (i.e. checking the result of >>) is the correct one.
EDIT: This post was edited in response to this comment.
When using file.eof() to test the input, the last input probably fails and the value stays unchanged and is, thus, processed twice: when reading a string, the stream first skips leading whitespace and then reads characters until it finds a space. Assuming the last value is followed by a newline, the stream hasn't touched EOF, yet, i.e., file.eof() isn't true but reading a string fails because there are no non-whitespace characters.
When using file >> value the operation is executed and checked for success: always use this approach! The use of eof() is only to determine whether the failure to read was due to EOF being hit or something else.

Having problems with 0x0A character in C++ even in binary mode. (interprets it as new file)

Hi this might seem a bit noobie, but here we go. Im developing a program that downloads leaderboards of a certain game from the internet and transforms it into a proper format to work with it (elaborate rankings, etc).
The files contains the names, ordered by rank, but between each name there are 7 random control codes (obivously unprintable). The txt file looks like this:
..C...hName1..)...&Name2......)Name3..é...þName4..Ü...†Name5..‘...QName6..~...bName7..H...NName8..|....Name9..v...HName10.
Checked via an hexEditor and saw the first control code after each name is always a null character (0x00). So, what I do is read everything, and then cout every character. When a 0x00 character is found, skip 7 characters and keep couting. Therefore you end up with the list, right?
At first I had the problem that on those random control codes, sometimes you would find like a "soft EOF" (0x1A), and the program would stop reading there. So I finally figured out to open it in binary mode. It worked, and then everything would be couted... or thats what I thought.
But I came across another file which still didn't work, and finally found out that there was an EOF character! (0x0A) Which doesn't makes sense since Im opening it in binary mode. But still, after reading that character, C++ interprets that as a new file, and hence skips 7 characters, so the name after that character will always appear cut.
Here's my current code:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main () {
string scores;
system("wget http://certainwebsite/001.txt"); //download file
ifstream highin ("001.txt", ios::binary);
ofstream highout ("board.txt", ios::binary);
if (highin.is_open())
{
while ( highin.good() )
{
getline (highin, scores);
for (int i=0;i<scores.length(); i++)
{
if (scores[i]==0x00){
i=i+7; //skip 7 characters if 'null' is found
cout << endl;
highout << endl;
}
cout << scores[i];
highout << scores[i]; //cout names and save them in output file
}
}
highin.close();
}
else cout << "Unable to open file";
system("pause>nul");
}
Not sure how to ignore that character if being already in binary mode doesn't work. Sorry for the long question but I wanted to be detailed and specific. In this case, the EOF character is located before the Name3, and hence this is how the output looks like:
http://i.imgur.com/yu1NjoZ.png
By default getline() reads until the end of line and discards the newline character. However, the delimiter character could be customized (by supplying the third parameter). If you wish to read until the null character (not until the end of line), you could try using getline (highin, scores, '\0'); (and adjusting the logic of skipping the characters).
I'm glad you figured it out and it doesn't surprise me that getline() was the culprit. I had a similar issue dealing with the newline character when I was trying to read in a CSV file. There are several different getline() functions in C++ depending on how you call the function and each seems to handle the newline character differently.
As a side note, in your for loop, I'd recommend against performing a method call in your test. That adds unnecessary overhead to the loop. It'd be better to call the method once and put that value into a variable, then enter the loop and test i against the length variable. Unless you expect the length to change, calling the length() method each iteration is a waste of system resources.
Thank you all guys, it worked, it was the getline() which was giving me problems indeed. Due to the 'while' loop, each time it found a new line character, it restarted the process, hence skipping those 7 characters.

strange eof flags in stream

I have encountered a strange problem when parsing text file using c++ file stream. Here is the code:
while (true)
{
std::getline(inFile, line);
if (!inFile.good())
{
std::cout << "Fail, bad and eof flags:" << inFile.fail() << inFile.bad() << inFile.eof() << std::endl;
break;
}
parseLine(line);
}
When the read terminates, the output is:
Fail, bad and eof flags:001
But actually the reader does not reach the end of file. I open the file and find that the next character is actually 26 (ASCII code). Then the problem is: 1) why the eof flag is set when reading this character, and how to avoid this kind of false termination? and 2) how to recover from this state? Thanks!
PS: thanks the replies. What if I read the file in binary mode? Any better solution? I use the Windows platform but the file seems to be an unix file.
why the eof flag is set when reading this character
Because it's the EOF marker character.
From Wikipedia:
In Microsoft's DOS and Windows (and in CP/M and many DEC operating
systems), reading from the terminal will never produce an EOF.
Instead, programs recognize that the source is a terminal (or other
"character device") and interpret a given reserved character or
sequence as an end-of-file indicator; most commonly this is an ASCII
Control-Z, code 26.
how to avoid this kind of false termination
It's not a "false" termination.
how to recover from this state?
You don't need to.
If you were trying to read a "binary file" where arbitrary characters would be expected, you would open your file stream in binary mode.
The ASCII character 26 is the SUB control character, which in caret notation is ^Z. This might be recognizable to you as the Windows end of file character. So assuming ASCII and Windows, there you go.
Here you go:
Getline and 16h (26d) character
Looks like you have to write your own getline function. Seems there is no way around it :p That I know of, and it seems no one else knows. If anyone knows a better way, chime in.